Course Outline

segmentGetting Started (Don't Skip This Part)

segmentStatistics and Data Science: A Modeling Approach

segmentPART I: EXPLORING VARIATION

segmentChapter 1  Welcome to Statistics: A Modeling Approach

segmentChapter 2  Understanding Data

segmentChapter 3  Examining Distributions

segmentChapter 4  Explaining Variation

segmentPART II: MODELING VARIATION

segmentChapter 5  A Simple Model

5.9 The Power of Aggregation

segmentChapter 6  Quantifying Error

segmentChapter 7  Adding an Explanatory Variable to the Model

segmentChapter 8  Models with a Quantitative Explanatory Variable

segmentFinishing Up (Don't Skip This Part!)

segmentResources
list High School / Statistics and Data Science I (AB)
5.9 The Power of Aggregation
In a famous article, the late evolutionary biologist Stephen Jay Gould argued that means (and medians) are not real; they are just abstractions (Gould, The Median Isn’t the Message). The only thing real is the variation, because those are the actual data points. Although he is right, the mean is an incredibly powerful tool for predicting the future. The reason has to do with the balancing of error.
Let’s say you run a pizza restaurant and you want to predict how many pizzas you are going to sell in the next week. This is important, because you want to make sure you have enough ingredients on hand to meet the demand, but not so much that you have to throw anything away.
As it happens, it would be practically impossible to predict which individual people are going to come into the restaurant and order a pizza during a given week—there is just too much variation. But if you know the average number of pizzas sold during a random sample of weeks, you could be pretty certain of your prediction for next week: the average is probably going to be pretty close.
This is all due to the power of aggregation—putting things together. Individuals are hard to predict, but the more things you add together, the more stable and predictable the resulting sum or average becomes. The reason for this is that the error variation balances out. Some scores pull the mean higher, and some lower. But when all’s said and done, the pulls in one direction are balanced out by the pulls in the opposite direction and you are left with something close to the average. And the more things you add together, the more stable the average will be.