Course Outline

list Introduction to Statistics: A Modeling Approach

3.0 Examining Distributions

The Concept of Distribution

Assuming we have a tidy data set to work with, the next step in data analysis is to begin looking at the variation in your measures. This leads us to one of the most fundamental concepts in statistics, the concept of distribution. Wild (2006) defines the concept of distribution as “the pattern of variation in a variable or set of variables” and it is like a “lens” through which we can view variation in data (figure from Wild, C. (2006).)

The concept of distribution is complex; most people do not understand it all at once. If you find it difficult or vague, don’t worry. Just know, for now, that it will be a major concept throughout this course, and we will keep adding more dimensions to it as we progress.

Thinking about distributions requires you to think abstractly, at a higher level, about your data. You must shift your thinking from a focus on the individual observations in your data set (e.g., the 20 people you have sampled) to a focus—first, on just one attribute along which the observations vary; and second, to a focus on the pattern of variation in the attribute across the sample.

Note that not just any bunch of numbers can be thought of as a distribution. The numbers must all be measures of the same attribute. So, for example, if you have measures of height and weight on a sample of 20 people, you can’t just lump the height and weight numbers into a single distribution. You can, however, examine the distribution of height and the distribution of weight separately.

Data are inherently complex; even a small data set includes lots of numbers and lots of variation. By using the concept of distribution, we can begin to see all this variation as characteristic of a single thing: a distribution. The concept of distribution allows us to see the whole as greater than the sum of the parts; the forest for the trees.