Course Outline

list Introduction to Statistics: A Modeling Approach


We have started our journey with data—what we end up with after we turn variation in the world into numbers. The process of creating data starts with sampling, and then measurement. We organize data into columns and rows, where the columns represent the variables (e.g., Thumb) that we have measured and the rows, the objects to which we applied our measurement (e.g., students). Each cell of the table holds a value, representing that row’s measurement for that variable (e.g., one student’s thumb length).

Before analyzing data we often want to manipulate it in various ways. We may create summary variables, filter out missing data, and so on. We may even change what a row is by aggregating measures across rows, resulting in a new data frame.

But let’s keep our eye on the prize: we care about variation data because we are interested in variation in the world. There is some greater population that a sample comes from. And here we see the ultimate problem with data: it won’t always look like the thing it came from. Much of statistics is devoted to understanding and dealing with this problem.