Course Outline

list Introduction to Statistics: A Modeling Approach

Fitting the Empty Model

The simple model we have started with—using the mean to model the distribution of a quantitative variable—is sometimes called the empty model or null model. If the mean is our model, then fitting the model to data simply means calculating the mean of the distribution.

Let’s think this through in the context of students’ thumb lengths. We will use a tiny data set, which we’ve put in a data frame called TinyFingers.

TinyFingers

The whole data set is just six observations. Make a histogram of the distribution of six thumb lengths (Thumb). Add in a blue line to show where the mean is.

require(mosaic) require(ggformula) # set up TinyFingers StudentID <- c(1,2,3,4,5,6) Thumb <- c(56, 60, 61, 63, 64, 68) TinyFingers <- data.frame(StudentID, Thumb) # modify this to save favstats for Thumb length TinyThumb.stats <- # modify this to draw a vline representing the mean in blue gf_histogram(~ Thumb, data = TinyFingers) %>% gf_vline() # modify this to save favstats for Thumb length TinyThumb.stats <- favstats( ~ Thumb, data=TinyFingers) # modify this to draw a vline representing the mean in blue gf_histogram(~ Thumb, data = TinyFingers) %>% gf_vline(xintercept = ~ mean, color = "blue", data = TinyThumb.stats) ex() %>% check_object("TinyThumb.stats") %>% check_equal() ex() %>% check_function("favstats") %>% check_arg("x") %>% check_equal() ex() %>% check_function("favstats") %>% check_arg("data") %>% check_equal() ex() %>% check_function("gf_histogram") %>% check_arg("object") %>% check_equal() ex() %>% check_function("gf_histogram") %>% check_arg("data") %>% check_equal() ex() %>% check_function("gf_vline") %>% check_arg("xintercept") %>% check_equal() ex() %>% check_function("gf_vline") %>% check_arg("color") %>% check_equal() ex() %>% check_function("gf_vline") %>% check_arg("data") %>% check_equal() ex() %>% check_error()
DataCamp: ch5-6

It’s easy to fit the empty model—it’s just the mean (62 in this case). But later you will learn to fit more complex models to your data. We are going to teach you a way of fitting models in R that you can use now for fitting the empty model, but that will also work later for fitting more complex models.

The R function we are going to use is lm(), which stands for “linear model.” (We’ll say more about why it’s called that in a later chapter.) Here’s the code we use to fit the empty model, followed by the output.

lm(Thumb ~ NULL, data = TinyFingers)

Although the output seems a little strange, with words like “Coefficients” and “Intercept,” it does give you back the mean of the distribution (62), as expected. Thus, this function finds the best fitting number for our model. The word “NULL” is another word for “empty” (as in “empty model.”)

It will be helpful to save the results of this model fit in an R object. Here’s code that uses lm() to fit the empty model, then saves the results in an R object called TinyEmpty.model:

TinyEmpty.model <- lm(Thumb ~ NULL, data = TinyFingers)

If you want to see what the model estimates are after running this code, you can just type the name of the object where you saved the model:

TinyEmpty.model

We seem to be making a big deal about having calculated the mean of six numbers! But trust us, it will make more sense once you see where we go with it. One point is worth making now, however. Remember, the goal of statistics is to understand the DGP. The mean of the data distribution gives us our best estimate of the mean of the population that results from the DGP.

It may not be a very good estimate—after all, it is only based on a small amount of data—but it’s the best one we can come up with based on the available data. It also is an unbiased estimate, meaning that it is just as likely to be too high as it is too low.

Now that you have fit the empty model to the tiny set of data, use lm() to fit the empty model to our full data set, Fingers.

Modify the code below to create a histogram of Thumb; draw a vertical line where the mean is; fit the empty model; and save the model to an R object called Empty.model.

require(mosaic) require(ggformula) require(supernova) # modify this to fit the empty model of Thumb Empty.model <- # this prints the best fitting number Empty.model # save the favstats for Thumb (this is helpful for drawing a line) Thumb.stats <- # make a histogram of Thumb and draw the line for the mean gf_histogram() %>% gf_vline(xintercept = ) # modify this to fit the empty model of Thumb Empty.model <- lm(Thumb ~ NULL, data = Fingers) # this prints the best fitting number Empty.model # save the favstats for Thumb (this is helpful for drawing a line) Thumb.stats <- favstats(~Thumb, data = Fingers) # make a histogram of Thumb and draw the line for the mean gf_histogram(~Thumb, data = Fingers) %>% gf_vline(xintercept = ~mean, data = Thumb.stats) ex() %>% check_object("Empty.model") %>% check_equal() ex() %>% check_output_expr("Empty.model") ex() %>% check_object("Thumb.stats") %>% check_equal() ex() %>% check_function("gf_histogram") %>% check_arg("object") %>% check_equal() ex() %>% check_function("gf_histogram") %>% check_arg("data") %>% check_equal() ex() %>% check_function("gf_vline") %>% check_arg("xintercept") %>% check_equal() ex() %>% check_function("gf_vline") %>% check_arg("data") %>% check_equal() success_msg("Keep up the great work!")
Have you set xintercet = ~mean and data = Thumb.stats?
DataCamp: ch5-7

Responses