Course Outline

segmentGetting Started (Don't Skip This Part)

segmentIntroduction to Statistics: A Modeling Approach

segmentPART I: EXPLORING VARIATION

segmentChapter 1  Welcome to Statistics: A Modeling Approach

segmentChapter 2  Understanding Data

segmentChapter 3  Examining Distributions

segmentChapter 4  Explaining Variation

segmentPART II: MODELING VARIATION

segmentChapter 5  A Simple Model

5.3 Fitting the Empty Model

segmentChapter 6  Quantifying Error

segmentChapter 7  Adding an Explanatory Variable to the Model

segmentChapter 8  Models with a Quantitative Explanatory Variable

segmentPART III: EVALUATING MODELS

segmentChapter 9  Distributions of Estimates

segmentChapter 10  Confidence Intervals and Their Uses

segmentChapter 11  Model Comparison with the F Ratio

segmentChapter 12  What You Have Learned

segmentResources
list Introduction to Statistics: A Modeling Approach
Fitting the Empty Model
The simple model we have started with—using the mean to model the distribution of a quantitative variable—is sometimes called the empty model or null model. If the mean is our model, then fitting the model to data simply means calculating the mean of the distribution.
Let’s think this through in the context of students’ thumb lengths. We will use a tiny data set, which we’ve put in a data frame called TinyFingers.
TinyFingers
The whole data set is just six observations. Make a histogram of the distribution of six thumb lengths (Thumb). Add in a blue line to show where the mean is.
require(mosaic)
require(ggformula)
# set up TinyFingers
StudentID < c(1,2,3,4,5,6)
Thumb < c(56, 60, 61, 63, 64, 68)
TinyFingers < data.frame(StudentID, Thumb)
# modify this to save favstats for Thumb length
TinyThumb.stats <
# modify this to draw a vline representing the mean in blue
gf_histogram(~ Thumb, data = TinyFingers) %>%
gf_vline()
# modify this to save favstats for Thumb length
TinyThumb.stats < favstats( ~ Thumb, data=TinyFingers)
# modify this to draw a vline representing the mean in blue
gf_histogram(~ Thumb, data = TinyFingers) %>%
gf_vline(xintercept = ~ mean, color = "blue", data = TinyThumb.stats)
ex() %>% check_object("TinyThumb.stats") %>% check_equal()
ex() %>% check_function("favstats") %>% check_arg("x") %>% check_equal()
ex() %>% check_function("favstats") %>% check_arg("data") %>% check_equal()
ex() %>% check_function("gf_histogram") %>% check_arg("object") %>% check_equal()
ex() %>% check_function("gf_histogram") %>% check_arg("data") %>% check_equal()
ex() %>% check_function("gf_vline") %>% check_arg("xintercept") %>% check_equal()
ex() %>% check_function("gf_vline") %>% check_arg("color") %>% check_equal()
ex() %>% check_function("gf_vline") %>% check_arg("data") %>% check_equal()
ex() %>% check_error()
It’s easy to fit the empty model—it’s just the mean (62 in this case). But later you will learn to fit more complex models to your data. We are going to teach you a way of fitting models in R that you can use now for fitting the empty model, but that will also work later for fitting more complex models.
The R function we are going to use is lm()
, which stands for “linear model.” (We’ll say more about why it’s called that in a later chapter.) Here’s the code we use to fit the empty model, followed by the output.
lm(Thumb ~ NULL, data = TinyFingers)
Although the output seems a little strange, with words like “Coefficients” and “Intercept,” it does give you back the mean of the distribution (62), as expected. Thus, this function finds the best fitting number for our model. The word “NULL” is another word for “empty” (as in “empty model.”)
It will be helpful to save the results of this model fit in an R object. Here’s code that uses lm()
to fit the empty model, then saves the results in an R object called TinyEmpty.model:
TinyEmpty.model < lm(Thumb ~ NULL, data = TinyFingers)
If you want to see what the model estimates are after running this code, you can just type the name of the object where you saved the model:
TinyEmpty.model
We seem to be making a big deal about having calculated the mean of six numbers! But trust us, it will make more sense once you see where we go with it. One point is worth making now, however. Remember, the goal of statistics is to understand the DGP. The mean of the data distribution gives us our best estimate of the mean of the population that results from the DGP.
It may not be a very good estimate—after all, it is only based on a small amount of data—but it’s the best one we can come up with based on the available data. It also is an unbiased estimate, meaning that it is just as likely to be too high as it is too low.
Now that you have fit the empty model to the tiny set of data, use lm()
to fit the empty model to our full data set, Fingers.
Modify the code below to create a histogram of Thumb; draw a vertical line where the mean is; fit the empty model; and save the model to an R object called Empty.model.
require(mosaic)
require(ggformula)
require(supernova)
# modify this to fit the empty model of Thumb
Empty.model <
# this prints the best fitting number
Empty.model
# save the favstats for Thumb (this is helpful for drawing a line)
Thumb.stats <
# make a histogram of Thumb and draw the line for the mean
gf_histogram() %>%
gf_vline(xintercept = )
# modify this to fit the empty model of Thumb
Empty.model < lm(Thumb ~ NULL, data = Fingers)
# this prints the best fitting number
Empty.model
# save the favstats for Thumb (this is helpful for drawing a line)
Thumb.stats < favstats(~Thumb, data = Fingers)
# make a histogram of Thumb and draw the line for the mean
gf_histogram(~Thumb, data = Fingers) %>%
gf_vline(xintercept = ~mean, data = Thumb.stats)
ex() %>% check_object("Empty.model") %>% check_equal()
ex() %>% check_output_expr("Empty.model")
ex() %>% check_object("Thumb.stats") %>% check_equal()
ex() %>% check_function("gf_histogram") %>% check_arg("object") %>% check_equal()
ex() %>% check_function("gf_histogram") %>% check_arg("data") %>% check_equal()
ex() %>% check_function("gf_vline") %>% check_arg("xintercept") %>% check_equal()
ex() %>% check_function("gf_vline") %>% check_arg("data") %>% check_equal()
success_msg("Keep up the great work!")