Course Outline

segmentGetting Started (Don't Skip This Part)

segmentStatistics and Data Science: A Modeling Approach

segmentPART I: EXPLORING VARIATION

segmentChapter 1  Welcome to Statistics: A Modeling Approach

segmentChapter 2  Understanding Data

segmentChapter 3  Examining Distributions

segmentChapter 4  Explaining Variation

segmentPART II: MODELING VARIATION

segmentChapter 5  A Simple Model

segmentChapter 6  Quantifying Error

segmentChapter 7  Adding an Explanatory Variable to the Model

7.3 Generating Predictions from the Model

segmentChapter 8  Models with a Quantitative Explanatory Variable

segmentPART III: EVALUATING MODELS

segmentChapter 9  Distributions of Estimates

segmentChapter 10  Confidence Intervals and Their Uses

segmentChapter 11  Model Comparison with the F Ratio

segmentChapter 12  What You Have Learned

segmentFinishing Up (Don't Skip This Part!)

segmentResources
list Introduction to Statistics: A Modeling Approach
7.3 Generating Predictions From the Model
Predicting Future Observations
Now that you have fit the Sex model, you can use your estimates to make predictions about future observations. Doing this requires you to use your model as a function. Think of a function like a machine: you put something in, you get something out. In this case, you will put in a value (e.g., “female”) for your explanatory variable (Sex), and get out a predicted thumb length.
We can think about how to use the TinySex.model as a function. Recall that our model, once fit, looked like this:
\[Y_{i}=59+6X_{i}+e_{i}\]
To turn this into a function, we remove the error term. If our goal is to model the variation, we want the error term there. But if our goal is to predict, we are going to ignore error and just do our best! We also change the \(Y_{i}\) to \(\hat{Y}_{i}\), which indicates a predicted score for person i. Our prediction function, then, looks like this:
\[\hat{Y}_{i}=59+6X_{i}\]
We leave out the error term because every person will have a different error term. If we knew their error, we could predict their score exactly. But since we don’t—because remember, we are predicting a new observation—all we can do is predict their score based on their sex.
This prediction function is straightforward to use. If we want to predict what the next observed thumb length will be, we can see that if the next student sampled is female, their predicted thumb length is 59. If they are male, the prediction is (59 + 6), or 65.
Using R to Predict Future Observations
If the numbers are easy to add or subtract, it’s not that hard to do this in your head. But of course, we won’t always want to do it in our heads for more complex models.
We have a couple of new R functions that you can use make it easier to generate a prediction: b0()
and b1()
. Run the three lines of R code in the window below and see if you can figure out what these new functions do.
require(tidyverse)
require(mosaic)
#require(Lock5Data)
require(supernova)
TinyFingers < data.frame(
Sex = rep(c("female", "male"), each = 3),
Thumb = c(56, 60, 61, 63, 64, 68)
)
# This creates the TinySex.model
TinySex.model < lm(Thumb ~ Sex, data = TinyFingers)
# Run this code
TinySex.model
b0(TinySex.model)
b1(TinySex.model)
TinySex.model < lm(Thumb ~ Sex, data = TinyFingers)
TinySex.model
b0(TinySex.model)
b1(TinySex.model)
ex() %>% check_object("TinySex.model") %>% check_equal()
ex() %>% check_output_expr("TinySex.model")
ex() %>% check_function("b0") %>% check_arg("fit") %>% check_equal()
ex() %>% check_function("b1") %>% check_arg("fit") %>% check_equal()
The b0()
function takes a model as its input and returns the parameter estimate for the first parameter, which in this case is the mean Thumb of females. The function b1()
returns the parameter estimate for the second parameter, the increment from the mean of females to the mean of males.
In the window below, see if you can write a single line of R code that will use both of these new functions (b0()
and b1()
) to return the predicted value for a new male’s thumb length.
require(tidyverse)
require(mosaic)
#require(Lock5Data)
require(supernova)
TinyFingers < data.frame(
Sex = rep(c("female", "male"), each = 3),
Thumb = c(56, 60, 61, 63, 64, 68)
)
# This creates the TinySex.model
TinySex.model < lm(Thumb ~ Sex, data = TinyFingers)
# Write a line of R code that uses both b0() and b1() functions
# to return the predicted Thumb length of a male
# This creates the TinySex.model
TinySex.model < lm(Thumb ~ Sex, data = TinyFingers)
# Write a line of R code that uses both b0() and b1() functions
# to return the predicted Thumb length of a male
b0(TinySex.model) + b1(TinySex.model)
ex() %>% {
check_function(., "b0") %>% check_arg("fit") %>% check_equal()
check_function(., "b1") %>% check_arg("fit") %>% check_equal()
check_output_expr(., "b0(TinySex.model) + b1(TinySex.model)")
}
[1] 65
Generating “Predicted” Values for the Sample Data
As we did in Chapter 5, we also will want to generate model predictions for our sample data. It seems odd to predict values when we already know the actual values. But it’s actually very useful to do so, because then we can calculate residuals from the model predictions.
To get predicted values from the TinySex.model
, we use the predict()
function:
predict(TinySex.model)
1 2 3 4 5 6
59 59 59 65 65 65
Let’s say you want to save these predicted values for each person as a variable called Sex.predicted
(in the TinyFingers
data frame). See if you can complete the R code to do this.
require(tidyverse)
require(mosaic)
require(Lock5Data)
require(supernova)
TinyFingers < data.frame(
Sex = rep(c("female", "male"), each = 3),
Thumb = c(56, 60, 61, 63, 64, 68)
)
TinySex.model < lm(Thumb ~ Sex, data = TinyFingers)
TinyFingers$Sex.predicted <
# this prints the TinyFingers data frame
TinyFingers
TinyFingers$Sex.predicted < predict(TinySex.model)
ex() %>% check_object("TinyFingers") %>% check_column("Sex.predicted") %>% check_equal()
predict()
to predict values for each case. Sex Thumb Sex.predicted
1 female 56 59
2 female 60 59
3 female 61 59
4 male 63 65
5 male 64 65
6 male 68 65
Notice that our predictions are a single number for each person: 59 for each female and 65 for each male. Each person gets a single predicted thumb length; we never predict both of these values for a single person. But different people will get different predicted outcomes based on their sex.
Try the function predict()
on the full data set. Recall that you fit the model to the full data set, Fingers
. You saved the model as Sex.model
. Now see if you can generate predictions from the model and save the predictions as a variable in the Fingers
data frame.
require(tidyverse)
require(mosaic)
require(Lock5Data)
require(supernova)
# here is the model we fit before
Sex.model < lm(Thumb ~ Sex, data = Fingers)
# generate all possible predictions from Sex.model
Fingers$Sex.predicted <
# this will print out 10 lines of Fingers
head(select(Fingers, Sex, Thumb, Sex.predicted), 10)
Sex.model < lm(Thumb ~ Sex, data = Fingers)
Fingers$Sex.predicted < predict(Sex.model)
head(select(Fingers, Sex, Thumb, Sex.predicted), 10)
ex() %>% {
check_object(., "Sex.model") %>% check_equal()
check_function(., "predict") %>% check_result() %>% check_equal()
check_object(., "Fingers") %>% check_column(., "Sex.predicted") %>% check_equal()
check_function(., "head") %>% check_result() %>% check_equal()
}
predict()
to predict values for each case. Sex Thumb Sex.predicted
1 male 66.00 64.70267
2 female 64.00 58.25585
3 female 56.00 58.25585
4 male 58.42 64.70267
5 female 74.00 58.25585
6 female 60.00 58.25585
7 male 70.00 64.70267
8 female 55.00 58.25585
9 female 60.00 58.25585
10 female 52.00 58.25585
We’ve learned how to specify and fit models. We then took those models and used them (as functions) to make predictions for future observations, and also to generate predictions for each person in our sample data. We turn next to examine the residuals from our model—the variation left over after we subtract out our model.