Course Outline

segmentGetting Started (Don't Skip This Part)

segmentIntroduction to Statistics: A Modeling Approach

segmentPART I: EXPLORING VARIATION

segmentChapter 1  Welcome to Statistics: A Modeling Approach

segmentChapter 2  Understanding Data

segmentChapter 3  Examining Distributions

segmentChapter 4  Explaining Variation

segmentPART II: MODELING VARIATION

segmentChapter 5  A Simple Model

segmentChapter 6  Quantifying Error

segmentChapter 7  Adding an Explanatory Variable to the Model

7.3 Generating Predictions from the Model

segmentChapter 8  Models with a Quantitative Explanatory Variable

segmentPART III: EVALUATING MODELS

segmentChapter 9  Distributions of Estimates

segmentChapter 10  Confidence Intervals and Their Uses

segmentChapter 11  Model Comparison with the F Ratio

segmentChapter 12  What You Have Learned

segmentResources
list Introduction to Statistics: A Modeling Approach
Generating Predictions From the Model
Predicting Future Observations
Now that you have fit the Sex model, you can use your estimates to make predictions about future observations. Doing this requires you to use your model as a function. Think of a function like a machine: you put something in, you get something out. In this case, you will put in a value (e.g., “female”) for your explanatory variable (Sex), and get out a predicted thumb length.
It’s easy to do this in your head with the TinySex.model. Recall that our model, once fit, looked like this:
\[Y_{i}=59+6X_{i}+e_{i}\]
To turn this into a function, we remove the error term. If our goal is to model the variation, we want the error term there. But if our goal is to predict, we are going to ignore error and just do our best! We also change the \(Y_{i}\) to \(\hat{Y}_{i}\), which indicates a predicted score for person i. Our prediction function, then, looks like this:
\[\hat{Y}_{i}=59+6X_{i}\]
L_Ch7_Fitting_5
We leave out the error term because every person will have a different error term. If we knew their error, we could predict their score exactly. But since we don’t—because remember, we are predicting a new observation—all we can do is predict their mean based on their sex.
This prediction function is easy to use. If we want to predict what the next observed thumb length will be, we can see that if the next student sampled is female, their predicted thumb length is 59. If they are male, the prediction is (59 + 6), or 65.
It’s not that hard to do this in your head, especially for a simple model. But of course, we won’t want to do it in our heads for more complex models. R provides a way to turn a model into a function, which we then can use to predict future observations based on the model. It’s called makeFun()
.
makeFun()
takes a model as its input (e.g., TinySex.model) and then returns an R object that works as a function. To use the function, you have to save it somewhere. We will do that for you, using the following code to create the function TinySex.fun.
TinySex.fun < makeFun(TinySex.model)
Once you have created this function, you can query it for a prediction. This is how we ask the function to make a prediction for a male.
TinySex.fun("male")
Try running it in the DataCamp window below. Try it out for “female” as well.
require(mosaic)
require(ggformula)
Fingers < read.csv(file="https://raw.githubusercontent.com/UCLATALL/introstatsmodeling/master/datasets/fingers.csv", header=TRUE, sep=",")
#set up tiny data set
Thumb < c(56, 60, 61, 63, 64, 68)
Sex < c("female","female","female","male","male","male")
TinyFingers < data.frame(Sex, Thumb)
TinySex.model < lm(Thumb ~ Sex, data = TinyFingers)
TinySex.fun < makeFun(TinySex.model)
# run this code
TinySex.fun("male")
L_Ch7_Fitting_7
Now try creating a function based on Sex.model, the model we fit to the larger Fingers data set, using the DataCamp window below. Then, use the function to compute the predicted thumb length of a female based on the Sex.model.
require(mosaic)
require(ggformula)
Fingers < read.csv(file="https://raw.githubusercontent.com/UCLATALL/introstatsmodeling/master/datasets/fingers.csv", header=TRUE, sep=",")
Sex.model < lm(Thumb ~ Sex, data = Fingers)
# make a function and save it as Sex.fun
Sex.fun <
# this will return Sex.fun's prediction of Thumb length for a female
Sex.fun("female")
# make a function and save it as Sex.fun
Sex.fun < makeFun(Sex.model)
# this will return Sex.fun's prediction of Thumb length for a female
Sex.fun("female")
test_object("Sex.fun")
test_function_result("Sex.fun")
test_error()
Generating “Predicted” Values for the Sample Data
As we did in Chapter 5, we also will want to generate model predictions for our sample data. It seems odd to predict values when we already know the actual values. But it’s actually very useful to do so, because then we can calculate residuals from the model.
To get predicted values from the TinySex.model, we use the predict()
function:
predict(TinySex.model)
Let’s say you want to save these predicted values for each person as a variable called Sex.predicted (in the TinyFingers data frame. See if you can complete the R code to do this.
require(mosaic)
require(ggformula)
Fingers < read.csv(file="https://raw.githubusercontent.com/UCLATALL/introstatsmodeling/master/datasets/fingers.csv", header=TRUE, sep=",")
#set up tiny data set
Thumb < c(56, 60, 61, 63, 64, 68)
Sex < c("female","female","female","male","male","male")
TinyFingers < data.frame(Sex, Thumb)
TinySex.model < lm(Thumb ~ Sex, data = TinyFingers)
TinySex.fun < makeFun(TinySex.model)
TinyFingers$Sex.predicted <
# this prints the TinyFingers data frame
TinyFingers
TinyFingers$Sex.predicted < predict(TinySex.model)
test_data_frame("TinyFingers")
test_function("predict")
test_error()
Notice that our predictions are a single number for each person: 59 for each female and 65 for each male. Each person gets a single predicted thumb length; we never predict both of these values for a single person. But different people will get different predicted outcomes based on their sex.
L_Ch7_Fitting_8
Try the function predict()
on the full data set. Recall that you fit the model to the full data set, Fingers. You saved the model as Sex.model. Now see if you can generate predictions from the model and save the predictions as a variable in the Fingers data frame.
require(mosaic)
require(ggformula)
Fingers < read.csv(file="https://raw.githubusercontent.com/UCLATALL/introstatsmodeling/master/datasets/fingers.csv", header=TRUE, sep=",")
# here is the code you wrote before
Sex.model < lm(Thumb ~ Sex, data = Fingers)
# generate predictions from Sex.model
Fingers$Sex.predicted <
# this will print out a 10 lines of Fingers
head(select(Fingers, Sex, Thumb, Sex.predicted), 10)
# here is the code you wrote before
Sex.model < lm(Thumb ~ Sex, data=Fingers)
# generate predictions from this function
Fingers$Sex.predicted < predict(Sex.model)
# this will print out a few lines of Fingers
head(select(Fingers, Sex, Thumb, Sex.predicted), 10)
test_object("Sex.model")
test_function("predict")
test_data_frame("Fingers")
test_function_result("head")
test_error()
success_msg("Spectacular job!")
L_Ch7_Fitting_6
We’ve learned how to specify and fit models. We then took those models and used them (as functions) to make predictions for future observations, and also to generate predictions for each person in our sample data. We turn next to examine the residuals from our model—the variation left over after we subtract out our model.