Course Outline

list Introduction to Statistics: A Modeling Approach

Generating Predictions From the Model

Predicting Future Observations

Now that you have fit the Sex model, you can use your estimates to make predictions about future observations. Doing this requires you to use your model as a function. Think of a function like a machine: you put something in, you get something out. In this case, you will put in a value (e.g., “female”) for your explanatory variable (Sex), and get out a predicted thumb length.

It’s easy to do this in your head with the TinySex.model. Recall that our model, once fit, looked like this:

\[Y_{i}=59+6X_{i}+e_{i}\]

To turn this into a function, we remove the error term. If our goal is to model the variation, we want the error term there. But if our goal is to predict, we are going to ignore error and just do our best! We also change the \(Y_{i}\) to \(\hat{Y}_{i}\), which indicates a predicted score for person i. Our prediction function, then, looks like this:

\[\hat{Y}_{i}=59+6X_{i}\]

L_Ch7_Fitting_5

We leave out the error term because every person will have a different error term. If we knew their error, we could predict their score exactly. But since we don’t—because remember, we are predicting a new observation—all we can do is predict their mean based on their sex.

This prediction function is easy to use. If we want to predict what the next observed thumb length will be, we can see that if the next student sampled is female, their predicted thumb length is 59. If they are male, the prediction is (59 + 6), or 65.

It’s not that hard to do this in your head, especially for a simple model. But of course, we won’t want to do it in our heads for more complex models. R provides a way to turn a model into a function, which we then can use to predict future observations based on the model. It’s called makeFun().

makeFun() takes a model as its input (e.g., TinySex.model) and then returns an R object that works as a function. To use the function, you have to save it somewhere. We will do that for you, using the following code to create the function TinySex.fun.

TinySex.fun <- makeFun(TinySex.model)

Once you have created this function, you can query it for a prediction. This is how we ask the function to make a prediction for a male.

TinySex.fun("male")

Try running it in the DataCamp window below. Try it out for “female” as well.

require(mosaic) require(ggformula) Fingers <- read.csv(file="https://raw.githubusercontent.com/UCLATALL/intro-stats-modeling/master/datasets/fingers.csv", header=TRUE, sep=",") #set up tiny data set Thumb <- c(56, 60, 61, 63, 64, 68) Sex <- c("female","female","female","male","male","male") TinyFingers <- data.frame(Sex, Thumb) TinySex.model <- lm(Thumb ~ Sex, data = TinyFingers) TinySex.fun <- makeFun(TinySex.model) # run this code TinySex.fun("male")
Use lm() to create a model of Sex from the Fingers data frame
DataCamp: ch7-2

L_Ch7_Fitting_7

Now try creating a function based on Sex.model, the model we fit to the larger Fingers data set, using the DataCamp window below. Then, use the function to compute the predicted thumb length of a female based on the Sex.model.

require(mosaic) require(ggformula) Fingers <- read.csv(file="https://raw.githubusercontent.com/UCLATALL/intro-stats-modeling/master/datasets/fingers.csv", header=TRUE, sep=",") Sex.model <- lm(Thumb ~ Sex, data = Fingers) # make a function and save it as Sex.fun Sex.fun <- # this will return Sex.fun's prediction of Thumb length for a female Sex.fun("female") # make a function and save it as Sex.fun Sex.fun <- makeFun(Sex.model) # this will return Sex.fun's prediction of Thumb length for a female Sex.fun("female") test_object("Sex.fun") test_function_result("Sex.fun") test_error()
Use makeFun() and Sex.model to create Sex.fun()
DataCamp: ch7-3

Generating “Predicted” Values for the Sample Data

As we did in Chapter 5, we also will want to generate model predictions for our sample data. It seems odd to predict values when we already know the actual values. But it’s actually very useful to do so, because then we can calculate residuals from the model.

To get predicted values from the TinySex.model, we use the predict() function:

predict(TinySex.model)

Let’s say you want to save these predicted values for each person as a variable called Sex.predicted (in the TinyFingers data frame. See if you can complete the R code to do this.

require(mosaic) require(ggformula) Fingers <- read.csv(file="https://raw.githubusercontent.com/UCLATALL/intro-stats-modeling/master/datasets/fingers.csv", header=TRUE, sep=",") #set up tiny data set Thumb <- c(56, 60, 61, 63, 64, 68) Sex <- c("female","female","female","male","male","male") TinyFingers <- data.frame(Sex, Thumb) TinySex.model <- lm(Thumb ~ Sex, data = TinyFingers) TinySex.fun <- makeFun(TinySex.model) TinyFingers$Sex.predicted <- # this prints the TinyFingers data frame TinyFingers TinyFingers$Sex.predicted <- predict(TinySex.model) test_data_frame("TinyFingers") test_function("predict") test_error()
Use the predict() function
DataCamp: ch7-4

Notice that our predictions are a single number for each person: 59 for each female and 65 for each male. Each person gets a single predicted thumb length; we never predict both of these values for a single person. But different people will get different predicted outcomes based on their sex.

L_Ch7_Fitting_8

Try the function predict() on the full data set. Recall that you fit the model to the full data set, Fingers. You saved the model as Sex.model. Now see if you can generate predictions from the model and save the predictions as a variable in the Fingers data frame.

require(mosaic) require(ggformula) Fingers <- read.csv(file="https://raw.githubusercontent.com/UCLATALL/intro-stats-modeling/master/datasets/fingers.csv", header=TRUE, sep=",") # here is the code you wrote before Sex.model <- lm(Thumb ~ Sex, data = Fingers) # generate predictions from Sex.model Fingers$Sex.predicted <- # this will print out a 10 lines of Fingers head(select(Fingers, Sex, Thumb, Sex.predicted), 10) # here is the code you wrote before Sex.model <- lm(Thumb ~ Sex, data=Fingers) # generate predictions from this function Fingers$Sex.predicted <- predict(Sex.model) # this will print out a few lines of Fingers head(select(Fingers, Sex, Thumb, Sex.predicted), 10) test_object("Sex.model") test_function("predict") test_data_frame("Fingers") test_function_result("head") test_error() success_msg("Spectacular job!")
Use the predict() function
DataCamp: ch7-5

L_Ch7_Fitting_6

We’ve learned how to specify and fit models. We then took those models and used them (as functions) to make predictions for future observations, and also to generate predictions for each person in our sample data. We turn next to examine the residuals from our model—the variation left over after we subtract out our model.

Responses