Course Outline

list Introduction to Statistics: A Modeling Approach

Quantifying Model Fit With Sums of Squares

In the empty model, you will recall, we used the mean as the model, i.e., as the predicted score for every observation. We developed the intuition that mean was a better fitting model (that there was less error around the model) if the spread of the distribution was small than if it was large.

Calculating Sums of Squares: Empty Model (Review)

In the previous chapter, we quantified error using the sum of the squared deviations (SS, or Sum of Squares) around the mean, a measure that is minimized precisely at the mean. Under the empty model, all of the variation is unexplained—that’s why it is called “empty.” But it does show us clearly how much variation there is left to explain, measured in sum of squares.

Remind yourself how to use the anova() function to get the SS left over after fitting the empty model for our TinyFingers thumb length data.

require(mosaic) require(ggformula) #set up tiny data set Thumb <- c(56, 60, 61, 63, 64, 68) Sex <- c("female","female","female","male","male","male") TinyFingers <- data.frame(Sex, Thumb) TinyFingers$Sex <- as.factor(TinyFingers$Sex) TinySex.model <- lm(Thumb ~ Sex, data = TinyFingers) TinyFingers$Sex.predicted <- predict(TinySex.model) TinyFingers$Sex.resid <- TinyFingers$Thumb - TinyFingers$Sex.predicted TinyFingers$Sex.resid2 <- resid(TinySex.model) TinyEmpty.model <- lm(Thumb ~ NULL, data = TinyFingers) TinyFingers$Empty.pred <- predict(TinyEmpty.model) # here is the code you wrote before TinyEmpty.model <- lm(Thumb ~ NULL, data = TinyFingers) # write code to get the SS leftover from TinyEmpty.model # here is the code you wrote before TinyEmpty.model <- lm(Thumb ~ NULL, data = TinyFingers) # write code to get the SS leftover from TinyEmpty.model anova(TinyEmpty.model) test_function_result("lm") test_object("TinyEmpty.model") test_function_result("anova") test_error()
Use the anova() function
DataCamp: ch7-10

L_Ch7_Quantifying_1

Calculating Sums of Squares: Sex Model

How do we quantify the error around our new—more complex—model, where sex is used to predict thumb length?

We quantify error around the more complex model in the same way we did for the empty model. We simply generate the residuals, square them, and then sum them to get the sum of squares left after fitting our model.

Go ahead and modify this code to get the SS left over for the TinySex.model.

require(mosaic) require(ggformula) #set up tiny data set Thumb <- c(56, 60, 61, 63, 64, 68) Sex <- c("female","female","female","male","male","male") TinyFingers <- data.frame(Sex, Thumb) TinyFingers$Sex <- as.factor(TinyFingers$Sex) TinySex.model <- lm(Thumb ~ Sex, data = TinyFingers) TinyFingers$Sex.predicted <- predict(TinySex.model) TinyFingers$Sex.resid <- TinyFingers$Thumb - TinyFingers$Sex.predicted TinyFingers$Sex.resid2 <- resid(TinySex.model) TinyEmpty.model <- lm(Thumb ~ NULL, data = TinyFingers) TinyFingers$Empty.pred <- predict(TinyEmpty.model) # modify this code to find the SS of TinySex.model anova(Empty.model) # modify this code to find the SS of TinySex.model anova(TinySex.model) test_function_result("anova", incorrect_msg = "Did you change `Empty.model` to `TinySex.model`?") test_error() success_msg("Wow! Great work.")
Make sure to specify the correct model
DataCamp: ch7-11

L_Ch7_Quantifying_2

We now have calculated two leftover (or residual) sums of squares. The first, 82, is for the empty model. The second, 28, is for the Sex model.

L_Ch7_Quantifying_3

The sum of squares has been minimized as much as we could with the empty model. We can now take that SS as our starting point—this is how much total error we have to explain. As soon as we add an explanatory variable (in this case Sex) into the model, it can only decrease the sum of squares for error, not increase them. If the new variable has no predictive value, then the sum of squares could stay the same. But it’s rare for a variable to have no predictive value at all.

Visualizing Sums of Squares

Let’s watch another video that explains where we are at this point. In her previous video in Chapter 6, Dr. Ji demonstrated the concept of sum of squares using our TinyFingers data set. We literally drew squares when we “squared the residuals.” She showed that the sum of squared deviations is minimized at the mean.

In this video, Dr. Ji shows us how we can visualize sum of squares from the Sex model, and also how we can compare the sum of squares from the Sex model against the empty model.

If you want to try it yourself, here we have provided data to copy/paste in to the little “sample data” box and the link to applet:

Sex Thumb
0 56
0 60
0 61
1 63
1 64
1 68

L_Ch7_Quantifying_4

Responses