Course Outline

segmentGetting Started (Don't Skip This Part)

segmentStatistics and Data Science: A Modeling Approach

segmentPART I: EXPLORING VARIATION

segmentChapter 1  Welcome to Statistics: A Modeling Approach

segmentChapter 2  Understanding Data

segmentChapter 3  Examining Distributions

segmentChapter 4  Explaining Variation

segmentPART II: MODELING VARIATION

segmentChapter 5  A Simple Model

segmentChapter 6  Quantifying Error

segmentChapter 7  Adding an Explanatory Variable to the Model

segmentChapter 8  Models with a Quantitative Explanatory Variable

8.6 Assessing Model Fit with Sum of Squares

segmentFinishing Up (Don't Skip This Part!)

segmentResources
list High School / Statistics and Data Science I (AB)
8.6 Assessing Model Fit with Sum of Squares
Finally, let’s examine the fit of our regression model by running the supernova()
function on our model. And at the same time, let’s compare the table we get from the regression model (Height_model
) with the one we produced before for the Height2Group_model
.
supernova(Height2Group_model)
supernova(Height_model)
Height2Group Model
Analysis of Variance Table (Type III SS)
Model: Thumb ~ Height2Group
SS df MS F PRE p
        
Model (error reduced)  830.880 1 830.880 11.656 0.0699 .0008
Error (from model)  11049.331 155 71.286
        
Total (empty model)  11880.211 156 76.155
Height Model
Analysis of Variance Table (Type III SS)
Model: Thumb ~ Height
SS df MS F PRE p
        
Model (error reduced)  1816.862 1 1816.862 27.984 0.1529 .0000
Error (from model)  10063.349 155 64.925
        
Total (empty model)  11880.211 156 76.155
Remember, the total sum of squares is the sum of squared deviations (or more generally, residuals) from the empty model. Total sum of squares is all about the outcome variable, and isn’t affected by the explanatory variable or variables. And when we compare statistical models, as we are doing here, we always are modeling the same outcome variable.
Partitioning Sums of Squares
If you want to try out the app Dr. Ji uses in this video you can click this link to the sum of squares applet. Copy/paste the data below into the little “sample data” box to reproduce Ji’s examples. (Here’s the full link in case that one doesn’t work: http://www.rossmanchance.com/applets/RegShuffle.htm)
Height2  Group Thumb 
0  56 
0  60 
1  61 
0  63 
1  64 
1  68 
Height  Thumb 
62  56 
66  60 
67  61 
63  63 
68  64 
71  68 
For any model with an explanatory variable (what we have been calling “complex models”), the SS Total can be partitioned into the SS Error and the SS Model. The SS Model is the amount by which the error is reduced under the complex model (e.g., the Height
model) compared with the empty model.
As we developed previously for the group models, SS Model is easily calculated by subtracting SS Error from SS Total. This is the same, regardless of whether you are fitting a group model or a regression model. Error from the model is defined in the former case as residuals from the group means, and in the latter, residuals from the regression line.
It also is possible to calculate the SS Model in the regression model directly, in much the same way we did for the group model. Recall that for the group model, SS Model was the sum of the squared deviations of each person’s predicted score (their group mean) from the Grand Mean. In the regression model, SS Model is calculated in exactly the same way, except that each person’s predicted score is defined as a point on the regression line. The Grand Mean is the same in both cases.