Course Outline

segmentGetting Started (Don't Skip This Part)

segmentIntroduction to Statistics: A Modeling Approach

segmentPART I: EXPLORING VARIATION

segmentChapter 1  Welcome to Statistics: A Modeling Approach

segmentChapter 2  Understanding Data

segmentChapter 3  Examining Distributions

segmentChapter 4  Explaining Variation

segmentPART II: MODELING VARIATION

segmentChapter 5  A Simple Model

segmentChapter 6  Quantifying Error

segmentChapter 7  Adding an Explanatory Variable to the Model

segmentChapter 8  Models with a Quantitative Explanatory Variable

8.4 Examining Residuals from the Model

segmentPART III: EVALUATING MODELS

segmentChapter 9  Distributions of Estimates

segmentChapter 10  Confidence Intervals and Their Uses

segmentChapter 11  Model Comparison with the F Ratio

segmentChapter 12  What You Have Learned

segmentResources
list Introduction to Statistics: A Modeling Approach
Examining Residuals From the Model
Now youâ€™re on a roll! You probably remember from the previous chapter how to save the residuals from a model. We can do the same thing with a regression model: whenever we fit a model, we can generate both predictions and residuals from the model.
L_Ch8_Examining_1
Try to generate the residuals from the Height.model that you fit to the full Fingers data set.
require(mosaic)
require(ggformula)
Fingers < read.csv(file="https://raw.githubusercontent.com/UCLATALL/introstatsmodeling/master/datasets/fingers.csv", header=TRUE, sep=",")
# this is for measurement section
#Fingers < arrange(Fingers, desc(Sex))
#Fingers$FamilyMembers[1] < 2
#Fingers$Height[1] < 62
#Fingers$Sex < recode(Fingers$Sex, '1' = "female", '2' = "male")
Fingers < data.frame(Fingers)
# clean up str
Fingers$Sex < as.factor(Fingers$Sex)
Fingers$RaceEthnic < as.numeric(Fingers$RaceEthnic)
Fingers$SSLast < as.numeric(Fingers$SSLast)
Fingers$Year < as.numeric(Fingers$Year)
Fingers$Job < as.numeric(Fingers$Job)
Fingers$MathAnxious < as.numeric(Fingers$MathAnxious)
Fingers$Interest < as.numeric(Fingers$Interest)
Fingers$GradePredict < as.numeric(Fingers$GradePredict)
Fingers$Thumb < as.numeric(Fingers$Thumb)
Fingers$Index < as.numeric(Fingers$Index)
Fingers$Middle < as.numeric(Fingers$Middle)
Fingers$Ring < as.numeric(Fingers$Ring)
Fingers$Pinkie < as.numeric(Fingers$Pinkie)
Fingers$Height < as.numeric(Fingers$Height)
Fingers$Weight < as.numeric(Fingers$Weight)
Fingers < filter(Fingers, Thumb >= 33 & Thumb <= 100)
set.seed(2)
Height.model < lm(Thumb ~ Height, data = Fingers)
# modify to save the residuals from Height.model
Fingers$Height.resid < resid()
# modify to save the residuals from Height.model
Fingers$Height.resid < resid(Height.model)
test_data_frame("Fingers")
test_error()
require(mosaic)
require(ggformula)
Fingers < read.csv(file="https://raw.githubusercontent.com/UCLATALL/introstatsmodeling/master/datasets/fingers.csv", header=TRUE, sep=",")
# this is for measurement section
#Fingers < arrange(Fingers, desc(Sex))
#Fingers$FamilyMembers[1] < 2
#Fingers$Height[1] < 62
#Fingers$Sex < recode(Fingers$Sex, '1' = "female", '2' = "male")
Fingers < data.frame(Fingers)
# clean up str
Fingers$Sex < as.factor(Fingers$Sex)
Fingers$RaceEthnic < as.numeric(Fingers$RaceEthnic)
Fingers$SSLast < as.numeric(Fingers$SSLast)
Fingers$Year < as.numeric(Fingers$Year)
Fingers$Job < as.numeric(Fingers$Job)
Fingers$MathAnxious < as.numeric(Fingers$MathAnxious)
Fingers$Interest < as.numeric(Fingers$Interest)
Fingers$GradePredict < as.numeric(Fingers$GradePredict)
Fingers$Thumb < as.numeric(Fingers$Thumb)
Fingers$Index < as.numeric(Fingers$Index)
Fingers$Middle < as.numeric(Fingers$Middle)
Fingers$Ring < as.numeric(Fingers$Ring)
Fingers$Pinkie < as.numeric(Fingers$Pinkie)
Fingers$Height < as.numeric(Fingers$Height)
Fingers$Weight < as.numeric(Fingers$Weight)
Fingers < filter(Fingers, Thumb >= 33 & Thumb <= 100)
set.seed(2)
Height.model < lm(Thumb ~ Height, data = Fingers)
Fingers$Height.resid < resid(Height.model)
# modify to make a histogram of Height.resid
gf_histogram()
gf_histogram(~ Height.resid, data = Fingers)
ex() %>% check_function("gf_histogram") %>% check_arg("object") %>% check_equal(incorrect_msg = "Did you remember to use ~Height.resid in the first argument?")
ex() %>% check_function("gf_histogram") %>% check_arg("data") %>% check_equal(incorrect_msg = "Make sure to set data = Fingers")
ex() %>% check_error()
L_Ch8_Examining_2
The residuals from the regression line are centered at 0, just as they were from the empty model, the twogroup model, and the threegroup model. In those previous models, this was true by definition: deviations of scores around the mean will always sum to 0 because the mean is the balancing point of the residuals. Thus the sum of these negative and positive residuals will be 0.
It turns out this is also true of the bestfitting regression line: the sum of the residuals from each score to the regression line add up to 0, by definition. In this sense, too, the regression line is similar to the mean of a distribution in that it perfectly balances the scores above and below the line.