Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentIntroduction to Statistics: A Modeling Approach
-
segmentPART I: EXPLORING VARIATION
-
segmentChapter 1 - Welcome to Statistics: A Modeling Approach
-
segmentChapter 2 - Understanding Data
-
segmentChapter 3 - Examining Distributions
-
segmentChapter 4 - Explaining Variation
-
segmentPART II: MODELING VARIATION
-
segmentChapter 5 - A Simple Model
-
segmentChapter 6 - Quantifying Error
-
segmentChapter 7 - Adding an Explanatory Variable to the Model
-
segmentChapter 8 - Models with a Quantitative Explanatory Variable
-
segmentPART III: EVALUATING MODELS
-
segmentChapter 9 - Distributions of Estimates
-
segmentChapter 10 - Confidence Intervals and Their Uses
-
10.5 Interpreting Confidence Intervals
-
segmentChapter 11 - Model Comparison with the F Ratio
-
segmentChapter 12 - What You Have Learned
-
segmentResources
list Introduction to Statistics: A Modeling Approach
Interpreting Confidence Intervals
To summarize where we are: confidence intervals are constructed around our parameter estimate–in the case of the empty model, the sample mean. What a 95% confidence interval tells us is that there is a 95% likelihood that the true parameter (so, the mean of the population) lies within a certain range. The interval is symmetrical with respect to the sample mean, extending the same critical distance below the sample mean as it does above it.
The size of a confidence interval tells us how much fluctuation there is in our parameter estimate. It can be expressed in the original units of measurement (e.g., mm) or in terms of number of standard errors above and below the mean. The larger the standard error, the wider the confidence interval. We also realized that the confidence interval (because it is dependent on the standard error) is determined in part by 1) our degrees of freedom, and 2) the standard deviation of the population.
L_Ch10_Interpreting_1
Units of the Confidence Interval
The actual size of the 95% confidence interval is in the units of the estimate. In the case of the empty model of thumb length, the 95% confidence interval is shown below.
confint(Empty.model)
L_Ch10_Interpreting_2
Try computing the 95% confidence interval for PoundsLost by housekeepers in the MindsetMatters data frame. Remember, the confidence interval is computed based on model estimates, so fit and print the empty model first.
#load packages
require(ggformula)
require(mosaic)
require(supernova)
require(Lock5Data)
require(Lock5withR)
require(okcupiddata)
# set up exercise
MindsetMatters$PoundsLost <- MindsetMatters$Wt2 - MindsetMatters$Wt
# fit and print the empty model for PoundsLost
# compute the confidence interval around this estimate
# fit and print the empty model for PoundsLost
Empty.model <- lm(PoundsLost ~ NULL, data = MindsetMatters)
Empty.model
# compute the confidence interval around this estimate
confint(Empty.model)
test_object("Empty.model")
test_function("lm", args = "data")
test_output_contains("Empty.model")
test_function_result("confint")
test_error()
success_msg("Great job!")
L_Ch10_Interpreting_3
Compute the critical distance away from the estimate of \(b\_{0}\) in pounds using this confidence interval.
#load packages
require(ggformula)
require(mosaic)
require(supernova)
require(Lock5Data)
require(Lock5withR)
require(okcupiddata)
# set up exercise
MindsetMatters$PoundsLost <- MindsetMatters$Wt2 - MindsetMatters$Wt
Empty.model <- lm(PoundsLost ~ NULL, data = MindsetMatters)
# compute the critical distance
L_Ch10_Interpreting_4
You can think of this confidence interval (-1.7 to -.44) as telling us about the variability in our estimate. Even though the average pounds lost in the sample of housekeepers was -1.07 pounds, we are reasonably confident that the true population mean could be as low as -1.7 and as high as -.44.
Keeping Your Distributions Straight
L_Ch10_Keeping_1
All this started because our best estimate of the population mean was the sample mean. But we tried to quantify the possible error in our estimate of the population mean. After all, samples do not look just like the population they come from.
The 95% confidence interval says that we can be 95% confident that the true population mean (which we don’t know and can never measure) is within a certain range. If this range is really large, then our estimate is not as good; but if it’s smaller, our estimate is better.
L_Ch10_Interpreting_5
The reason we invoked sampling distributions is to model variation in our estimate (in this case the mean) across samples. Sampling distributions help us deal with sampling variation–how much samples from the same population (or data generating process) might vary. We used the sampling distribution to estimate the critical distance, or how far our sample mean might be from an extremely low or high population mean.
L_Ch10_Interpreting_6
The sample mean is the best point estimate we have of the population mean. So, we start there in trying to estimate the range of possible population parameters. But we know that our sample could have been a particularly low or high sample mean. The confidence interval helps us keep those possibilities in mind.