Course Outline

segmentGetting Started (Don't Skip This Part)

segmentStatistics and Data Science: A Modeling Approach

segmentPART I: EXPLORING VARIATION

segmentChapter 1  Welcome to Statistics: A Modeling Approach

segmentChapter 2  Understanding Data

segmentChapter 3  Examining Distributions

segmentChapter 4  Explaining Variation

segmentPART II: MODELING VARIATION

segmentChapter 5  A Simple Model

segmentChapter 6  Quantifying Error

segmentChapter 7  Adding an Explanatory Variable to the Model

segmentChapter 8  Models with a Quantitative Explanatory Variable

segmentPART III: EVALUATING MODELS

segmentChapter 9  Distributions of Estimates

segmentChapter 10  Confidence Intervals and Their Uses

10.5 Interpreting Confidence Intervals

segmentChapter 11  Model Comparison with the F Ratio

segmentChapter 12  What You Have Learned

segmentFinishing Up (Don't Skip This Part!)

segmentResources
list Introduction to Statistics: A Modeling Approach
10.5 Interpreting Confidence Intervals
To summarize where we are: confidence intervals are constructed around our parameter estimate—in the case of the empty model, the sample mean. What a 95% confidence interval tells us is that there is a 95% likelihood that the interval contains the true parameter (so, the mean of the population). The interval is typically symmetrical with respect to the sample mean, extending the same distance below the sample mean as it does above it.
The size of a confidence interval tells us how much fluctuation there is in our parameter estimate. It can be expressed in the original units of measurement (e.g., mm) or in terms of number of standard errors above and below the mean. The larger the standard error, the wider the confidence interval. We also realized that the confidence interval (because it is dependent on the standard error) is determined in part by 1) our degrees of freedom, and 2) the standard deviation of the population.
Units of the Confidence Interval
The actual size of the 95% confidence interval is in the units of the estimate. In the case of the empty model of thumb length, the 95% confidence interval is shown below.
confint(Empty.model)
2.5 % 97.5 %
(Intercept) 58.72794 61.47938
Try computing the 95% confidence interval for PoundsLost by housekeepers in the MindsetMatters data frame. Remember, the confidence interval is computed based on model estimates, so fit and print the empty model first.
packages < c("mosaic", "Lock5withR", "Lock5Data", "supernova", "ggformula", "okcupiddata")
lapply(packages, library, character.only = T)
MindsetMatters$PoundsLost < MindsetMatters$Wt2  MindsetMatters$Wt
# Fit and save the empty model for PoundsLost
# Print your empty model
# Compute the confidence interval around this estimate
# Fit and save the empty model for PoundsLost
Empty.model < lm(PoundsLost ~ NULL, data = MindsetMatters)
# Print your empty model
Empty.model
# Compute the confidence interval around this estimate
confint(Empty.model)
ex() %>% {
check_object(., "Empty.model") %>% check_equal()
check_output_expr(., "Empty.model")
check_function(., "confint") %>% check_result() %>% check_equal()
}
Compute the margin of error (in pounds) around the estimate of \(b_{0}\) using this confidence interval.
packages < c("mosaic", "Lock5withR", "Lock5Data", "supernova", "ggformula", "okcupiddata")
lapply(packages, library, character.only = T)
MindsetMatters < MindsetMatters %>% mutate(PoundsLost = Wt2  Wt)
# This saves the empty model to Empty.model
Empty.model < lm(PoundsLost ~ NULL, data = MindsetMatters)
# Compute the margin of error
# This saves the empty model
Empty.model < lm(PoundsLost ~ NULL, data = MindsetMatters)
# There are many ways to calculate the margin of error
# One way:
confint(Empty.model)[[2]]  mean(MindsetMatters$PoundsLost)
# Another way:
(confint(Empty.model)[[2]]  confint(Empty.model)[[1]]) / 2
ex() %>% check_output_expr("confint(Empty.model)[[2]]  mean(MindsetMatters$PoundsLost)")
[1] 0.631077
You can think of this confidence interval (1.7 to .44) as telling us about the variability in our estimate. Even though the average pounds lost in the sample of housekeepers was 1.07 pounds, we are reasonably confident that the true population mean could be as low as 1.7 and as high as .44.
Keeping Your Distributions Straight
All this started because our best estimate of the population mean was the sample mean. But we tried to quantify the possible error in our estimate of the population mean. After all, samples do not look exactly like the population they come from.
The 95% confidence interval says that we can be 95% confident that the true population mean (which we don’t know and can never measure) is within a certain range. If this range is really large, then our estimate is not as good; but if it’s smaller, our estimate is better.
The reason we invoked sampling distributions is to model variation in our estimate (in this case the mean) across samples. Sampling distributions help us deal with sampling variation—how much samples from the same population (or Data Generating Process) might vary. We used the sampling distribution to estimate the margin of error—how far off the population mean could be from our estimate.
The sample mean is the best point estimate we have of the population mean. So, we start there in trying to estimate the range of possible population parameters. But we know that our sample could have been a particularly low or high sample mean. The confidence interval helps us keep those possibilities in mind.