Course Outline

list Introduction to Statistics: A Modeling Approach

Interpreting Confidence Intervals

To summarize where we are: confidence intervals are constructed around our parameter estimate–in the case of the empty model, the sample mean. What a 95% confidence interval tells us is that there is a 95% likelihood that the true parameter (so, the mean of the population) lies within a certain range. The interval is symmetrical with respect to the sample mean, extending the same critical distance below the sample mean as it does above it.

The size of a confidence interval tells us how much fluctuation there is in our parameter estimate. It can be expressed in the original units of measurement (e.g., mm) or in terms of number of standard errors above and below the mean. The larger the standard error, the wider the confidence interval. We also realized that the confidence interval (because it is dependent on the standard error) is determined in part by 1) our degrees of freedom, and 2) the standard deviation of the population.

L_Ch10_Interpreting_1

Units of the Confidence Interval

The actual size of the 95% confidence interval is in the units of the estimate. In the case of the empty model of thumb length, the 95% confidence interval is shown below.

confint(Empty.model)

L_Ch10_Interpreting_2

Try computing the 95% confidence interval for PoundsLost by housekeepers in the MindsetMatters data frame. Remember, the confidence interval is computed based on model estimates, so fit and print the empty model first.

#load packages require(ggformula) require(mosaic) require(supernova) require(Lock5Data) require(Lock5withR) require(okcupiddata) # set up exercise MindsetMatters$PoundsLost <- MindsetMatters$Wt2 - MindsetMatters$Wt # fit and print the empty model for PoundsLost # compute the confidence interval around this estimate # fit and print the empty model for PoundsLost Empty.model <- lm(PoundsLost ~ NULL, data = MindsetMatters) Empty.model # compute the confidence interval around this estimate confint(Empty.model) test_object("Empty.model") test_function("lm", args = "data") test_output_contains("Empty.model") test_function_result("confint") test_error() success_msg("Great job!")
When creating your empty model, use `PoundsLost ~ NULL`. Remember to use the MindsetMatters data frame
DataCamp: ch10-17

L_Ch10_Interpreting_3

Compute the critical distance away from the estimate of \(b\_{0}\) in pounds using this confidence interval.

#load packages require(ggformula) require(mosaic) require(supernova) require(Lock5Data) require(Lock5withR) require(okcupiddata) # set up exercise MindsetMatters$PoundsLost <- MindsetMatters$Wt2 - MindsetMatters$Wt Empty.model <- lm(PoundsLost ~ NULL, data = MindsetMatters) # compute the critical distance
DataCamp: ch10-18

L_Ch10_Interpreting_4

You can think of this confidence interval (-1.7 to -.44) as telling us about the variability in our estimate. Even though the average pounds lost in the sample of housekeepers was -1.07 pounds, we are reasonably confident that the true population mean could be as low as -1.7 and as high as -.44.

Keeping Your Distributions Straight

L_Ch10_Keeping_1

All this started because our best estimate of the population mean was the sample mean. But we tried to quantify the possible error in our estimate of the population mean. After all, samples do not look just like the population they come from.

The 95% confidence interval says that we can be 95% confident that the true population mean (which we don’t know and can never measure) is within a certain range. If this range is really large, then our estimate is not as good; but if it’s smaller, our estimate is better.

L_Ch10_Interpreting_5

The reason we invoked sampling distributions is to model variation in our estimate (in this case the mean) across samples. Sampling distributions help us deal with sampling variation–how much samples from the same population (or data generating process) might vary. We used the sampling distribution to estimate the critical distance, or how far our sample mean might be from an extremely low or high population mean.

L_Ch10_Interpreting_6

The sample mean is the best point estimate we have of the population mean. So, we start there in trying to estimate the range of possible population parameters. But we know that our sample could have been a particularly low or high sample mean. The confidence interval helps us keep those possibilities in mind.

Responses