Course Outline

segmentGetting Started (Don't Skip This Part)

segmentIntroduction to Statistics: A Modeling Approach

segmentPART I: EXPLORING VARIATION

segmentChapter 1  Welcome to Statistics: A Modeling Approach

segmentChapter 2  Understanding Data

segmentChapter 3  Examining Distributions

segmentChapter 4  Explaining Variation

segmentPART II: MODELING VARIATION

segmentChapter 5  A Simple Model

segmentChapter 6  Quantifying Error

6.3 Standard Deviation

segmentChapter 7  Adding an Explanatory Variable to the Model

segmentChapter 8  Models with a Quantitative Explanatory Variable

segmentPART III: EVALUATING MODELS

segmentChapter 9  Distributions of Estimates

segmentChapter 10  Confidence Intervals and Their Uses

segmentChapter 11  Model Comparison with the F Ratio

segmentChapter 12  What You Have Learned

segmentResources
list Introduction to Statistics: A Modeling Approach
Standard Deviation
The standard deviation (written as \(s\)) is simply the square root of the variance. We generally prefer thinking about error in terms of standard deviation because it yields a number that makes sense using the original scale of measurement. So, for example, if you were modeling weight in pounds, variance would express the error in squared pounds (not something we are used to thinking about), whereas standard deviation would express the error in pounds.
Here are two formulas that represent the standard deviation:
\[s = \sqrt{s^_{2}}\]
\[\sqrt{\frac{\sum_{i=1}^n (Y_i\bar{Y})^2}{n1}}\]
L_Ch6_Standard_1
To calculate standard deviation in R, we use sd()
. Here is how to calculate the standard deviation of our Thumb data from TinyFingers.
sd(TinyFingers$Thumb)
There are actually a few different ways you can get the standard deviation for a variable. The function sd()
obviously. But you can also square root the variance with a combination of the functions sqrt()
and var()
. Yet another, and possibly more useful, way is to use good old favstats()
. Try all three of these methods to calculate the standard deviation of Thumb from the larger Fingers data frame.
require(mosaic)
require(ggformula)
require(supernova)
# calculate the standard deviation of Thumb from Fingers
sd( )
# calculate the standard deviation with sqrt() and var()
sqrt(var( ))
# calculate the standard deviation with favstats()
favstats( )
sd(Fingers$Thumb)
sqrt(var(Fingers$Thumb))
favstats(~ Thumb, data = Fingers)
test_function_result("sd")
test_correct(test_function_result("sqrt"),
{
test_function("var")
test_error()
})
test_function_result("favstats")
test_error()
L_Ch6_Standard_2
Sum of Squares, Variance, and Standard Deviation
We have discussed three ways of quantifying error around our model. All start with residuals, but they aggregate those residuals in different ways to summarize total error.
All of them are minimized at the mean, and so all are useful when the mean is the model for a quantitative variable.
L_Ch6_Standard_3
Thinking About Quantifying Error in MindsetMatters
Below is a histogram of amount of weight lost (PoundsLost) by each of the 75 housekeepers in the MindsetMatters data frame.
Use R to create an empty model of PoundsLost. Call it Empty.model. Then find the SS, Variance, and Standard Deviation around this model.
require(mosaic)
require(ggformula)
MindsetMatters < read.csv(file="https://raw.githubusercontent.com/UCLATALL/introstatsmodeling/master/datasets/mindsetmatters.csv", header=TRUE, sep=",")
MindsetMatters$PoundsLost < MindsetMatters$Wt  MindsetMatters$Wt2
# create an empty model of PoundsLost from MindsetMatters
Empty.model <
# find SS, var, and sd
# there are multiple correct solutions
# create an empty model of PoundsLost from MindsetMatters
Empty.model < lm(PoundsLost ~ NULL, data = MindsetMatters)
# find SS, var, and sd
anova(Empty.model)
var(MindsetMatters$PoundsLost)
sd(MindsetMatters$PoundsLost)
test_object("Empty.model")
test_error()
success_msg("Nice job!")
There are multiple ways to compute these in R but here is one set of possible outputs.
L_Ch6_Thinking_1
L_Ch6_Thinking_2
Notation for Mean, Variance, and Standard Deviation
Finally, we also use different symbols to represent the variance and standard deviation of a sample, on one hand, and the population, on the other. Sample statistics are also called estimates because they are our best estimates of the DGP parameters. We have summarized these symbols in the table below (pronunciations are in parentheses).
L_Ch6_Thinking_3
Variance is the mean squared error. It is an average of the squared deviations from the mean. In tables, it may be shortened to be Mean Square or MSE. You now know that just means variance. Remember in the output (see below) from the anova()
function the column headed “Mean Sq”? That is, in fact, the variance.
anova(Empty.model)
var(Fingers$Thumb)