Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentIntroduction to Statistics: A Modeling Approach
-
segmentPART I: EXPLORING VARIATION
-
segmentChapter 1 - Welcome to Statistics: A Modeling Approach
-
segmentChapter 2 - Understanding Data
-
segmentChapter 3 - Examining Distributions
-
segmentChapter 4 - Explaining Variation
-
4.2 Outcome and Explanatory Variables
-
segmentPART II: MODELING VARIATION
-
segmentChapter 5 - A Simple Model
-
segmentChapter 6 - Quantifying Error
-
segmentChapter 7 - Adding an Explanatory Variable to the Model
-
segmentChapter 8 - Models with a Quantitative Explanatory Variable
-
segmentPART III: EVALUATING MODELS
-
segmentChapter 9 - Distributions of Estimates
-
segmentChapter 10 - Confidence Intervals and Their Uses
-
segmentChapter 11 - Model Comparison with the F Ratio
-
segmentChapter 12 - What You Have Learned
-
segmentResources
list Introduction to Statistics: A Modeling Approach
Outcome and Explanatory Variables
We now have a sense of what it means to explain variation in one variable by variation in another. In the previous section, for example, we saw that variation in thumb length could be explained, in part, by variation in sex. In this section we want to begin developing a language for describing the variables that play different roles in these relationships, relationships in which one variable explains variation in another.
I’m putting in one Vimeo video here:
A variable is a variable is a variable. But, as soon as we start explaining variation we need to decide which variable’s variation are we trying to explain, and which variables are doing the explaining.
The outcome variable is the variable whose variation we are trying to explain.
L_Ch4_Outcome_1
In this course, the tools and methods we use will focus on a single outcome variable at one time.
The explanatory variables are the variables we use to explain variation in the outcome variable. Although we will initially consider only one outcome variable at a time, we will in this course allow for the possibility of using multiple explanatory variables at a time.
L_Ch4_Outcome_2
You may or may not have heard the terms “outcome variable” and “explanatory variable.” We will use these terms throughout, but if you’ve taken statistics before, or read any research reports, you will no doubt have encountered a number of different terms used to represent the same distinction. Some of these are presented in the table below.
L_Ch4_Outcome_3
It is important to figure out whether a variable is quantitative or categorical. But it is equally important to figure out whether a variable is an outcome or explanatory variable. Making this latter distinction requires a greater understanding of the context that the data pertain to, and the purpose for collecting the data. Let’s think about a situation and try to figure out what the outcome variable might be.
In the MindsetMatters data frame, we have the results of an experiment where a researcher informed a randomly chosen group of housekeepers (41 of them) that the work they do satisfies the Surgeon General’s recommendations for an active lifestyle (which is true), giving them examples to illustrate why their work is considered good exercise. The other 34 housekeepers were told nothing.
Whether an individual housekeeper was informed or not was recorded in the variable called Condition (either Informed or Uninformed). The researcher also recorded whether each housekeeper lost weight or not after four weeks in a categorical variable called WtLost (either lost or not lost). The first six rows of the data frame for these two variables are shown below.
head(select(MindsetMatters, Condition, WtLost))
L_Ch4_Outcome_4
Write some code to examine the distribution of the outcome variable.
# creates MindsetMatters data frame
MindsetMatters <- read.csv(file="https://raw.githubusercontent.com/UCLATALL/intro-stats-modeling/master/datasets/mindset-matters.csv", header=TRUE, sep=",")
# loads packages
require(mosaic)
require(tidyverse)
#Creates WtLost variable
MindsetMatters$WtLost <- MindsetMatters$Wt2 < MindsetMatters$Wt
MindsetMatters$WtLost = ifelse(MindsetMatters$WtLost, "lost", "not lost")
# Write code to make the most appropriate visualization for the outcome variable
gf_bar(~ WtLost, data = MindsetMatters)
test_function_result("gf_bar")
test_function("gf_bar", args="data", incorrect_msg="Did you call `gf_bar()` on MindsetMatters?")
test_error()
We’ve used a bar graph to visualize the distribution of WtLost (gf_bar()
).
L_Ch4_Outcome_5
You can use gf_facet_grid()
with any plot, not just histograms. Try creating a bar graph of WtLost (like the one above), but chain on a facet grid to compare the outcome across conditions.
# creates MindsetMatters data frame
MindsetMatters <- read.csv(file="https://raw.githubusercontent.com/UCLATALL/intro-stats-modeling/master/datasets/mindset-matters.csv", header=TRUE, sep=",")
# loads packages
require(mosaic)
require(tidyverse)
#Creates WtLost variable
MindsetMatters$WtLost <- MindsetMatters$Wt2 < MindsetMatters$Wt
MindsetMatters$WtLost = ifelse(MindsetMatters$WtLost, "lost", "not lost")
# Create a bar graph of WtLost then use gf_facet_grid() to compare the outcome across conditions
gf_bar(~ WtLost, data = MindsetMatters) %>%
gf_facet_grid(Condition ~ .)
test_function_result("gf_bar")
test_function("gf_facet_grid")
test_error()
success_msg("Fantastic work!")
There is a real limitation in this graph. Because the sample sizes are different between the two groups, you have to look at the relative difference in percentage of housekeepers who lost weight between the two groups, mentally controlling for the difference in sample size. If these were histograms we would just use a density histogram instead. But R doesn’t make it easy to do this for bar graphs.
A better way to make the comparison using relative frequencies is just to use the tally()
command. Here’s some code that would give us a more easily interpretable comparison:
tally(WtLost ~ Condition, data = MindsetMatters, format = "proportion")
L_Ch4_Outcome_6
So far we have considered both quantitative (e.g., Thumb) and categorical (e.g., WtLost) outcomes. We have also looked at some categorical explanatory variables (e.g., Sex and Condition). But we haven’t yet looked at any examples of quantitative explanatory variables.
But there isn’t any reason to think that we couldn’t. Perhaps a quantitative variable like age or initial weight might help us predict how much weight a housekeeper will lose. We’ll come back around to the idea of a quantitative explanatory variable later.
L_Ch4_Outcome_7
In summary, the key difference between whether a histogram or a tally will be more useful has to do with the type of outcome variable.
L_Ch4_Outcome_8