segmentGetting Started (Don't Skip This Part)
segmentIntroduction to Statistics: A Modeling Approach
segmentPART I: EXPLORING VARIATION
segmentChapter 1 - Welcome to Statistics: A Modeling Approach
segmentChapter 2 - Understanding Data
segmentChapter 3 - Examining Distributions
segmentChapter 4 - Explaining Variation
segmentPART II: MODELING VARIATION
segmentChapter 5 - A Simple Model
segmentChapter 6 - Quantifying Error
segmentChapter 7 - Adding an Explanatory Variable to the Model
segmentChapter 8 - Models with a Quantitative Explanatory Variable
segmentPART III: EVALUATING MODELS
segmentChapter 9 - Distributions of Estimates
segmentChapter 10 - Confidence Intervals and Their Uses
segmentChapter 11 - Model Comparison with the F Ratio
segmentChapter 12 - What You Have Learned
list Introduction to Statistics: A Modeling Approach
Doing Statistics With R
Speaking of doing, how are you going to do the data analysis part of this course? The answer is: you are going to use R (yes, it’s just called R, the letter). R is a free open source coding language commonly used by statisticians. Open source means that R was developed and is maintained not by a company but by a community of users. So basically, anyone can contribute to R and help make it better.
Technology is a fundamental part of doing statistics these days. In fact, most of what we do in terms of data analysis would not be possible without computers, and most statistics courses include learning to use software for data analysis. There are many different software packages available. We chose to use R for two reasons: First, it’s free. Second, it’s a coding language.
You may already know a bit about computer coding (or programming). But if you don’t, it’s worth demystifying it a little. Computers manipulate data rapidly and accurately—something we need to do in statistics. A coding language is the language we use for telling a computer what to do. It’s really that simple.
You may be thinking: coding language; that sounds hard! It may, in fact, be a little harder than just learning to use a statistics package with a point-and-click interface. But don’t worry: we will take you through it step by step, slowly. You might even enjoy it.
We want you to learn some R because we believe writing code will help you understand statistics better than will clicking on buttons in a statistics package. And, it also will give you a skill at the end of this course that you didn’t have before! You can even put it on your resume (as in, “Basic knowledge of data analysis with R”).
Representing the same concept in different forms (called “re-representation”) helps make learning more robust. In this course, you will use a number of different representations: words, graphs, tables, mathematical notation, and R. Making connections between these different representations will deepen your understanding.
Try Some R Code
For example, here’s a bit of R (what we sometimes refer to as “code”). Read the code in the window below. What do you think it will do? Press the Run button and see what happens.
ex() %>% check_function("print") %>% check_arg("x") %>% check_equal() success_msg("Great job!")
Look in the Console window. You can see that R displayed “Hello world!” Note: when we tell R to
print(), R interprets that to mean, “Display in the console.” You just figured out a little bit of R.
Let’s try another one. Read the code and see if you can guess what it will do. Then press the Run button in the window below.
ex() %>% check_function("sum") %>% check_arg("...", arg_not_specified_msg = "Make sure you don't delete what's inside the parentheses") %>% check_equal(incorrect_msg = "Make sure you don't change what's inside the parentheses") ex() %>% check_error success_msg("Great job!")
This bit of code printed out the sum of 1, 5, and 10 (that is, 16). You are already learning a bit of code. Notice that in R, it always prints out the code that it ran, then the result of the code.
The rectangular window that you just interacted with is called a DataCamp exercise. DataCamp exercises are constructed to help you learn R without having to install anything or do anything special to your computer. You can just focus on learning R.
The Windows of R: script.R and R Console
In the DataCamp exercise, you’ll see a few different windows. The part where you type the code (i.e., the instructions for what the computer should do) is called the script window. The window where the code actually runs, and where the results appear, is called the R console.
Throughout the course, type your R code into the script.R window. When you are done, press Run, and you will see R execute your instructions in the R console window. If you want you can type an instruction directly into the console. But you can only run one command at a time in the console, whereas you can enter a sequence of commands in the script window. We recommend you just type into the Script.R window for now.
These windows may look different depending on how wide your browser window is. If your window is wide enough, the windows will look like this, with the Script.R window on the left and the R console window on the right:
But if your browser is a little bit less wide, the two windows will be tabbed one behind the other, like this:
Just be sure to notice which window you are typing in. Once you run a command, the R console window will come to the front to show you the results. So, before typing more commands, or revising your script, be sure to click back over to the Script.R window using the tabs.
Sometimes we will write things in script.R that we want R to ignore. These are called comments and they start with a
#. R will ignore comments, and just execute the code. In this course we will use the comments as a way to give you instructions for R exercises. In the DataCamp window below, try typing whatever you want after a
# at the front of the line. Then press Run.
require(mosaic) require(tidyverse) require(ggformula) require(supernova) require(Lock5Data) require(Lock5withR)
# type whatever you want # see... blah blah blah
If you want to write a comment that takes more than one line, it’s a good idea to put a # at the beginning of each line.
You can also use R like a basic calculator. Try running this code to see the results in the R console. Just press RUN.
# a few basic arithmetic things 5 + 1 10 - 3 2*4 9/3
# a few basic arithmetic things 5 + 1 10 - 3 2*4 9/3
ex() %>% check_operator("+") %>% check_result() %>% check_equal() ex() %>% check_operator("-") %>% check_result() %>% check_equal() ex() %>% check_operator("*") %>% check_result() %>% check_equal() ex() %>% check_operator("/") %>% check_result() %>% check_equal() ex() %>% check_error()
Notice that you can put more than one line of code—or set of instructions—in a single script window. When you press the Run button, all the commands in the window will be run, one after the other in the order in which they appear.
How to Learn the Most from DataCamp
The Run button will run your code in the DataCamp window. The Submit button will submit your answer to be graded. You’ll learn the most by trying to write code, running it, and keep trying until it works. Once you know how it works, submit it for grading. There won’t always be a Submit button; in those few cases, your answer won’t be graded.
There is also a Hint button. Don’t be too fast to click the Hint button. You’ll learn more if you try on your own without the hint first. After you click the Hint button, you might see a Solution button. It’s tempting to look at the solution but trust us, you won’t learn R unless you try writing code on your own.
The longer you try, even when it feels frustrating, the more you will learn.
Course Note: A DataCamp Sandbox is Always Available
We will always provide a DataCamp window when you need one. But, sometimes you may just want to try something out.
You can click the link in the sidebar that says DataCamp Sandbox and it will open an empty DataCamp window in a new tab. If you leave it open there, you will always have a handy place to run some R code.
DataCamp Now, R Studio Later
In this course we will run all of our R code in the DataCamp windows. DataCamp is a company that focuses on teaching people data science and coding online. If you get interested in learning more about R after this course, you can try some of DataCamp’s more advanced courses.
DataCamp is great for learning R. But later, when you start doing actual data analysis projects, you will probably use a different software package called RStudio. RStudio is an application that lets you write and run R code on your computer.
There is no need to download and install RStudio on your computer at this point. But if you want to do it later, as you get a better grasp of R, here is a link to the RStudio page as well as a pdf of instructions we wrote for our students. RStudio, by the way, is also a free resource.
Because R is open source, there are people always inventing new things to do with R, so there is always more to learn.