Datasets
Right-click on any file below to download it or copy the URL for use in R, Excel, or other software.
- Animal Taxonomy
- Bigfoot Temp & Moonphase
- Bigfoot Temp, Moon & Wind
- Boys Growth
- Chemical Reactions
- Climate Change (Blue Hill)
- Climate Change (Monthly)
- Contaminated Water
- Credit Scores
- Cricket Chirps
- Elephants
- Hospital & Antibiotics
- Insect Frequency
- Insect Frequency Table
- LA Ozone
- Lung Capacities
- Male Brain Weights
- Penguin Body Mass
- Mammals
- Old Faithful
- Sodium Restricted mEQ
- Temp & Heart Rate
- King Tut's Mummy Curse
RMarkdown Starter Kit
Tip: Knit to HTML first — it causes far fewer problems than knitting directly to PDF. If you need a PDF, open the HTML file in your browser and print to PDF from there (File → Print → Save as PDF). Set your YAML output to html_document as shown in the skeleton below.
Every RMarkdown document begins with a YAML header (the block between the --- lines) and contains a mix of R code chunks and written commentary. Here is the minimum skeleton:
A few things to remember:
- The setup chunk at the top should load packages and read in data. Setting include=FALSE keeps it from cluttering your output.
- Every code chunk opens with ```{r} and closes with ```. Missing either one will cause a knitting error.
- Use ## for section headings. Write your interpretations in plain text outside the code chunks.
- Knit early and often — don't wait until the end to check if your document compiles.
Project Templates
These starter files have the YAML header, data import, and question sections already set up. Open one in RStudio and fill in your code and interpretations.
Worked Exemplar
This completed RMarkdown file analyzes the Temperature and Heart Rate dataset and demonstrates what a finished project should look like — code, output, and written interpretations together.
R Command Reference
Commands are grouped by the modules where they are introduced. If you need a command from a later module, you may need to define a custom function first — see the project templates for those.
Modules 01–05: Data Manipulation and Descriptive Statistics
| Command | What It Does |
|---|---|
| read.csv("url") | Read a CSV file into a data frame |
| library(dplyr) | Load the dplyr package for data manipulation |
| filter(df, condition) | Subset rows that meet a condition |
| df$variable | Access a single column from a data frame |
| mean(x) | Arithmetic mean |
| median(x) | Median |
| sd(x) | Standard deviation |
| var(x) | Variance |
| cor(x, y) | Correlation coefficient between two variables |
| length(x) | Number of elements in a vector |
| sum(x) | Sum of elements (also counts TRUE values in a logical vector) |
| c(a, b, c) | Combine values into a vector |
| hist(x) | Histogram |
| boxplot(x, horizontal=T) | Boxplot (horizontal orientation) |
| plot(x, y) | Scatterplot |
Modules 06–08: Probability Distributions and Sampling
| Command | What It Does |
|---|---|
| dbinom(x, n, p) | Binomial probability: P(X = x) |
| pbinom(x, n, p) | Cumulative binomial probability: P(X ≤ x) |
| qbinom(prob, n, p) | Binomial quantile: value where P(X ≤ x) = prob |
| dpois(x, lambda) | Poisson probability: P(X = x) |
| ppois(x, lambda) | Cumulative Poisson probability: P(X ≤ x) |
| dgeom(x, p) | Geometric probability: P(X = x) |
| pgeom(x, p) | Cumulative geometric probability: P(X ≤ x) |
| punif(x, min, max) | Cumulative uniform probability: P(X ≤ x) |
| qunif(prob, min, max) | Uniform quantile |
| pnorm(x, mean, sd) | Cumulative normal probability: P(X ≤ x) |
| qnorm(prob, mean, sd) | Normal quantile: value where P(X ≤ x) = prob |
| sqrt(x) | Square root |
Module 09: Assessing Normality
| Command | What It Does |
|---|---|
| stripchart(x, method="stack") | Stacked dot plot for spotting outliers and skewness |
| qqnorm(x) | Normal quantile plot — points should follow a line if data is normal |
| qqline(x) | Add the reference line to a normal quantile plot |
| rnorm(n, mean, sd) | Generate n random values from a normal distribution |
| sample(x, size) | Draw a random sample of a given size from a vector |
Modules 10–11: Hypothesis Testing and Multiple Testing
| Command | What It Does |
|---|---|
| one.samp.t.test.sum(...) | One-sample t-test from summary statistics (custom function — see project templates) |
| one.samp.t.test.data(...) | One-sample t-test from raw data (custom function — see project templates) |
| one.samp.prop.test(...) | One-sample proportion z-test (custom function — see project templates) |
| pt(t, df) | Cumulative t-distribution probability |
| Ps * k | Bonferroni adjustment: multiply p-values by the number of tests |
Module 12: Two-Sample Tests
| Command | What It Does |
|---|---|
| t.test(x, y, alternative=...) | Two-sample t-test (independent samples) |
| t.test(x, y, paired=TRUE) | Paired t-test |
| prop.test(c(x1,x2), c(n1,n2)) | Two-sample proportion test |
Module 13: Regression
| Command | What It Does |
|---|---|
| lm(y ~ x1 + x2) | Fit a linear (or multiple) regression model |
| summary(model) | Coefficients, p-values, and R-squared for a fitted model |
| coefficients(model) | Extract the regression coefficients |
| fitted(model) | Predicted values from the model |
| resid(model) | Residuals (observed − predicted) |
| plot(fitted(m), resid(m)) | Residual plot to check model assumptions |
Common Errors
These are the errors students run into most often when knitting RMarkdown documents. If your document won't knit, start here.
Object not found
Error in mean(mam$gestation) : object 'mam' not foundYour data frame hasn't been created yet. This usually means you didn't run the setup chunk. In RStudio, click the green arrow on the setup chunk first, or knit the whole document (the setup chunk runs automatically when knitting).
Could not find function
Error in filter(mam, bodywt < 100) : could not find function "filter"The package that contains this function hasn't been loaded. Make sure library(dplyr) is in your setup chunk and that the setup chunk has been run.
Unexpected end of input
Error: unexpected end of inputA code chunk is missing its closing ```. Scroll through your .Rmd file and make sure every chunk that opens with ```{r} has a matching ``` on its own line.
Unexpected symbol / Unexpected string constant
Error: unexpected symbol in "mean(mam$ gestation)"There's a typo or extra space in your R code. In this example, the space between $ and gestation breaks the command. Check for stray spaces, missing commas, or unmatched parentheses.
Non-numeric argument to binary operator
Error in x * y : non-numeric argument to binary operatorYou're trying to do math on something that isn't a number. This often happens when a column name is misspelled and R returns NULL instead of data. Double-check your variable names against the data frame.
Knitting produces a blank or incomplete document
(No error message — document just looks wrong)Make sure your written text is outside code chunks. Text inside a chunk is treated as R code. Also check that you have a blank line before and after each ## heading.
Cannot open connection / Cannot open URL
Error in file(file, "rt") : cannot open the connectionR can't reach the data file. If you're loading from a URL, make sure you're connected to the internet and the URL is correct. If loading a local file, check that the file path is right and the file is in your working directory.
Plots not appearing in output
(No error — the plot just doesn't show up in the knitted document)Make sure the plotting command is inside a code chunk. Also check that the chunk doesn't have include=FALSE or eval=FALSE in its header, which would suppress the output.