Home / Statistics Projects

Statistics Projects

Using the Materials

Many of the projects below have data sets associated with them in an Excel file. Each project tile has links to related background information. The materials are posted here to be used and may be used freely and edited as you see fit. The thumbnail graphs in the tiles along with the curve fitting where done with R.  New The R files that were used to create the graphs have been added to each tile.

Introductory Statistics

Arctic Ice Data

  • Data: Excel File (7/18)
  • R Script
  • Project: The file contains average Arctic Ice Extent from 1979 to present by month. It is excellent data for linear regression assignments.
  • Skills:  Linear regression with meaningful residual plots.
  • Notes: March (winter peak month) and September (summer low month) are particularly interesting.  The 95% confidence intervals for slopes nearly separate and may depending on the impact of the latest update.  The residual plot for March suggests a line is acceptable but the plot for September suggest curvature (see graph).  The difference is at least partly explained by a positive feedback loop due to changing albedo (fraction of solar energy reflected back into space).
  • Learn more about albedo feedback loops at Windows to the Universe.
  • Learn more about arctic ice at the National Snow and Ice Data Center.

Global Temperature Anomalies

U.S. Oil Consumption

  • Data: Excel File (1/19)
  • Project: Word
  • R script
  • Skills: Interpreting Normal Curves
  • Notes: Overall U.S. oil production doesn't fit a normal model, but this is largely due to the substantial increase in tight oil production. The excel file contains U.S. crude oil production and tight oil production since 2000. The normal model is fit to crude oil less tight oil data using solver in Excel. The Excel sheet itself is interesting as it is set up to fit the data knowing only what Hubbert knew (I think) as well as all the current data.  One can change the total oil extracted value to see how the peak and fit doesn't change much.
  • Learn more about peak oil: See world oil production.
  • Learn more about tight oil:  Student Energy  and the Union of Concerned Scientist

Lead and Crime

  • Data: Excel File (historic data - no updates)
  • R Script
  • Project: The file contains historic lead consumption from gasoline and crime rates for violent crimes, assault, rape, and robbery. It also includes unemployment rates.  The data is excellent for linear regression with lags (noted in the file) as well as multiple regression (use lead and unemployment to predict crime).
  • Notes: There is evidence of a linear correlation between lead and crime with a lag. There are plausible explanations for causation. See a summary by Kevin Drum Lead: America's Real Criminal Element
  • Learn more about Lead and Crime:  Rick Nevin  (has peer reviewed publications on lead crime - read "Horror" on his website for an overview), Kevin Drum Race, Lead, and Juvenile Crime and Lead Crime Links. An updated lead-crime roundup for 2018 by Kevin Drum.

World Oil Consumption

Mean vs Distribution, A Cold February in DC, and a Senator

  • Project: Word
  • Skills: Understanding the difference between a mean and a distribution.
  • Sustainability Goals: Understanding that as the planet warms on average one can still have a location with below average temperatures.

World Grain

  • Data: Excel File (4/17)
  • Project: Which of the grains production and consumption time series are linear?  Of those that are linear rank the fits.  How closely do the rates of change of production and consumption for each grain match? Which grain has the fastest change in production and which for consumption? From 2017 to 2050, world population is expected to increase from about 7.3 billion to 9.3 billion, a 27% increase in population. Which grains are on target to increase in production by 27% by 2050? 
  • Skills: Linear regression.
  • Notes: The data contains grain (barley, corn, millet, oats, rice, rye, sorghum, & wheat) production, consumption, and ending stocks, as well as totals and by per capita. Most of the production and consumption data is linear.
  • Learn more about food: FAO article World’s future food security “in jeopardy” due to multiple challenges, report warns  four minute interview about the report here.

Hourly Wage by Race and Gender

  • Data: Excel File (9/18)
  • R Script
  • Project: The excel file contains a number of data sets suitable for linear regression. It contains median and average hourly wages (in 2017 dollars) with categories of men and women by White, Black, and Hispanic.
  • Skills: Linear regression.
  • Notes: Data from EPI. There is plenty to discuss in comparing hourly wages by median and average. For example, the graph here has median hourly wages of men and women. The wage gap has closed but partly because men's pay has decreased.  The dynamics are not exactly the same when you look at averages instead of medians, great for a stats class. Let the discussion begin.