Home / Tag Archives: statistics

Tag Archives: statistics

How do you tell a story with data and maps – Beto vs Cruz?

FiveThirtyEight has an excellent article on the 2018 senate race and the possible implications for future elections. The article, What Really Happened In Texas by Kirk Goldsberry (11/14/18) analyzes voting patterns by county and compares 2014 to 2018.  Their graph copied here is the fourth in a series of maps and mostly summarizes the previous maps.

Cruz won by 220,000 votes last week. But in Harris County alone, 500,000 more people voted in the 2018 midterms than had voted in 2014. In Dallas County, 300,000 more people voted than in the last midterms, and in Travis, Bexar and Tarrant counties, 200,000 more people voted.

Indeed, aside from some noteworthy increase in voter numbers in suburban Dallas, the biggest white circles on the map above tend to hover over Beto country. Meanwhile, the darkest red counties — the places that carried Cruz back to Washington — have exhibited very little, if any, change in the number of votes cast compared to 2014. Those areas may be staunchly red, but they’re also staunchly stagnant too. O’Rourke almost won in 2018 by taking roughly 60 percent of the vote in the big five counties. This map suggests that if Democrats can repeat that feat as these places continue to grow, that may be all they need to turn Texas blue.

The data for their analysis comes from the Texas Secretary of State election results.  The 2014 data is available by following the Historical Election Results (1992-current) link. The 2018 data is available through a link along the top. This is a stats project in the making (do this for you home state). The article can also  be used in a QL course.

How does the digital divide impact secondary education for different groups?

The Pew Research Center article Nearly one-in-five teens can’t always finish their homework because of the digital divide by Monica Anderson and Andrew Perrin (10/26/18) provides insights on how lacking access to the internet impacts the ability to complete homework.  Their chart (copied here) gives the percent of school-age children by race and income without high-speed internet.  A second chart provides the results of survey about how this impacts homework. In particular,

One-quarter of black teens say they are at least sometimes unable to complete their homework due to a lack of digital access, including 13% who say this happens to them often. Just 4% of white teens and 6% of Hispanic teens say this often happens to them. (There were not enough Asian respondents in this survey sample to be broken out into a separate analysis.)

The article includes a link at the bottom for results and methodology. This includes sample sizes making this article particularly useful for statistics courses.

Who misses school the most?

The EPI article,  Student absenteeism – Who misses school and how missing school matters for performance by Emma García and Elaine Weiss (9/25/18) provides a detailed account of absenteeism based on race and gender.  For example, their chart here is the percent of students that missed three or more days in the month prior to the 2015 NAEP mathematics assessment. There are noticeable differences. For instance, the percentage of Black, White, and Asian (non ELL) that missed three or more days in the month is 23%, 18.3%, and 8.8% respectively.

Why does this matter?

In general, the more frequently children missed school, the worse their performance. Relative to students who didn’t miss any school, those who missed some school (1–2 school days) accrued, on average, an educationally small, though statistically significant, disadvantage of about 0.10 standard deviations (SD) in math scores (Figure D and Appendix Table 1, first row). Students who missed more school experienced much larger declines in performance. Those who missed 3–4 days or 5–10 days scored, respectively, 0.29 and 0.39 standard deviations below students who missed no school. As expected, the harm to performance was much greater for students who were absent half or more of the month. Students who missed more than 10 days of school scored nearly two-thirds (0.64) of a standard deviation below students who did not miss any school. All of the gaps are statistically significant, and together they identify a structural source of academic disadvantage.

These results “… identify the distinct association between absenteeism and performance, net of other factors that are known to influence performance?”  The article has 12 graphs or charts, with data available for each, including one that reports p-values.

How much have fall temperatures risen?

According to the Climate Central post, Fall Warming Trends Across the U.S. (9/5/18), the average fall temperature for the U.S. has risen nearly 3°F since 1970 (see their graph copied here).  Why does this matter:

Insects linger longer into the fall when the first freeze of the season comes later in the year. A new study from the Universities of Washington and Colorado indicates that for every degree (Celsius) of warming, global yields of corn, rice, and wheat would decline 10 to 25 percent from the increase in insects. Those losses are expected to be worst in North America and Europe.

The article has a drop down menu to select cities across the U.S. to see a graph similar to the one copied here for the selected city.  They don’t post the data that was used to create the graphs but they do explain their data sources under methodology.

A statistics project could have students create this graph for their hometown.  One way to obtain the data was noted in our post, What do we know about nighttime minimum temperatures?: Go to  NOAA’s Local Climatological Data Map. Click on the wrench under Layers. Use the rectangle tool to select your local weather station. Check off the station and Add to Cart. Follow the direction from their being sure to select csv file. You will get an email link for the data within a day.  Note: You are limited in the size of the data to ten year periods. You will need to do this more than once to get the full data set available for your station.

 

 

How much do countries spend on education?

The answer to the question depends on how it is measured.  The post  in statista, The Countries Spending the Most on Education by Martin Armstrong (9/12/2018) reports spending as a share of gross domestic product for primary, second and post-secondary non-tertiary education as well as tertiary education.  By this measure Norway spends the most. But, if the measure used is expenditure per student as a share of GDP per capita, the high spender is (south) Korea (Norway is fifth). Our graph here is a scatter plot of the two measures by country.

The data is from OECD.Stat. Go to Education and Training, Education at a Glance, Financial resources invested in education, Education finance indicators, and finally Expenditure per student as share of GDP per capita.  Under indicator at the top of the spreadsheet the measure can be changed.  Definitions of measures can be found in the OECD Handbook for Internationally Comparative Education Statistics (page 99).

Download the csv file and R-script used here.

What do we know about plastics?

The Our World in Data article Plastic Pollution by Hannah Ritchie and Max Roser (Sept 2018) is a detailed summary of plastics with 20 charts.  For example, one of the charts is a time series of plastic production (downloaded and posted here) showing that, in 2015, the world produced 381 million tons of plastic. In the same year, only 20% of the plastic was recycled (second chart in the article).  There is information on plastic waste generation.

Packaging, for example, has a very short ‘in-use’ lifetime (typically around 6 months or less). This is in contrast to building and construction, where plastic use has a mean lifetime of 35 years.7 Packaging is therefore the dominant generator of plastic waste, responsible for almost half of the global total.

Who produces the most plastic waste?

… we see the per capita rate of plastic waste generation, measured in kilograms per person per day. Here we see differences of around an order of magnitude: daily per capita plastic waste across the highest countries – Kuwait, Guyana, Germany, Netherlands, Ireland, the United States – is more than ten times higher than across many countries such as India, Tanzania, Mozambique and Bangladesh.

As always with Our World in Data, the data associated with each graph is downloadable.

What do we know about nighttime minimum temperatures?

The recent article on Climate.gov Extreme overnight heat in California and the Great Basin in July 2018 by Rebecca Lindsey (8/8/18) provides an overview in context.

As the NCEI’s Deke Arndt has blogged about before, nighttime low temperatures are increasing faster than daytime high temperatures across most of the contiguous United States. For much of the West and Southwest, July’s record-breaking nighttime heat is a new highpoint in a long-term trend—one that has rapidly accelerated in recent decades. In California, average overnight low temperature in July rose by 0.3°F per decade over the historical record (1895-2018), but since 2000, the pace of warming has accelerated to 1.3°F per decade.

Here is an example of why this matters:

According to Tim Brown, director of NOAA’s Western Region Climate Center (WRCC), it’s a pattern that has serious consequences for wildfires and those who combat them. When temperatures cool off overnight, it’s not just a physical relief for firefighters who may be working in conditions that push the limits of human endurance; fire behavior itself relaxes as temperatures drop, winds grow calmer, and relative humidity rises.

The graph here for California July minimum temperature is from the article. A stats course can have students create a similar graph for their hometown. Go to  NOAA’s Local Climatological Data Map. Click on the wrench under Layers. Use the rectangle tool to select your local weather station. Check off the station and Add to Cart. Follow the direction from their being sure to select csv file. You will get an email link for the data within a day.  Note: You are limited in the size of the data to ten year periods. You will need to do this more than once to get the full data set available for your station.

The map here  shows statewide minimum temperature ranks for July 2018.  It is from NOAA’s National Temperature and Precipitation Maps page.  Under products select Statewide Minimum Temperature Ranks and choose the desired time period.  A map similar to the one in the article can be generated by selecting CONUS Gridded Minimum Temperature Ranks.

What is the story of suicides in the U.S.?

The article in the Conversation, Why is suicide on the rise in the US – but falling in most of Europe? by Steven Stack (6/28/18), tries to get at the story. The first chart (copied here), clearly shows that the suicide rate rose from 199-2015 overall and considerably more for the 45-54 age group (stats regression problem here).  There is a second chart showing changes in suicide rates in Western European countries:

However, suicide rates in other developed nations have generally fallen. According to the World Health Organization, suicide rates fell in 12 of 13 Western European between 2000 and 2012. Generally, this drop was 20 percent or more. For example, in Austria the suicide rate dropped from 16.4 to 11.5, or a decline of 29.7 percent.

The obvious question is why?

There has been little systematic research explaining the rise in American suicide compared to declining European rates. In my view as a researcher who studies the social risk of suicide, two social factors have contributed: the weakening of the social safety net and increasing income inequality.

The article has two more charts showing that the U.S. is low on Social Welfare Expenditures as a percent of GDP and is high on inequality. In all instances the data is available for download and there are links to the original sources.

Do we disagree with factual statements that we think are opinions?

The Pew Research Center’s article Distinguishing Between Factual and Opinion Statements in the News by Amy Mitchell, Jeffrey Gottfried, Michael Barthel, and Nami Sumida (6/18/18) addresses this question and more.

A new Pew Research Center survey of 5,035 U.S. adults examines a basic step in that process: whether members of the public can recognize news as factual – something that’s capable of being proved or disproved by objective evidence – or as an opinion that reflects the beliefs and values of whoever expressed it.

We will focus on section 4 Americans overwhelmingly see statements they think are factual as accurate, mostly disagree with factual statements they incorrectly label as opinions. Odds are that a person who identifies a factual statement as opinion will also disagree with the statement (see table copied here).  For example,  41% of those surveyed said that Spending on Social Security, Medicare, and Medicaid make up the largest portion of the U.S. federal budget was an opinion and of those 82% disagreed with the statement.

This is an excellent article for a QL or Stats course as it is rich with data, graphs, and charts. You can also discuss why 41% of those surveyed thought a statement that is measurable (How much of the Federal budget goes to social security, medicare, and medicaid?) was an opinion.  The article also includes detailed information on their methodology and detailed tables of data.

 

What is the CEO to worker pay gap?

U.S. Publicly held companies now have to report CEO and median worker salaries (this was part of the Dodd-Frank Wall Street Reform and Consumer Protection Act of 2010) and Bloomberg has an article, Alphabet CEO Page Makes a Tiny Fraction
Compared to Its Median Employee by Alicia Ritcey and Jenn Zhao (5/15/18), with an interactive graph (see image).   Mattel “wins” with a CEO to median worker pay ratio of 4,987-1. Walmart “wins” in the consumer staple category with 1,188-1 ratio.  In the interactive graph there is a button on the top right that hides outliers. This is useful, but be conscious of whether it is on or off.

The Guardian article ‘CEOs don’t want this released’: US study lays bare extreme pay-ratio problem by Edward Helmore (5/16/18)  provides some context and a summary.  The Bloomberg graph is being updated daily.  Rep. Keith Elliston’s staff prepared the report Rewarding or Hoarding? An Examination of Pay Ratios Revealed by Dodd-Frank, which has the data of the first 225 Fortune 500 companies to report and and details on the data collection. The data in the report can be used in statistics courses to test differences by sector.  At some point maybe Bloomberg will post a spreadsheet of the data (one can also ask for it too).