Some interesting publicly available data sets

Steve Simon

2015-07-15

I’ve been looking for a few interesting data sets for use as teaching examples. I wanted data associated with peer-reviewed publications. It’s a difficult and tedious search, but here are a few promising leads.

Normative Data for Email Writing by Native Speakers of British English. Authors: Lindsey Thiel , Karen Sage, Paul Conroy. Journal of Open Psychology Data. Available in html format.

This is a simple Word document that might be useful for text analytics. Forty two subjects were asked to write short emails with a three minute time limit. Each subject wrote three emails:

  1. an email to a friend arranging a meeting,
  2. an email to a friend describing a recent trip, and
  3. an email to a local political representative about an issue of concern to you.

Data from “Demographic Influences on Disgust: Evidence from A Heterogeneous Sample.” Authors: Uri Berger, David Anaki. Journal of Open Psychology Data. Avialable in html format.

The disgust scale is a 25 item survey with three subscales (core disgust, animal reminder, and contamination based disgust). The survey was given to 1,414 Israeli citizens. The individual items and the composite values as well as a wealth of demographic data are included.

Jessica F. Stephenson, Cock Van Oosterhout, Ryan S. Mohammed, and Joanne Cable. 2015. Parasites of Trinidadian guppies: evidence for sex- and age-specific trait-mediated indirect effects of predators. Ecology 96:489–498. Available in html format.

The key variable in this data set, weight, is a function of the site location and the number and types of predators.

Here is a nice list of journals that publish full data sets.

You can find an earlier version of this page on my blog.