Dataset recommendations from statlib


A very old resource on the Internet with interesting datasets is statlib. The quality of the documentation is a bit uneven and some of the datasets are too simple to be used in my classes. Other files use oddball or obscure formats. But here are a few datasets that look interesting.

Cloud seeding experiment

This is a fairly simple file. While the documentation is brief, the setting is simple enough that you can probably figure things out.

College rankings

There are two separate files here and each is in two different formats. The documentation is adequate.

Speeding tickets

This is a direct link to an Excel file. It looks interesting, but there is no documentation and some of the number codes are impossible to associate with the appropriate labels.

Florida vote counts in 2000 election

The counts should, perhaps, be converted into percentages. In addition to the well documented issues in Palm Beach County, the results in Dade County are also interesting.

Irish education

Some of the terms may be confusing if you are not familiar with educational systems outside the United States. Note, for example that “The Leaving Certificate Examination (Irish: Scrúdú na hArdteistiméireachta), commonly referred to as the Leaving Cert or (informally) the Leaving (Irish: Ardteist), is the final exam of the Irish secondary school system and the university matriculation examination in Ireland.” according to Wikipedia

Plasma retinol

An analysis of certain blood chemistry values and how they relate to various dietary factors.

VA lung cancer trial

A study of a test treatment for lung cancer with information about the severity of the cancer, age of the patient, and survival times. Note that the event “censored” means that the patient was still alive at the time listed.