Dataset recommendations from the Journal of Statistics Education


The Journal of Statistics Education (now called the Journal of Statistics and Data Science Education) maintains an archive of interesting datasets. The data documentation is mostly good, and you can find out more about these data sets in the journal articles themselves. Here are a few datasets that look interesting (I am skipping datasets from this archive that I have already used in class).

A very old resource on the Internet with interesting datasets is statlib. The quality of the documentation is a bit uneven and some of the datasets are too simple to be used in my classes. Other files use oddball or obscure formats. But here are a few datasets that look interesting. I raise a question that might be worth exploring for each of these datasets.

2004 cars and trucks data and documentation

What features of a car/truck influence the price or the gas mileage?

Completion times for software projects data and documentation

Does the computer language or database influence the amount of time it takes to complete a programming project?

Calcium measurements in elderly patients data and documentation

Do measurements of calcium and other parameters depend on the age and gender of the patient? Note that there are two versions of the dataset, the first has some of the original transcription errors, I presume to allow students to see if they can spot obvious errors. The link above is to the dataset after correcting the transcription errors.

Electric bills data and documentation

How much does the weather influence home electric bills?

Film reviews data and documentation

Do the length, number of cast members, or release year have an impact on the film rating?

Fish catch data and documentation

How much do size measurements vary among different fish species?

Frutifly lifespan data and documentation

Does the lifespan of a male fruitfly depend on his sexual activity?

Used cars data and documentation

What features of a used car influence its sales price?