Data sources for a proposed course on secondary data analysis

Steve Simon


I am giving a talk at the Joint Statistical Meetings (JSM) in Toronto. I’m still tweaking the slides just a few hours before the talk. The title is “Data sources for a proposed course on secondary data analysis.” On this page, I want to provide a link to the PDF file of the slides and share a story about this talk.

You can get a PDF file of the individual slides, or a six up version without the transition slides. I tossed out the transition slides on the latter version to save paper, but the actual content is still complete.

Just as a philosophical point: I don’t like Powerpoint slides. They are generally less informative than a web page with narrative. I will eventually turn these Powerpoint slides into a webpage, but for now, you might find them useful if only to see the links to the various data sources. For the record, the four data sets that I talk about can be found at

Note 2024-05-28: The link listed in the original web page and in the talk no longer works and has been updated.

This talk evolved out of a working group in the Department of Biomedical and Health Informatics. Every month, one of us would share information about an interesting secondary data set that we had worked with. Part of this was to help researchers who might be preparing a grant and wanted to include a secondary data analysis component in that grant. A second part was to develop something that might eventually turn into a course on secondary data analysis.

On one of the email discussion boards that I participate in, someone asked for possible speakers for a special contributed session for JSM about using “big data” in a classroom setting. I offered to talk about some of the work that I had presented at the secondary data analysis working group, because some of those data sets are big. They’re not big enough to merit the official term “big data” with the quote marks and all aura and prestige, but I thought it would still be a reasonable fit. I got some good feedback and we planned to have a slate of four speakers and a discussant.

The date of February 2 came along, which is not only Groundhog Day, but the last day to submit your abstract for the Joint Statistical Meetings. I thought, “Uh oh!” because I hadn’t prepared an abstract yet and I didn’t want to let the group down by failing to get my abstract in on time. So I hustled and prepared something, and I got it in with a few hours to spare. I found out the next day that no one else in this group had submitted an abstract.

I was mad, because I was trying to fill in a gap and offer a different perspective on “big data” to contrast with what others were preparing. But (unlike these other people), I feel that you have to follow through with a commitment once you make it. Now I can laugh at it, but back then I was steaming.

Anyway, they found another session to put me in, and it is a pretty good fit. The first talk in the session is “How many licks to the center of a Tootsie Roll Pop” which appeals to the nostalgia of those of us old enough to remember the commercial. I presume it is an in class exercise with actual data collection. We’ll see.

So anyway, wish me luck. I need to add this weblink to my slides and I’m ready to go.

You can find an earlier version of this page on my original website.