Why secondary data analysis takes a lot longer

Steve Simon


Someone posted a question noting that most of the statistical consulting projects that they worked on finished in a reasonable time frame, a few were outliers. They took a lot longer and required a lot more effort by the statisticians. Were there any common features to these outliers they wondered. So they asked if anyone else had identified methodological features of projects that went overtime. I only had a subjective impression, but thought it was still worth sharing.

I can only offer a subjective impression, but in my experience, the projects that take a lot longer are secondary data analyses. There are two reasons for this. First, the data management in a secondary data analysis is often a lot harder than you might expect. You find a lot of unexpected inconsistencies that you have to resolve before you can get confident that you have the right results.

Second, journal reviewers are often pretty harsh. They’ll tell you that your data set is the wrong data set for your hypothesis. When this happens, it’s really hard to argue back and you can’t re-analyze the data a different way if they are convinced you have the wrong data. Your only solution is to go to a different journal. Another problem with reviewers is that they sometimes tend to be nitpickers and secondary data analysis has lots of pickable nits if you really want to go down that road.

For what it’s worth, a small retrospective chart review of your own patients, which is a type of secondary data analysis, doesn’t seem to have this problem. It’s when you are using someone else’s data and not your own that the problems occur.

You can find an earlier version of this page on my blog.