What sort of statistical training is needed for basic scientists?

Steve Simon


This page is currently being updated from the earlier version of my website. Sorry that it is not yet fully available.

Someone wrote to a mailing list sponsored by the American Statistical Association asking about what resources to use in a statistics class aimed at basic scientists (as opposed to public health students and clinical scientists). I offered a few general recommendations.

Robin Penslar’s book on Research Ethics should be incorporated somewhere into any researcher’s curriculum. I would think that a class on Statistics is as good a place as any, as many problems with ethics involve data fudging, violation of statistical protocols, and carelessness with private data. These are issues that Statisticians can and should speak about.

There is also a wealth of data available on the Internet these days that students should review and identify projects. I would encourage the use of a large data set such as in genomics. There is a fascinating data set looking at microarray expression levels of 19 different human tissue types in 30 different individuals. The basic setup is described at

and the article has a link to supplemental research data that includes the full data set in a text file that is surprisingly easy to manipulate.

There are other interesting data sources like this. Perhaps it would be interesting to ask students to provide a simple analysis of a subset of a very large data set like this one. The days of having to live with small toy data sets is over.

Anyone working in a laboratory should be familiar with the basic tools of quality control including control charts, fishbone diagrams, and Pareto charts. If you were really ambitious, you might consider screening designs as well.

As a general rule, basic science should place more emphasis on randomized designs, especially block designs and multifactorial designs. It should place less emphasis on Epidemiology topics, such as case-control designs and risk adjustment models. Of course, you can’t totally ignore Epi.

You can find an earlier version of this page on my original website.

Also see this page.