Technometrics review of Statistical Evidence

The February 2007 issue of Technometrics has a review of my book, Statistical Evidence in Medical Trials. The review is not available on the web, unless you subscribe to Technometrics. The review, written by Richard Goldstein, was fairly critical, as you can judge by the final paragraph.

“Nothwithstanding my negative comments and tone above, there are valuable points here; they are, however, generally too hard to find and some of them are undercut the author’s misguided attempt to be “fair.” If the author were to clean up the typographical errors and omissions and highlight the main points, the result would be a much better book."

There are a lot of interesting comments in the review that I disagree with–not comments of the book itself, but Dr. Goldstein’s assertions about the research process. These comments are worth highlighting on my weblog because they illustrate some of the unresolved controversies in statistics. Dr. Goldstein’s first contention was that I was underselling the benefits of randomization.

“However, at the end of the chapter he has a section titled Counterpoint: Randomized Trials are Overrated which undermines much of his earlier material. I was particularly bothered here that this last section was not balanced. Further, one of the weaknesses of this chapter was that there was insufficient comparison of the results from research areas where some studies were randomized and others were not. Given the recent press about hormone replacement therapy studies, this seems very odd."

I like randomized trials, and I encourage the use of randomization whenever possible. I do, however, stand behind my assertion that randomization is overrated, in spite of the recent hormone replacement trials. First, citing a single study where randomized results provided a different conclusion than observational data is, at best, anecdotal evidence. A systematic review, like the one done in Concato 2000 is better evidence, but even this study is not perfect.

Also, the women who volunteered for the randomized studies were quite a bit different than the women who were studied in the observational studies. I think it’s a mistake to ignore the lessons from the hormone replacement studies, but it is equally a mistake to overstate their significance.

Many people more intelligent than me (and I would include Dr. Goldstein in that list) have taken a much stronger stance and believe that a single well designed randomized trial will trump any number of observational studies. This is an issue that will continue to engender debate in the research community.

Later in the review, Dr. Goldstein comments about replication:

“Chapter 4 deals with other evidence about the research topic and whether it agrees with the article being read. While the author makes useful points, I doubt that many will agree with what he says about replication (repeating the same experiment is not worthwhile)."

I would again respectfully disagree. A mindless replication that merely repeats the same set of biases is unhelpful. It does have some limited value in that it reduces uncertainty associated with sampling error. But the types of studies that need replication are ones where bias rather than sampling error are the more serious concern.

The final comment that I take issue with is about measures of absolute risk. Dr. Goldstein writes:

“I also note that the author appears to favor interpretable measures of risk (as I do), but that he considers number needed to treat (NNT; the additional number of patients to be treated to have one more success than other the alternative treatment) and number needed to harm (NNH) as “the most easily interpretable measures of risk” (p. 125). As a patient, I do not agree: for me, 3% versus 1% success is quite different from 99% versus 97% success — yet the NNT is 50 in each case."

So what measure of risk works better here? The odds ratio is 3 for both cases. The relative risk is 3 for the first case and 1.02 for the second case. Do either of these measures shine a better light on the problem?

I don’t want to criticize Dr. Goldstein for raising these issues. He is highlighting a perspective on the research that is different than mine, and many people have commented on both sides of these controversies.

I also don’t want to imply that criticisms of the book and my writing style are not valid. I’m perfectly willing to accept that criticism, and I hope that the second printing of the book will assuage some of Dr. Goldstein’s concerns.

References:

Randomized, Controlled Trials, Observational Studies, and the Hierarchy of Research Designs. John Concato, Nirav Shah, Ralph I. Horwitz. The New England Journal of Medicine 2000: 342(25); 1887-1892. Available in html format.

You can find an earlier version of this page on my old website.

Technometrics review of Statistical Evidence

Steve Simon

2007-01-16