I’m giving a talk for the American College of Allergy, Asthma, and Immunology with the title “Use of diagnostic tests for making clinical decisions.” Here’s an abstract of this talk:
“Not all diagnostic tests are created equal. Some are so bad that they cause more harm than good. After reviewing the general formulas for sensitivity and specificity, I will outline the five phases of research for development of a diagnostic test proposed by Margaret Pepe. I will then explain why research in the early phases provides an insufficient evidence base for making clinical decisions about the utility of a diagnostic test. Finally, I will illustrate how to apply a diagnostic test in a practical setting that incorporate clinical judgment and accounts for individual patient variation. In this talk, you will learn how to: describe the limitations of diagnostic tests, summarize the five phase of diagnostic test development, and apply diagnostic tests in a practical setting.”
The five phases of development of a diagnostic test appear on page 215 of
- The Statistical Evaluation of Medical Tests for Classification and Prediction. Margaret Sullivan Pepe (2003) Oxford, UK: Oxford University Press. Publisher’s website.
Phase 1 is purely exploratory. The objective is to “identify promising tests and settings for application” and uses a case-control study with convenience sampling. Phase 2 is a retrospective attempt to validate the test. The objective is to determine if the test can achieve minimal standards for sensitivity and specificity. This phase also uses a case-control design, but with more careful sampling from the overall population. Phase 3 is a retrospective attempt to refine the test. One refinement is to develop criteria for determining when the test is positive. Another refinement is examination of covariates that can affect the performance of the test. Studies in this phase may also compare competing tests and develop a recommended sequence of applying multiple tests. Phase 4 is prospective attempt to validate the test in a more realistic setting. This phase uses a prospective cohort and is able to estimate prevalence dependent characteristics of the test like the positive predictive value. Phase 5 is an attempt to measure the true impact of the test on important health outcomes and the economic effect of the test. It uses a randomized design where some patients are selected to receive the new test and others are selected to receive the standard care without the availability of the new test.
There are several biases that can occur in the evaluation of a diagnostic test. First, there is spectrum bias, a problem that occurs in most case-control designs. When you select a group of patients who overtly manifest the disease in question and compare them to a group of patients who clearly do not have the disease, then you have a black and white comparison. All the intermediate cases, the ones most difficult to diagnose properly, are omitted from the study. As a result, these case control designs tend to overstate the sensitivity and specificity of the test because they do not include the full spectrum of the disease process.
Another bias occurs if the gold standard is not applied to all the patients in the study. When the gold standard is invasive (e.g., a surgical biopsy) there is a strong desire in the doctors and the patients to avoid using the gold standard. This can produce verification bias (also known as workup bias).
A third bias can occur if the gold standard used to diagnose disease is itself an imperfect standard. Still another bias may occur if the results of the gold standard are not blinded from those performing the diagnostic test and the results of the diagnostic test are not blinded from those performing the gold standard evaluation.
You can find an earlier version of this page on my old website.