Sample size for a confidence interval

Steve Simon

2000-01-26

Categories: Blog post Tags: Confidence intervals Sample size justification

*Dear Professor Mean

Dear Frantic,

400 million records? I bet your fingers are tired from all that typing.

There are several approaches for determining the sample size. The simplest is to estimate what sample size will provide confidence intervals that are narrow enough for your needs. You might say

By the way

What else do I need to specify?

Beyond specifying how narrow you really need the intervals to be

If you can’t pull out a few hundred records in advance

When you find that publication

Example

If you let D represent the minimum detectable difference and S represent the standard deviation

![]{http://www.pmean.com/images/confid37.gif){width="95” height="55”}

Suppose you wanted a confidence interval for average cholesterol level to have a precision of plus or minus 2 units. And let’s suppose that the standard deviation for cholesterol in a population similar to yours is 50 units. If we wanted a 99% confidence interval (let’s be extravagant

![]{http://www.pmean.com/images/confid38.gif){width="211” height="55”}

which we round up to 4,148.

What if I am estimating a proportion?

If you’re estimating a proportion rather than a mean

If P is your guess at what the proportion should be

![]{http://www.pmean.com/images/confid36a.gif){width="155” height="56”}

Suppose we wanted to estimate the proportion of adverse drug events to plus or minus 1.5% and we know that the proportion will be around 12%. Again

![]{http://www.pmean.com/images/confid35.gif){width="268” height="55”}

which we round up to 3,115.

At this point

If you really have no idea what the proportion might be

Summary

Frantic Frank needs to randomly select some records from a database that has 400 million of them. He wants to know how many records he should select. Professor Mean suggests that confidence intervals would be a good way to summarize information from this type of random sample. He suggests that you select enough records so your confidence intervals are reasonably narrow.

Further reading

  1. The case for confidence intervals in controlled clinical trials. M. Borenstein. Controlled Clinical Trials 1994: 15(5); 411-28. Medline
  2. The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results. Steven Goodman. Annals of Internal Medicine 1994: 121(3); 200-206. Medline Abstract Full text
  3. Confidence limits and sample size in quarantine research. HM Couey. Forum: Journal of Economic Entomology 1986: 79(4); 887-90.

You can find an earlier version of this page on my original website.