Stratified random sample.

Steve Simon


Dear Professor Mean, What is a stratified random sample and why would I want to use one?

A stratified random sample is an alternative to a simple random sample that provides more precision. In a simple random sample, you would select subjects randomly from a single large pool. In a stratified random sample, you would divide this large pool of subjects into several groups (strata) and then randomly select subjects from within each group. The number of subjects selected from each group is fixed by design.

A stratified sample makes sense when your data are heterogenous, but they can easily be split into strata that are more homogenous. In other words, use a stratified sample when there is a lot variability between strata and little variability within strata.

The numbers that you select from each strata should normally be proportional to the size of the strata. For example, in a population of asthmatic children, you might know that 90% of the subject pool are boys. Then it would make sense to randomly select exactly 9 boys for every girl that you sample.

In other situations, you may by design decide to oversample in some of your small but important strata. For example, many national surveys of health trends will purposefully oversample minority populations. This ensures that there are enough subjects in these groups to allow you to study minority-specific health problems.

You can use a stratified sample only when you know which subjects belong to which strata prior to data collection. If you only find out information about the strata during the interview or testing of subjects, then you can’t control the selection of subjects in advance.

You can find an earlier version of this page on my original website.