Coding race/ethnicity

Steve Simon


If you have to collect data on the race and/or ethnicity of your research subjects, you should be aware of the official U.S. government definitions that all federal agencies have to follow. You don’t necessarily have to follow these guidelines, but they do offer up a way to code your data that is reasonably standardized.

The first interesting issue with the government definitions is that they separate out race/ethnicity into two separate variables. The category commonly called Hispanic is not really a racial category. Hispanics are people who have origins in places like Cuba, Mexico, and the rest of Central and South America and these people could and do identify themselves with a particular racial category (usually Native American, Black, or White) as well as the Hispanic category.

Hispanic is an ethnic category, not a racial catergory. The American Anthropological Association defines ethnicity as

*the identification with population groups characterized by common ancestry, language and custom.*

Another issue is how to identify people who wish to select more than one race or ethnicity category. The official government policy is to allow respondents to choose more than one category rather than creating a new category such as multiracial.

The federal government encourages the use of greater detail when appropriate, as long as the results can be aggregated into the approved categories if needed. So researchers who visit Indian reservations might want to divide the Native American category into various tribal affiliations.

There was some push to include a category for Arab or Middle Eastern people, but this was not included in the final standard. Interestingly, people with origins in the Middle East are asked to include themselves in the White category.

The minimum number of race categories that federal agencies need to collect includes:

Terms to avoid include:

The minimum number of ethnicity categories that federal agencies need to collect includes:

If the patient choose the race/ethnicity categories, then the regulations require asking both race and ethnicity questions and asking the ethnicity question first. If self identification is impossible or impractical (e.g., when race/ethnicity is coded on a death certificate), then the regulations allow the two questions to be combined, with the following minimum number of categories:

If you aggregate the data during a statistical summary, the regulations suggest using


The full document on federal standards can be found at in html format (this link is broken link).

You can find an earlier version of this page on my original website.