So you’re thinking about a retrospective chart review

Steve Simon


If you are designing a retrospective chart review, you should talk to a statistician early in the process. There are lots of statistical issues that you must think about during the concept development phase of your research. Here is a broad overview of these issues.

You should start off by taking advantage of the HIPAA regulations that allow for “Reviews preparatory to research.” The provisions of this regulation, found at 45 CFR 164.512(i)(1)(ii), allow you to review medical charts prior to getting IRB approval, if the purpose is to plan the research. According to the U.S. Department of Health and Human Services, preparatory to research means

“Representations from the researcher, either in writing or orally, that the use or disclosure of the protected health information is solely to prepare a research protocol or for similar purposes preparatory to research, that the researcher will not remove any protected health information from the covered entity, and representation that protected health information for which access is sought is necessary for the research purpose.” HIPAA, For Professionals, Special Topics, Reseaarch.

There are several things that you can do here. You can investigate how many patient charts there are that might fit your inclusion criteria. There’s no point doing a retrospective chart review if you only have a couple of charts that you could use in the study. Depending on what you find, you can examine what impact modifications to the inclusion criteria or changes in the date ranges of your study have on the number of charts that you find. You can review the charts to see if the information that you need to abstract is available and easy to code.

All of these fall under the realm of feasibility. In addition to feasibility, you might also use the preparatory to research activities rule to refine your research question.

There are several important restrictions. Although you can (and should) do this work before getting your full protocol approved, you still have to make some “representation” that you are conducting a review preparatory to research. This might be as simple as a conversation with a member of your IRB, but some IRBs might ask you to fill out a form. Other IRBs will ask you to wait until the IRB approves that form. But even if your IRB has no requirements for reviews preparatory to research, you would be well advised to talk to someone and not make unwarranted assumptions about what this review might entail.

Also varying from institution to institution is how much you can do under the umbrella of reviews preparatory to research. The rules are pretty explicit that you can’t take any Protected Health Information (PHI) with you, but if you are employed by the “covered entity” that has these medical records, you might have a fair amount of latitude.

Some institutions have developed anonymized data repositories that you can also use� for preparatory to research activities and sometimes even for the research itself. Typically, you need to get some sort of authorization first. They want you to promise not to do� bad things with the data. If you have such a repository, please take the time to work with it before you write your protocol.

Take some time now to define and refine your research hypothesis. I need to be careful here, because not all research requires a research hypothesis, and this occurs more often with retrospective chart reviews than with other types of research. One reason you may not have a research hypothesis is that your goal is not to try to prove or disprove a particular hypothesis, but rather to characterize your patients in some way.

You could force this into a research hypothesis framework but often the effort is contrived. If you do not have a research hypothesis, try to state the goals of your research. For example, your goal might be to identify what therapy options are commonly used, what is the general prognosis, what are the resources needed, etc. for a particular patient population. You might dispense with a research hypothesis if you have little or no interest in comparing this patient population to another population and if you are also not interested in making major subgroup comparisons within this patient population.

If you do decide to formulate a research hypothesis (and this is certainly appropriate for most retrospective chart reviews in spite of what I wrote above), take the time now to carefully define all the elements of your hypothesis.

I like to structure a research hypothesis as if you were developing an answerable question in Evidence Based Medicine. That means that you define your hypothesis according to the PICO acronym (Patient, Intervention, Comparison group, and Outcome). Not every research hypothesis can fit into the PICO framework, and some of them only use two or three of the P, I, C, and O. But even so, I have found that PICO often helps you avoid many of the ambiguities inherent in some research protocols.

Start with the P. Who are the patients you are interested in studying and what are you doing (have you done) to those patients? Think about your patients in terms of disease, demographics, procedures done, physical location, care requirements, or possibly some combination of these. Define your patients also in terms of time. Are you looking at a years worth a data or ten years?

Sometimes you define the patients and their intervention in one fell swoop. For example, you plan to study pregnant women who undergo a C-section. The C-section might become your I, your intervention. But if that’s not the case, think about what intervention you are interested in examining. This might be something that is actively done to a patient, or it might be something that a patient is exposed to.

Not every study has an intervention or exposure. If you want to compare men to women or old people to young people, or other demographic comparisons, then don’t try to create an intervention or exposure here. Your PICO is really just a PCO, and that’s fine.

Now think about the O. What outcome or outcomes do you want to measure? These have to be outcomes that are documented in the patient’s electronic health record, and you probably won’t be able to administer something new like a quality of life questionnaire unless this is routinely done for all patients as part of their normal care.

There are two types of outcomes: those that are of direct interest to your patients, and those that are indirect indicators of outcomes that are of direct interest to your patients. The latter, surrogate outcomes, have a bad reputation, but this reputation is largely undeserved. If you use them carefully, surrogate outcomes can be quite helpful in research, especially when direct outcomes are difficult to come by.

Finally, think about your C, your comparison group.Sometimes the I defined the C. If you are studying a surgical intervention for a particular medical condition, your comparison group is often those patients with the medical condition who get a pharmaceutical intervention instead of getting cut open. Sometimes you split the P in half, so that some of the patient population becomes the comparison group. So you are interested in pregnant women, and your desire is to compare pre-term births among these women with full-term births.

Also keep in mind that not every research study has a comparison group, and sometimes the comparison is internal (e.g., comparison of medication usage before and after a particular intervention). If you have an explicit comparison group, though, make sure that the comparison group is different than your patient population, but different in only one dimension. If your patient population is kids with asthma who visit the emergency room during a weekday between 9am and 5pm, then your comparison group might be ADULTS with asthma� who visit the emergency room on a weekday between 9am and 5pm, or it might be kids with BRONCHITIS who visit the emergency room on a weekday between 9am and 5pm, or it might be kids with asthma who have an OUTPATIENT visit on a weekday between 9am and 5pm, or it might be kids with asthma who visit the emergency room on an EVENING/WEEKEND.

Once you’ve specified your P, I, C, and O, take a look at them for any ambiguities. Do your terms have objective definitions that are easily identified. You want research that is reproducible. You asking for a lot of trouble if your definitions for P, I, C, or O are subjective or if they are difficult to identify in the patient’s chart.

If there are unavoidable ambiguities, you have two choices. You can abandon this research for something that has a greater chance for reproducibility or you can evaluate and quantify the degree to which this ambiguity affects the research. The latter is often done by having two people work independently on the ambiguous features and see how often they agree/disagree.

For example, some of the inclusion criteria that the patient population might be based on subjective elements. Have two researchers review the same record (each blinded to the results of the other) and see how often one researcher decides to include a patient in the study and the other decides to exclude that patient. You might schedule meetings to resolve these discrepancies or have a third person adjudicate the discrepancy, but the key point is that the frequency to which discrepancies occur is a measure of how subjective the inclusion criteria are. This becomes a data value that you report when you publish. You could use a similar approach if there is subjectivity in assessing whether an intervention/exposure occurred or if a particular outcome has been achieved.

Now you need to think about what particular medical records go into your retrospective chart review. Is it a single chart per patient, or do you have to review multiple charts? Do you have to match information across multiple systems? The more times you have to look for information about a patient and the more places you have to look for information, the more you need a plan for storing and matching this information.

The other big consideration is whether you can get the data electronically, or if you have to review the patient charts and then abstract information manually. The latter requires re-typing your data and is a possible source of ambiguity and inconsistency. As you abstract information, you may need to document certain decisions that you make (e.g., characterizing an event as a pre-term birth and death only if the gestation age was above a certain value). This documentation will help assure that you will make the same decision later, should the same situation occur again. It’s best, though not always possible, to document as much as you can about potential abstracting decisions prior to data collection.

If you abstract information from a chart, you should seriously consider the use of a professional database such as REDCap. The abstracting process is already difficult and error prone, so you don’t want to make things worse by using paper forms for abstracting or an Excel spreadsheet.

In this era of the electronic health record, you can often get all the data you need without any need to abstract, but be careful here. Find out what format the electronic records will come to you and make sure that you will be able to import this data into a statistical analysis program like SPSS. Make sure that you understand how the electronic format will handle missing data values. Often there is a conversion problem, and the last thing you want is a zero year old mother in your pregnancy study because somewhere your unknown age got converted to a zero.

The review preparatory to research should have told you what your sample size will be, more or less, but now you have to justify your sample size from the perspective of precision or power. While pretty much every prospective research study has a justification of sample size, you may find that your IRB does not demand such a justification for a retrospective chart review. The reason for this, perhaps, is that IRBs want to insure that the benefits of the research, represented by a nice tight confidence interval, are worth the pain and suffering that your patients endure as part of your research. The things about a retrospective chart review that lets you off the hook is that all the pain and suffering for your patients is pain and suffering that has occurred in the past. Your review of their charts does not, in some reviewers minds, add any extra risk to the patients.

Now, the people who do not demand a justification of your sample size are, at best, only half right. There is a risk to the patients in a retrospective chart review, a risk of disclosure of confidential health information. You should not ignore this risk. You can’t go rummaging around in the electronic health records unless you can demonstrate with reasonable confidence that you will produce something useful when you are done. This means that you must produce a statement about power or precision before you start your research.

Now I tell this fictional story that is exaggerated beyond all reality to make my point about justifying your sample size. The story is about a researcher who gets a six year, ten million dollar research grant. After the work is done, the researcher produces a final report stating “This is a new and innovative surgical procedure and we are 95% confident that the cure rate is somewhere between 3% and 98%.”

Now the IRB shouldn’t approve your study until you can convince them that your research won’t produce a confidence interval that goes from 3% to 98%. It doesn’t have to be a great justification because while a breach of confidentiality is not trivial, it is still far less than what you have in a prospective research study where there is the same or greater risk of a breach of confidentiality and potential for other harms from the extra poking and prodding that often comes with a prospective trial.

The point, however, on which there is a difference is that in a prospective clinical trial, the risk increases as the sample size increases. So too much power or precision is a problem for a prospective trial. You are asking extra subjects to endure the pain and risk associated with a trial to help you achieve a gold-plated study, a study with an unnecessarily large amount of power and precision. So you have to plan a sample size for a prospective clinical trial like Goldilocks would: not too big and not too small.

In a retrospective chart review, the risk of a breach of confidentiality does not increase as the sample size increases. So if you end up with way too much precision and power, no one had to pay a price for that extra effort. Too large a sample size is never a problem with a retrospective chart review. The only thing you have to show is that the sample size is not so small that you get uninformatively wide confidence intervals when you’re done.

I’ve already written some general guidance about sample size justification, so I won’t repeat those details here. If your IRB approves your research without a sample size justification, I wouldn’t be too surprised. But you should still run some power or precision calculations, even if the IRB doesn’t ask for it. You’re investing a lot of time and energy in this work, and I’m sure you would be even madder than your IRB if after all that work was done, the best you could provide is a confidence interval going from 3% to 98%.

There’s another issue with sample size, though, that you can’t ignore. Every research study has resource limitations and in a retrospective chart review, that resource is your time. The patient records are free to look at, but it takes time to abstract the information and put it into a database like REDCap. Unless your data collection is totally automated and you get the raw data just by pushing a button, you need to make sure that you have enough time to collect the data. Take some time to extract data from a small number of charts (say 5 or 10) and then extrapolate those results to your proposed sample size. Will you get the complete data set within a few days, or will it take longer to get your data than it takes the Chicago Cubs to win the World Series?

The extraction of data from the health records is the only step in your research where the time is proportional to the size of the data. Everything else is fixed: the time to write the protocol, the time to shepard it through the IRB, the time to conduct your data analysis, the time to prepare your presentation or publication. Some of these steps will not come quickly, but they do take exactly the same amount of time no matter how many or how few patients you get. The abstracting is the one step that is highly dependent on your sample size. So make sure you have a handle on the time this step requires, and scale the study back if the amount of time needed to abstract all your data is too great a burden to bear.

If you’re like most people, you are nervous about writing your data analysis plan. Don’t be. The data analysis plan is one of the easiest things you’ll have to specify. Data from a chart review has all the limitaitons of retrospective data and it has all the limitations of non-randomized data. So don’t work on a fancy gold plated data analysis plan.

Every data analysis plan is going to be different, but here are some of the common elements. You’re going to summarize the demographics of your patients using means and standard deviations for continuous variables and percentages for categorical variables.

For the statistical test that you use to test your formal research hypothesis, it could be many things, but whatever form it takes you probably want to mention that you will use a two-sided test with an alpha level of 0.05. There are times when you would use a one-sided test instead of a two sided test and there are times when the test is neither one sided or two sided. Also, there are times when you want an alpha level of 0.01 or 0.10. But in most settings, a two sided test with an alpha level of 0.05 is just fine.

You should also specify what statistical software you will use to conduct your analysis. The remaining details will vary from study to study. If this were a submission to FDA, you might have to spell out your statistical models in excruciating detail, but the IRB doesn’t want or need anything other than a general description.

So in summary, if you are thinking about developing a retrospective chart review research study, conduct a review preparatory to research to estimate the number of patient records available, consider what particular medical records you want to review and how you will abstract the important variables, specify your research hypothesis using the PICO format, justify your sample size with greater concern about sample sizes that are too small and not so much concern about sample sizes that are too big, and lay out a data analysis plan that is not excessively complex or detailed.

One final suggestion. Plan ahead as you prepare the paperwork that outlines your research proposal for IRB review. Write it in a style that would let you dump it directly into the introduction and methods section of a research publication. You can’t write the results section, unless you’re clairvoyant, or the discussion section for that matter, but even having a small piece of the paper written now will make you feel so much better.

You can find an earlier version of this page on my blog.