So you’re thinking about a systematic overview

Steve Simon


If you are designing a systematic overview, you should talk to a statistician early in the process. There are lots of little things that you can do to make your research more rigorous. Here is a broad overview of these issues.

First things first, you should talk to a medical librarian. A librarian can help you in a lot of ways above and beyond what a statistician can help you with. I’m going to focus on the statistical issues associated with a systematic overview, but will also comment where appropriate on issues that the librarian can help you with.

You may have seen the term meta-analysis instead of systematic overview. These terms are used interchangeably, which is not surprising because they are so similar. There is a technical distinction, however. Meta-analysis is the statistical combination of quantitative information from multiple research studies. A systematic overview is a rigorous review of the peer-reviewed literature. You must have a systematic overview in order to run a meta-analysis. But a systematic overview will not always lead to a quantitative combination of information.

The word “systematic” in systematic overview is not there by accident. The systematic overview is intended to be a reproducible form of research, so you need to prepare a research protocol that describes all of the steps that you plan to take. In many ways, the protocol for a systematic overview resembles the protocol for a prospective clinical trial. In particular, the research papers in your systematic overview are like the patients in a clinical trial, except that you don’t have to read an informed consent statement to a research paper.

First, define in clear and unambiguous terms what types of research studies you are looking for in your systematic overview. These are like the inclusion and exclusion criteria that you see in a prospective clinical trial. Tread cautiously here. Cast your net too widely and you are likely to have difficulty with heterogeneity, the statistical mixing of apples and oranges. But a little bit of heterogeneity is actually a benefit. If you study an intervention across a wide range of implementations, a wide range of patient populations, and a wide range of research designs, and the results are reasonably consistent, you have found evidence of a robust intervention that works consistently no matter what.

If you are thinking about inclusion and exclusion criteria for research studies, you might start first by considering the type of intervention or exposure. Make sure that you define your boundaries clearly. You may, for example, be interested in weight loss programs. Would you want to include programs that are dietary only, or is some combination of diet and exercise okay? How about weight loss programs that include some type of drug or some type of surgery? You need to think in terms of both inclusions and exclusions.

You may also want to place restrictions on the types of patients that are being studied. Do you want to look only at moderately ill patients, only at severely ill patients, or any combination? Do you want to limit the ages of the patients being studied? Do you want to restrict the geographical boundaries (e.g., studies in the United States only)? Do you want to limit where patients receive their treatments (e.g., outpatient only)?

Think about how far back you want to go. Do you want studies only in the past decade or would earlier studies be okay?

Finally, think about any restrictions that you might place on the research design? Do you want only blinded studies, only randomized studies, only prospective studies? Do you care if the researchers used a surrogate outcome? Do you want to look only at a certain duration of follow-up time (e.g., no short term studies)?

You may decide on other exclusions as well. Remember that you want the systematic overview to be reproducible, so define your inclusion and exclusion criteria in sufficient detail that another reasonably intelligenet person would end up picking the same studies that you did.

In your protocol, explain where you will find your research papers. There’s a lot of debate in the research community about how rigorous you need to make your literature search. Most people recommend that you review several bibliographic databases, not just PubMed. There are many unpublished research studies, and you need to identify how you will try to find these studies as well. A medical librarian can help you design a reasonably comprehensive literature search strategy. One factor that you may overlook, if you don’t consult with a librarian first, is how to save your search strategy. A saved search strategy ensures the reproducibility of your work and also allows you to refine your search strategy as needed.

There are sensitive searches and specific searches and you need to balance these appropriately. A sensitive search is one that, if a study is out there, your search is likely to find it. You get a more sensitive search by looking at multiple bibliographic databases, hand searching selected journals, sending out requests to prominent researchers in this area, and reviewing clinical trial registries. You also improve sensitivity by including a wide range synonyms and using MeSH terms in your search. How much effort do you want to put in here? That’s a tough call. A sensitive search does reduce the risk of bias in your systematic overview. There’s a fair amount of empirical data to support the notion your systematic overview will be overly optimistic if you leave out papers that are harder to find: because they didn’t make it into a peer-reviewed study, because they were published in a journal not indexed by the major databases, because they were written in a language other than English, and so forth. So your efforts to improve sensitivity will pay off in producing a systematic overview that is more authoritative.

But too much of this might be overkill. There’s a saying that the good is the enemy of the great and I tend to fall on the side of the good rather than the great. The effort you spend producing an ever more exquisitely sensitive search is effort that could have been spent producing a second or even a third good systematic overview.

The particular topic you want to research may be an important consideration here. Certain types of therapies, such as complementary and alternative medicine are likely to have lots of research in obscure journals and a lot of research that was conducted but never published. So for an area like this, improving your sensitivity may be really important.

The flip side of the coin is a specific search. A specific search does not produce a lot of false positives: papers that don’t really meet your inclusion criteria. It’s nearly impossible to produce a highly specific search unless you are willing to live with mediocre sensitivity. So get used to the fact that you’ll be reviewing a lot more papers than the ones that meet your inclusion and exclusion criteria. Your librarian can help some here. In particular, if you are looking for certain factors associated with the research design, such as randomized studies only, there are some filters that have been tested and shown to work reasonably well.

Because you aren’t likely to get a highly specific search, it’s okay to screen some studies out, based just on the abstract. The abstract is often a highly imperfect summary of a research study, but for the purposes of deciding which studies obviously need to be excluded, the abstract does a pretty good job. Keep two things in mind, though. First, if the paper does not have an abstract, you have to read the entire paper. Second, if there is any uncertainty in whether a paper meets your inclusion criteria based just on the abstract, you have to read the full paper.

Keep track of the number of papers that you have identified and the number which were excluded based just on a review of the abstract.

Now for the papers that didn’t obviously get excluded based just on the abstract, you need to get some help. First, you’ll need to translate any papers not written in English. More importantly, you need to have a second person read these papers independently from you. Your goal is reproducibility and one of the best ways to establish reproducibility is to demonstrate that two independent reviewers both came to the same conclusions about whether a paper should be included or excluded. Track the number of papers that you and your friend disagreed on. You can resolve this disagreement through the use of a third party to adjudicate the dispute. If the number of disagreements is too high, then rethink your inclusion and exclusion criteria and write them with more precision.

All along the way, you have to make sure that a study is not double-counted. Some duplicates come because you used multiple bibliographic databases, and these are not too hard to find and remove. The bigger problem is when researchers like a study so much that they publish it several times in several different locations. There are sometimes legitimate reasons for this: one paper looks at the primary outcomes and another looks at the secondary outcomes. One paper might look at the short term results and a paper on the long term results appears later. Sometimes a paper appears originally in the native language and then gets republished in an English language journal. In a multi-center study, one paper might focus on the summary across all centers and another might look at a single center because that center conducted additional research and collected additional data above and beyond what the other centers did. When the reasons for multiple publications are legitimate, the authors will normally cite the related papers.

What is more worrisome is covert duplicate publications, where researchers publish their research results twice without letting anyone know. This might be an effort to pad someone’s resume, but it represents an abuse of the free reviews provided by the editors and peer reviewers. Tracking covert duplicate publications is not easy. Sometimes (but not always) you can deduce this by looking at the author lists, or reviewing some of the details of the studies.

If you end up counting some of the studies twice, you are likely to produce an overly optimistic bias, because the studies with very striking results are more likely to have duplicate publications.

Now that you have all the papers in hand that satisfy your inclusion and exclusion criteria, you need to build a database from these papers. Recruit your friend again here, because the process of pulling data from research papers is not as clear and unambiguous as you might think. So both you and your friend read the paper independently and fill your database independently. If you can demonstrate that you and your friend produce pretty much the same data, then you have established that the systematic overview is reproducible. Get a third party to adjudicate the few differences that you do have.

If you disagree a lot, then you have to start over again with more precise definitions of what you want. This will really test your friendship and you might want to spring for a fancy dinner after all the work is done.

So what do you put in your database? First, you need information about the studies themselves. In particular, you want information on the interventions, the patients being studied, and the type of research design. Focus on information from the studies that are likely to be sources of heterogeneity. If your intervention is defined quite broadly, for example, then you need to categorize which interventions are evaluated in which studies.

Many systematic overviews use quality scores (the Jadad score is one example), and you and your friend should apply the quality scores independently. Does your friend like flowers? Maybe a dozen roses in addition to that fancy dinner is in order here.

Your database needs information from all the relevant outcome measures. You can decide in advance that you will exclude certain outcome measures, of course, but even with these measures excluded, you will shocked at how many different ways that researchers can measure the same thing.

The two most common types of outcome measures are continuous outcome measures and binary outcome measures. For continuous outcome measures, you need to record six numbers: the mean, standard deviation, and sample size in the treatment group and the mean, standard deviation, and sample size in the control group. For binary outcome measures, you need to record four numbers: the number of events and the total sample size in the treatment group and the number of events and the total sample size in the control group. Not every study, of course, will have every outcome measure, so develop a missing value code.

There are lots of complications that can occur, and you should consult with a statistician about how to handle these cases. For example, some studies may not report a standard deviation. Sometimes you can infer what the standard deviation would be based on other measures of variability, but do this carefully. Some studies might use paired patients and some might use unpaired patients. There’s no easy way to combine paired and unpaired data. Some studies rely on survival curves, and for these studies, you should try to get access to the original data sets.

If you don’t have these complications, the standard meta-analysis is not that difficult. Even so, you should probably use specialized software for this. The two basic choices for analyses are a fixed effects meta-analysis and a random effects meta-analysis. I strongly prefer the random effects meta-analysis, because it mirrors closely how you might analyze a multi-center clinical trial. (I often joke that meta-analysis is a multi-center trial where each center gets to use its own protocol.)

There are measures of heterogeneity that you can use, and funnel plots that some researchers like to see as an assessment of publication bias. But there is a fair amount of criticism of both measures of heterogeneity and funnel plots. When you’re writing your protocol, it’s best to mention that you will calculate these things because you can always leave them off in your publication.

The one graph that is universally used is the forest plot and it is actually a pretty good plot. One final choice that you will face is with binary outcome measures. For these measures, you can combine them as odds ratios, or as relative risks, or as absolute risks. This choice is trickier than you might think because sometimes the statistic that you use for a binary outcome can create heterogeneity. In general, I like to combine odds ratios, but there are sometimes good arguments to be made for combining relative risks.

The one thing that difference in meta-analysis is that subgroup analysis is strongly encouraged. If you’re familiar at all with other types of research design, the research community is generally hostile towards subgroup analysis. There are good reasons for this, but in meta-analysis, one of your biggest concerns is heterogeneity. If you can show that the results across various subgroups are reasonably consistent, then you have established that heterogeneity does not exist (or perhaps it exists but does not produce any problems). If you find that some subgroups are different than others then you have removed the problem of heterogeneity by focusing on smaller, more homogeneous subsets. You can also think of subgroup analysis as a type of sensitivity analysis. One of the strengths of the forest plot, by the way, is that it easily accommodates subgroup analyses. That’s probably the reason it is so popular.

So what subgroups should you look at. This may depend on what types of studies you find, but prior to the study, try to imagine potential sources of heterogeneity. Is there a lot of variation in patient populations? Are there a broad range of research designs that are commonly used? Is the duration of follow-up quite different? Areas where you might expect to see heterogeneity are areas that you should plan subgroup analyses around. Then allow for the possibility of conducting several post hoc subgroup analyses based on what you find during data collection.

I wish you the best of luck on your systematic overview. They are a lot of work, but they are also very important.

This is the third in a series of blog entries: So you’re thinking about research. The two entries are:

So you’re thinking about a pilot study, and

So you’re thinking about a retrospective chart review.

You can find an earlier version of this page on my blog.