Researchers Face Hurdles to Evaluate COVID Evidence

October 09, 2020 | Science Magazine

Jeffrey Brainard

Systematic reviews promise to reveal the best, safest treatments, but during the coronavirus pandemic, many have quickly become outdated. Image by Mongta Studio / Shutterstock. Undated.

Science’s COVID-19 reporting is supported by the Pulitzer Center and the Heising-Simons Foundation.

Soon after the COVID-19 pandemic started, physicians at the Mayo Clinic wanted guidance, and fast. How should they treat patients with blood clots? What outcomes should they expect for pregnant women?

They didn’t have time to sort through the flood of scientific papers, recalls Hassan Murad, a Mayo internist. Some research reported results that contradicted other papers, and medical societies hadn’t agreed on clinical guidelines.

Normally, such confusion would be sorted out with the help of “systematic reviews”: exhaustive, scholarly evaluations of relevant literature and data by groups of specialists. Systematic reviews often yield conclusions about the safest, most effective treatments that are more authoritative than any single study. But the reviews typically take 1 or 2 years to complete. And in the middle of a pandemic, “you don’t have time to do that,” says Murad, who specializes in conducting evidence-based reviews. So, this spring, he and colleagues hit the accelerator: They created an archive of relevant scientific articles and quickly produced seven reviews about COVID-19, some taking just 3 days to complete.

Murad’s group isn’t alone. Responding to calls from the World Health Organization (WHO) and other public health groups, scientists around the world have fast-tracked efforts to evaluate COVID-19 treatments, producing about 2000 systematic reviews.

Researchers say they have helped clarify which treatments work and highlighted knowledge gaps about COVID-19. But now the growing body of reviews itself is under scrutiny. Scientists reviewing the reviews are grappling with a pressing question: In the push for quick answers, have pandemic-related evaluations of studies sacrificed thoroughness and rigor?

Quickly outdated

Some early answers are discouraging. Many COVID-19 reviews have flaws, researchers conclude, having failed to reliably separate useful signals from the scientific noise.

For example, an unpublished analysis of some 240 reviews about drug treatments for the disease found 95% were already out of date. Each was missing at least one randomized controlled trial, the type of clinical study designed to exclude the effects of confounding factors, says internist Gabriel Rada of the Pontifical Catholic University of Chile, an author of the analysis. In many cases, the reviewers missed trials that had already been published. In other cases, new individual studies appeared subsequently, making the reviews out of date. Both kinds of flaws have plagued reviews on other topics, but the high percentage of COVID-19 reviews found to be incomplete only months after publication is unprecedented, Rada says, and reflects the extraordinary volume and speed of this literature.

He identified a related problem: a shortage of randomized controlled trials. Analyzing more than 35,000 papers and preprints about COVID-19 in a database called Epistemonikos, Rada’s team found that just 211 used this trial design, which can be expensive and time consuming. “The systematic review is not a magic wand that is going to change the quality of the primary studies,” Rada says.

Another warning sign about the reviews’ reliability is questionable methodology. A sample of 109 systematic reviews about COVID-19 analyzed by researchers in Japan found most earned middling scores on a standard test of methodological rigor, according to the team’s preprint study posted 2 September on medRxiv. However, nine analyses by Cochrane Reviews, an international consortium that has pioneered the practice of systematic reviewing, earned high scores.

In addition, Rada says, many systematic reviews duplicate efforts; 53 reviews, for instance, dealt with the effectiveness of hydroxychloroquine, an antimalarial medication that has been used to treat COVID-19. Although President Donald Trump and other political leaders have touted it as a remedy, the reviews have not found evidence of its benefits.

Even before the pandemic, concerns about rigor and timeliness dogged systematic reviews. Specialists have suggested too few of their authors have appropriate training and experience. These worries have grown because of the flood of scientific papers about COVID-19 and the disease’s enormous global impact.

Making reviews relevant

Despite such problems, specialists say a few systematic reviews have influenced clinical practice by shedding light on the only two drugs shown by large randomized controlled trials to alleviate severe symptoms of COVID-19. WHO published a meta-analysis of studies on the steroid dexamethasone on 2 September confirming studies’ findings that it reduced the severity of symptoms. WHO strongly endorsed the drug’s use in severe cases.

But another systematic review, published in The BMJ in July, concluded with only low certainty that the antiviral drug remdesivir reduced the duration of hospital stays, a weaker assessment than that of the largest single study of the drug. “I think the effect has been that physicians have become more cautious about prescribing this drug,” Rada says. “Some countries, Chile for instance, have decided not to recommend or even register [remdesivir for COVID-19 treatment] until better evidence is available.”

Some authors of systematic reviews recognize shortcomings in the COVID-19 reviews and are looking for ways to improve the process of producing them, says John Lavis, director of the McMaster Health Forum in Canada, which collaborates with WHO to develop evidence-based health policies internationally. He likens the first half of this year to a sprint that has become a marathon, during which more deliberate planning of reviews can help improve care for COVID-19 patients and public policy about the pandemic.

To avoid out-of-date reviews, at least six groups have been developing an approach little used before the pandemic: “living reviews” they plan to update as new relevant primary studies emerge. The July BMJ analysis was one of the first published.

Sustaining living reviews presents challenges. Each review update, like the original review, requires rigorously searching for relevant scientific papers—and considering whether to rework the conclusions to account for new findings. And university scientists typically receive no professional rewards for updates despite the considerable amount of work. “At the moment, we do it for the public good rather than for academic credit,” says Paul Glasziou of Bond University’s Institute for Evidence-Based Healthcare in Australia.

But Glasziou is hopeful about lightening the load for both original systematic reviews and updates. In the Journal of Clinical Epidemiology this year, he and colleagues reported completing a new systematic review about urinary tract infections in only 2 weeks using several novel approaches to quicken the work—steps Glasziou says can also help with reviews about COVID-19. For instance, he used software to automate repetitive tasks, such as excluding nonrandomized studies.

Another way to bolster the quality of systematic reviews is to share best practices for creating them and then highlight the best results. Those are among the goals of the COVID-19 Evidence Network to Support Decision-making (COVID-END). The international consortium comprises 50 organizations, including Cochrane and the Campbell Collaboration, that have conducted systematic reviews. The forum is encouraging coordination of efforts to reduce duplication, says Jeremy Grimshaw, a leader of COVID-END and a scientist at the Ottawa Hospital Research Institute. “Globally, we can probably get much more bang for our buck if we coordinate what we’re doing, rather than do this in a piecemeal way,” he says.

In August, the group debuted a list of the best, most up-to-date evidence syntheses on COVID-19. It now includes more than 140 reviews on topics that include treatments, protective equipment for health workers, and care for patients with long-lasting symptoms. The list is unusual because its managers assessed the quality of each study’s methodology using a standard test; the entries display reviews’ scores and for some, the dates when authors updated them.

The list also reveals what some call a glaring gap in the COVID-19 literature: relatively little evidence about tools other than drugs to stop transmission of the virus, such as wearing masks, distancing in public, and quarantining. The inventory contains about 80 reviews about drugs and symptoms but only half that number about nondrug public health measures. Some 2000 clinical trials of drugs have been registered, but only a handful of the others are underway.

“We’re investing two orders of magnitude less” than on drug studies to research “the things that actually work at the moment,” Glasziou says. “And when you’ve got a trillion dollar problem, it seems crazy.”

Sorting out behavioral questions

Other researchers, however, caution that quantitative reviews won’t be the most helpful to help set guidance and policy for behavioral measures against COVID-19. Effects of interventions such as social distancing and wearing masks are often challenging to measure, says Trisha Greenhalgh, a primary care physician at the University of Oxford who has written extensively about evidence-based medicine.

For example, she and colleagues recently reviewed evidence about why the disease has spread so quickly in slaughterhouses. Many possible causes could interact in complex ways, she says, including indoor humidity, workers’ close proximity to one another, and even their shouting, which can propel viral particles. Greenhalgh and the co-authors drafted a “narrative review,” a qualitative approach that summarized multiple strands of evidence using expert judgment. But a leading infectious disease journal rejected it because the authors didn’t conduct a quantitative systematic review.

“It would have been impossible,” Greenhalgh says. “And that’s not what was needed. What was needed was clarification and a deepening of understanding about the problem, not some spurious estimate of effect size.”

Especially during a pandemic, she says, policymakers and clinicians should not wait for definitive quantitative reviews and instead should act to protect the public using the best available evidence, revisiting decisions as new evidence emerges. This spring, she notes, U.K. leaders and some epidemiologists resisted requiring face masks to protect against transmission of the virus, noting a lack of support from controlled trials and systematic reviews. But Greenhalgh and others argued that other kinds of evidence in favor of face masks was compelling. By June, the government began to issue such rules for some types of public spaces.

In a June editorial in PLOS Medicine, Greenhalgh wrote, “History will one day tell us whether adherence to ‘evidence-based practice’ helped or hindered the public health response to COVID-19.”

COVID-19 Update: The connection between local and global issues–the Pulitzer Center's long standing mantra–has, sadly, never been more evident. We are uniquely positioned to serve the journalists, news media organizations, schools, and universities we partner with by continuing to advance our core mission: enabling great journalism and education about underreported and systemic issues that resonate now–and continue to have relevance in times ahead. We believe that this is a moment for decisive action. Learn more about the steps we are taking.

Researchers Face Hurdles to Evaluate, Synthesize COVID-19 Evidence at Top Speed

Quickly outdated

Making reviews relevant

Sorting out behavioral questions