What is already known on this topic?

Despite an increasing number of COVID-19 LSRs being published, there have been no studies to assess their methodological and reporting quality, which are crucial for informing clinical practice and policy-making

What this study adds?

Our study aimed to evaluate the methodological and reporting quality of published COVID-19 LSRs and to identify factors that could affect their quality.

What is the implication?

Low-quality COVID-19 LSRs may undermine the confidence of clinicians and policymakers in the evidence, thereby hindering its translation into practice. This study serves as a reference for future researches of COVID-19 LSRs.

Introduction

The 2019 Coronavirus Disease (COVID-19) pandemic is still a major public health problem on a global scale. Subsequent evidence has confirmed that it’s caused by a novel coronavirus, initially referred to as 2019-novel coronavirus (2019-nCoV) by the World Health Organization (WHO) [1]. The WHO declared the COVID-19 outbreak a pandemic on February 11, 2020 [2]. As of September 19, 2022, there have been 610,393,563 confirmed cases of COVID-19 and 6,508,521 deaths reported to WHO [3]. Researchers worldwide are working diligently to understand COVID-19 as soon as possible. However, amidst the massive amount of published evidence, "false information" and "false conclusions" have emerged, forming an "Infodemic" that can increase clinician's workload and hinder problem-solving efforts [4,5,6].

Systematic reviews (SRs) and meta-analysis (MAs) are the results of a rigorous scientific process consisting of several well-defined steps, including a systematic literature search, an evaluation of the quality of each included study and a quantified or narrative synthesis of the results obtained [7]. SRs and MAs are often considered as the highest level of evidence in evidence-based medicine, as they can bridg the gap between clinical research and clinical practice [8, 9]. Healthcare decision-makers in search of reliable information increasingly turn to SRs for the best summary of the evidence [10, 11]. However, traditional SRs are not updated or updated at long intervals (Cochrane SRs require updates every two years), which is inadequate for the rapidly evolving COVID-19 pandemic [12, 13]. Inability to maintain currency under the COVID-19 pandemic may lead to significant inaccuracy [14].

Living systematic reviews (LSRs), proposed by Julian and his colleagues in 2014, are a unique type of SRs that are continually updated as new evidence becomes available [14, 15]. Studies under the COVID-19 pandemic meet exactly three conditions for conducting LSR: (1) The review question is a particular priority for decision-making; (2) There is an important level of uncertainty in the existing evidence; (3) There is likely to be emerging evidence that will impact on the conclusions of the LSR [14, 16]. Therefore, LSRs have become increasingly important under the COVID-19 pandemic.

Well-conducted SRs provide an excellent snapshot of evidence [17], conversely, poor methodological and reporting quality may reduce the confidence of clinicians and policymakers in the conclusions of SRs [18, 19]. Therefore, it is necessary to assess the quality of SRs before applying their conclusions to clinical or public health practice [20]. As a unique kind of SRs, the same holds true for LSRs, which are even more important under the COVID-19 pandemic. Although there have been studies assessing the methodological and reporting quality of COVID-19 SRs [21,22,23], to our knowledge, none have yet evaluated the quality of COVID-19 LSRs.

Therefore, the main objective of this study is to evaluate the methodological quality and reporting quality of LSRs on COVID-19, while the secondary objective is to investigate potential factors that may influence the overall quality of COVID-19 LSRs. The findings of this study will provide useful insights for the development of future COVID-19 LSRs.

Methods

A cross-sectional study was conducted. In this study, it is important to note that given the continued spread of COVID-19 and the rapid development of LSR, we did not have a published protocol prior to conducting this study. This study was reported according to the Strengthening the Reporting of Observational studies in Epidemiology (STROBE) guidelines [24]. (Supplementary material Appendix I).

Search strategy

Six databases including Medline, Excerpta Medica Database (Embase), Cochrane Library, China national knowledge infrastructure (CNKI), Wanfang Database and China Science, Technology Journal Database (VIP) were systematically searched. We searched the databases from their inception until December 9, 2021, and additional searches were conducted on May 13, 2022. The primary search terms included living systematic review, living meta-analysis, etc. (Supplementary material Appendix II). The sample size for this study was all eligible studies.

Given that preprints are not peer-reviewed and results may still change, we did not search preprint databases [25]. We acknowledge that COVID-19 LSRs may have multiple versions due to regular updates, hence we give priority to the version that provides more information.

Inclusion and exclusion criteria

Inclusion criteria: (1) The study type is a SR; (2) The title or abstract clearly identifies it as “living systematic reviews” (using this or similar terminology); (3) The clinical topic of systematic reviews is related to COVID-19.

Exclusion criteria: (1) Unavailable articles; (2) Withdrawn COVID-19 LSRs; (3) Living evidence map.

Selection and information extraction

The retrieved articles were imported into ENDNOTES X8 (Thomson Corporation, Thomson ResearchSoft, USA) software for removing duplicates and selection. The review authors (Jiefeng Luo and Zheng Liu) independently screened articles in duplicate, with any disagreements resolved by a third author (Zhe Chen). The article selection process involved several steps: first, we screened out obviously irrelevant articles based on their titles and abstracts; then, we assessed the remaining articles by reading their full texts.

The data extraction was conducted independently and in duplicate by the review authors (Jiefeng Luo and Zheng Liu), with any disagreements being resolved by a third author (Zhe Chen). The data extraction form was designed in advance based on the pre-extracted data from ten COVID-19 LSRs. The data extraction form included title, first author, year of publication, country and region of publication, journal of publication and eight factors that might affect overall quality of COVID-19 LSRs. These factors include impact factor (IF), number of authors, number of institutions, number of included studies, whether there were international collaborations (yes or no), whether authors stated their funding sources (yes or no), whether the study was pre-registered in any registration platform (yes or no), and whether authors reported compliance with the PRISMA statement (yes or no).

Methodological quality assessment

The methodological quality of COVID-19 LSRs was assessed independently and in duplicate by the review authors (Jiefeng Luo and Zheng Liu), with any disagreements resolved by a third author (Zhe Chen). Although there are numerous tools available for assessing the methodological quality of SRs, we opted for AMSTAR-2 due to its widespread usage and established validity and reliability [19, 26, 27]. AMSTAR-2 consists of 16 domains, of which Domain 2, Domain 4, Domain 7, Domain 9, Domain 11, Domain 13, and Domain 15 are critical domains. Answers for each domain include three options: "yes", "partial yes", and "no". The methodological quality of SRs was divided into four levels according to the following criteria: high (No or one non-critical weakness), moderate (More than one non-critical weakness), low (One critical flaw with or without non-critical weaknesses) and critically low (More than one critical flaw with or without non-critical weaknesses). It is worth noting that Domain 11, 12 and 15 would no longer apply if no meta-analysis has been performed. Considering that multiple non-critical weaknesses may reduce confidence in the review, we defined LSR with more than 4 non-critical weaknesses as "Low".

Reporting quality assessment

The reporting quality of COVID-19 LSRs was assessed independently and in duplicate by the review authors (Jiefeng Luo and Zheng Liu), with any disagreements resolved by a third author (Zhe Chen). PRISMA statement was used to assess the reporting quality of included COVID-19 LSRs. PRISMA is aimed to guide SRs for complete reporting and to improve the transparency and reporting quality of SRs [18]. While there are various PRISMA statement extensions available to facilitate reporting on different types or aspects of SRs, we have chosen to use the PRISMA 2020 statement as the assessment tool for reporting quality of COVID-19 LSRs [28]. This decision was made because we recognized that different versions of the PRISMA statement might result in incomparable items.

PRISMA 2020 statement includes seven sections (title, abstract, introduction, methods, results, discussions and other information) with 27 items, and each item was assessed as "yes", "partial yes", or "no" based on the degree of compliance with the reporting criteria. We calculated the number of "yes" responses for each COVID-19 LSR and defined that the larger the number of "yes" responses, the better the reporting quality of the COVID-19 LSRs.

Statistical analysis

EXCEL 2019 (Microsoft Corporation, WA, USA) was used to quantitatively analyze and qualitatively describe the included COVID-19 LSRs. For all categorical variables such as AMSTAR-2 levels, international collaborations (yes or no), funding (yes or no), pre-registration (yes or no), and PRISMA statement (yes or no), we used frequencies and percentages. For all continuous variables, including the number of "yes" responses of PRISMA 2020 statement, the number of "yes" responses of AMSTAR-2, IF, number of institutions, number of authors, and number of included studies, we used mean, median, standard deviation (SD), and range.

To investigate factors that could potentially affect the methodological quality and reporting quality in COVID-19 LSRs, we conducted linear regression analysis on eight factors as described above. We conducted the linear regression analysis in two steps: firstly, we performed univariate linear regression on the eight factors, and subsequently, we performed multiple linear regression on those factors with statistical differences. We defined factors with statistical differences in multiple linear regression as determinants of quality [29]. We used the variance inflation factor (VIF) to assess multicollinearity among study features, and a VIF ≥ 5 was considered highly correlated [30].

Results

Study selection

A total of 1,132 articles were initially included, and 1,043 articles remained after removing duplicate articles by ENDNOTES X8. Then 156 articles remained after excluding obviously irrelevant articles by screening the title and abstract. And finally, 64 COVID-19 LSRs were included by reading the full text [31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94]. The flow diagram of the screening process is presented in Fig. 1. The titles and reasons for excluded studies are presented in Appendix VI of the Supplementary Material.

Fig. 1
figure 1

Flow diagram of screening

Study characteristics

Table 1 summarizes the characteristics of 64 COVID-19 LSRs we included, with additional details available in Appendix III-V. Most COVID-19 LSRs were published in high-impact Science Citation Index (SCI) journals, with 23% having an IF greater than 10. LSRs involved multiple institutions and authors, with an average of 9.27 institutions and 14.53 authors per LSR. The number of included studies in LSRs ranged from 0 [37, 73]to 728 [67]. Many COVID-19 LSRs followed the PRISMA statement (78.1%), involved collaboration across multiple countries (59.4%), and were funded (81.3%). Additionally, a majority of the COVID-19 LSRs were registered (81.3%).

Table 1 Study characteristics

Methodological quality

The evaluation results of the 64 COVID-19 LSRs based on the AMSTAR-2 had an average of 13 "yes" responses, with a median of 11, a range between 6 and 16, and a standard deviation of 2.68. 50% COVID-19 LSRs were assessed "critically low", 23.4% were "low", 4.7% were "moderate", and only 21.9% were "high". Figure 2 displays the distribution of methodological quality levels.

Fig. 2
figure 2

AMSTAR-2 levels distribution

Domain 1, Domain 4, Domain 5, Domain 8, Domain 9, Domain 11 and Domain 16 were reported by more than 90% of COVID-19 LSRs. The worst methodological quality is Domain 10, with only 31.25% of COVID-19 LSRs reports. Figure 3 shows a heatmap of the assessment results of each domain in the 64 COVID-19 LSRs included by AMSTAR-2. From the figure, it is clear that LSRs adherence to critical domains was better than that of non-critical domains.

Fig. 3
figure 3

Heat map of AMSTAR-2

Reporting quality

The evaluation results of the 64 COVID-19 LSRs based on the PRISMA 2020 statement had an average of 21 "yes" responses, with a median of 21, a range between 13 and 27, and a standard deviation of 4.18. Figure 4 displays the PRISMA evaluation results for each item of the COVID-19 LSRs, presented as the percentage of "yes" responses.

Fig. 4
figure 4

Proportion of “yes” for each item in PRISMA 2020 Statement

More than 90% of the COVID-19 LSRs fully reported 9 items (Item 1, Item 3, Item 4, Item 5, Item 6, Item 8, Item 17, Item 23, and Item 26), whereas less than 50% fully reported 5 items (Item 14, Item 15, Item 16, Item 21, and Item 22). The "Title", "Rationale", "Objectives", and "Information sources" had the best reporting quality, with 98% of COVID-19 LSRs fully reporting them. On the other hand, the "Certainty of evidence" had the worst reporting quality, with only 41% of COVID-19 LSRs fully reporting it. Figure 5 shows a heatmap of the assessment results of each item in the 64 COVID-19 LSRs included by PRISMA 2020 statement. From the figure, it is clear that the sections with poor adherence were methods, results and other information.

Fig. 5
figure 5

Heat map of PRISMA 2020 statement

Results of correlation analyses

The results of univariate and multivariate analyses of the correlation between the eight factors and the overall quality of COVID-19 LSRs are shown in Table 2.

Table 2 Linear regression results of PRISMA statement and AMSTAR-2

The Table 2 showed that the number of included studies, and registration are associated with AMSTAR-2 levels, and these variables explained a total of 19.2% of the variation in AMSTAR-2 levels; the number of included studies and funding are associated with the number of “yes" in PRISMA 2020 statement, and these variables explained a total of 14.2% of the variation in the number of “yes" in PRISMA 2020 statement.

Discussion

The concept of LSR was proposed by Julian and his colleagues more than nine years ago, but previous studies on LSR were tepid until the outbreak of COVID-19, which triggered a surge in related research [95]. At present, the research methods of LSR are still under exploration. Therefore, it is of great significance to summarize and analyze the existing LSRs quality, and determine potential influencing factors. To our knowledge, this is the first study to assess the quality of COVID-19 LSRs and attempt to identify potential influencing factors. We believe that this is crucial in guiding the implementation of future LSRs under COVID-19.

The methodological quality of 73.4% of COVID-19 LSRs has been assessed as low or critically low. In Domains 10 and Domains 12, the compliance rate were below 50%. The content of Domain 10 is whether SR authors report funding information of the included studies, with a compliance rate of only 31.25%. Studies funded by corporations may be more biased towards the sponsor. Therefore, it is helpful for SR authors to extract and report the funding information of the included studies for readers to judge its influence on the SR. We recommend that future COVID-19 LSRs authors and journal editors adhere to the relevant requirements of Domains 10. Domains 12 assess whether SR authors used Risk of Bias (ROB) tools to evaluate the potential influence of individual studies. The compliance rates for this domain was 31.25%, indicating that a significant proportion of the COVID-19 LSRs included did not meet this criteria. When authors include RCTs of varying quality, Domain 12 becomes particularly crucial, as RCTs with a high risk of bias can distort facts and reduce the credibility of the evidence [96]. Therefore, we recommend that authors of COVID-19 LSRs employ regression analysis to assess the impact of bias on the results when including RCTs of different quality, or restrict the analysis to studies with a low risk of bias to observe the stability of the results.

On the other hand, the average of "yes" responses to PRISMA 2020 statement for each COVID-19 LSR was 21, which only accounted for 77.8% of all items, with items 14, items 21, and items 22 having compliance rates below 50%. Item 14 and Item 21 are whether the authors report "reporting biases" in the methods/results. We speculate that the reason for the low compliance rate for these two items is that in the early stages of conducting the COVID-19 LSR, the number of included studies was small, so the authors did not consider reporting biases (the Cochrane Handbook recommends using a funnel plot to test for reporting biases when including more than 10 studies). We recommend that the authors specify the method for testing "reporting biases" in the protocol before conducting the COVID-19 LSR. When the number of included studies is too small, the reason for not testing "reporting biases" should be explained in the results. Item 22 is to present assessments of the certainty or confidence in the body of evidence for each assessed outcome. Currently, the Grading of Recommendations, Assessment, Development and Evaluations (GRADE) tool is widely used to grade the quality of evidence for each outcome. Murad suggested that the GRADE rating results can be used as triggers for retiring LSRs from the living mode [97]. Therefore, we strongly recommend that authors of COVID-19 LSRs use the GRADE tool to grade the quality of evidence for each outcome.

Through linear regression, we have identified the factors that influence the quality of COVID-19 LSRs, including the number of included studies, funding, and registration (all of which are positively correlated). Given that including more studies may be associated with higher-quality COVID-19 LSRs, we recommend that authors conduct a more comprehensive search and make their utmost efforts to include all eligible studies. Existing evidence has confirmed the high academic value of gray literature during the COVID-19 pandemic [98]. Therefore, authors may consider including eligible gray literature, as appropriate, to enhance the quality of COVID-19 LSRs. Funding may mean a more diverse author team (e.g. methodologists, informaticians), and sustaining LSRs requires funding [99], so we suggest that authors should obtain as much funding as possible for their research. Many studies have shown that registration is positively correlated with SR’s quality [100,101,102], as it helps to avoid authors selectively reporting findings that favor publication. Therefore, we recommend that all authors of COVID-19 LSRs register and present their protocol on websites/journals.

Surprisingly, claiming adherence to the PRISMA statement did not improve the reporting quality of COVID-19 LSRs. We speculate that this may be due to almost half of the LSRs (46.9%) following the PRISMA 2009 statement (as the PRISMA 2020 statement was released in March 2021), which has significant differences from the PRISMA 2020 statement.

The potential association among international cooperation, number of authors, and number of institutions (i.e., more authors means more institutions, which necessarily implies more international collaboration) may be one of the reasons why they do not show a correlation with the quality of COVID-19 LSRs.

Similar to Zheng's findings [95], our study found that over 89% of COVID-19 LSRs were published in SCI journals (This figure was 76.8% in Zheng's findings), with more than 64% of these journals having an IF greater than 5. This reflects the importance that high-impact journals place on COVID-19 LSRs and facilitates the wide dissemination of evidence [22]. In Zheng's findings, over 97% of LSRs were published in English. Similarly, in this study, all the COVID-19 LSRs were published in English. This suggests that LSR authors prioritize international communication of their findings, which could facilitate overcoming language barriers in translating clinical evidence into practice across different countries.

Despite conducting a comprehensive search, we did not identify any studies evaluating the quality of COVID-19 LSRs. Only one study, published in 2023, evaluating the quality of different versions of LSRs was included for comparison in this study. A. Akl and his colleagues assessed the methodological and reporting quality of 64 LSRs (base version) published from February 2013 to April 2021 using AMSTAR-2 and the PRISMA 2009 statement, respectively [103]. The methodological quality of the two studies was generally consistent, except for domains 4, 8, 9,10 and 12. A comparison of the proportion of "yes" in each domain between this study and A. Akl's study for AMSTAR-2 is presented in Fig. 6. It is our speculation that the reason behind these variations is the significant differences in the inter-rater reliability of AMSTAR, which is influenced by the pairing of reviewers. Additionally, A. Akl's study only included 63% of LSRs that are COVID-19 related. Due to the incomparability between PRISMA 2009 statement and PRISMA 2020 statement, we did not compare the differences in reporting quality between this study and A. Akl's study. Unfortunately, the primary objective of A. Akl's study was not to evaluate the quality of LSRs but rather to describe their characteristics and understand their life cycles. Consequently, several important data points, such as the methodological and reporting quality results for each individual LSR, were not accessible. Therefore, A. Akl's study does not provide prescriptive guidance for LSR’s authors. In contrast, our study focuses specifically on COVID-19-related LSRs and the findings can offer valuable insights for future authors of COVID-19 LSRs.

Fig. 6
figure 6

Comparison of the proportion of "yes" in each domain between this study and A. Akl's study for AMSTAR-2

Strengths and limitations

Our study has several advantages. Firstly, we believe that we are the first to evaluate the methodological and reporting quality of COVID-19 LSRs. Secondly, our study identifies potential factors that could impact the quality of COVID-19 LSRs, which could inform future development of such studies. Thirdly, we conducted a systematic search, including an updated search in May 2022, to ensure all eligible COVID-19 LSRs were included. However, our study also has limitations. Firstly, the PRISMA 2020 statement acknowledges that applying it to LSRs presents some challenges, such as reporting key data during the production process (e.g., search frequency, screening frequency, update frequency). Secondly, we used the AMSTAR-2 tool to assess all included COVID-19 LSRs, but this tool is designed for assessing healthcare intervention SRs and is not suitable for evaluating the "living" domain in the production of COVID-19 LSRs.

Conclusion

Improvement is needed in the methodological and reporting quality of COVID-19 LSRs. Researchers conducting COVID-19 LSRs should take note of the quality-related factors identified in this study to generate evidence-based evidence of higher quality.