Introduction

The coronavirus disease 2019 (COVID-19) pandemic has affected several million lives, strained healthcare systems, and dramatically changed the lives of people, globally. Being a novel disease with dreaded consequences, desperate measures were tried in clinical practice, public health, and research settings to manage the condition, resulting in the prescription and promotion of several interventions without adequate supporting evidence [1]. The evidence base for pediatric management is even less robust, and often extrapolated from practices in adults. Governmental agencies, public health institutions, and professional societies, struggled to meet the challenges of the pandemic by rapidly developing guidelines to assist health care providers and the public [2].

However, guidelines should be produced through a formal process involving systematic search, critical appraisal, and synthesis of evidence on specific clinical questions [3]. Such guidelines produce evidence-based recommendations, coupled with judgements on the evidence strength and quality [2]. In contrast, the urgency created by the pandemic resulted in the publication of a large number of documents bypassing the formal process, but labelled as guidelines [3]. Sadly, these well-intentioned (and sometimes helpful) documents may not be entirely trustworthy, due to methodological limitations, or conflicts of interest [2]. In some situations, they may result in implementing strategies that are not only ineffective, but possibly harmful [4]. The situation is worse for pediatric guidelines owing to the limited quantity and quality of primary evidence in children. Therefore, many pediatric guidelines have either been extrapolated from adult studies or are based on ‘expert opinions,’ adversely affecting their reliability. Even well-developed guidelines are generally applicable for the healthcare settings for which they are developed. However, during the COVID pandemic, many guidelines developed for entirely different settings were freely extrapolated to other settings.

These issues necessitate a thorough appraisal of currently available guidelines to evaluate their quality. However, there are no data on the methodological quality of COVID-19 guidelines applicable to pediatric practice in India. This study was undertaken to address this knowledge gap. The objective was to identify guidelines related to COVID-19 management in children, in the Indian healthcare setting, and formally appraise their methodological quality. This would enable health care professionals (and even the public) to take into consideration the quality of guidelines (and hence their reliability), when implementing them.

Materials and Methods

A systematic search was conducted to identify guidelines published on any aspect of COVID-19 management in children that could be applicable in India. Applicability was considered based on (i) presence of the clinical problem in India, (ii) availability of the intervention in India, (iii) feasibility of implementing the intervention in India. Guidelines developed by Indian agencies were assumed to be applicable in India. The search was carried out through Medline (via Pubmed), and websites of the Government of India (GoI) Ministry of Health & Family Welfare, Indian Academy of Pediatrics (IAP), World Health Organization (WHO), American Academy of Pediatrics (AAP), Centers for Disease Control and Prevention (CDC), National Institutes of Health (NIH), National Institute of Health and Clinical Excellence (NICE), and American College of Rheumatology (ACR). Google and Google Scholar were also searched for additional guidelines. The searches were run without restricting the time limit, and updated till 30 April 2021. The search strategy and output from each database is summarized in Supplementary material S1.

Guidelines developed for children (< 18 y) including neonates, published in English, and available in the public domain were included. Guidelines that were not developed specifically for COVID-19, those developed for both adults and children wherein the pediatric component could not be separately analyzed, and those not designed for human application were excluded.

After eliminating duplicate publications, step-wise screening of the title was undertaken, followed by Abstract (or Introduction section), and then full text. Guidelines not fulfilling the eligibility criteria were eliminated.

Each included guideline was independently appraised by at least two trained appraisers using the “Appraisal of Guidelines for Research and Evaluation II” (AGREE II) tool [5]. This is the current gold standard for guideline appraisal. It comprises 23 items, categorized into 6 domains viz. Scope and Purpose (3 items), Stakeholder Involvement (3 items), Rigor of Development (8 items), Clarity of Presentation (3 items), Applicability (4 items), and Editorial Independence (2 items). As per the instruction manual [5], the appraisers scored each item on a 7-point scale ranging from 1 to 7. Thus, for each guideline evaluated across 23 items, the total score by a single appraiser could range from 23 to 161. As per the AGREE-II manual, the scores of independent appraisers were added, so that the total score for each guideline could range from 46 to 322.

Then the following parameters were calculated:

$$Overall\;score\;for\;a\;guideline=\frac{(Sum\;of\;scores\;of\;two\;appraisers-46)}{322-46}\times100$$
$$Domain\;score=\frac{(Sum\;of\;item\;scores\;for\;the\;domain-Minimum\;possible\;score\;for\;the\;domain)}{\left(Maximum\;possible\;score\;for\;the\;domain-Minimum\;possible\;score\;for\;the\;domain\right)}\times100$$

Scores were expressed as percentages.

Each appraiser underwent training on a set of guidelines (different from the included ones), to learn the process. Thereafter, a pilot appraisal was undertaken on another set of guidelines, before the appraisal of the guidelines included in this study. After the appraisers finalized their scores, items having an inter-rater difference in score of ≥ 2, were reappraised. If the difference persisted, the appraisers discussed their scores, and if required, arbitration was done by the senior author.

The median and interquartile range (IQR) of overall scores and domain scores were calculated. Spearman rank correlation coefficient (rho) was calculated for each guideline to assess the relationship of the scores assigned by the independent appraisers. Median (IQR) rho was also calculated across the included guidelines.

Three comparative analyses of the median (IQR) scores were undertaken, viz. (i) Indian vs. foreign guidelines, (ii) the first half of the included guidelines based on date of publication vs. the second half, and (iii) original versions vs. updated versions of guidelines having updates. Mann–Whitney test was used for the first two comparisons, and Wilcoxon signed rank sum test for the third.

This study was approved by the Institute Ethics Committee of PGIMER Chandigarh vide INT/IEC/2021/SPL-251 dated 13/02/2021.

Results

The systematic literature search identified 1526 guidelines, of which 1520 remained after removing duplicates. Among these, 1381 guidelines were excluded based on screening of titles, followed by abstract/introduction. The eligibility criteria were applied to the full text of the remaining 139 guidelines, resulting in elimination of 77 guidelines. Thus, 62 guidelines were included in this analysis (Fig. 1). Only 8 of these were published in India [6,7,8,9,10,11,12,13].

Fig. 1
figure 1

PRISMA flow diagram for selection of pediatric guidelines. AAP American Academy of Pediatrics; ACR American College of Rheumatology; CDC Centers for Disease Control; GoI Government of India; IAP Indian Academy of Pediatrics; NICE National Institute of Health and Clinical Excellence (UK); NIH National Institutes of Health; WHO World Health Organization

Of the 62 guidelines, 25 (40.3%) covered clinical diagnosis, diagnostic tests, clinical management, drug therapy, supportive care and preventive therapy. Another 25 (40.3%) were related only to prevention. Eight (12.9%) guidelines covered both clinical management and prevention.

The overall AGREE-II score of the 62 guidelines ranged from 4.7% to 72.8%, with median (IQR) 37.9% (29.4, 48.6). The median (IQR) domain scores are summarized in Table 1, and overall scores of the 62 guidelines are presented in Fig. 2. Only 3 (4.8%) guidelines had an overall score > 60%, the threshold commonly used to define ‘good’ quality, and only one guideline crossed the 70% threshold.

Table 1 AGREE-II scores of the included guidelines (n = 62)
Fig. 2
figure 2

AGREE-II scores of pediatric guidelines (n = 62) arranged alphabetically. The x-axis shows the guidelines and the y-axis the overall score (%)

Among the six domains, the highest median score was in Domain 1, followed by Domain 4. The lowest score was in Domain 6. However, the widest variation was also observed in Domain 6. Two-thirds of the guidelines crossed the 60% threshold for Domain 1 (Scope and Purpose), whereas only 3.2% achieved this for Domain 3 (Rigor of Development). Table 2 summarizes the distribution of the domain-wise AGREE-II scores. Supplementary material S2 presents the domain scores of each included guideline.

Table 2 Distribution of domain-wise scores of the included guidelines (n = 62)

Comparing Indian versus foreign guidelines (Table 3), the median (IQR) overall score of the former was slightly lower, but the difference was not statistically significant. However, the Indian guidelines had statistically significant lower scores for two domains viz. Rigor of Development and Applicability. This was despite the foreign guidelines themselves having low scores.

Table 3 Comparison of AGREE-II scores of Indian versus foreign guidelines; updated versions versus original versions of guidelines (n = 9); and first half versus second half of the guidelines

Nine (14.5%) guidelines had been updated during the study period. The median, overall and domain scores of these guidelines were higher in the updated versions, but the differences were not statistically significant (Table 3). Only 3/9 guidelines showed meaningful improvement in the critically important domain on ‘Rigor of Development’.

Comparing the AGREE-II scores of the first half (i.e., first 31) versus second half of the published guidelines showed no statistically significant improvement in any of the domain scores (Table 3).

The median (IQR) correlation coefficient between the scores of the two appraisers (Fig. 3) was 0.80 (0.69, 0.83), suggesting strong correlation. About 75% of the guidelines showed correlation coefficient ≥ 0.7. Only two guidelines had correlation coefficient < 0.50.

Fig. 3
figure 3

Correlation coefficient of the scores assigned by two appraisers for each of the 62 guidelines. The x-axis shows the guidelines, and the y- axis the correlation coefficient (rho)

Discussion

This systematic literature search identified several pediatric guidelines applicable to the Indian healthcare scenario, but the overall methodological quality was low. There were no significant differences between Indian and international guidelines, or those published earlier in the pandemic versus later. Even updated/revised guidelines did not show improvement in methodological quality. Among the various domains, the critical one reflecting rigor of development remained weak across the guidelines.

Early during the pandemic, two studies analyzing guidelines available by March 2020 and April 2020, highlighted the problem of low methodological quality [14, 15]. Although most guidelines clarified their scope and purpose, the methodological process was seriously compromised. As few as 4% guidelines were developed through systematic reviews of evidence, and most were based on informal expert consensus [15]. Almost similar findings were observed in another analysis that included guidelines published till August 2020 [16]. Analysis of guidelines restricted to various medical specialties such as Anesthesiology [17], and Surgery [18], as well as guidelines focused on specific aspects of COVID-19 management [19], also showed low methodological quality.

As far as the authors know, this study is the first systematic appraisal of a large number of pediatric guidelines. Previous studies were either limited in focus or approach [20, 21]. One study on 20 pediatric guidelines separately assessed their methodological as well as reporting quality [22], and found no guidelines having high quality. The methodological quality and reporting quality of most guidelines went hand-in-hand, suggesting that low methodological quality was not related to inadequate reporting alone [22].

The present findings in pediatric guidelines, that are aligned to observations in guidelines of other specialties, raise several important questions. First, is the problem unique to COVID guidelines? Most COVID-19 guidelines were developed under challenging circumstances of a raging pandemic by a hitherto unknown virus, urgent need for guidance despite lack of robust evidence, and the need for policy-makers and healthcare professionals to take action. However, this perception is belied by the authors’ observation that guidelines developed later during the pandemic also did not show better methodological quality, nor did those that were revised or updated. Further, these are specific processes for the rapid development of guidelines to meet public health emergencies [23, 24]. In addition, comparison of guidelines developed for viral outbreaks causing public health emergencies, showed that the methodological quality of COVID guidelines was inferior to those developed for outbreaks of SARS-CoV, MERS-CoV, Ebola virus, and Zika virus [25]. These diverse pieces of evidence suggest that the COVID-19 guidelines are somehow especially inferior in quality.

The second question is why this is happening. It could be because COVID-19 spread much faster (compared to previous outbreaks), affected the whole world (compared to limited geographies in other public health emergencies), and involved millions of people, pressuring Governments and healthcare systems to respond urgently. However, these theories still do not explain why more recent guidelines failed to reflect better methodological quality.

Third, what is the impact of poor methodological quality of guidelines? Some may argue that even if more robust methodology (consuming greater time and resources) had been employed, the resultant guidance may not have been different. This view is inappropriate for two reasons. Studies have shown that guidelines developed with inadequate methodology resulted in discrepant recommendations even for a limited set of interventions [22, 26], often promoting therapies later conclusively proven to be ineffective [22, 27] and recommended not to be used. The ethical, clinical and economic consequences of such practices have also been highlighted [1].

The authors wondered whether the AGREE-II tool could be somehow inadequate or inappropriate for COVID-19 guidelines, given the unique context and challenges. This question is relevant because guidance documents developed by prestigious guideline agencies also had low scores. In this context, it is worth noting that some items in the tool are open to interpretation, resulting in variable scoring. Further the tool itself lacks a provision for awarding an overall score. After completing the item-by-item assessment objectively, appraisers are expected to provide two subjective judgements viz. their assessment of the quality of the guideline, and whether they would recommend using the guideline [5]. Further, there is no absolute cut-off score that categorises guideline quality as ‘high’ or ‘low’, although higher scores reflect better quality, and vice versa. Despite this, some experts use cutoffs such as 60% as a minimum criterion to consider guidelines in their practice; and some classify 70% as the threshold for high(er) quality. On account of such challenges, some authors suggested alternate tools for critical appraisal of COVID-19 guidelines, with a focus on the implementation context [28, 29], but these also have limitations and inconsistencies.

What is the way forward? The authors believe that it is essential to improve guideline quality, with attention to methodological and other issues that can limit bias during development. Some issues may be relatively easier to resolve than others. For example, almost all the included guidelines scored poorly in the domain of ‘editorial independence’, on account of poor reporting of funding sources and sponsors. For guidelines developed without vested interests, this should be relatively easy to rectify. Other issues are more complex, requiring methodological expertise to frame questions, conduct systematic reviews, appraise evidence, and use it to develop recommendations.

Needless to mention, the process of evidence-based guideline development is resource intensive. Therefore, an alternate approach could be that guidelines need not be separately developed by each and every healthcare agency. Instead robust evidence-based guidelines developed by agencies recognized for producing trustworthy guidance, can be considered for implementation in local healthcare settings. The international collaborative ‘COVID-19 Recommendations Map project’ provides a detailed catalogue of methodologically-appraised guidelines and recommendations, through a single user-friendly portal [2, 30]. Instead of developing new local guidelines, stakeholders can enhance efficiency by accessing the available guideline recommendations and then deciding to adopt (without modifications), or adapt (with locally appropriate modifications) them.

This study had several strengths including systematic search for guidelines relevant to management of COVID-19 in children. The key aspect of guideline appraisal was conducted in a robust and transparent manner, minimizing several sources of bias. There are several limitations, notably searching through a limited number of databases, restricting to English language guidelines, excluding guidelines applicable to both adults and children, and curtailing the search till 30 April 2021. The nature of the AGREE-II tool, and its application by appraisers with varying levels of expertise and experience can also create observer biases.

Conclusions

This systematic search identified several guidelines for the management of COVID-19 among children in India. However, the overall methodological quality was low, raising concerns about the credibility of the guideline development process. The low methodological quality pervaded across Indian as well as international guidelines, those published during the early part of the pandemic as well as more recent guidelines, and even among updated versions of guidelines. The authors identified several issues that can be resolved, and suggested a way forward to improve the situation.