Introduction

Healthcare-associated pneumonia (HCAP) is a type of pneumonia that was described in the American Thoracic Society/Infectious Diseases Society of America (ATS/IDSA) guidelines in 20051. In Japan, a similar category was proposed in 2011 as nursing- and healthcare-associated pneumonia (NHCAP)2. Additionally, the ATS/IDSA 2019 guidelines recommended abandoning the category of HCAP and combining it with community-acquired pneumonia (CAP) to avoid unnecessary selection of extended antibiotic coverage3. However, the characteristics of the patients, the detection rate of multi-drug resistant pathogens, and pneumonia mortality rates are different between CAP and HCAP in some countries, including Japan4,5,6,7, and the concept of HCAP is still needed under the situation where there will be many elderly pneumonia patients in aging societies such as Japan. Therefore, proper prognostic tools are required to follow the appropriate clinical practice.

Severity assessment tools such as the pneumonia severity index (PSI) and CURB-65 find clinical use worldwide. In addition, tools like A-DROP and I-ROAD are widely used to predict mortality in Japan. PSI8,9, CURB-659,10,11, and A-DROP8,9,10,11 in CAP and I-ROAD12 in hospital-acquired pneumonia (HAP) are shown to have good predictive efficacies for mortality. But the utility of these tools for predicting mortality in patients with HCAP has been controversial because of conflicting reports7,13,14,15,16,17,18,19,20,21. The efficacy of these tools previously reported in patients with HCAP may be primarily influenced by the participants' social backgrounds, underlying diseases, and comorbidities. Therefore, studies targeting large populations are required.

We previously conducted a systematic review and meta-analysis on the utility of PSI and CURB-65 for predicting mortality in patients with HCAP22. We showed that these tools lacked significant capability in HCAP, though PSI may be slightly more useful than CURB-65. However, only a few studies were included in the meta-analysis (seven for PSI and eight for CURB-65). There was insufficient data for A-DROP and I-ROAD (accessed on July 16, 2015).

In this systematic review and meta-analysis, we re-validate the significance of PSI and CURB-65 and evaluate the usefulness of A-DROP and I-ROAD for predicting mortality in patients with HCAP, aiming to reveal the effectiveness of existing severity assessment tools.

Methods

Search and selection criteria

This study was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and the Meta-Analyses (PRISMA) statement and Meta-analysis of observational Studies in epidemiology (MOOSE) guidelines23,24.

We searched for studies using PubMed, Cochrane Library (trials), and Ichushi web database, and the following search words in PubMed were applied: “pneumonia [MeSH Terms]” AND (“healthcare associated pneumonia” OR “health-care-associated pneumonia” OR “healthcare-associated pneumonia” OR “nursing home acquired pneumonia” OR “nursing and healthcare associated pneumonia” OR “long term care facility” OR “extended-care facility”) AND (“severity score” OR “predict” OR “prognosis” OR “mortality score” OR “pneumonia severity index” OR “PORT score” OR “fine score” OR “A-DROP” OR “I-ROAD”) as previously reported22, while corresponding terms were used to search the Cochrane Library and Ichushi web database (accessed on August 22, 2022).

Inclusion and exclusion criteria

The inclusion criteria for eligible studies were as follows: prospective or retrospective studies targeting hospitalized patients with HCAP, nursing home-acquired pneumonia (NHAP) and/or NHCAP according to the 2005 ATS/IDSA guidelines1 and/or the 2017 Japanese Respiratory society (JRS) guidelines25, evaluating severity scores of PSI26, A-DROP27, I-ROAD12, or CURB-6528 and reporting mortality outcomes and raw data for the number of patients and deaths for any item of each severity grade, written in English or Japanese as original research articles. Exclusion criteria were as follows: studies involving children; case reports, conference reports, reviews; studies including patients who did not receive inpatient treatment in hospital because of possible significant biases for the treatment contents; studies with overlapping periods at the same medical institution; and studies lacking detailed data of namely true-positive, false-positive, true-negative, and false-negative values at any severity grade for mortality.

Data extraction and quality assessments

Two reviewers (SN and MK) independently assessed all the articles. The non-relevant studies were excluded based on the titles and abstracts after searching PubMed, Cochrane Library (trials), and Ichushi web database using the keywords, and the full texts of potentially appropriate titles and abstracts were further reviewed. The following information was collected from the included studies: geographic location, design, sample size, the mean age of participants, type of severity score, a common outcome, and mortality rate. The QUADAS-2, which includes four risk-of-bias domains and three domains of applicability29, was used to evaluate the risk of bias. Two investigators (SN and MK) evaluated the risk of bias using the QUADAS-2, and any disagreements were resolved by a third reviewer (NN) and discussed.

Severity grade of PSI, A-DROP, I-ROAD, and CURB-65

The detailed calculation parameters of these four assessment tools are demonstrated in Table 1. PSI26 is classified into a five-class according to total score of the prognostic factors and the severity grade was categorized into ≥ IV (moderate) and V (severe) when there was a total score of 91 or more and 131 or more points, respectively. A-DROP27 and CURB-6528 is a 6-point scoring system and “more than one point” and “ more than three points” for A-DROP and “more than two points” and “more than three points” for CURB-65 was categorized into ≥ II (moderate) and ≥ III (severe), respectively. I-ROAD12 is classified into three grades and it was categorized into severe when three or more prognostic factors of “Predictors of life expectancy” were applied. When less than three were applied, it was categorized as moderate when one or more of the “determinants of pneumonia severity” was positive, and it was categorized as low when none of the two applied.

Table 1 Calculation parameters of the PSI, A-DROP, I-ROAD, and CURB-65.

Outcomes

The primary outcome in this study was short-term mortality (28-day, 30-day or in-hospital mortality).

Statistical analysis

Paired forest plots and the pooled sensitivities, specificities, positive likelihood ratio (PLR), negative likelihood ratio (NLR) and diagnostic odds ratio (DOR) were calculated using the “midas” and “metandi” commands in the STATA 14 software (StataCorp LP, College Station, TX, USA), as previously reported22. In addition, the overall area under the curves (AUCs) of each severity assessment tools were calculated and compared with the Review Manager ver. 5.4 software. In eight studies14,18,30,31,32,33,34,35 where AUC was not described in the paper, the AUC was calculated based on the receiver operator characteristic (ROC) curves that were obtained from the raw data of the number of patients and fatalities for each severity grade using STATA 14 software. Statistical significance was set at a p-value of < 0.05. I2 statistics were used to evaluate the heterogeneity of the reported studies, as follows: 0–25%, low; 25–50%, moderate; 50–75%, high; 75–100%, very high.

Results

Database search and risk of bias assessment

A total of 2881 articles (PubMed 2276, Cochrane Library 134, and Ichushi web 471) were identified in the initial search, and 41 articles were potentially eligible after the first screening of the titles and abstracts. Next, the full text was reviewed, and 20 articles were excluded. Eventually, 21 observational studies were selected for this study (Fig. 1). The summary of the risk of bias using the QUADAS-2 in the included studies was shown in Fig. 2. For patient selection, three or two studies were evaluated as having a high risk of bias or high concern for applicability, respectively, because of the possibility of inappropriate exclusion or mismatched definition. In the patient flow and timing assessment, three studies were assessed to have a high risk of bias because of inappropriate omission or uncertainly of evaluation timing of reference standard.

Figure 1
figure 1

Flow chart of the study selection.

Figure 2
figure 2

Summary of risk of bias using QUADAS-2.

Included studies

The characteristics of the included studies are shown in Table 2. Among these 21 studies, five6,13,18,30,36 and 164,7,14,19,31,32,33,34,35,37,38,39,40,41,42,43 were prospective and retrospective studies, respectively. The category of pneumonia was eight7,19,31,33,36,38,39,42, seven13,18,32,34,35,41,43, and six4,6,14,30,37,40 for HCAP, NHCAP, and NHAP, respectively. Twelve4,7,13,14,19,30,31,36,39,41,42,43 studies for PSI, 124,7,13,18,19,32,33,34,35,38,41,43 for A-DROP, seven7,13,18,19,34,41,43 for I-ROAD, and 144,6,7,13,14,19,30,31,36,37,39,40,41,42 for CURB-65 were included in this study.

Table 2 Characteristics of included studies.

PSI

Twelve4,7,13,14,19,30,31,36,39,41,42,43 studies were included in the meta-analysis for the PSI score. Using a cut-off value of ≥ IV (moderate; n = 12), the pooled sensitivity, specificity, PLR, NLR, and DOR for mortality were calculated as 0.97 (0.94–0.98), 0.15 (0.10–0.21), 1.14 (1.08–1.20), 0.22 (0.12–0.38), and 5.09 (2.95–8.78), respectively (Table 3). Using a cut-off value of V (severe; n = 11), the pooled sensitivity, specificity, PLR, NLR, and DOR for mortality were 0.69 (0.60–0.77), 0.66 (0.60–0.72), 2.03 (1.82–2.27), 0.47 (0.38–0.58), and 4.32 (3.35–5.59), respectively. The forest plots and estimated sensitivities and specificities from each study are shown in Fig. 3a,b.

Table 3 Pooled characteristics of severity scores for predicting mortality.
Figure 3
figure 3

The paired forest plots of sensitivity and specificity for predicting mortality with PSI, A-DROP, I-ROAD, and CURB-65. Forest plots of sensitivity and specificity for mortality prediction with (a) PSI ≥ IV, (b) PSI V, (c) A-DROP ≥ I, (d) A-DROP ≥ III, (e) I-ROAD ≥ moderate, (f) I-ROAD ≥ severe, (g) CURB-65 ≥ II, and (h) CURB-65 ≥ III.

A-DROP

Twelve4,7,13,18,19,32,33,34,35,38,41,43 studies were included in the meta-analysis for the A-DROP score. Using a cut-off value of ≥ III (severe; n = 11), the pooled sensitivity, specificity, PLR, NLR, and DOR for mortality were 0.70 (0.62–0.76), 0.54 (0.45–0.62), 1.50 (1.33–1.70), 0.56 (0.48–0.66), and 2.66 (2.09–3.40), respectively (Table 3). The forest plots using these cut-offs and estimated sensitivities and specificities from each study are shown in Fig. 3c,d. In one study7, forest plots weren’t described in Figure because the data necessary to create it in both a cut-off value of ≥ I and ≥ III was insufficient.

I-ROAD

Seven7,13,18,19,34,41,43 studies were included in the meta-analysis for the I-ROAD score. Using a cut-off value of ≥ moderate (n = 5), the pooled sensitivity, specificity, PLR, NLR, and DOR for mortality were 0.92 (0.69–0.98), 0.44 (0.30–0.59), 1.66 (1.39–1.98), 0.18 (0.05–0.61), and 9.32 (2.86–30.3) respectively (Table 3). Using a cut-off value of severe (n = 7), the pooled sensitivity, specificity, PLR, NLR, and DOR for mortality were 0.67 (0.54–0.77), 0.63 (0.50–0.74), 1.78 (1.44–2.21), 0.53 (0.42–0.68), and 3.34 (2.35–4.75), respectively. The forest plots using these cut-offs and estimated sensitivities and specificities from each study are shown in Fig. 3e,f.

CURB-65

Fourteen4,6,7,13,14,19,30,31,36,37,39,40,41,42 studies were included in the meta-analysis for the CURB-65. Using a cut-off value of ≥ II (moderate; n = 13), the pooled sensitivity, specificity, PLR, NLR, and DOR for mortality were 0.91 (0.84–0.95), 0.28 (0.20–0.37), 1.26 (1.17–1.36), 0.33 (0.23–0.46), and 3.86 (2.74–5.44), respectively (Table 3). Using a cut-off value of ≥ III (severe; n = 14), the pooled sensitivity, specificity, PLR, NLR, and DOR for mortality were 0.63 (0.52–0.73), 0.63 (0.53–0.71), 1.70 (1.52–1.90), 0.58 (0.49–0.70), and 2.91 (2.34–3.62), respectively. The forest plots using these cut-offs and estimated sensitivities and specificities from each study are shown in Fig. 3g,h.

Comparisons of overall AUC among PSI, A-DROP, I-ROAD, and CURB-65

The overall AUC values were pooled from the AUC (95% CI) values reported in the included studies (Fig. 4). The overall AUCs were 0.70 (0.68–0.72), 0.70 (0.63–0.76), 0.68 (0.64–0.73), and 0.67 (0.63–0.71) for PSI, A-DROP, I-ROAD, and CURB-65 scores, respectively. No significant differences were observed (p = 0.66, I2 = 0%).

Figure 4
figure 4

Comparison of overall AUC for PSI, A-DROP, I-ROAD, and CURB-65.

Discussion

The present study evaluated the significance of PSI, A-DROP, I-ROAD, and CURB-65 for predicting mortality in HCAP patients. Our results indicate that these severity assessment tools cannot accurately predict mortality in patients with HCAP. In addition, there were no significant differences between these severity assessment tools.

It has been shown that PSI, A-DROP and CURB-65 in CAP and I-ROAD in HAP have high AUCs, nearly 0.8, for predicting mortality8,9,10,11,12. In this meta-analysis, the overall AUCs for these severity assessment tools for predicting mortality are 0.67–0.70, although only two reports showed high AUC values of over 0.8 for A-DROP38 and CURB-6541. AUC is often used to measure the accuracy in studies of severity assessment, and the discriminatory value based on AUC is evaluated as “poor” for 0.60–0.69, “moderate” for 0.70–0.79, “good” for 0.80–0.89, and “excellent” for 0.90–1.00, respectively44, although its criteria differ between studies45. In our study, PSI and A-DROP had “moderate” discriminative ability, while I-ROAD and CURB-65 showed “poor” discriminative ability when we follow this criteria. Overall, our results showed no significant capability for predicting mortality among the four assessment tools. Generally, patients with HCAP are highly heterogeneous, and their mortality is affected by various factors, including general conditions, laboratory data on admission to the hospital, comorbidities, antibiotic-resistant bacterial infections, and their social backgrounds, in addition, it may be also influenced by the rate of intensive care unit (ICU) admission and/or do not attempt resuscitation (DNAR); for example, the rates of ICU admission and DNAR were 0.9–26.4%6,13,14,18,30,31,37,39 and 24.0–55.6%7,18,37, respectively, although these numbers weren’t mentioned in all 21 studies. Thus, these severity assessment tools did not show enough predictive capability for mortality in HCAP patients. In addition, these results remained unchanged even when limited to NHCAP patients, although the comparison of severe grade between A-DROP and I-ROAD was only performed due to few studies evaluating PSI or CURB-65 (Supplementary Table S1 and Fig. S1).

This meta-analysis found no significant differences in overall AUCs between PSI and the remaining tools. PSI includes some comorbidities and physical and laboratory parameters as evaluation items and might be the best score for predicting mortality in the patients’ group with comorbidities such as HCAP46. In addition, the item “pH”, included in PSI and the SCAP score, is known as an indicator of metabolic acidosis under sepsis31. On the other hand, the item “age”, included in all of the severity assessment tools evaluated in this study, occupies a relatively large weight in PSI score but was not a significant risk factor for in-hospital mortality in NHCAP5. Further investigation is needed, but age and comorbidities may be overvalued in predicting pneumonia severity in elderly patients such as HCAP39. Furthermore, the influence of the general condition, such as “bedridden state” and “low serum albumin” as well as inflammatory biomarkers, such as “CRP level” and “neutrophil-to-lymphocyte ratio” has been shown for predicting mortality in elderly patients with pneumonia47,48. Therefore, these explain the low AUC value despite a large number of items, as the prognosis might be more strongly influenced by the ordinal general condition than the presence of comorbid diseases in these patients43. Similar to our results with low NLR, Chalmers et al. reported that PSI might be superior for identifying low-risk patients with low NLR (0.2 for ≥ IV and 0.5 for ≥ V) in patients with CAP49, although the AUC value in our results was low compared with that of CAP (0.82 for ≥ IV, 0.81 for ≥ V). Therefore, PSI may be useful for identifying low-risk patients in HCAP similar to CAP patients, and NLR below 0.1 is generally considered useful for diagnoses50.

A-DROP and CURB-65 are easy to use in daily clinical practice. However, these tools may not be ideal in patients with multiple comorbidities because these tools may underestimate the severity in the elderly patients with comorbidities51. In addition, most HCAP patients are over 65 years old, and the age index of A-DROP and CURB-65 might not be significant, although the utility of CURB, without the item “age”, was insignificant in patients with HCAP36,37. On the other hand, the results of this study showed that A-DROP and CURB-65 had almost similar predictive capabilities to PSI in the evaluation using overall AUC. PSI is relatively complex and often avoided in complicated environments such as an emergency room. Our results indicate that the predictive abilities of themselves were not enough to predict mortality, but A-DROP and CURB-65 can be one of the choices, instead of PSI, in clinical practice for HCAP owing to their evaluation conveniences.

Our previous study could not evaluate the utility of I-ROAD for predicting mortality in HCAP22 because there was only one report19 (accessed July 16, 2015). However, this study analyzed reports on I-ROAD published after 2015 (all Japanese studies). I-ROAD includes immunodeficiency and radiological findings, and these are a major difference from the other severity assessment tools, such as PSI, A-DROP and CURB-65. Indeed, it was reported that the prognostic ability of PSI and CURB-65 for mortality prediction in HCAP patients changed irrespective of immunosuppression36, consistent with our previous study22. In addition, radiological characteristics such as bilateral pneumonia were reported as independent risk factors for mortality in NHAP52. Although I-ROAD is not widely used outside Japan, it might be a viable choice in patients with HCAP since there was no significant difference between I-ROAD and other severity assessment tools although their low prognostic capability.

In addition to the major evaluation method listed above, there are various severity assessment tools such as the IDSA/ATS severity criteria13,31, M-ATS30,31, NHAP index16, NHAP model score14, qSOFA7,43, R-ATS rules30, SCAP31,36, SMART-COP16,31, SOAR14,31,37 and SOFA7, but none of them showed adequate prognostic capability. In Japan, sepsis evaluation using qSOFA and SOFA was recommended as the initial evaluation in the 2017 JRS guidelines for managing pneumonia in adults, in addition to severity assessment by PSI, A-DROP, or CURB-6525. Asai et al. demonstrated that SOFA scores in combination with qSOFA more accurately valuated the severity of HCAP7. On the other hand, it was reported that the evaluation based on clinical conditions such as malnutrition, acute mental status deterioration, health conditions requiring home care, recent hospitalization, and low BMI should be used for severity assessment52,53. We also showed the usefulness of combining hypoalbuminemia with the PSI or qSOFA, which increased the AUC for mortality from approximately 0.7 to 0.75 compared to PSI or qSOFA alone in NHCAP patients43. In addition, the efficacy of various serum biomarker such as the neutrophil to lymphocyte ratio, pro-adrenomedullin, prohormone forms of atrial natriuretic peptide, and heparin-binding protein for mortality prediction have been demonstrated in pneumonia patients54,55,56. Thus, combining new items might be needed to be considered for predicting mortality in HCAP patients.

There were some limitations in this systematic review and meta-analysis. First, the included reports had a large heterogeneity- a common drawback in meta-analyses57. In other word, this study had differences in each country, study design, category of pneumonia, study population, outcome, and the rates of ICU admission and DNR order, which we could not assess due to limited accessible data and a relatively small sample size. However, the heterogeneity in the HCAP population makes our findings significant. In addition, we evaluated only short-term mortality but evaluating the long-term mortality may be hoped in patient groups where a prolonged hospital stay is likely, such as HCAP cases. Second, the cut-off values of each assessment tools used for the AUC calculation may vary slightly in each report, but the cut-off values for the severity grade are generally defined in these four assessment tools and we believe that the influence on overall AUCs is therefore insignificant. Third, we could not evaluate the efficacy of A-DROP for scores more than “moderate” because many studies included in this analysis had a sensitivity of almost 100% and a specificity of 5% or less. But these results may indicate that the criteria of moderate grade in A-DROP do not have a mean for mortality prediction because most subjects, including those in the HCAP category, are adults aged 65 and above. Finally, this systematic review might have some selection bias due to the reason of limited searching database and languages included in search strategy.

In conclusion, the predictive role of PSI, A-DROP, I-ROAD, and CURB-65 for mortality was insufficient for predicting mortality in HCAP patients. We have described useful prognostic factors for mortality in HCAP patients, hoping to establish a more useful severity assessment tool with highly accurate prediction ability while considering the existing tools.