FormalPara Key Summary Points

Lower respiratory tract infection (LRTI) burden estimates increasingly rely on administrative databases using International Classification of Diseases (ICD) codes, but no standard methodology exists.

Our systematic review of validation studies showed that most ICD code-based algorithms miss a substantial number of hospitalized LRTI cases in adults, whether community- or hospital-acquired cases.

ICD-based algorithms containing only nosocomial- or pathogen-specific ICD codes to identify hospital-acquired and pneumococcal pneumonia have poor sensitivity.

ICD-based algorithms should include all pneumonia ICD codes, in particular those for unspecified organisms, to estimate the incidence of all-cause hospitalized LRTI and hospital-acquired pneumonia in adults.

Introduction

Lower respiratory tract infections (LRTIs) are among the top five leading causes of morbidity and mortality worldwide. The most frequent LRTIs are community-acquired pneumonia (CAP) and hospital-acquired pneumonia (HAP), which differ in etiology, treatment, and mortality [1, 2]. Pneumonia signs and symptoms vary while diagnostic tests, including chest X-ray, are non-specific, making clinical diagnosis challenging particularly in adults [3]. The identification of pneumonia cases is even more complex in studies based on real-world databases. LRTIs can be caused by a variety of viral, bacterial, and fungal etiologies. However, in a high proportion of LRTI no pathogen is detected (e.g. up to 62% of hospitalized adult CAP cases) [4], usually because of limitations of traditional microbiological methods such as blood and sputum cultures, prior antibiotic use, and single specimen testing [5,6,7].

Estimating LRTI incidence is a cornerstone in assessing the epidemiologic burden of disease and, subsequently, the potential impact of vaccination programs targeting respiratory pathogens. The most reliable incidence data are generated by population-based cohort studies following patients with LRTI in a clearly defined population and using clinical records for case identification. Although such designs remain the gold standard in assessing epidemiological trends of LRTI, they are time-consuming and difficult to generalize across clinical settings. As an alternative, epidemiological studies increasingly rely on administrative databases based on International Classification of Diseases (ICD) codes, which are used primarily for administrative or financial purposes, such as hospital financing. However, study methodologies vary, and their validity to identify LRTI cases and, therefore, to estimate LRTI burden in general, and CAP and HAP in particular, are largely unknown. Furthermore, existing literature on the LRTI burden focuses primarily on children rather than among adults.

To address this gap, we conducted a systematic review to summarize the accuracy of ICD code-based algorithms for identifying community and hospital-acquired LRTI cases and propose best practices for such analyses.

Methods

The study protocol was registered on PROSPERO (International Prospective Register of Systematic Reviews) with registration number CRD42022299634. This article is based on previously conducted studies and does not contain any new studies with human participants or animals performed by any of the authors.

Study Selection and Data Extraction

A systematic search was conducted in Medline, EMBASE, and LILACS databases for studies published between January 1996 and January 2022 that validated the use of ICD code-based algorithms for the identification of hospitalized LRTIs (including CAP) in adults ≥ 18 years of age. Studies specific to COVID-19-related LRTIs were excluded. Details of the selection process, inclusion and exclusion criteria, as well as the search strategy are provided in Tables S1 and S2. The selection of studies was carried out in two stages. First, title and abstract screening was performed by two reviewers, and discrepancies were resolved by a third reviewer. Full-text screening was conducted by one reviewer and 10% of non-selected articles were checked by a second reviewer. Hand searching was performed for the references of included articles and previous SLRs identified, and grey literature was searched on OpenGrey [8]. Besides study and algorithm characteristics, we extracted the following validation measures: sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and positive/negative likelihood ratio (LR+ and LR−), when available. When LRs were not available, they were calculated as follows: LR+ = sensitivity/(1 − specificity) and LR− = (1 − sensitivity)/specificity. The risk of bias was assessed using the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) [9], which was tailored to the review. Table S3 outlines the components of each domain.

Data Analysis

Data from each study were summarized descriptively, by main outcome. When validation measures were not provided and numbers were available, we calculated the missing indicators, in particular LR+ and LR−. Pooled analysis was not considered because we sought to establish best practices for this type of analysis rather than quantify incidence.

Best Practices

The identification of best practices for establishing ICD-based algorithms to estimate hospitalized LRTI incidence was based on the sensitivity, specificity, LR+ , and LR− of the algorithms, and not on predictive values to avoid biased statistics from varying disease prevalence [10]. We compared values from studies testing different algorithms and retrieved information from the discussion sections of included papers to collect lessons learned regarding ICD algorithms from individual studies to derive best practices.

Results

Study Description

Our database search identified 1697 unique references and 838 studies via hand searching, of which 26 studies were eligible and contributed to the final analysis (Fig. 1). These were conducted in ten high-income countries (Table 1), including 14 (56%) from the USA [11,12,13,14,15,16,17,18,19,20,21,22,23,24]. Six (23%) studies included a separate analysis for patients with comorbidities [14, 18, 22, 23, 25, 26], including chronic obstructive pulmonary disease and immune suppression. Overall, 24 studies reported on all-cause LRTI, including 17 on any or community-acquired LRTI (two on LRTI only, five on any pneumonia, nine on CAP, and one on empyema) [11, 12, 14,15,16, 18,19,20,21,22, 25, 27,28,29,30,31,32] and seven on HAP [23, 24, 26, 33,34,35,36] (Table S4). Three studies involved pathogen-specific LRTI (one covered both, Table 3) [13, 17, 32].

Fig. 1
figure 1

PRISMA diagram

Table 1 Characteristics and validation measures of ICD algorithms to identify LRTI in adults (excluding those specific to hospital-acquired infection)

Among the 53 algorithms, the reference standards used to confirm LRTI were based on either the review of medical files (63% of algorithms), including clinical, laboratory and/or radiological data, or on the diagnosis established by the treating physician (Table S4). Ten studies used reference standards based on explicit clinical criteria.

Overall, 22 of the 26 included studies were graded as having a risk of bias: 16 had one uncertainty, and 11 had one high risk of bias in at least one domain (see Table S5 and Fig. S1).

Characteristics and Accuracy of ICD Algorithms

The algorithms included ICD-9 (n = 17 studies) or ICD-10 codes (n = 9), alone or in combination with additional criteria, such as length of stay (n = 2) or free text search (n = 2) such as natural language processing. All studies, except three HAP studies, included ICD codes from the “classical” Pneumonia and influenza ICD group (ICD-9 480–488 or ICD-10 J10–J18), named here classical pneumonia codes. These were the only codes in seven studies, while ten studies included codes for pneumonia due to specific pathogens (ICD-9 001–139 or ICD-10 from A and B groups) and/or other respiratory codes (other codes from ICD-9 460–519 codes or ICD-10 J group than those of pneumonia above). Some algorithms included (n = 7, five CAP and two HAP studies) or excluded (n = 2 CAP studies) “Aspiration pneumonia” codes (ICD-9 507 or ICD-10 J69). Pneumonia classical codes were required to be in primary position only (n = 7), in any position (n = 12), or position not stated (n = 4). Three studies added codes of disease severity (e.g. respiratory failure and sepsis) in primary position when pneumonia codes were in secondary position. HAP studies included additional criteria such as specific codes for nosocomial infection and/or pneumonia codes not present at admission, see below.

Among the 26 studies, eight reported only PPV and/or NPV and were excluded from the analysis on algorithm performance (see “Methods”), and data are in Table S6. The remaining 18 studies reported sensitivity and/or specificity and the performances of their algorithms are described below by clinical outcome. These include 16 studies on all-cause LRTI (six on HAP only) and two on pathogen-specific LRTI.

Any or Community-Acquired LRTI, All Causes

Among the ten studies on non-HAP all-cause LRTI (i.e. excluding those focusing on only HAP), two, three, and five studies involved any LRTI, any pneumonia, and any CAP respectively (Table 1). The 18 algorithms did not differ across outcomes and included either ICD-9 (n = 15) or ICD-10 (n = 3) codes. Sensitivity was at least 80% in around 10/18 of the algorithms reporting it, while specificity was above 90% in 9/16 of them. LR+ was high in three-quarters of the algorithms (13/17, LR+ ≥ 5) and LR− was low in one-third of them (6/17, ≤ 0.20).

The three algorithms based on classical pneumonia codes in primary position only (including or not aspiration pneumonia) yielded a low sensitivity (range 55–72%) and high specificity (> 93%), with varying LR+ and LR− [11, 21, 32]. Sensitivity increased with minimal loss in specificity in algorithms that included codes of pneumonia severity (sepsis or respiratory failure) in the primary position when pneumonia was in a secondary position, by 15% compared to the above algorithm [11], and by 5% compared to the above algorithm combined with codes for other pathogens and respiratory codes [15]. Sensitivity increased by 13% when infection or other respiratory codes (such as empyema, pleurisy, or lung abscess) were added to pneumonia codes in primary position only, while specificity remained high [11]. In the four algorithms with pneumonia ICD codes in any position, sensitivity (57–98%), specificity (62–97%), LR+ (2–32), and LR− (0.02–0.46) varied [27,28,29, 32]; and no major change was observed when other infection or respiratory codes were added [27]. Sensitivity (85–89%) was high and LR− was below 0.2 when text search (text mining or natural language processing) was added to ICD codes, while specificity (78–98%) and LR+ varied [19, 31]. More complex algorithms, using predictors identified through analysis (such as length of hospital stay), reached a high sensitivity (81–89%) and lower LR− (around 0.2) but a lower specificity (63–82%) and LR+ (2–5) [21].

The influence of the reference standard used for case confirmation has been illustrated in one study, in which confirmation by radiological data only led to a drop in both sensitivity (from 98% to 89%) and specificity (from 97% to 62%) compared to chart reviews of medical files for the same algorithm [29]. The type of patients also had an impact on the performance values of algorithms when different groups were included, with higher sensitivity and lower specificity in older adults (≥ 65 years) compared to younger adults (18–64 years) [21, 29], and higher sensitivity in hospitalized patients compared to those seen at emergency departments [19].

Hospital-Acquired Pneumonia, All Causes

Among the six studies (nine algorithms) on HAP (Table 2), five involved any HAP (two with ICD-9 and three with ICD-10) and one based on ICD-9 included ventilator-associated pneumonia (VAP) only [23, 24, 26, 33,34,35]. Six algorithms included specific ICD codes for HAP or VAP [24, 34, 35], and five included classical pneumonia codes (without HAP-specific codes in three algorithms) [23, 24, 26, 33]. All nine algorithms had ICD codes in secondary or any position (or not stated), and six required the ICD codes to be not present on admission [24, 26, 34]. The three algorithms that used HAP/CAP codes alone with the present on admission criteria presented a very low sensitivity (≤ 25%), high specificity (≥ 98%), high LR+ (range 83–233) and poor LR− (0.75–0.77) [24, 34]. When classical pneumonia codes were added to the specific VAP code and present on admission criteria, sensitivity increased to 61% and LR− improved (0.42–0.47) while specificity slightly declined (83–93%) and LR+ dropped to 4–9 [24]. The three algorithms including only classical pneumonia codes displayed a higher sensitivity (35–100%) than specific HAP/VAP codes, high specificity (99–100%), high LR+ (44–333) and varying LR− (0.00–0.65) [23, 26, 33]. Algorithms showed similar performance in patients with continuous invasive mechanical ventilation as in the total patient population [24].

Table 2 Characteristics and validation measures of ICD algorithms to identify hospital-acquired pneumonia in adults

Pathogen-Specific LRTI

All three studies assessing pathogen-specific LRTI included pneumococcal pneumonia, one study covered ten other pathogens [13], and they included 26 different algorithms (Table 3), after excluding data for which the reference standard included possible or probable cases [17]. Pathogen-specific ICD-9 codes were included in all algorithms, in primary (n = 3 algorithms), in primary or secondary with severity codes (n = 12), or any (n = 11) position. Sepsis or bacteremia general codes were added in 16 algorithms. The reference standard was based on laboratory tests, with or without clinical and radiological criteria.

Table 3 Characteristics and validation measures of ICD algorithms to identify pathogen-specific CAP in adults

The performance of algorithms varied across specific pathogens [13], with sensitivity ranging from 14% for parainfluenza to 96% for influenza and specificity being always high (≥ 98%). In the three studies involving pneumococcal pneumonia, sensitivity of the pneumococcal-specific code (ICD-9 481) alone was low to moderate (35–58%), while specificity was high (98–99%). In one study, sensitivity was 45% when the pneumococcal pneumonia code was in primary position and 58% when it was in any position. In the same study, sensitivity increased from 58% to 89% when the codes for pneumonia from organism unspecified were added (ICD-9 485–486, any position) [17]. However, specificity declined in the latter, from 98% to 45%. The addition of the acute respiratory failure code (518.81) did not improve performance [17].

Best Practices Learned from Included Studies

Distinguishing Community-Acquired from Hospital-Acquired Pneumonia

In the included studies, the distinction between CAP and HAP cases was performed in the inclusion of pneumonia-suspected cases, in the algorithms applied to these, and/or in the reference standard. In all 26 studies, including those not providing sensitivity and specificity, the case definitions for the inclusion of suspected HAP or CAP frequently included a time threshold, i.e. diagnosis or medical information being obtained or reported within or after 24 or 48 h after hospital admission. At the algorithm level, seven of the nine CAP studies (10 algorithms) included codes (pneumonia, or severity codes—with pneumonia as secondary) in primary position only [15, 16, 18,19,20,21, 32]. Other criteria to exclude HAP were antibiotic prescription within 72 h after admission [22], and the exclusion of patients with major trauma or elective surgery [15]. In HAP studies, the algorithm criteria most frequently used to exclude CAP are pneumonia ICD codes in secondary or in any position and/or not present on admission [24, 26, 34,35,36], and the use of specific HAP codes, such as U69.00 (elsewhere classified, hospital-acquired pneumonia) [34, 35], and 997.31 for VAP [24, 33, 36]. However, the performance of the differentiating criteria has not been evaluated.

Algorithms Including Code Position

In the five non-HAP LRTI studies comparing the sensitivity and/or specificity of algorithms [11, 15, 19, 21, 32], adding primary codes of pneumonia severity (sepsis or respiratory failure) when pneumonia was in secondary position or adding other respiratory codes to classical pneumonia codes improved sensitivity with minimal losses in specificity [11, 15]. The benefit of using pneumonia codes in primary or any position is not clear for CAP: sensitivity slightly improved in one study but other accuracy measures were not available [32], and another study made additional changes than coding position [11]. Adding text search in the medical file, such as natural language processing, doubled the sensitivity relative to classical pneumonia codes alone in one study, with limited impact on specificity [19]. The benefit of adding or excluding the aspiration pneumonia code is also unclear, as studies applied other changes in algorithms [11, 21]. The comparison of performance between ICD-9 and ICD-10 versions is not available.

In the single HAP study comparing algorithms, adding classical pneumonia codes improved the poor sensitivity of specific HAP code, with an ensuing slight decrease in specificity [24]. In the pathogen-specific CAP studies, including any position and adding other pneumonia codes to pathogen-specific codes increased sensitivity [17, 32], but specificity declined when codes for organism unspecified were included [17].

Discussion

In this systematic review of 26 validation studies, we found that the sensitivity of ICD code-based algorithms for identifying LRTI cases among hospitalized adults was below 80% in two-thirds of studies, while specificity exceeded 90% in the majority. For all-cause LRTI studies, algorithms that included only pneumonia codes in primary position had a particular low sensitivity, which increased (with minimal loss in specificity) with the addition of codes for severe pneumonia, other LRTI-related infections, other respiratory diseases and free text search. The influence of coding position on sensitivity and specificity was unclear because algorithms differed in other characteristics that may have confounded this analysis. In HAP and pneumococcal-specific pneumonia, algorithms containing only nosocomial-specific or pathogen-specific ICD codes had poor sensitivity, which improved when broader pneumonia codes were added, in particular codes for unspecified organisms. The comparison of algorithms performance allowed us to derive some best practices that are summarized in Table 4.

Table 4 Summary of best practices for ICD-based algorithm to estimate the incidence of inpatient LRTI in adults, per clinical syndrome

Our review found that a majority of ICD-based algorithms reached a low sensitivity to identify LRTI cases among hospitalized adults. This finding is in line with two previous systematic reviews on hospitalized pneumonia, including one published during our review process [10], and one on hospital-acquired infections that included pneumonia [37]. The determination of the ICD codes that would improve the sensitivity of an algorithm is complex. The classical pneumonia codes were included in all algorithms, except four HAP-specific algorithms, but achieved low sensitivities overall. The addition of pneumonia severity codes (sepsis or respiratory failure) in primary position when pneumonia was in secondary position increased sensitivity by 5–15% with minimal changes to specificity in two studies on any pneumonia or CAP comparing different algorithms [11, 15]. The addition of other infections and/or respiratory codes was only explored in one study [11], in which sensitivity increased by 13% with the addition of other specific infections and respiratory codes. The limited impact of adding those codes was explained by the few patients who were assigned such codes, as these tend to be underreported in administrative databases [11]. Four studies reported the frequency of specific codes within the pneumonia and influenza group and revealed that the majority of ICD codes are rarely reported in hospitalized adult patients [13, 21, 29, 32]. One exception is the code for “organism unspecified” pneumonia cases (ICD-9 486 or ICD-10 J18.9), which was reported in 51–65% and 39–92% of LRTI or pneumonia cases in primary or any position, respectively, likely due to lack of etiological diagnosis at the time of coding [13, 21, 29, 32]. Therefore, it is essential to include the “organism unspecified” pneumonia code in studies aiming to measure LRTI incidence rates.

The influence of code position on sensitivity and specificity is poorly documented. Only one study on all-cause CAP compared pneumonia codes in primary vs. any position and reported a mild (8%) increase in sensitivity, but specificity was not reported [32]. Interestingly, several studies stated that pneumonia codes in secondary position better identify patients with severe pneumonia and underlying diseases [11, 27, 32], as pneumonia may be reported in secondary position only when it occurs in patients with other diseases, such as congestive heart failure or chronic bronchitis [11]. While not a formal validation study, one recently presented analysis in the UK found that ICD code-based pneumonia incidence using the first five positions correlated highly with estimates from a concurrent prospective study, but agreement decreases with age [38]. This suggests that algorithms aiming at deriving incidence rates of community-acquired LRTI cases in elderly patients should include pneumonia codes in any position, as this age group has a higher probability of presenting underlying diseases and severe pneumonia.

The distinction between CAP and HAP cases based on ICD codes is complex. In studies using the full medical file, the distinction is usually based on a time criterion, i.e. HAP is often defined as pneumonia with onset reported 24 or 48 h after hospital admission to be distinguished from CAP. Studies based on ICD databases only cannot apply this criterion because time after admission is not translated into ICD codes. There is also a lack of a specific code for HAP in ICD-9 and ICD-10, though some countries such as Germany and Switzerland introduced their own code. The ICD studies used different criteria to distinguish HAP from CAP, such as code position, specific HAP code, or pneumonia code not present at admission, but the performance of these criteria was not measured in these studies. Three studies provided some data on misclassification: 24% of all ICD-coded pneumonia cases were found to be HAP after clinical review [15], HAP represented 50% of misclassified cases in one CAP study [18], and CAP represented 25% of misclassified cases in one HAP study [35]. This highlights the need to introduce clear HAP codes in future ICD coding systems. The ICD-11 version, which has come into effect in 2022, includes an additional code for nosocomial origin (i.e. XB25) but it remains to be seen if it will be used in practice and how it will perform.

The added value of including or excluding ICD codes for aspiration pneumonia in algorithms is unclear. Aspiration, which is the inhalation of oropharyngeal or gastric content into the lower respiratory tract, may lead to aspiration pneumonitis, which is a chemical injury, and to aspiration pneumonia, which accounts for 5–15% of CAP cases [11]. Although there is some overlap between these syndromes, they are distinct clinical entities [39] but with similar microbiologic etiology [40]. Aspiration pneumonitis is often difficult to distinguish from CAP [39]. The seven algorithms including aspiration pneumonia codes covered different types of LRTI (any LRTI, any pneumonia, CAP, and HAP), those excluding it were CAP algorithms, and the performance of including or excluding it has not been measured separately. Its use is also controversial in the literature [11, 21, 22]. On one hand, including aspiration pneumonia codes may add patients who have pneumonitis instead of lung infection. On the other hand, its exclusion in CAP studies risks eliminating true CAP cases in nursing home residents or other frail patients [11]. Studies excluding aspiration codes may thus exclude a particular spectrum of patients.

Our review highlights the lack of a standard for the reference used to confirm true positive and negative in cases detected by ICD codes, as illustrated by the wide variety of criteria and processes, i.e. manual review of medical charts, diagnosis of the treating physician, both with or without explicit clinical criteria and standard forms. Only ten out of 26 used explicit clinical criteria in the process. A previous SLR also pointed to the variety of criteria and, in particular, to the lack of reproducibility and reliability of using the physician diagnosis of LRTI noted in the medical chart by manual review [10]. This process may be prone to subjective interpretation based on unspecific symptoms and/or clinical, laboratory, and/or imaging findings. However, LRTIs can have atypical presentations in the frail and elderly development and such events may be missed by standardized clinical case definitions. Further, ICD code incidence studies primarily aim to capture diagnoses disease in line with current practice which involved physician judgment. Specifically for methodological studies, greater standardization of the reference process would be helpful for future validation studies of ICD code algorithms to allow a quantitative combination of results among studies.

Our review of 26 eligible studies and 18 studies with sensitivity and/or specificity measures is the first one to our knowledge to cover any LRTI, from community-acquired as well as hospital-acquired origin. Our added value is to compare performance of the different algorithms to inform best practices and recommendations for future protocols for ICD-based studies to derive LRTI incidence rates. A main limitation is the lack of reproducibility and reliability of the clinical review used as reference standard to confirm pneumonia in a majority of validation studies. Standardization of criteria as a minimum, and prospective studies using explicit criteria to confirm LRTI, including laboratory testing to confirm pathogen-specific CAP, would address that limitation. The diversity of study characteristics and heterogeneity across patients (and their expected prevalence of pneumonia) also limited the inference of our results to other populations and settings. We also expect that our findings are biased by the influence of reimbursement rules, as many countries base their hospital financing on Diagnosis Related Groups which include ICD codes, but we found no data to estimate the extent of this influence. The poor quality of some studies (41% in high risk of bias in at least one domain) limits the robustness of our conclusions. Lastly, many studies have identified that respiratory pathogens play a causative role in broader group of cardiopulmonary conditions than LRTI (e.g. chronic obstructive pulmonary disease and chronic heart failure) [41], and the recommendations here are only intended to capture the subset of respiratory pathogen illness recognized as LRTI by the treating clinicians; the extended impact of these would be best captured by time-series modelling studies [42, 43].

Conclusion

Our systematic review highlights that many studies of ICD codes to detect LRTI cases among hospitalized adults, and HAP in particular, while being relatively specific, miss a substantial number of cases as shown by their poor sensitivity and can negatively impact burden of disease assessments and related assessments of the utility of preventive interventions. Best practices to estimate LRTI incidence in this population include the use of ICD codes from the pneumonia and influenza group for any subtype of LRTI, including HAP and pathogen-specific groups. The addition of codes for severe pneumonia, specific pathogens or other respiratory entities close to pneumonia may improve sensitivity but would capture a limited number of additional cases. Algorithms targeting elderly patients should prefer the use of pneumonia codes in any position to better capture those with underlying diseases and/or severe pneumonia. Future validation studies of ICD algorithms should aim at a higher quality, in particular, the use of a more objective and evidence-based gold standard. Further comparison of algorithms and their performance based on new and higher quality studies may help to derive more robust recommendations to measure the LRTI burden based on ICD codes.