Background

Despite the large burden of lower respiratory infections globally [1], it is difficult to estimate the proportion of the hospitalizations attributable to influenza and respiratory syncytial virus (RSV) across countries or over time. Heterogeneous coding practices in hospital records across countries limit the comparability of administrative datasets from different locations and pose a challenge to producing global hospitalization estimates using influenza and RSV-coded inpatient admissions alone. Without the addition of laboratory test result data, administrative data may not accurately estimate inpatient disease burden, further complicating efforts to model burden at the population level. Absent accurate population estimates of the burden of specific respiratory diseases, it will be challenging to conduct cross-country comparison, a hallmark of linking health policies (e.g., masking, vaccination campaigns) to outcomes.

The Burden of Influenza and RSV Disease (BIRD) project has developed an alternative method that may be useful for producing estimates of county-specific influenza and RSV burdens using administrative hospitalization data. This method generates rates of influenza and RSV-related acute lower respiratory illness (ALRI) hospitalizations across 44 countries by modeling the proportion of ALRI hospitalizations specifically attributable to RSV and influenza from literature estimates of laboratory-confirmed influenza and RSV among ALRI hospitalizations. The model can be applied to administrative data on country-specific influenza and RSV utilization. By comparing the results of the BIRD project method to those produced by raw extraction of ICD-coded RSV and influenza admission rates, we can estimate the potential under-attribution of ALRI to these specific causes.

Methods

At a high level, this study estimates influenza and RSV admission rates by modeling the proportion of ALRI admissions that are due to influenza and RSV respectively, and then multiplying these proportions by ALRI admission rates from clinical administrative data. Figure 1 below is a detailed flowchart of the processing steps used in this analysis, and each step is described in further detail in the following sections.

Fig. 1
figure 1

Flowchart of ALRI admission processing and meta-analysis modeling. Flowchart of data processing and analysis conducted under this study. This diagram describes processing of ALRI admissions from clinical administrative data as well as the modeling and processing performed on RSV and Influenza meta-analysis proportions

ALRI admissions calculation

We extracted admission counts for ALRI from 29 inpatient all-cause admission datasets covering 44 countries and containing hospitalizations spanning the years 1990 to 2017, stratified by age in years or age groups depending on the source. These datasets included approximately 43 million admissions and represent all ICD-coded inpatient admission data used in the Global Burden of Disease Study, an international collaborative study led by the Institute for Health Metrics and Evaluation (IHME) at the University of Washington and supported by over 4800 researchers in more than 140 countries [1]. Additional detail on inpatient data from IHME is listed in Additional file 1. Because only 11 of the 44 datasets utilized in this study recorded secondary diagnoses, ALRI admissions were defined as those with a primary diagnosis code listed in Table 1 below.

Table 1 Acute lower respiratory infection ICD codes

The majority of clinical datasets in this analysis contain a subset of the country’s total inpatient utilization. For these non-comprehensive clinical sources, counts of ALRI admissions by age were divided by the total number of admissions in the dataset to produce age-specific proportions of inpatient utilization that have a primary ALRI diagnosis. This proportion is multiplied by IHME’s total inpatient utilization envelope to approximate a comprehensive rate of ALRI utilization by age and country. The envelope is produced using a spatio-temporal Gaussian process regression that smooths over geographic distance and year of hospitalization and that models admission rate per capita by age using IHME’s healthcare access quality indicator, supply of inpatient hospital beds, and all-cause mortality as predictive covariates. More detail on the envelope estimation process, covariates used in the model, and results can be found in related Global Burden of Disease (GBD) publications [1].

The UK Hospital Episode Statistics dataset [2] and Healthcare Cost and Utilization Project National Inpatient Sample (HCUP NIS) [3] are considered comprehensive datasets and the scaling described above was not applied to these sources. Instead, counts of admissions with a primary ALRI diagnosis in these sources were divided by the total population of that country to produce rates of ALRI admission by BIRD age group and year. Population estimates are produced as part of IHME’s GBD study and detailed information on the methods to produce these estimates are available in related publications [1].

Most clinical administrative data is provided in age in years or occasionally in various aggregated age bins. The age groupings used for the BIRD analysis were at a higher level of aggregation than the majority of administrative sources used. Therefore, the final step in ALRI admission processing was to aggregate rate-space estimates to the BIRD analysis age groups, by summing both the numerator and denominator so that the rates of ALRI utilization are binned appropriately to match the rest of the analysis.

While many of the data sources used in this analysis are also used in creating annual GBD estimates, there were some differences in data processing methods between the two projects that led to different estimates of rates of ALRI. GBD analysis adjusts inpatient data to account for readmissions, potential missingness of secondary inpatient diagnoses, unavailable outpatient data, and healthcare access and quality for every location. It aggregates inpatient data with claims and outpatient data to produce estimates of individuals who received any care for an ALRI diagnosis. Because this study was primarily focused on inpatient diagnoses of influenza or RSV, these additional corrections were not applied.

Influenza and RSV proportion estimation

Influenza and RSV admission rates were estimated by modeling the proportion of admissions for ALRI that were attributable to each cause respectively, and then estimating the proportion of total ALRI hospitalizations represented by these diseases, stratified by age, year, and country. The meta-analysis for this model included 156 independent studies on influenza-associated hospitalization rates covering 46 countries with data between 1979 and 2015 for influenza [4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159], and 204 studies on RSV admission rates covering 56 countries with data between 1982 and 2017 [4, 19, 73, 107, 133, 146, 160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356]. Sample size of the study, age range, and location in study cohort, total admissions for ALRI, and admissions for influenza and RSV respectively were extracted from each study. The proportion of ALRI admissions due to influenza and RSV were calculated for each location, age, and year present in the input study data.

A Bayesian regularized trimmed meta-regression (MR-BRT) model was generated using ALRI admission meta-analysis data to produce estimates of the proportion of ALRI admissions due to each cause while accounting for within-study heterogeneity by age and location as well as error and bias between sources. Within the MR-BRT framework, the trend over age was modeled as a cubic spline with linear tails on the youngest and oldest age groups and an uninformative Gaussian prior. Linear tails on the age ends were used to smooth behavior of the age pattern at the poles in cases of sparse data, which can be highly unstable in MR-BRT modeling.

Location was used as a covariate at the IHME Global Burden of Disease’s super-region and regional levels, to account for potential geographic variation while informing estimations for locations with sparse data by the trend of those with a larger input evidence base. Region was used as a proxy for country-level heterogeneity in order to produce estimates where meta-analysis data was available and admissions data was not or vice versa. IHME’s regional categorization by country is available in related literature. Both region and super-region were modeled as a fixed effect with an uninformative Gaussian prior on each. The hierarchical structure of the super-regional and regional models results in child models that follow the same age trend as those of the parents.

The equation for the influenza and RSV MR-BRT models is shown in Eq. 1 below. Detail on the assumptions made by the mixed effects framework, the use of cubic splines on fixed effects, and estimation of the posterior using maximum likelihood estimation are available in related literature [357]. The MR-BRT framework is an R wrapper for the open source mixed effects LimeTr package, which could be used to replicate the modeling methods described here [358].

$$ \ln \left({p}_{\left( flu\ \right| RSV\ \Big),i,j}\right)=\mathrm{spline}\left({\mathrm{age}}_{i,j}{\beta}_1\right)+\ln \left({\mathrm{region}}_{i,j}{\beta}_2\right)+\ln \left(\mathrm{super}\ {\mathrm{region}}_{i,j}{\beta}_3\right)+{Z}_i{u}_{i,j}+{\epsilon}_{ij} $$
(1)

Where p(flu |RSV), i, j is the proportion of ALRI admissions that are positive for flu or RSV in observation i for study j, agei, j is computed using a spline based matrix for age midpoint, regioni, j and super regioni, j are the fixed effects on GBD region and super region, Ziis a linear map, ui, jare the random effects from meta-analysis study j at observation i, and ϵij are measurement errors with a specified covariance.

A hierarchical method was chosen a priori for this analysis as it allowed us to produce estimates for locations with little or no meta-analysis data while still accounting for location-specific randomness in meta-analysis estimates. In the final results of this analysis, location-level estimates maintain age heterogeneity based on the differences of age patterns for ALRI admission rates by each location.

Bootstrapping was performed by taking 1000 samples on the posterior of the MR-BRT model, and uncertainty from the samples was propagated through the remainder of the estimation process as 95% credible intervals.

Final admission rate estimation

Admission counts and rates for influenza and RSV were calculated by multiplying the proportions from the influenza and RSV mixed effects attribution models to annual ALRI admission count estimates by age group and location. Seasonality was excluded from the scope of this analysis because seasonal information was not consistently available in influenza and RSV meta-analysis literature. Each location with clinical data received the attribution model fit for the corresponding GBD region, unless no input data for the model existed, in which case an average of the models within the GBD super-region was used. Uncertainty was quantified using the upper and lower uncertainty interval from the fit of the mixed effects model. Due to meta-analysis data sparsity in older ages for the RSV attribution mixed effects model, admission rates and counts for RSV were only calculated for children under five.

Influenza and RSV-coded primary admissions were extracted from a subset of clinical administrative datasets as illustrative scenarios in order to compare results of the BIRD analysis to direct ICD extraction with no adjustments. ICD codes used for this comparison can be found in Additional file 2. All locations used to illustrate the comparison contained at least 4-digit ICD detail, which was required to identify primary admissions for RSV.

To assess the limitation of using primary diagnosis alone for ALRI admissions, we extracted non-primary diagnosis detail from the HCUP NIS data which was used to produce US estimates [3]. Diagnosis levels available in HCUP NIS vary by state, but all available diagnosis detail up to the 30th inpatient diagnosis was included for this analysis. We compared primary and non-primary utilization for the year 2012 from this dataset, and applied influenza-attributable proportion estimates to the complete dataset in order to generate a comparison of influenza rates that include non-primary hospitalizations. We focused specifically on influenza for this sub analysis because of the substantial ALRI utilization as non-primary diagnosis in older ages, as there may be competing complications that would end up coded as primary discharge diagnosis in this population [359,360,361,362].

Results

Figures 2 and 3 represent the number of sources of meta-analysis data for the proportion of ALRI admissions attributable to influenza and RSV, respectively. Meta-analysis sources varied in their age ranges and granularity, sample size, and the time range over which studies were conducted. All meta-analysis sources were used to inform the meta-regression analyses as described above.

Fig. 2
figure 2

Map of influenza meta-analysis source data. Influenza meta-analysis data availability by country

Fig. 3
figure 3

Map of RSV meta-analysis source data. RSV meta-analysis data availability by country

Metadata about each of IHME’s inpatient data sources is available in Additional file 1. Only the inpatient sources that were ICD-9 or ICD-10 coded were used in this analysis. While all sources listed had sufficient ICD detail to extract ALRI utilization rates, not all locations with inpatient admission data have at least 4-digit ICD coding which is required to identify RSV cases by ICD diagnosis alone (see Additional file 2 for the list of 4-digit RSV codes).

Figure 4 shows the proportion of ALRI admissions attributable to influenza and RSV at the super-regional level. Due to limited meta-data availability in older ages for RSV as seen in the figure, admission rates for RSV were only estimated for the under 1 and 1 to 4 year age groups. Data for selected regions are tabulated in Table 2 below.

Fig. 4
figure 4

Proportion of ALRI admissions attributable to influenza and RSV. Influenza and RSV proportion models and meta-analysis input data for all IHME super-regions. Data point and model line colors reflect the GBD super region. Size of data points is scaled by the standard error of each datum

Table 2 Proportion influenza and RSV positive by GBD super-region

In these results, influenza represents a significant proportion of ALRI admissions in individuals aged 15 to 55 years, and a lower proportion in the oldest and youngest age groups. Conversely, RSV represents over 30% of all ALRI admissions for infants under 1 year and over 18% for infants aged 1–4, but the proportion of ALRI admissions attributable to RSV drops dramatically in age groups beyond the age of 5 years.

Comparisons of admission rates calculated through the BIRD analysis versus those coded directly with influenza and RSV ICD codes for locations with sufficient ICD granularity are shown in Figs. 5 and 6, and tabulated in Tables 3 and 4. For almost all age groups, the methods as described in this paper estimated a higher national admission rate than the rate of directly coded influenza or RSV admissions in the same inpatient sources. Many inpatient data sources used at IHME are coded only to three or four digits, in which case it is less accurate or even not possible to estimate RSV admission rates. Detail on inpatient clinical sources and ICD granularity is listed in Additional file 1, and the ICD codes used to determine influenza and RSV inpatient admissions are listed in Additional file 2. The full dataset of BIRD estimates of influenza and RSV admissions by age, year, and country are available in Additional file 3.

Fig. 5
figure 5

Influenza admission rate by BIRD analysis and ICD coding. Influenza admission rate per 100,000 people by age as produced by BIRD analysis (blue) and simple raw ICD code extraction (yellow). 95% CI shown for both estimates

Fig. 6
figure 6

RSV admission rate by BIRD analysis and ICD coding. RSV admission rate per 100,000 by age as produced by BIRD analysis (blue) and simple raw ICD code extraction (yellow). 95% CI shown for both estimates

Table 3 Influenza rates by BIRD analysis and ICD code extraction for select locations
Table 4 RSV rates by BIRD analysis and ICD code extraction for select locations

As non-primary diagnoses were not available for the majority of sources of inpatient admission data, only primary diagnosis was used to expand the number of useable sources and retain consistency across locations. We conducted a sensitivity analysis comparing the average primary and non-primary admission rates for ALRI in the USA from 2002 to 2012 to illustrate the potential impact of limiting the analysis to ALRI as primary diagnosis only.

Influenza admission rates in the USA by primary-only diagnosis and primary and non-primary diagnosis are shown in Fig. 7. The impact of non-primary diagnoses was a 1.4-fold increase in rates estimates for children < 1 year, and nearly a 2.5-fold increase in rates estimated in the 18–49, 50 to 64, and 65 plus age groups.

Fig. 7
figure 7

Primary versus nonprimary influenza admission rate in the USA, 2012. Influenza admission rate by diagnosis position, in US HCUP NIS data. Uncertainty is capped in order to show estimated age pattern

Discussion

While influenza and RSV-associated healthcare utilization is acknowledged as a global problem, gaps in quantifying the magnitude of this problem exist due to lack in representative data availability across locations that makes assessing admission rates within or across countries challenging. Traditional methods of burden estimation based on laboratory-confirmed cases are not possible in most settings because testing patients with ALRI is not routine care. This analysis utilizes clinical administrative data which is widely available across countries, and presents a means of utilization estimation that can be more robust than direct ICD extraction alone. The approach, however, has important limitations for influenza when considering older adults.

Although the true burden of RSV in children is unknown, estimates of RSV admission rates from this study are generally consistent with published literature on RSV hospital utilization in children under 5. Shi et al estimate hospital admission rates of 26.3 (22.8–30.2) per 1000 in children aged 1–5 months, 11.3 (6.1–20.9) per 1000 in children 6–11 months, and 1.4 (0.9–2.0) per 1000 in children 12–59 months old in World Bank High Income countries [363]. Reeves et al. found admission rates for RSV of 35.1 (32.9–38.9) per 1000 in children under 1 year and 5.31 (4.46–6.59) per 1000 in children age 1–4 years old in England [364]. Estimates from the BIRD analysis as shown in Table 4 are lower in high-income settings for children under 1 year of age than either study, but fall between estimates of older children as described in the literature. Further discussion and comparisons of the results of the BIRD analysis for RSV to other RSV estimation methods are available in related literature [365].

Our estimated admission rates for influenza are generally an underestimate of rates previously published, particularly in the 65+ age group [366, 367]. For the USA and Sweden at age 65+, the simple extracted ICD-coded admission rate from administrative datasets surpasses the rate produced by this study. The inclusion of non-primary diagnoses did increase estimates for influenza in the USA by more than 50%. Nonetheless, these rates are still lower than those produced by comparable studies in the oldest age group. Previous studies estimate that anywhere between 39.5 and 96.6% of all admissions across all ages for influenza have a primary diagnosis related to influenza, and the relative proportion of burden as a primary diagnosis in this analysis fall within that range [359,360,361,362]. While using only the primary diagnosis allowed us to maintain consistency with the 33 sources containing only primary diagnostic detail, future iterations of this method should consider inclusion of non-primary diagnoses for more comprehensive utilization estimation, if at the expense of geographic coverage.

Estimates of the proportion of influenza-positive adults age 65+ were also generally lower than existing literature. Jain et al. estimate that 4% adults aged 65–79 years and 5% adults 80 or older hospitalized for pneumonia in select US cities test positive for influenza [32]. Monto et al. report that 10.9% of adults aged 50 or older presenting with acute respiratory illness are influenza positive, in a study of families in Ann Arbor Michigan over 3 years [69]. Our analysis estimates 1.9% (0.02–8.4) of ALRI admissions in ages 65+ in IHME high-income settings are influenza positive cases. While the upper bound of this estimate more closely aligns with existing published literature, the proportion positive estimated from the BIRD project is low because of data sparsity in oldest ages. The age spline method used in the MR-BRT analysis depends on age midpoint of meta-analysis input data instead of accounting for an age range, which narrows the number of estimates representing older ages. Inclusion of additional meta-analysis data and incorporation of more sophisticated age range splitting could produce more robust proportion estimates in older ages.

The methodology employed by this analysis is comparable to previous burden estimates for influenza produced by IHME in the application of a proportion model to estimates of total lower respiratory infection [368]. However, estimates from the BIRD project were formed using a categorical approach that did not account for the relative risk of ALRI in cases of confirmed influenza or RSV. Instead, the proportion of ALRI hospitalizations was assumed to be a proxy of total utilization. Additionally, the BIRD analysis focuses exclusively on inpatient hospital utilization instead of incidence or mortality, which reduced the assumptions made about how trends in utilization can be extended to other metrics. Finally, the hierarchical method of modeling proportion positive by region and super-region was a novel approach used in burden analysis to allow for estimates in locations with sparser meta-analysis data to have more robust proportion estimates over age. IHME’s GBD global influenza admission rate estimates were higher than most of those predicted for countries included in BIRD analysis, at 123.8 per 100,000 (CI: 48.5–300.2) across all ages as compared to BIRD all-age rates of 29.7 per 100,000 (CI: 3.64–101.7) in the USA to 195.81 (183.88–207.74) in the Philippines.

This study met limitations that are consistent with any analysis developed from clinical administrative data. Availability of inpatient admissions data in some lower- to middle-income countries and meta-analysis data for RSV in older children and adults limited the scope of this analysis, and additional sources of both types of data would improve accuracy of estimates. Availability of inpatient data and proportion meta-analysis at a seasonal or monthly granularity would allow for more relevant analysis during peak influenza and RSV seasons. Additionally, we encountered technical limitations in handling of meta-analysis with point estimates for proportion positive spanning large age ranges, and in the assumption made that influenza and RSV proportions across countries will follow the same pattern over age. Finally, the rates estimated in this analysis represent utilization rates of influenza and RSV present in individuals who have a primary admission diagnosis of acute lower respiratory infection. Accounting for non-inpatient care including urgent or emergency departments and adjustments for non-primary diagnosis when ALRI is not the primary reason for visit would further improve the estimates produced by this analysis.

In addition to addressing the limitations described, future iterations of this methodology could be expanded to estimates of incidence or prevalence from utilization by accounting for health care access and care-seeking behavior. Furthermore, deeper investigation of goodness-of-fit of the proportion models through out of sample estimation would provide additional validation for the methods proposed here and potentially identify additional areas for refinement of the proportion models.

Conclusions

Because of heterogeneity in coding practices between countries and limited availability of data at sufficient granularity for precise burden estimation, there are few reliable sources of influenza and RSV hospital utilization or incidence that are provided on a global scale. The application of meta-analysis for proportion positive to overall ALRI utilization is a non-traditional means of estimation that indicate promise in other applications where direct measurement of ICD diagnoses cannot provide accurate estimates of rates of disease and where surveillance data are not available. However, the method shows much uncertainty when considering influenza in older adults that could be a function of considerable heterogeneity in ALRI coding between countries (i.e., as primary vs secondary cause), and in the age profile of proportion positivity for influenza and RSV across studies. While this method is interesting because it is based on clinical administrative data that is available from many countries globally, additional refinement of admission processing methodology and inclusion of more data over ages would enable greater comparability to existing influenza and RSV utilization literature.