Background

Respiratory syncytial virus (RSV) has long been recognized as a substantial public health threat [1] with annual epidemics exacting an enormous toll on vulnerable populations and health care delivery systems. RSV is associated with substantial morbidity in children in both the hospitalized and outpatient setting [25]. In addition to the toll on the health of the population, this disease imposes a large burden on the health care system in terms of human and material resources. Although no RSV vaccine exists, infants and children with risk factors for severe RSV infection (eg, lung disease or prematurity) can receive monthly doses of palivizumab, a humanized murine anti-RSV monoclonal antibody, during the RSV season. Palivizumab treatment is extremely costly; the cost-effectiveness of this therapy could be improved if treatment is given only during times of high RSV activity. Treatment of vulnerable individuals also improves overall health in the population.

Prediction of seasonal epidemic characteristics including times of high activity and total size would support efficient management of resources and delivery of palivizumab. Health care facilities could forecast requirements for beds, staffing, testing, treatment, and other resources needed to care for sick children. For greatest effectiveness, these predictions should be made early in the RSV season; the authors, including public health practitioners and physicians, hold the expert opinion that these predictions would be useful within the first month of the observed start of the RSV seasonal epidemic.

In some regions, total epidemic size generally follows a biennial cycle from year to year with smaller epidemic seasons followed by larger epidemic seasons [6]. This cycle is currently used to gauge upcoming RSV seasonal epidemic size based on total size of the previous epidemic season. The Centers for Disease Control and Prevention (CDC) researchers using the National Respiratory and Enteric Virus Surveillance System found that the prior epidemic season's data were a relatively imprecise predictor of the epidemic season onset in a given community and that timing of the RSV epidemic season may vary substantially in the same year among communities in close proximity [7]. One goal of this research was to explore year-to-year variation in epidemic seasons using local data. The biennial variation in our seasonal epidemic data was seen in the early exponential growth rates (slope of the cumulative case curves, Figure 1) as well as total epidemic size. We explored the relationship between exponential growth of RSV epidemics and the seasonal epidemic characteristics of total epidemic size, days to peak, and epidemic length to assess predictions made early in the epidemic season.

Figure 1
figure 1

Weekly observed RSV cases. Weekly observed RSV cases for 7 epidemic years. Data collected from Primary Children's Medical Center in Salt Lake City from July 2001 through June 2008.

Knowledge about viral transmission characteristics and the data derived from surveillance systems can be used to inform novel approaches for estimating characteristics of RSV epidemics through the application of methods rooted in epidemiological models of infectious disease transmission [8, 9]. These methods are being increasingly applied to emerging threats like SARS [1012] and pandemic influenza, but their application to routine epidemics of common respiratory viruses like seasonal influenza and RSV has only begun to be explored. Weber et al. [8] model RSV transmission to examine how climate and social factors influence transmission in a population. They consider compartmental models using Susceptible-Infected-Recovered-Susceptible (SIRS) with additions to include latency and stages of susceptibility. They find no single best model for RSV epidemics; many "competing" models fit the observed data well. We further explored the variation in seasonal epidemics using compartmental models. The variation in exponential growth could potentially be related to variation in transmission rates, epidemic start dates, or proportions susceptible as well as a host of other factors.

The second goal of this research was to evaluate the ability of a compartmental model based on epidemiologic principles to fit observed data from a series of epidemics and examine the extent to which seasonal variations in epidemics can be accounted for by variation in specific model parameters.

For these analyses, we used daily laboratory data from the major pediatric health care facility in Utah where routine viral testing is a fixture of standard clinical care for children presenting to regional emergency departments. The utility of the data from these surveillance systems for relating final epidemic size and modeling the epidemic curve has not been fully evaluated. We investigated the estimation of seasonal epidemic characteristics using regression of exponential growth across seven epidemic seasons. We also modified the model of Weber et al. to explore the model fits and estimates of epidemic size using variation of parameters within a Susceptible-Exposed-Infected-Infected/Detected-Recovered (SEIDR) model.

Methods

Data

Primary Children's Medical Center (PCMC) is a 250-bed children's hospital that serves both as a community pediatric hospital for Salt Lake County, Utah (2008 population 1 million [13]), and as a tertiary referral center for five states in the Intermountain West (Utah, Idaho, Wyoming, Nevada, and Montana, total 2008 population 8.36 million [14]). Eighty percent of pediatric hospital admissions occurring in Salt Lake County and 73% occurring in the state of Utah are at PCMC.

During the study period, July 2001 through June 2008, direct respiratory sampling (mainly saline-assisted nasopharyngeal aspiration) for respiratory viral testing was performed for about 70% of children evaluated in the PCMC emergency department for respiratory complaints (unpublished data) and was required for all hospitalized children with respiratory symptoms (eg, upper or lower respiratory tract infection, bronchiolitis, asthma, or bacterial or viral pneumonia). In addition, respiratory viral testing was recommended for all febrile infants one to 90 days of age. Test results were used to inform patient cohorting and isolation procedures and to assist with medical management. All samples were initially tested by direct fluorescent antibody staining (DFA). DFA testing was performed three to five times daily depending on the season, with a mean turnaround time of four hours. For all DFA negative specimens, multiplex polymerase chain reaction (PCR) or viral culture was performed.

The data included in our analyses were all positive test results from the above sampling protocols from any of the testing methods during the study period. The practice of testing and test methods did not change appreciably during the study period (unpublished data on percentage of children tested and methods used). The data were used as daily counts by age group, under two and over two years old.

The RSV epidemic year was defined to be from July 1 of one year through June 30 of the following year. This time period was chosen to place the beginning date close to the middle of the inter-epidemic period, approximately six months from the average historical peak of the seasonal epidemic.

This study was reviewed by the Institutional Review Boards of Intermountain Healthcare and the University of Utah and determined by both organizations to be exempt.

Regression analysis

Regression analysis was used to explore the relationship between the initial exponential growth rate and the epidemic season characteristics of size, days to peak, and length using the seven epidemic seasons of RSV data from PCMC. The exponential growth rate, λt0, t1, for time interval t0 to t1 was calculated as , where denotes the cumulative number of cases at time t i , i = 0,1. The exponential growth rate was calculated at four weeks to assess regression predictions made early in the season. For comparison, exponential growth rate was also calculated at weeks one through six. The total epidemic size was the sum of cases over the epidemic year, including sporadic inter-epidemic cases. An observable seasonal epidemic start date of t0 was defined as the start of the first week of the epidemic year with at least five confirmed RSV cases. This was the definition used by the hospital epidemiologists at PCMC to declare the start of RSV outbreaks during the study period. The term seasonal epidemic refers to the period from the epidemic start date until the epidemic end date, defined as the end of the last week of the epidemic year with at least five confirmed RSV cases. The number of days until the peak for the epidemic seasons was calculated as the midpoint day of the largest seven-day moving average window minus the epidemic season start day. The length of the epidemic season was calculated as the epidemic season end day minus the epidemic season start day.

Relationships between the initial exponential growth rate and seasonal epidemic characteristics were described using the Pearson correlation coefficient and assessed using standard regression statistics. The fits of the regression models were assessed using the percent error of the model fits from the observed values. To combine across seasons, the absolute values of the percent errors were averaged providing the mean absolute percent error for the model.

SEIDR model

We modeled the observed RSV cases using an extension of the SIR model that included individuals (c for children and a for adults) that were susceptible (Sc and Sa), exposed (Ec and Ea), infectious(Ic and Ia), infectious and subsequently detected children (D), and recovered combined across children and adults (R). This SEIDR model was applied to a series of seven epidemic years. The population was split into children less than two years old (children) and those older than two (adults). It has been shown that the initial RSV infection is the most severe and occurs in almost every child in their first two years of life. Transmission is modeled as a function of time using a cosine function to mirror the cyclic nature of epidemics [8]. There is an offset to this cycle (α), which we estimate along with transmission parameter (β). Births and deaths (μ) are accounted for in the susceptible class only. Achievement of age two is accounted for in all age-separated classes (η). Assumptions of simple compartmental models that we made were as presented in Koopman [15].

Our SEIDR transmission model (Figure 2) was defined using the following system of non-linear differential equations:

Figure 2
figure 2

Schematic representation of the flow through model compartments.

Here β was the transmission parameter, L the latency period, f the under-two detection fraction, and γ the recovery parameter. All parameters are presented in the next subsection with descriptions, ranges, and reference values from the literature. Solution to the set of differential equations is addressed below.

Model parameters

To fit the SEIDR model to the empiric epidemic data, three parameters-latency period, birth and death rate, and recovery period-were specified based on the literature. Three parameters associated with variation across epidemic years were estimated: 1) the temporal offset of the epidemic cycle (α), 2) detection fraction (f), and 3) transmission parameter (β). Different models were specified to explore the effect of these three parameters. All combinations of these were considered: models with one parameter allowed to vary across seasons, models with two parameters allowed to vary across seasons, and a model with all parameters allowed to vary across seasons.

Each parameter is described below.

Birth and death rate (μ)

The number of daily births and deaths were entered in the model based on census data for Salt Lake County.

Aging rate (η)

It was assumed that 1/365th of the children in each age-separated compartment reached the age of two each day.

Detection fraction (f)

The detection fraction parameter reflected the fraction of the RSV epidemic in children under two years old that was captured in our data set. The detection fraction parameter was estimated as a constant parameter across years and also allowed to vary by epidemic year.

Latency period (L)

The latency period is the time between exposure resulting in transmission and time of infectiousness. The latency period was specified using the median value from Crowcroft [16], five days.

Transmission parameter (β)

The transmission parameter determined the rate of transmission from contacts between infectious and susceptible individuals. We assumed a homogeneous, uniformly mixing population. The transmission parameter was estimated as a constant parameter across years and also allowed to vary by epidemic year.

Recovery parameter (γ)

The recovery parameter specifies the time from infectiousness to recovery. This was specified as 0.1, which translates to a ten-day recovery period, following the work by Weber [8] and in the range of one to 21 reported by Hall [17].

Epidemic cycle offset (α)

The final model parameter was the offset of the annual epidemic cycle. A regular annual cycle is thought to vary due to weather and climate conditions. The SEIDR model captures the entire epidemic, detected and not detected. Prior to observing RSV cases, the epidemic cycle started within the undetected population. This offset parameter was estimated as a constant parameter across years and also allowed to vary by epidemic year.

Model fitting

The nonlinear equations were solved using the lsoda function from the odesolve library [18] in R statistical software [19]. The parameters were estimated using a grid search. Two fitting statistics were used. The estimates were the values that minimized the square root of the sum of standardized squared errors (RSE) and/or the square root of the sum of squared standardized errors (RMSE). The RSE was calculated as the square root of the sum of the squared errors between the observed daily cases and the fitted model, divided by the fitted value, , where was the fitted value on day i. The RMSE was calculated as the square root of the sum of the squared weighted errors between the observed daily cases and the fitted model; the weight being the fitted value, . The denominator from these measures adjusted for the magnitude of the epidemic curve to avoid fitting the model mainly to the peak, where differences could over-inflate the fitting statistic and under-value differences during the early and late stages of the epidemic. The RMSE reduces the effect of fit to the peak more than does the RSE.

A grid search was used starting with an initial wide range of values for f, β, and α. The search grid was repeated with successively narrowing ranges to minimize the RSE. The grid started with the range of reasonable values, 0 - 1 for β and f and one to 200 days for α. The range was reduced and resolution increased iteratively around minimal RSE and RMSE values. The minimum grid resolution was 0.0001 for β, 0.01 for f, and one day for α. The RSEs and RMSEs from the grid search results were used to select the best parameter estimates within each model type (eg, one model type had only transmission rates that varied by epidemic year).

The model with all three parameters allowed to vary by epidemic year was fit as a saturated model to provide a benchmark for RSE and RMSE, along with the Schwarz Criteria described below, and percent error in estimating epidemic size when evaluating more parsimonious models in which only one of the 3 parameters was allowed to vary by epidemic year. Multiple measures were used to compare the models, in part because the Schwarz criteria assumed the residuals were independent and identically distributed, which was not the case; they are, in fact, autocorrelated.

The Schwarz Information Criterion [20] were calculated based on the weighted least squares method used for parameter estimation. There were n = 2555 data points, 365 days of case data for each of seven years, and k, the number of parameters estimated was 28 in the full model (four parameters for seven years) and 16 in each other model (two parameters for seven years and two parameters overall). The Schwarz Criteria were calculated as: where M represents either the RSE or RMSE fit statistic [21]. The absolute values of the percent error in estimating total epidemic size were summed across seasons for comparison of models.

Results

Descriptive Analysis

The number of children with test-positive RSV infection ranged from 682 cases in 2004-5 to 1704 cases in 2007-8 (Table 1). The median size of the annual epidemic was 1113 cases. Overall, 98% of cases were detected between the months of October and April. Larger epidemics alternated with smaller epidemics. The amplitude of this biennial cycle was approximately 600 cases.

Table 1 Observed RSV epidemic size, start date, days to peak, duration, and 4-week exponential growth.

The total number of children (under 18 years of age) tested per epidemic year ranged from approximately 3000 to 7000, with numbers of tests increasing over time. Overall, 21% percent of these were positive for RSV, varying according to the biennial cycle. Of children tested, 81% were less than three years old and 95% were less than 11 years old. Of children with positive tests, 92% were less than three years old and 99% were less than 11 years old. Of the children tested, 70% were from Salt Lake County and 77% of children with positive tests were from Salt Lake County.

Regression analyses

Exponential growth rates calculated from cases accumulated for four weeks from the observed epidemic season start ranged from 0.034 to 0.081 (Table 1) across the epidemic seasons. The effective reproductive numbers ranged from 1.27 to 1.49 using a serial interval of seven days [16]. In regression analyses (Table 2), the four-week exponential growth rate exhibited a substantial positive correlation with epidemic size (r = 0.69, p = 0.08), and was negatively correlated with start day (r = -0.43, p-value = 0.33), days to peak (r = -0.44, p-value = 0.32), and length of the epidemic (r = -0.58, p-value = 0.17). The regression models provided estimates of epidemic season characteristics that were on average within 16% of observed epidemic season size, 11% of observed days to peak, and 8% of observed epidemic length. Using exponential growth rates calculated from weeks one through six provided, in general, increasing correlation (Table 3).

Table 2 Results of regression analysis using exponential growth to predict epidemic size, days to peak, and length.
Table 3 Correlations between exponential growth rate (calculated at weeks one through six) with observed RSV epidemic size, start date, days to peak, and duration.

SEIDR model

The saturated SEIDR model was fit to seven epidemic years of observed RSV data with epidemic year-specific RSE values that ranged from 13 to 21, RMSE values that ranged from 0.40 to 0.77 and percent error of total cases that ranged from 1% to 16%. The fit statistics for the models with either transmission parameter or detection fraction estimated as a constant across epidemic year did not differ substantially from those from the saturated model (Table 4). The minimum RSE model with detection fraction held constant across epidemic years had the smallest % error, smallest Schwarz RSE Criterion, and had other fit statistics nearly equal to the saturated model. The minimum RMSE models were, in general, fitting to the tails of the epidemic and resulted in large errors in estimating epidemic size.

Table 4 Fit statistics for models with different sets of parameters allowed to vary across epidemic year.

The pattern of variation in estimates of offset from all models matched the biennial cycle variation in total epidemic size across epidemic years (Figure 3). The variation in estimates of the transmission parameter and detection fraction did not necessarily match this cycle for all epidemic years. The parameter estimates for the transmission parameter were negatively correlated with total epidemic size.

Figure 3
figure 3

SEIDR model parameter estimates. SEIDR model parameter estimates for three models for each of the 7 seasons. The parameters are transmission parameter (top right), epidemic offset (bottom left), and detection fraction (bottom right). The estimates from the model with 1) all parameters varying by epidemic year are open triangles, 2) transmission parameter constant across epidemic year are plus signs, and 3) detection fraction constant across epidemic year are x's. The data were collected by Primary Children's Medical Center, Salt Lake City, UT from July 2001 through June 2008.

Discussion

The SEIDR model we presented made assumptions that simplified the reality of RSV transmission. We have identified three limitations to the SEIDR modeling effort. First, the population age separation does not take full advantage of differences in interaction among a non-homogenous population. Second, related to this, the parameter values were not allowed to vary within the population. Transmission, for instance, could be age-dependent (due, eg, to hand-washing habits). Third, the grid search method of parameter estimation did not provide estimated standard errors for parameter estimates, which limited the ability to compare models and seasons.

Despite these limitations, this SEIDR model was useful; it modeled the observed RSV cases from PCMC as part of larger unobserved epidemic seasons and provided a framework for investigating the model parameters. The parameters offset and transmission may not be completely identifiable within this framework but more likely represent combined other forces unmeasured here.

Our future work includes addressing these limitations and expanding the complexity of the models. RSV is carried by all age groups but is, in general, only a concern for infants. Thus, an age-stratified model, possibly with different mixing mechanisms, would more closely resemble the true transmission. The biennial cycle of large, early, and short seasonal epidemics followed by smaller, later, and longer seasonal epidemics the next year observed in Utah is similar to other published studies of seasonal RSV epidemics in temperate climates. The theories for this phenomenon include the existence and switching of two RSV disease strains, climate patterns, and waning immunity after infection [6, 8, 9, 2224]. These and other theories could be investigated in more complex models. It is understood that immunity after infection of RSV is partial, at best. This incomplete immunity and severity of re-infections could be incorporated into more complex models [8, 25]. Finally, future modeling efforts will involve approaches that include measures of uncertainty in parameter estimates, including Bayesian methods [26, 27] and likelihood and other methods [28, 29].

Conclusions

The first main conclusion of this work was that exponential growth was somewhat empirically related to seasonal epidemic characteristics. The variations in epidemic seasons from data collected at PCMC during the seven years of the study can be partially explained by the variation in exponential growth, especially characteristics of epidemic size, peak day, and length of the epidemic. The seven years of data were not sufficient to make conclusive statements on the nature of the relationships. These early findings based on just seven data points can be built upon to explore early prediction of the upcoming RSV epidemic season. These early predictions could be used by hospitals to budget and allocate resources and to coordinate the timing of palivizumab treatment. They can be used by public health to advise clinicians and the public and also to help identify non-standard epidemics earlier in the season. For example, health departments might take specific actions if the number of observed cases during the season greatly exceeds early predictions.

The second main conclusion of this work was that variation of the transmission parameter and the start of the epidemic (offset) over epidemic years could explain the variation in seasonal epidemic size. The three model parameters allowed to vary by epidemic year (detection fraction, transmission parameter, and offset) provided possible rationale for the variation in seasonal epidemic size. The model with detection fraction held constant across epidemic year fits the observed data well with the fewest parameters. The parameter estimates from this model also match the expected biennial pattern of the epidemic years. From the models considered in this study, this one performs best overall (Figure 4).

Figure 4
figure 4

Observed RSV cases and model predicted epidemic curves. Observed RSV cases (grey dots) collected by Primary Children's Medical Center in Salt Lake City from July 2001 through June 2008, plotted for each season along with fitted SEIDR models.