Background

Starting as early as 2013 [1, 2], the Zika virus (ZIKV) invaded northeast Brazil and began to spread in the Latin America and Caribbean (LAC) region. The subsequent discovery of a cluster of Guillain–Barré syndrome cases and the emergence of severe birth defects led the World Health Organization to declare the outbreak a Public Health Emergency of International Concern in early 2016. The virus has since spread to 49 countries and territories across the Americas where autochthonous transmission has been confirmed [3].

However, 2017 saw a marked decline in reported Zika cases and its severe disease manifestations [4]. This decline has been widely attributed to the build-up of immunity against ZIKV in the wider human population [5], although it remains unknown how many people have been infected. To date, there has been limited use of population-based surveys to determine the circulation and seroprevalence of ZIKV in LAC, owing to challenges in interpretation of serological tests that cross-react with other flaviviruses (e.g. dengue) [6, 7]. In addition to the reduction in Zika cases, there has also been a marked reduction in incidence of reported dengue and chikungunya cases in Brazil, meaning that the role of climatic and other factors affecting mosquito density or cross-immunity between arboviruses cannot be ruled out.

Whilst the decline in ZIKV incidence is undoubtedly a positive development, it exposes clear gaps in our understanding of its natural history and epidemiology, which limit our ability to plan for, detect and respond to future epidemics. The short duration of the epidemic and the long lead time needed to investigate comparatively rare congenital impacts has meant maternal cohort studies, in particular, may be statistically underpowered to assess relative risk and factors associated with ZIKV-related adverse infant outcomes [8]. The evaluation of the safety and efficacy of ZIKV vaccine candidates [9] is now also faced with an increasingly scarce number of sites with sufficient ZIKV incidence [10, 11].

There is an urgent need to predict which areas in LAC remain at risk of transmission in the near future and to estimate the trajectory of the epidemic. Projections can help public health policymakers plan surveillance and control activities, particularly in areas where disease persists. They can also be used by researchers, especially those in vaccine and drug development, to update sample size calculations for ongoing studies to reflect predicted incidence within the time-window of planned trials. The findings identified from a continental analysis of ZIKV in LAC may be useful should ZIKV emerge in other settings, such as quantifying the spatial patterns of spread and impact of seasonality on incidence.

Several mathematical and computational modelling approaches have been developed to forecast continental-level ZIKV transmission [5, 11,12,13,14]. The focus has largely been on estimating which areas are likely to experience epidemic growth. It is apparent from the incidence in 2017 that many countries no longer report an increasing incidence of cases. Due to either data unavailability or inaccuracies in the reported number of Zika cases in each country at the time of analysis, such approaches have either not used incidence data at all [15,16,17], they have fit models to data on other arboviruses [14] or have used selected Zika-related incidence data from particular countries [5, 12, 13, 18,19,20,21] to calibrate their models. Additionally, only a small number of studies have validated their model findings, either through comparison to serological surveys or comparing model outputs to incidence data not used within model fitting [13, 19,20,21]. Considerably more data are now available across LAC and spanning multiple arboviral transmission seasons. This provides a valuable opportunity to examine the nature of ZIKV transmission and the importance of connectivity and seasonality in assessing ZIKV persistence in specific locations throughout LAC.

In this article, we apply a dynamic spatial model of ZIKV transmission in 90 major cities across LAC and fit the model to the latest data from 35 countries. We test several models to account for human mobility to better understand the impact of human movements on the emergence of ZIKV. The model was validated using a 10-fold cross-validation comparison to the data. We use the fitted model to quantify the expected number of cases likely to be observed in 2018 and identify cities likely to remain at greatest risk.

Methods

Zika case data from LAC

The weekly number of confirmed and suspected Zika cases within each country is reported to the Pan American Health Organization. This analysis makes use of the weekly incidence of Zika cases in 35 countries, from January 2015 to August 2017 (Additional file 1: S1). State-level ZIKV incidence data was available for Brazil and Mexico [22]. Confirmed cases are typically identified through a positive, real-time reverse polymerase chain reaction blood test using ZIKV-specific RNA primers. Suspected cases are based on the presence of pruritic (itchy) maculopapular rash together with two or more symptoms, including fever, polyarthralgia (multiple joint pains), periarticular oedema (joint swelling), or conjunctival hyperaemia (eye blood vessel dilation) without secretion and itch [23, 24]. Confirmed and suspected cases were included in this analysis because ZIKV detection may have low sensitivity due to a narrow window of viraemia and many samples, particularly from the earlier phase of the epidemic, remain untested due to laboratory overload during the epidemic [24]. Inclusion of suspected cases in the analysis may reduce specificity due to the non-specific clinical manifestations of ZIKV and similar circulating arboviruses, including dengue. The reporting of ZIKV cases will vary considerably between settings and is thought to depend on the arbovirus surveillance system already in place, additional surveillance specifically established for ZIKV and other viruses, and the likelihood of an individual self-reporting with symptoms consistent with ZIKV infection.

A mathematical model of ZIKV infection

A deterministic meta-population model was used for ZIKV transmission between major cities in the LAC region. Cities with a population larger than 750,000 and large Caribbean islands were included in the model. In total, we considered 90 locations consisting of large cities and islands. We extracted population sizes using the UN estimates from 2015 [25]. Migration between cities was modelled assuming several scenarios, as follows: (1) a simplified gravity model with one estimated parameter; (2) a gravity model where the three exponential terms were estimated; (3) a radiation model; (4) a data-driven approach based on flight data; and (5) a model of local radiation and flight movements. Gravity models assume that movement between cities is highest when located near each other and when both cities are large. Radiation models assume that movement between cities are affected by the size of the population in a circle between the cities (Additional file 1: S2).

Within each city, individuals were classified by their infection status as susceptible, pre-infectious, infectious or recovered from ZIKV infection (Fig. 1). Upon infection, individuals were assumed to be pre-infectious for an average of 5 days and then infectious for a subsequent 20 days [26, 27]. Immunity was assumed to be life-long and no cross-protection against other flaviviruses was considered. We assumed that infectious individuals would not migrate between cities, owing to possible ZIKV-related symptoms, but this assumption was relaxed as part of the model sensitivity analysis. The main vector for ZIKV in LAC is thought to be Aedes aegypti, whilst Aedes albopictus and other species were thought to play a minor role in transmission [28]. The seasonality and scale of ZIKV transmission was assumed to be specific to each city and dependent across cities, using a vectorial capacity modelling approach. To estimate vectorial capacity, we modelled the probability that ZIKV may transmit for each day of the year, and fed this time-varying probability into the mathematical model (Additional file 1: S3) [2931]. We estimated the time-varying reproduction number (R0,i(t)), defined as the average number of secondary infections that result from one infected person within a totally susceptible population, which varies in time due to the seasonality in vectorial capacity within each city. The seasonality curves were summarised by reporting the average number of days per year where R0,i(t) was greater than 1, and the mean value of R0,i(t) for a typical year.

Fig. 1
figure 1

Schematic of the meta-population model structure that focuses on the northern part of South America and the Caribbean islands. Each city consists of individuals who are assumed to be susceptible (S), pre-infectious (E), infectious (I) or recovered (R) from ZIKV infection. Movement of pre-infectious individuals between cities is modelled assuming different population flows, where a gravity model is illustrated. Movements to cities outside of the plotted area are not illustrated

Due to the difficulties in ZIKV disease surveillance [23], the weekly incidence of reported cases was unlikely to reflect the true incidence in each setting and we did not fit the model to weekly incidence data. We instead used summary statistics in the model fitting procedure, focussing on the timing of the peak in incidence and whether the annual incidence was above 1 case per 100,000 in each country. The timing of the peak in outbreaks has been previously shown to be a useful summary statistic for epidemic dynamics [32, 33], and preliminary analysis illustrated that annual incidence had a good discriminatory power for the estimating parameters of the model. Although surveillance quality varies between settings, the timing of the reported peak within countries is less sensitive to systematic error. A sensitivity analysis confirmed that only a small number of observations were susceptible to large changes in surveillance prior to April 2016 and after January 2017, making the reported timing of the peak robust to changes in surveillance (Additional file 1: S4).

The model estimate of new infections within each city was aggregated to the country or state level (for Brazil and Mexico) and scaled to ZIKV cases, enabling comparisons with the available data. The maximal value of R0(t) and the best-fitting migration model (including the maximal leaving rate from cities) were estimated in the model fitting procedure. Parameters were estimated using approximate Bayesian computation (ABC)–sequential Monte Carlo methods [34]. ABC methods use summary statistics to estimate model parameters from qualitative epidemic characteristics. The sequential procedure of ABC–sequential Monte Carlo means that each model of human mobility could be treated as a parameter. The prior and posterior distributions of selecting each model was used to estimate Bayes factors to determine the evidence in favour of one model over another. Multiple parameter sets with equivalent fit were produced during the model fitting, and were used to provide the mean and 95% credible intervals (CI) of parameter estimates, numbers infected between 2015 and 2017, timing of the peak in the epidemic, and projections of the numbers of ZIKV cases in 2018. The distribution of the timing of the peak was compared to the data using Bayesian posterior checks. The values correspond to probability that the data take a value less than or equal to the cumulative distribution function of the model, and values between 0.01 and 0.99 can be interpreted as evidence that the data and model estimate come from the same distribution. For each country the time-series of reported cases were compared to the normalised model incidence. We compare the total number of reported cases to the estimated cumulative median (and 95% CI) number of infections to estimate the country-specific probability of reporting a case per infection.

To validate the parameter estimates and model output a cross-validation approach was used. The data was split into 10 randomly allocated groups by country, each group was sequentially excluded from the parameter estimation procedure and the peak timing of the out-of-sample parameter estimates were compared to the data. The 95% CI of the cross-validated estimates were compared to the within-sample peak estimates. For the 2018 projections, we use parameter values estimated from the data to project forward the number of cases, accounting for the estimated reporting rate and uncertainty in model output. The 95% prediction interval had a variance equal to the sum of the variance of the model prediction and the variance of the expected value assuming a Poisson distribution. Comparison of 2018 predictions to data were not possible as data from affected countries have not been made publicly available (as of 2 May 2018).

Although there have been numerous reports of sexual transmission of ZIKV, especially within returning travellers [35, 36], the evidence for sexual transmission of ZIKV as an important route of transmission is debatable. Several modelling studies suggest that sexual transmission may be an important transmission route [37, 38], whilst other models have been used to argue that it is not [39, 40]. Counotte et al. [41] provide a living systematic review of the evidence for sexual transmission of ZIKV and conclude that modelling studies indicate that the reproduction number for sexual transmission of ZIKV is most likely to be below 1.00. To better understand the importance of sexual transmission, surveillance that distinguishes between vector and sexual transmission is required and is currently lacking. Herein, we exclude sexual transmission as a modelled route of transmission. Due to current unexplained variability [42], we do not project the expected numbers of neonatal malformations or neurological disorders, such as microcephaly, associated with ZIKV infection.

Results

A gravity model, which assumes migration scales with large populations that are closely located to one another, provided the best fit for the data (Table 1). We identified substantial spatial heterogeneity in transmission (country summaries are provided in Table 2); the average estimated value of R0 was 1.81 (95% CI 1.74–1.87) and the average number of days per year where R0(t) > 1 was 253 days (95% CI 250–256 days). The average number of days where R0(t) > 1 varied from 116 days days (Costa Rica) to almost year-round transmission (several cities within Brazil (Belem & Salvador), Colombia (Medellin & Cali), and Aruba and Curacao Islands). The mean value of R0(t) was above 2.0 in many Caribbean islands (Aruba, Bahamas, Barbados, Curacao, Guadeloupe) and was low within Argentinian cities, Cost Rica and French Guiana. The mean estimate of R0(t) was often higher within cities and islands that also reported a longer window of transmission with R0(t) > 1. However, several cities (including Boa Vista, Aracaju and Natal in Brazil) were estimated to have maximal R0(t) values above 2.5 with a relatively small window of transmission within the year.

Table 1 Summary of the evidence for each population movement model tested on the Zika data. The prior and posterior probabilities were estimated using the approximate Bayesian computation – sequential Monte Carlo procedure (see Additional file 1 for further details)
Table 2 Reported and estimated statistics for ZIKV in Latin America and the Caribbean. Reported timing of the peak of ZIKV cases; the model estimate of the peak in ZIKV cases; the estimated number of days each year where R0 > 1; the average value of R0 throughout the year, the estimated reporting rate of ZIKV cases and the estimated number of ZIKV cases in 2018

Despite the emergence of the ZIKV epidemic in early 2015 in north-eastern Brazil, the incidence of cases remained relatively low in 2015 (Fig. 2d and Additional file 1: S6 for plots of Brazilian States and Additional file 1: S7 for Mexican States). All countries that reported cases in 2015 (Brazil, Colombia, Guatemala, Honduras, Paraguay, Suriname, Cuba, El Salvador, Mexico and Venezuela) continued to report cases in 2016 and 2017, except for Cuba. For most countries, the largest number of cases were reported in 2016. Belize, Colombia, French Guiana, Honduras, Suriname and several Caribbean islands reported more than 2 cases per 1000 population in 2016. For 28 of the 35 countries in the analysis, the peak in reported disease incidence occurred in 2016. Five countries reported a peak in 2017 and Cuba reported a peak in July 2015 (Fig. 2c).

Fig. 2
figure 2

Reported Zika incidence (cases per 1000) within Latin America for (a) 2016 and (b) 2017. c Timing of peak incidence. d Total number of cases reported for each country for each calendar year (on a log 10 scale), according to the case classifications submitted by each country

The estimated incidence of ZIKV infections (median and 95% CI) were compared to the reported data to estimate the country-specific reporting rate. The average probability of an infection being reported as a case was 3.9% (95% CI 2.3–8.1%) and this rate was lower within countries that only reported confirmed cases (4 countries) than those who reported both confirmed and suspected cases (22 countries) (Table 2). Costa Rica, French Guiana and the US Virgin Islands were estimated to have a reporting rate above 20%. A comparison of the time-series of reported cases was compared to the model estimates of incidence (Fig. 3). For all countries, an epidemic was likely to have begun by December 2015 to March 2016 (otherwise known as the first phase). The relative scale of the epidemic in the first phase compared to late 2016 (the second phase) varied by country. For many countries, the epidemic was estimated to be larger during the first phase (such as Argentina, Bolivia, Ecuador, Paraguay). For simulations in Antigua, Barbuda, Mexico and Venezuela, the epidemic during the second phase had a higher incidence than the first phase. A small number of countries (Belize, Honduras, El Salvador and most Caribbean Islands) were estimated to have experienced only one epidemic season. The difference in the timing of the peak between the data and model was measured using Bayesian posterior checks where there was a non-significant difference between the model and data for 11 countries (highlighted in dark red/dark blue), and the distribution was over-dispersed (Fig. 4a, b). There was a significant correlation (p = 0.035) between the reported and estimated peak in the country epidemics (Fig. 4c). The locations where the model has a good fit to the data are focussed within Brazilian states that reported a large number of zika cases, and eastern Caribbean islands. The estimated peak in cross-validated simulations were correlated (p < 0.001) with the model fit, although the 95% CI were wider (Fig. 4d).

Fig. 3
figure 3

Comparisons of the time-series data for all Latin American countries (red) and normalised model output of the number of infections (blue). The countries are ordered by the type of surveillance data available: a Confirmed and suspected, b Confirmed, and c Suspected cases

Fig. 4
figure 4

Comparisons of observed and model fit for ZIKV peak incidence in the 31 countries in Latin America. a Bayesian posterior checks that the estimated peak timing are consistent with the data; values between 0.01 and 0.99 indicate that the model and data are from the same distribution. b Quantile plot of the Bayesian posterior probabilities. c Comparison of the observed timing of the peak and estimated timing of the peak (with 95% CI). d Comparison of the estimated timing of the peak and the cross-validated estimates of peak timing (with 95% CI on the horizontal and vertical)

Projections for 2018 suggest a low incidence of Zika cases in most cities considered in the analysis (Fig. 5 and Table 2). When accounting for the country-specific case reporting rate, the median number of cases was typically less than 20 in most settings. However, French Guiana was predicted to have between 148 and 1773 cases, owing to a larger pool of susceptible individuals than in other settings. Populated states within Brazil, such as Santa Carina and São Paulo, were projected to have more than 5 cases, and cases were predicted to occur within Medellin (Colombia) and San Jose (Costa Rica). The majority of Caribbean countries were predicted to have few cases in 2018. For all cities, the incidence of cases in 2018 will be lower than 2017. In Colombia, the projected time-series of cases for specific cities illustrate a negligible incidence in 2018, but Medellin was expected to experience the end of the epidemic in 2018 (Fig. 5c). The projected low incidence of ZIKV was consistent in simulations where infected individuals were also assumed to move between cities (Additional file 1: S8).

Fig. 5
figure 5

The estimated probability of Zika cases in each country (and states in Brazil and Mexico). a Probability of more than 10 cases. b Median estimate of Zika cases in 2018. c The estimated time series of Zika cases within the five major cities of Colombia

Discussion

The spread of ZIKV across the LAC region in 2015–2017 has resulted in considerable disease burden, particularly in the children of mothers infected during pregnancy. Both the reported incidence of cases and modelling results from this study suggest that the transmission of ZIKV had continued until herd immunity was reached, despite major efforts to limit its spread through vector control. Whilst the reported and projected reduction in ZIKV cases is undoubtedly good news for affected communities, it is only because substantial numbers of individuals have already been infected. Therefore, it remains vital to maintain surveillance for congenital and developmental abnormalities and provide long-term care for affected people and families [43].

The aim of this analysis was to assess if cities in LAC were likely to experience ZIKV cases in 2018 to support resource planning and trials. Our modelling results suggest a very low incidence in 2018. This analysis supports the findings of previous mathematical models of ZIKV [5, 11, 13, 14]. In addition, our study provides estimates of incidence and risk for specific cities, estimates of case reporting rates, incorporates parameter uncertainty, includes out-of-sample validation of the model estimates and uses more data than other modelling studies as we incorporate ZIKV case reports alongside ecological data to determine city-specific epidemic trajectories and seasonality curves.

We fitted the model to the timing of the peak in ZIKV cases and then compare the time series of expected cases to reported cases and found a good fit in many countries. We assumed that large cities both drive the spread of Zika and are responsible for the majority of cases. Considering that Ae. aegypti is a largely urban-dwelling mosquito and that arboviral diseases have been observed to be spread by movement of infected humans [44, 45], this assumption is likely to be valid. However, whilst we predict the outbreak to be mainly over in these large cities, smaller more remote cities and peri-urban areas may still have susceptible individuals and experience cases. Should additional sub-national data on the timing of the peak become available, the model fitting and projections can easily be updated. Case reporting rates indicate a lower rate within countries that report only confirmed cases, and the rates within Brazil, El Salvador, Martinique, Puerto Rico, and Suriname align well with other estimates measured using alternative methods [21, 46, 47]. Whilst the fit to the data was good in many countries, there were a number of cases where the timing of the peak in the epidemic did not fit the data, as shown by the Bayesian posterior checks. These values were over-dispersed, indicating that there was a large under- and over-estimation in the peak timing (see Colombia and Peru, for example). To overcome these poor fits, more accurate approximations of population movements between locations within LAC are required, as well as, ideally, surveillance data that are less likely to have substantial changes in quality during prolonged periods. A recent comparison of microcephaly reported through birth registrations and confirmed cases of ZIKV in Mexico suggested substantial under-reporting in ZIKV cases, even within pregnant women [48]. Should under-reporting be this extensive, it will impact the reported peaks in ZIKV that were used to estimate model parameters. Modelling only large cities and Caribbean islands may also be an over-simplification of infectious disease spread across a large geographical area. This was a necessary compromise between model complexity, parsimony and computational time. Further model comparison exercises would help identify advantages and disadvantages between different modelling approaches [11].

Despite the short-comings in the available data, we present the most up-to-date and robust predictions of Zika incidence in 2018. As the projected incidence is consistently low across all model runs, this finding is quite robust to the variability accounted for in the model. Validation of these findings are necessary through multi-site population representative seroprevalence surveys across LAC to monitor seroconversion to ZIKV such as in Netto et al. [19]. Reporting of cases within LAC has reduced markedly since the downgrading of ZIKV from a Public Health Emergency of International Concern to an Ongoing Public Health Challenge (in November 2017) [49]. Consequently, it remains difficult to compare these projections to incidence data for 2018.

This research has highlighted that, within LAC, the spread of ZIKV was better represented by a gravity model than flight movements. This may seem surprising as flight data are cited as a source of emerging infections such as ZIKV [50]. However, cars and public transportation are used for most journeys, and the movement of people impacts the spatial spread of vector-borne diseases [43, 51]. Perhaps for highly transmissible infectious diseases, movements facilitated by flights are sufficient for predicting introduction of a pathogen in a new population, but this analysis suggests that triggering of a ZIKV outbreak may require more frequent exposure than air travel. The migration patterns assumed within each model are quite different in LAC (Additional file 1: S2), suggesting that models which have not tested the relative fit of each and use one alone could be prone to errors in estimated spread of ZIKV. In comparison to mobility modelling in North America, Europe and Africa, the mobility patterns in LAC are not well quantified and require further study.

Major questions on the epidemiology of ZIKV remain unanswered [7]. Whilst the impact of sexual transmission on ZIKV emergence is likely to be minimal [39, 52], it may increase the magnitude of an epidemic [40] and this would be difficult to test using the available surveillance data. There are large differences in the incidence of congenital Zika syndrome across LAC [43], with an epicentre reported within northeast Brazil, that remain largely unexplained. In particular, the analysis here suggests increased incidence of ZIKV throughout Brazil in 2016, but the expected increase in congenital malformations within newborns were not observed [53]. This and other modelling studies suggest that ZIKV has been widespread, and the finding of geographically variable rates of congenital defects is discordant with the more consistent rates of ZIKV infection predicted by our model. Ferguson et al. [5] developed a model to project when a sufficient number of susceptibles would become available to permit a resurgence of ZIKV, estimating a 25–30 year period. We did not make this type of projection as serological surveys [19, 54] published since suggest considerable heterogeneity in exposure within cities and there are variable birth rates across LAC. Both of these factors will add considerable uncertainty to long-term projections for resurgence of ZIKV and is consequently outside of the scope of this analysis.

We have assumed that the time varying transmission rate of ZIKV is a function of environmental and vector suitability that has not been reduced by effective vector control. The impact of vector control has been largely unassessed or, where it has been assessed, it has been found to be ineffective [55, 56]. Consequently, our findings are likely to be unaffected by the impact of vector control. Should effective wide-scale interventions be developed, the model can be used to assess the impact of proposed interventions. The mathematical model was deterministic in nature and, especially for projections, it may under-estimate the variability in the number of cases. Additionally, we do not include the impact of inter-annual variation in Ae. aegypti vectorial capacity, such as the 2015–2016 El Nino climate phenomenon, which has previously been shown to be positively associated with an increased incidence in 2016 [18]. Instead, we show that the peak incidence in 2016 was likely due to a low incidence of infection in 2015, that then resulted in optimal transmission in 2016, which led to depletion of the susceptible population, thus limiting incidence in 2017 and 2018. If inter-annual variation in ZIKV transmission were incorporated into our model, it is likely that our incidence estimates for 2016 would increase, and the predicted incidence in subsequent years would further decrease.

Conclusions

ZIKV has spread widely across LAC, affecting all cities during 2015–2017 and leading to high population immunity against further infection, thereby limiting capacity for sustained ZIKV transmission. The seasonality in ZIKV transmission affected the rate of infection, but due to high connectivity between cities, this had little impact on the eventual depletion of susceptible populations. Looking forward, incidence is expected to be low in 2018. This provides optimistic information for affected communities, but limits our ability to use prospective studies to better characterise the epidemiology of ZIKV. The continental-wide analysis illustrates much commonality between settings, such as the relative annual incidence, and the connectivity across LAC, but questions remain regarding the interpretation of the varied data for ZIKV. Ultimately, representative seroprevalence surveys will be most useful to understanding past spread and future risk of ZIKV epidemics in LAC.