FormalPara Key Summary Points

Shingles is a burdensome viral disease whose main symptoms are rash, which lasts for 1 to 2 weeks, and pain that may last for several weeks or months.

The previous literature was reviewed to estimate the incidence of shingles worldwide in people > 50 years of age and the impact of several variables on this incidence.

From 61 reports conducted in 29 countries, we found that incidence increases with age, is lower in males compared to females, increases over time and is lower in Europe and North America compared to Asia and Oceania.

These estimates may help guide public health immunization policies for herpes zoster (HZ) prevention.

Digital Features

This article is published with digital features, including a graphical plain language summary, to facilitate understanding of the article. To view digital features for this article go to


The varicella-zoster virus (VZV) causes two distinct diseases with different clinical presentations: varicella (chickenpox) and shingles (herpes zoster, HZ). Following varicella (usually in childhood), VZV remains dormant in nerve ganglia. Latent VZV can reactivate years later as HZ, as a result of (1) an age-related decline in immunity and/or (2) an immunodeficiency or immunosuppression caused by disease or therapy. The pain associated with HZ has been described as “throbbing, sharp, shooting and stabbing” [1]. Individuals with HZ may also experience altered sensitivity to touch, pain provoked by trivial stimuli and unbearable itching [2]. In the United States of America (US), 99.5% of adults aged > 40 years are infected with VZV and are thus at risk of developing shingles, while approximately 30% of the population will develop HZ during their lifetime [3]. It is estimated that HZ incidence has increased over time, combined with aging of population demographics, resulting in an increasing burden to the health care system [4, 5].

A previous systematic literature review (SLR) conducted in 2014 provided a comprehensive overview of HZ as a significant global health burden [6] and demonstrated substantial heterogeneity across studies. While age, gender, ethnicity and comorbidities are well-known risk factors that may impact incident rates for HZ [7], other factors such as study design (e.g., prospective versus retrospective), case ascertainment (International Classification of Diseases [ICD] versus prescribed therapeutic codes), study population (e.g., age distribution, year(s) of study, setting [(country or region]) and sample size may also have an impact on the results of a study [8,9,10]. Consequently, generalizability and comparability of study results likely require consideration of other factors that potentially influence outcomes. Selecting inputs for both epidemiological and health economic models may prove challenging because of the numerous choices of data sources available from literature. Consequently, health technology assessment bodies encourage “evidence synthesis for decision making” [11]. Meta-analysis methods are frequently used as a means of summarizing results from multiple data sources, to obtain single point estimates (with confidence intervals for sensitivity analyses). These statistical methods can be employed to identify the possible causes of heterogeneity and subsequently adjust for the explained variability [12]. In this study, we performed a SLR to provide a comprehensive evidence base on the worldwide incidence of HZ, focusing on individuals ≥ 50 years old. The results of the SLR are presented elsewhere [8]. In this manuscript, we explore the relationship between study covariates and the incidence of HZ. We focus on two generalizations of meta-analysis: random-effect meta-analysis and meta-regression [12].


This article is based on previously conducted studies and does not contain any new studies with human participants or animals performed by any of the authors. We performed a SLR following the guidelines specified in the Cochrane Handbook [13] and Preferred Reporting Items for Systematic Literature Reviews and Meta-Analyses (PRISMA) [13,14,15]. Details of the SLR are provided elsewhere [8]. The methodological quality of each peer-reviewed article was assessed as follows: studies were deemed to be of poor quality if there was not a valid case definition for the diagnosis of HZ and/or if the denominator to calculate HZ incidence was not clearly defined [8].

The HZ incidence in published studies was expressed as either the number of HZ cases per 1000 population (i.e., cumulative incidence) or as number of HZ cases per 1000 person-years (i.e., incidence rate). To allow all studies to be included in the analysis, we assumed that cumulative incidence was assessed over a 1-year study period. We tested the validity of this assumption by exploring whether there was any effect of incidence type (i.e., cumulative incidence versus incidence rate) on incidence estimates.

When the sample size was not provided, the sample size was estimated based on the precision estimates and confidence interval values provided, depending on the analytic method used by the authors.

Some studies within the same country reported data (1) for several populations [16,17,18,19] (e.g., general and immunocompetent) and (2) using different case definitions. For those studies, we used data from the general population using ICD codes where available. Some authors presented study data in separate publications [20, 21]. In those situations, the manuscript with the most complete dataset was kept. Some incidence data were presented in graphical format only [22,23,24] and were excluded since estimation of numerical data was not feasible, e.g., overlapping lines and/or confidence intervals not provided.

The primary objective of the study was to estimate incidence of HZ and explore the relationship and impact of various clinical and methodological factors including age, gender, year of study data and continent. Secondary objectives included exploring the additional predictive and explanatory power of alternative models that account for (individually or collectively) additional covariates and non-linear components for age.

The primary covariates in the random effects meta-analysis model were extracted from the SLR as follows:

  • Age: 50–54, 55–59, 60–64, 65–69, 70–74, 75–79, 80–84, ≥ 85 years.

  • Gender: female (reference group), male, pooled (i.e., not defined, pooled male/female)

  • Continent: Asia, Europe (reference group), North America or Oceania

  • Year of study data: < 1998, 1998–2002, 2003–2007, 2008–2012, ≥ 2013

In meta-regression analysis, age, centered at 50 years (corresponding to the intercept), and year of study data, centered at year 2000, were treated as linear predictors.

Random-effects meta-analysis, to account for additional ‘between-study’ heterogeneity, was applied. Between-study variance was estimated from the data and incorporated into the study weights while pooling the estimates across studies for a subpopulation (e.g., Asia). All random-effects meta-analyses were conducted using the rma function in metafor package, in R 3.5.1, using restricted-maximum likelihood estimation for the random-effects component \({\tau }^{2}\) [25].

The methodology suggested by van Houwelingen et al. was used to model the incidence using meta-regression [12]. Incidence (i) was modeled for each study using a logistic function:


Random effects meta-regressions were performed to add to the meta-analytic model by adjusting for the effect of one or more covariates. In the current study, a multilevel model was used to account for the fact that many studies presented subgroup data (e.g., results for age subcategories, or separate data for males and females or pooled). For each covariate, a test of its additional partial explanatory power was conducted by assessing Qmod, the variance in the logit incidence explained by the covariate. All multivariate meta-regressions were performed using the function in metafor package, in R 3.5.1. Model fit for analyses was investigated by consideration of improvement on the Akaike information criterion (AIC) [26]. Outliers for individual incidence estimates (e.g., for specific age and/or gender incidence values) within studies were identified using the Cook’s distance statistic (> upper hinge = Q3 + 3 × IQR [Q3, third quartile, IQR, interquartile range of Cook’s distances]) in the multivariate meta-regression model [27].

Two multivariate regression models were developed: (1) a model that was pre-specified (using “primary covariates”) and (2) a data-driven model. The data-driven model was derived including “secondary covariates” (see below), with additional non-linear components for age (squared and cubic) examined. The data-driven model was derived using a stepwise approach based on differences in AIC. The following “secondary covariates” were explored in the data-driven model:

  • case detection method: general practitioner surveillance (reference group), health insurance, healthcare database, pharma register, sentinel network or survey,

  • case definition: medical record-based (e.g., ICD code) or self-reported (reference group),

  • study design: retrospective passive surveillance (reference group), retrospective active surveillance, unmatched retrospective cohort or prospective active surveillance or prospective passive surveillance,

  • incidence type: cumulative incidence/1000 persons (reference group) or incidence rate/1000 person-years,

  • latitude: distance from the equator,

  • patient type: outpatients or in- and out-patients (reference group).


A total of 69 study publications were included in the SLR [8]. Supplementary text Fig. S1 provides an overview of the records included in the meta-regression analysis. Sixty-one records were included, of which 25 were from Europe, 20 from North America, 11 from Asia and 5 from Oceania. Study methodology was assessed as poor in 26 records (i.e., not a valid case definition, and/or denominator not clearly defined) and good in 35 records. The case detection method was based on general practitioner surveillance (n = 7 records), health insurance (n = 25), healthcare database (n = 18), pharma register (n = 1), sentinel network (n = 4) or survey (n = 6). The case definition was based on medical records (n = 56) and self-report (n = 5). The study design was retrospective passive surveillance (n = 44), retrospective active surveillance (n = 5), unmatched retrospective cohort (n = 3), prospective active surveillance (n = 8) and prospective passive surveillance study (n = 1). The incidence type was cumulative incidence (n = 36) or incidence rate (n = 25) and the patient type was outpatient (n = 27) or in- and out-patient (n = 34). Details of the studies and results of the quality assessment are provided elsewhere [8].

Figure 1 presents a forest plot with individual study effects for the age category 70–74 years as an example of the heterogeneity within age groups and the patterns of HZ data reported across studies. Many results have extremely small sampling errors due to the large sample sizes. Variation across studies, as measured by tau (τ), was 0.305.

Fig. 1
figure 1

Forest plot of pooled incidences (95% CIs): meta-analyses for age 70–74 years. CI confidence interval, I2 percentage of variance due to true heterogeneity, tau between-study (data) standard deviation, Qp the p value in the Cochran’s Q test of homogeneity, k number of studies being pooled

Meta-analysis results across univariate subgroups of age classes, gender, continent and year of study data are presented in Fig. 2. The results show that incidence increases with age, is lower in males compared to females and tended to be higher in Asia and Oceania compared to Europe and North America, and there also was a trend for an increase from the period up to 2002 to after 2002. For the four primary covariates of interest (age, gender, continent and year-of-study data), meta-analyses of aggregated data found that age was the strongest predictor of variation across strata; the meta-analytic incidence for the youngest individuals (50–54; 5.15/1000) was less than half that for the oldest (≥ 85; 11.27/1000). The total variability across age categories when age is ignored is τ2 = 0.174, which is reduced to an average of approximately 0.10 within the individual age categories (a reduction of around 40%). The incidence appeared to be lower in studies with data collected in the years 2008–2012 and ≥ 2013 compared to data collected in 2003–2007. However, the data from the five studies with data collected in or after 2013 included three studies from Europe and one from the US (i.e., continents where the incidence was lowest, see Fig. S3). Similarly, it appears that the incidence for gender is higher when data were not broken down by gender (i.e., pooled category, see Fig. 2). However, it was observed that older studies in North America presented data broken down by gender whereas more recent studies presented data with gender pooled (see supplemental Fig. S4).

Fig. 2
figure 2

Forest plot of pooled incidences (95% CIs): meta-analyses for age, gender, continent and year of study data. CI confidence interval

Regression estimates for the pre-specified model univariable and multivariable regressions are presented in Tables 1 and S1. Figure 3 shows predicted incidences from the pre-specified model of all four main covariates for combinations of geographic area and gender (male and female) across age. Heterogeneity within continent was high for all four continents, but for the Asian analysis especially (τ = 0.529). Note that estimated incidences for Asia ranged from 3.43 to 19.46 per 1000 person-years (see Fig. S5). While heterogeneity—in both relative and absolute terms—was high, there were no individual studies that appeared as substantial outliers. That said, there were individual incidence estimates (e.g., for specific age and gender incidence values) within studies that were identified as outliers using the Cook’s distance statistic as follows for the regions (see supplemental Fig. S6): Europe (Italy [28], Sweden [29]); North America (US [5, 30]); Asia (China [31,32,33], Japan [10, 34, 35], South Korea [36]); Oceania (Australia [37], New Zealand [38]). The outliers tended to be low incidence values for Europe and North America incidence outliers. The three studies from China [31,32,33] had three of the five lowest incidence values across all publications (the other two being Di Legami et al. [28] and Krishnarajah et al. [30]), and the study conducted by Kim et al. in South Korea has notably high incidences [36]. Without these studies, the Asia results are approximately as heterogeneous as those seen in any other continent. In fact, sensitivity analysis excluding low-quality studies had the greatest impact on the parameter estimate for Continent: Asia, whereby the incidence estimates were higher after exclusion of low-quality studies (see supplementary Table S2).

Table 1 Estimates of coefficients (effects) of different factors in models of primary covariates separately (univariable regression) and in combination (multivariable regression): pre-specified meta-regression model
Fig. 3
figure 3

Estimated incidence based on the pre-specified model for the year 2000 and 2020 by region, gender and age

Additional covariates were explored to determine their potential explanatory power for HZ incidence. Figure S7 presents the meta-analysis results across univariate subgroups of classes for case detection, case definition, study design, incidence type and patient type. Although there were apparent trends for differences between classes on a univariate level for these additional covariates, when included in the multivariate model differences in case detection, case definition, study design, incidence type and patient type were all not significant as were latitude and the interaction term for age by year of study data. However, the effect of age × gender was extremely significant (P < 0.001). The results suggest that the difference in incidence rates between males and females is greatest in younger age (e.g., 50–54), whereas in the oldest age groups the incidence rates are similar in males and females (e.g., ≥ 80) (see Fig. 4).

Fig. 4
figure 4

Predicted incidence based on the data-driven model for the year 2020 by region, gender and age

Exploration of all covariates and the changes in model fit (based on differences in AIC) leads to the proposal of a final data-driven model that adds components for squared and cubic terms for age as well as an age × gender interaction term (see Table 2). The AIC of the final data-driven model was 17080.8 compared with 25998.2 for the pre-specified model. Table S3 and Fig. 4 provide estimates of predicted incidence values for males and females by age and continent, using the final data-driven model.

Table 2 Data-driven model of age, gender, continent, year of study data, age squared, age cubed and age-gender interaction


This analysis provides the most recent synthesis of the worldwide incidence of HZ in the general population aged ≥ 50 years. For the pre-specified analysis, we focused on four study characteristics, i.e., incidence rate by continent, age, gender and year of study data collection. This was to ensure that the model was robust and in line with broad statistical recommendations [39] with respect to the number of parameters estimated and the number of data points available for analysis.

Age was the most important factor influencing incidence rate. As HZ incidence rates increase with age and population demographics are changing over time, the study period and age distribution of the study population should be kept in mind when interpreting data. Interestingly, the data-driven model suggested that the incidence was higher in females and that the difference in incidence rates was greatest between males and females in younger age groups whereas this difference was less in older adults. There was a trend for incidence rates to be lower in Europe followed by North America, Asia and Oceania in this analysis of data synthesized from many studies with much heterogeneity in study methodology and outcomes. However, the same trend was observed in the placebo groups of two large randomized clinical trials of a HZ vaccine, where the case definition and study methodology were consistent across all study participants and continents [40].

A trend was observed for an increase in incidence over time. Data from several independent studies demonstrated a steady increase in HZ incidence over time (see Fig. 4 from van Oorschot et al. [8]). There was no significant effect of incidence type (i.e., cumulative incidence versus incidence rate) on incidence rate in the meta-regression model supporting our hypothesis that studies assessing cumulative incidence had a study duration of 1 year. Most of the studies were passive surveillance studies that utilized a retrospective design. These studies used either electronic medical databases or health insurance databases and utilized diagnostic codes to identify HZ cases. Consequently, the statistical power to detect significant differences for the variables case detection, case definition, study design and patient type was likely to be low.

Several studies with outliers were identified, in particular for the Asian region. It was notable that high incidence rates were observed by Kim et al. in South Korea [36]. The authors noted this observation also suggesting that the high incidence values could be explained by the medical service system (i.e., high insurance coverage and low medical costs) and the high public disease awareness of herpes zoster. Three studies from China [31,32,33] had three of the five lowest incidence rates of all 61 records. These studies were all conducted as self-report, house-to-house surveys, followed by face-to-face interviews with those having reported HZ. Self-report studies are subject to many biases and are therefore generally considered low quality.

The estimated incidence values from this study may be used in cost-effectiveness (CE) models for countries where there are no specific incidence data available or where data are outdated (see Figs. 3, 4 and Table S3). In epidemiological and economic models, some data may be considered “transferable”, i.e., data from other sources outside that country may be used. To ensure the reliability of the model, the analyst must determine whether data are transferable or not. For example, the Guidance Document: Global Pharmacoeconomic Model Adaption Strategies noted that where country-specific epidemiologic data are not available, random-effect meta-analysis may be used to derive estimates for inputs for CE models [41].

Table S3 and Fig. 4 provide the predicted incidence estimates for age, gender and continent for the year 2020. These data could be used as inputs in modeling exercises of the public health impact and cost-effectiveness of vaccination. Using these incidence rates from the data-driven model in conjunction with population estimates by continent, we estimated that 14.9 million HZ cases occurred in 2020 worldwide in individuals aged ≥ 50 [42]. Due to the global trend of aging populations worldwide, assuming no increase in age-specific incidence rates from 2020 and in the absence of vaccination, the total number of HZ cases in individuals aged ≥ 50 would be expected to increase to 17.0 million in 2025 and 19.1 million cases by 2030. These estimates may be conservative, as they do not account for under-reporting. Using incidence rates estimated in the placebo group of the ZOE-50 study, where individuals were followed up prospectively to monitor HZ cases, > 20 million HZ cases worldwide occurred in 2020 (see Table S4). Of course, there are also limitations to using data from a clinical trial, but it does suggest that our estimate of 14.9 million cases worldwide in 2020 may be conservative. Further HZ vaccine recommendations and increases in vaccination uptake could lead to a decrease of HZ incidence over time.

Several limitations of this review are worth noting in the interpretation of the overall findings. We assumed that studies assessing cumulative incidence had a study duration of 1 year. Although this assumption appears valid and was deemed to be appropriate in the multivariate model, it is not possible to ascertain whether it was an appropriate assumption or not. Second, in most studies the proportion of subjects with comorbidities and immunocompromising conditions was unknown. When reporting study results, it is important whenever possible to provide granular data, in tabular format, e.g., in the supplementary texts of manuscripts. Granular data of incidence broken down by influential covariates not only help the reader to interpret the data and assess potential biases but allow for further synthesis of the data in meta-regression analysis. The fitted models can be useful for prediction. However, the interpretation of the parameters estimated is complex and should be done with care. For example, life-expectancy is increasing over time, and gender and age are correlated (females live longer than males). One strength in terms of prediction is that while the COVID-19 pandemic has an impact on the incidence rates of many infectious diseases, as HZ is caused by the reactivation of the varicella zoster virus (i.e., VZV remains dormant in nerve cells), even individuals who are isolating are at risk of HZ.


A plain language summary of the context and main findings of this article is presented below the abstract. In this synthesis of HZ incidence rates worldwide, in the general population over the age of 50 years, age was illustrated to be the most important factor predicting HZ incidence. The world’s population is continuing to age, in particular the oldest-old, where herpes zoster rates are highest and are continuing to grow [43]. Effective vaccines to prevent HZ and the associated disease burden are available and, if employed, could contribute to the reduction of the future global burden of HZ [4, 44]. The continent-specific incidence estimates may help guide public health immunization policies for HZ prevention.