Excess death estimates from multiverse analysis in 2009–2021

Excess death estimates have great value in public health, but they can be sensitive to analytical choices. Here we propose a multiverse analysis approach that considers all possible different time periods for defining the reference baseline and a range of 1 to 4 years for the projected time period for which excess deaths are calculated. We used data from the Human Mortality Database on 33 countries with detailed age-stratified death information on an annual basis during the period 2009–2021. The use of different time periods for reference baseline led to large variability in the absolute magnitude of the exact excess death estimates. However, the relative ranking of different countries compared to others for specific years remained largely unaltered. The relative ranking of different years for the specific country was also largely independent of baseline. Averaging across all possible analyses, distinct time patterns were discerned across different countries. Countries had declines between 2009 and 2019, but the steepness of the decline varied markedly. There were also large differences across countries on whether the COVID-19 pandemic years 2020–2021 resulted in an increase of excess deaths and by how much. Consideration of longer projected time windows resulted in substantial shrinking of the excess deaths in many, but not all countries. Multiverse analysis of excess deaths over long periods of interest can offer an approach that better accounts for the uncertainty in estimating expected mortality patterns, comparative mortality trends across different countries, and the nature of observed mortality peaks. Supplementary Information The online version contains supplementary material available at 10.1007/s10654-023-00998-2.


Introduction
Calculation of excess deaths is considered to be a very useful tool for estimating patterns of mortality changes over time in different countries and the impact of major events, such as pandemics [1][2][3]. Excess deaths are meant to capture the composite sum of perturbations in disease incidence and other factors, including social, health care, lifestyle and natural catastrophes that may shape population fatalities in a given year. However, excess death calculations can lead to controversy with different teams of researchers generating markedly different estimates for the same country and year(s) [4][5][6]. The reason is that the calculation of excess deaths requires making analytical choices for which there is no consensus. Specifically, one needs to select a reference baseline period (a time window in the past that will be used for extrapolating how many deaths would be expected in subsequent years) and a projected period (the time window for which an excess death estimate is made by comparing the observed versus expected number of deaths based on the past experience). Moreover, one should decide whether there are any time patterns and what is the form of these time patterns (e.g. whether overall mortality should be declining or increasing over time and, if so, in what form, e.g. linear or spline fit). Empirical work and simulations [4][5][6][7][8][9][10] have shown that these choices can make a substantial difference in the obtained excess death estimates.
Analyses may range from a few dozen to several million different options (e.g. in selecting covariate sets in regressions) [15,17]. Different terminology has been used for such approaches that generalize the concept of sensitivity analysis. Commonly used terms are "multiverse analysis" [11][12][13][14], "vibration of effects" [16][17][18] and "multi-analyst analysis" [19,20] (when multiple researchers are each asked to select independently their preferred analysis). Here, we propose a multiverse approach for excess deaths. Instead of making unavoidably arbitrary choices in selecting reference baseline and projected periods, we consider all possible reference baseline periods and projected periods in adjacent year time windows during a lengthy period of interest. Instead of prespecifying time patterns, this multiverse approach allows the data to demonstrate what might be the time patterns and how sensitive the results are to different analytical choices. All possible choices are considered for reference baseline periods (extending as far back as 2009). The multiverse approach also allows us to understand to what extent excess death estimates may shrink when longer projected periods are considered, in the range of 1-4 years. If perturbations lead to excess deaths increases due to the demise of individuals with limited life expectancy [21], then excess death peaks that are seen with short projected periods (e.g. 1 year) will diminish or even disappear when longer projected periods are considered. People who died at some point due to the perturbation would have died very soon anyhow. Conversely, if perturbations result in mortality peaks due to deaths of people who had long life expectancy, extending the projected period window will not have the same attenuating impact.
We applied this approach to 33 high-income countries studied before [6] and which have the most reliable data for mortality according to age-stratified groups for the extended period 2009-2021. Our aim here is to propose the multiverse method, illustrate its application, and see how it can offer insights about evolving relative patterns of mortality over many years in each country and how these patterns compare across countries. The multiverse approach focuses on relative comparisons rather than on obtaining absolute estimates of excess deaths during a specific given pandemic period. However, we have also used it to generate absolute estimates of excess deaths during the pandemic period, by considering different types of down weighting of older reference years as opposed to newer reference years.

Variability of excess death estimates according to reference baselines
The absolute value of excess death estimates can vary substantially depending on the selection of reference years used for baseline. We considered all 66 possible time windows of whole consecutive calendar years (1-11 years long) in the years 2009-2019 as representing baseline values. Table 1 shows the average, standard deviation, minimum, maximum and range for estimates of relative excess deaths (expressed as percentage of expected deaths) for the two-year pandemic period 2020-2021 for each of the 33 countries. The average value is highly correlated with either the maximum or minimum value but not with standard deviation or range (correlation coefficients of 0.96, 0.95, − 0.22 and − 0.15, respectively). Table 1 also shows the average excess death estimates across 66 possible time windows when different down weighting is applied for older years in the reference range. Figure S1 shows that the average multiverse percentage relative excess death values are highly correlated to the corresponding values calculated with the previously used single reference period of years 2017-19 [6]. The multiverse values with the dw3 weighting scheme are very similar to those with the 2017-19 baseline. The multiverse values averaged with equal weights for all 66 baselines are lower by 4.6 percentage points.

Stability of relative ranking for the pandemic years' excess deaths across 33 countries
The estimates of relative excess deaths (as percentage of expected deaths) can be used to compare different countries in a given time period. Despite large variabilities in the absolute estimates, the relative ranking of the 33 countries for a given period of interest was largely unperturbed, regardless of what reference baseline years were chosen. Figure 1 shows the ranking of relative excess death estimates (as percentage of expected deaths) for the pandemic years 2020-2021 in all 66 analyses with different reference baseline windows. The USA had the highest estimates of relative excess deaths among all 33 countries in 50 of 66 analyses, the second highest in 15 analyses and the fourth highest in 1 analysis. Conversely, South Korea had the lowest estimates in 59 of 66 analyses, the second to lowest in 5, the third to lowest in 1, and the sixth to lowest in 1.
Eastern European and Balkan countries closely followed the USA in the top excess death ranks consistently. Scandinavian countries, Australia, and New Zealand consistently were placed among the lowest excess death ranks next to South Korea. Other Western European countries typically occupied middle ranks. Figures S2A, S2B and S2C show that the distribution of country ranks for projected periods of 1 year, 3 years and 4 years are similar to that shown in Fig. 1 for 2020-2021; summing over more years does blur the ranking of middle-ranked countries. Figure 2 maps the emerging time patterns for mortality in each of the analyzed countries for the average of the 66 analyses using different reference baseline periods and the range of maximum and minimum estimates. Although the range of estimates of relative mortality for each given year is large, the rank of different years for a particular country is generally the same for the 66 different sets of reference years ( Figure S3). Time patterns across different countries show large variability as well. Differences exist both in the presence and magnitude/steepness of time  For 23 countries, the most common occurrence is on the diagonal. For the 6 countries between Austria and Germany, the average rank is between 16.2 and 19.9 and the rank order is ambiguous. For 21 countries the most common rank occurrence is on the diagonal Norway, Denmark and Australia). Figures S4A, S4B, and S4C map the time patterns shown in Figure S2 for periods of 1, 3 and 4 years, respectively and show how longer projected periods reduce fluctuations. Table S2 and Tables S1A, B, C, D present data on the worst years. Table S1A shows that the worst single year with the highest mortality was 2021 for 10 countries (Slovakia, Poland, United States, Latvia, Lithuania, Hungary, Croatia, Czechia, Chile and Greece), 2020 was the highest for 4 countries (United Kingdom, Italy, Spain and Belgium), 2010 was worst for Luxembourg and 2009 was worst for all other 18 countries. When considering 2-year periods, in 25 of the 33 countries, 2009 + 2010 were the worst pair of years (Table S2). In 9 of the 33 countries (Chile, Czechia, Greece, Hungary, Italy, Lithuania, Poland, Slovakia and United States) the pandemic years 2020 + 2021 were the worst, and in all of them the years 2009 + 2010 were the second worst. (Table S1A & Table 2). In 16 countries, the pandemic years were not among the three worst years, which were always years between 2009 and 2016. When considering 3 or 4 year periods, in 31 of the 33 countries 2009-2011 and 2009-2012, were the worst, respectively. Only in Poland or the United States were period 2019-2021 and 2018-2021, which include the pandemic years, the worst, respectively, (data from Tables S1C & S1D). Table 2 shows the effect of changing the width of the projected period of interest from 1-4 years for the most recent years (2021 alone, 2020 alone, 2020-2021, 2019-2021, 2018-2021). As shown, there is substantial attenuation of the relative excess mortality between the single worse pandemic With increasing projected periods, both the mean and standard deviation of the relative excess mortality declined substantially. For 2021, 2020-2021, 2019-2021, 2018-2021, the mean was 2.6%, 1.5%, − 1.0%, and − 1.7%, respectively. The standard deviation was 9.3%, 7.2%, 5.2%, and 4.1%, respectively.

Discussion
Our application of a multiverse approach to excess death data shows that consideration of different periods for reference baseline resulted in major variability in the absolute magnitude of the exact excess death estimates, but it did not affect substantially the relative ranking of different countries compared to others for specific years. Moreover, there have been distinct time patterns across different countries during 2009-2021. Countries differed markedly on whether they had a substantial decrease over time or not during 2009-2021, on whether they had a peak during the 2020-2021 pandemic years, and, if so, how high, and in the relative contribution of 2020 and of 2021 to this peak. With longer time windows for the projected period of interest (1 to 4 years), the range of excess deaths across different countries in the pandemic years and the 2 years preceding the pandemic shrank substantially and excess death estimates became less variable across countries. This suggests that it would be inappropriate to dwell too much on small or modest differences between countries, as these are highly model-dependent. However, peaks did not disappear and for the USA in particular, excess deaths remained prominent even with long projected periods of interest.
In the multiverse literature from other fields, some analytical choices may be considered more meaningful or relevant than others. When researchers are asked to select independently what analysis mode they feel is most sensible, not all analytical choices are selected [19,20] and some types of choices may seem to make more sense. This may apply also for excess death calculations. E.g. it may seem not so appropriate to use a reference window of 2009 alone for projecting mortality in 2021. The baselines created by each of the 66 different windows may have less or more relevance to the current situation. In principle, baselines using more recent years may be more informative for the current time. Common choices include using the last 3 years or the last 5 years.
The obvious heterogeneity of time patterns across different countries suggests that selection of specific time trends in modeling excess deaths may be a situation where one size does not fit all. Selection of specific anticipated time trend patterns may markedly affect the results in ways that are not verifiable for their appropriateness. E.g. selecting a model that anticipates a marked decrease in mortality over time makes it difficult for a country not to have excess deaths even if it does very well in a given year-but still falls short of an anticipated stellar improvement over time. It should be acknowledged that age-adjusted mortality rates usually have decreased over time in most countries in the last several decades. Standard methods for forecasting future mortality rates and life expectancy such as the Lee-Carter forecasts [22,23] and other methods that use time series approaches end up using some linear trends in the modeling. However, it has been observed [24] that changes in mortality rates may differ markedly in different years even in the same country/location and they may also differ across different age and gender groups in the same country and same year. In the presence of major perturbation events such as pandemics or wars and natural disasters, such modeling estimates will deviate from observed deaths in proportion to the magnitude of the catastrophe. More importantly, there is no guarantee that mortality rates should continue declining, let alone markedly decline, over time with medical and other progress, even in the absence of major negative perturbation events [6]. For advanced economies with aging populations, accumulating frailty and disease burden and restrictions or ceilings to progress and available resources, the typical trends for decreased mortality that were documented in the previous decades may not be sustainable for the future. Furthermore, countries that have already reached very high life expectancies may have less room for improvement than others that are lagging behind. The multiverse approach, when applied to multiple countries, allows a comparative assessment of the trajectory of different countries. This may be preferable and it may offer some genuine insights about which countries do well (short-term and long-term) and which do poorly-in comparison.
In this regard, some stark differences stand out for both long-term trends and for the pandemic years. The USA consistently performed very poorly with both stagnation in mortality during the pre-pandemic years and a sharp increase during the pandemic. Eastern European and Balkan countries showed sharp decreases during the pre-pandemic years and a sharp increase during the pandemic. Most western European countries had sharp decreasing trends with modest disruption during the pandemic. All Scandinavian countries, Australia, New Zealand, and South Korea have had largely unperturbed declining mortality patterns. The markedly different patterns may reflect a combination of social, health care, and pandemic factors. The USA has an ailing health system with approximately 30 million uninsured people [25], large inequalities [26], many people with poor access to care [27], and major ongoing non-infectious epidemics, including obesity [28], opioid abuse and overdose [29], and violent deaths [30]. More detailed data are needed to understand which of the policies and actions during the pandemic or the pre-existing problems were more important for shaping the poor performance in 2020 and beyond. Eastern European and Balkan countries have limited resources for their healthcare systems and lower social welfare than other European countries [31] and some countries like Greece have long suffered from austerity [32]. The best performers are excelling in social welfare and health system functionality and resources, even if there are differences across countries. Exceptions may occur within circumscribed populations and adverse settings even in countries with overall excellent trajectories. For example, the dysfunctional consequences of privatization in nursing homes in countries like Sweden or Canada [33,34] translated to peaks of excess deaths during circumscribed periods in the long-term care settings [35].
Consideration of longer projected period time windows diminished substantially the range of excess deaths in some countries, but not in others. Overall, when longer periods are considered, differences between most countries become less pronounced. However, larger windows had minimal effect in the USA, and this may reflect that the problems that lead to unfavorable mortality patterns in the USA reflect chronic dysfunctions that might have been accentuated by the pandemic but pre-existed and which affect also people with long life expectancy. Poverty, marginalization, homelessness, inequalities, drug overdoses, and violence affect indeed young and middle-aged populations. We have shown previously that the USA has had 40% of excess deaths contributed by the < 65 age stratum, a higher percentage than all other highly developed countries [6]. Conversely, in many other countries, large time windows for the projected period shrank substantially the excess death fluctuations. This suggests that in these countries excess deaths temporarily affect mostly people with relatively limited life expectancy [21].
Europe, while not a country, has historically aggregated excess death data in the EuroMOMO data base (https:// www. eurom omo. eu/) to include data from 21 countries in Europe plus Israel [36]. If one were to aggregate data for the 19 of these 21 countries for which we have data (excluding Cyprus & Ireland), the fictional country composite that includes Austria, Belgium, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Israel, Italy, Luxembourg, Netherlands, Norway, Portugal, Slovenia, Sweden, Switzerland, and United Kingdom has a similar population to the USA (410 million versus 330 million) and the relative excess death of this European composite is only 2.46% for the pandemic years (2020 + 2021), which is in stark difference to the USA figures. It is also less than two-year totals for 2009 + 2010 and 2010 + 2011, with values of 5.57% and 3.30% caused apparently by elevated influenza pandemics [37]. One may examine also the excess deaths according to the EuroMOMO model, but the model has been criticized for low baseline values which may lead to overestimation of excess mortality in some countries [38].
Some limitations should be acknowledged. First, there are some additional sources of analytical flexibility that can be considered in excess death calculations. These include the choice of age bins for age adjustment, and the use of additional adjustments for modeling the population profile over time. For example, socioeconomic profile variables would be very useful to incorporate [39], but these are not routinely available and standardized across many countries. Such additional adjustments would add additional variation with more multiverse options, but probably would not invalidate the major patterns that we observed. Second, we only modeled data from 33 countries that are the ones with the most reliable data. Extrapolations to other countries would be precarious, given the unreliability of the mortality information. Time patterns observed in the 33 countries may not necessarily apply to the remaining countries around the globe and local circumstances may make a difference. Third, we considered yearly interval increments so as to capture all 4 seasons in the unit of time, but in theory, the multiverse process can be applied for smaller units of time as well. Fourth, data on population and population structure in each country on a yearly basis are typically inferred from census data collected on more sparse timing, therefore they carry some uncertainty. Fifth, the pandemic impact and its consequences as well as the consequences of aggressive measures that were taken has continued more prominently in 2022 in some countries than others [40]. It would be interesting to see whether differences across countries get further attenuated and/or some countries continue to stand out prominently when longer pandemic and post-pandemic periods (e.g. 2020-2022 and 2020-2023) are considered. Preliminarily results based on the first 8 months of 2022, it seems that several countries with death deficit in 2020-2021 (e.g., Australia, New Zealand and South Korea), had considerable excess deaths in 2022, while some others continued to have limited deaths (e.g. Sweden) and some hard-hit countries like USA and Greece continued to do very poorly [6,40]. Sixth, we did not consider in the multiverse analyses any superimposed modeling of time trends, specifically because we wanted to allow the data to show whatever trends of patterns existed. If one were to add in the modeling also all the possible functions that might be used to capture time trends (e.g. linear, higher power, splines, and so forth), the analytical options would multiply far more. This explains why mortality forecasting is so difficult and uncertain, and why there is no consensus on the best method on how to do it [41]. Mortality forecasting becomes even more difficult and uncertain when perturbation events such as pandemics occur. The multiverse approach helps understand why obtaining accurate absolute estimates of excess deaths is precarious. However, our approach using down weighted older reference years may be considered, if absolute estimates are desirable (as opposed to relative performance across years and compared with other countries). One may also down weigh previous reference years based on other features, e.g. severe flu seasons or major heat waves.
In conclusion, a multiverse approach to excess death calculations may offer bird's eye views on mortality patterns in comparative assessments of a large number of countries. These patterns may be more reliably informative than efforts to obtain isolated single-country estimates of excess deaths, which are subject to substantial uncertainty even in countries with the best-collected data. It may be best to avoid pre-specifying time patterns and to allow the data to show what time patterns may be emerging. Finally, observed time patterns may not necessarily continue into the future and multiverse analyses can be updated accordingly for additional years moving forward.

Data
All data comes from the Human Mortality Database (HMD) [42][43][44]. The data for the most recent years comes from the Short-Term Mortality Fluctuation file stmf.csv downloaded from https:// www. morta lity. org/ File/ GetDo cument/ Public/ STMF/ Outpu ts/ stmf. csv. (last updated 6 February 2023). The data for earlier years extending back to 2009 was downloaded as the HMD archive file (see Supplementary Links to Data). We considered data from 2009-2021 so as to analyze 13 years including also the years of the 2009-2020 pandemic. We focused on the 33 high-income countries with highly reliable death registration systems, excluding Bulgaria as done in previous work [6]. The most recent data in the file stmf.csv is per week and uses five standard agebands: 0-14, 15-64, 65-74, 75-84 and Over 85; we sum the data over the weeks assigned to each year as done before [6]. The older data in the HMD archive (downloaded on May 25, 2022) uses 1-year age bands for annual all-cause deaths and annual populations are available for all 33 countries. We sum these 1-year bands to give the same five standard age bands used in stmf.csv.

Excess death calculations
In order to be able to compare different countries and different time periods we focus on relative excess deaths expressed as the number of excess deaths divided by the number of expected deaths. Specifically, the relative excess death p% is the actual all-cause death count, D, minus the estimated death count, E, expressed as a percentage of the estimated death count or p% = (D-E)/E.

Systematic Variation of Assumptions for Multiverse Analyses
We consider all possible reference baseline periods and projected periods in consecutive year time windows during a lengthy period of interest. Instead of prespecifying time patterns, this multiverse approach allows the data to demonstrate how sensitive the results are to different analytical choices. Moreover, by considering the average of the multiverse results, one can reveal time patterns of decreases or increases of mortality during the covered time period.
We consider all possible reference baseline spans of consecutive years in the period 2009-2019. This gives 11 + 10 + ... + 1 = 66 different spans of length 1 to 11 years for the 66 reference baselines. For each reference period, we average the mortality of each of the five age bands. These averaged mortalities are then used to get the expected deaths in any year by multiplying the mortality of a particular age band by the population of that age band and then summing the estimated values over the five age bands to give the total estimated death count.
Projected time periods are also considered in all possible options of length 1 to 4 years, again considering consecutive calendar years. The mortality in the pandemic years 2020 and 2021 is never considered when calculating excess death. Similarly, when calculating excess deaths for projected periods 2018 + 2019 + 2020 + 2021 (or 2019 + 2020 + 2021), the years 2018-2019 (or 2019) are not considered as baselines. For analyses with different assumptions, we present the maximum, minimum, median and IQR or mean and standard deviation, as appropriate).

Analyses with down weighing for older years
Estimates of excess deaths during the pandemic years that are averaged against all possible reference periods may be misleading in absolute magnitude, since very early years such as 2009 may not be as relevant as more recent years like 2019. Therefore, we also rerun the analyses for all 66 possible combinations of reference years with various weights: (a) with weights decreasing linearly by 10% for each year before 2019 (i.e. 100% weight for 2019, 90% weight for 2018, …, 10% weight for 2010, 0% weight for 2009); (b) with weights decreasing by 5% for each year before 2019 (i.e. 100% weight for 2019, 95% weight for 2018, …, 55% for 2010, 50% for 2009); (c) with weights decreasing by half for each year (i.e. 100% weight for 2019, 50% weight for 2018, 25% weight for 2017, 12.5% weight for 2016, 6.25% weight for 2015, 3.125% weight for 2014, 1.563% for 2013, 0.781% for 2012, 0.390% for 2011, 0.195% for 2010, 0.098% for 2009). In all 3 weighting schemes, the weighted average of the 66 options is obtained, weighting each option by the average weight of the reference years that it contains. For example, the 2018-2019 reference years option is weighed by a factor of 1.9/2 = 0.95, 1.95/2 = 0.975, and 1.50/2 = 0.75, for each of the three weighting schemes above, respectively. The 2017-2019 reference years option is weighed by a factor of 2.7/3 = 0.9, 2.85/3 = 0.95, and 1.75/3 = 0.58, respectively. Data availability All data are in the manuscript, tables, and supplementary tables and in the publicly available databases listed in Supplementary Links to Data and deposited online at https:// zenodo. org/ record/ 70957 53.

Conflict of interest The authors declars that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.