1 Introduction

First identified in Wuhan, China, in December 2019, the Coronavirus disease 2019 (COVID-19) caused by the SARS-CoV‑2 virus developed into a worldwide pandemic during the spring of 2020 (Velavan and Meyer 2020). One of the challenges for scientists has been to evaluate its impact in terms of life loss across different countries and regions of the world. A possible way to do this is through directly looking at the number of people who died while they were confirmed to be infected. This measure, often defined as COVID-19-associated mortality, is certainly more robust than other pandemic-related quantities such as, for example, the number of reported COVID-19 cases, for which it has become clear that there is a non-negligible discrepancy between cases detected through tests and the number of individuals who were infected (Lau et al. 2021; Schneble et al. 2021). Nonetheless, the raw number of COVID-related fatalities can also be subject to interpretative issues and biases due to underreporting and misclassification. In particular, this number might be biased downwards, as COVID-19 cases can still remain unreported until and after the point of death. Moreover, it is not always straightforward to identify if COVID-19 was the primary cause of death: Some patients might have a SARS-CoV‑2 infection, but the actual contribution of the virus to the death might be minimal (Vincent and Taccone 2020). To deal with these issues, comparing all-cause mortality is generally considered a more robust alternative for assessing the damage done by the pandemic, and to compare its impact between regions or countries. A first look at this matter for Germany was provided by Stang et al. (2020), who looked at data from the first wave ranging from calendar weeks 10 to 26 in 2020. The authors came to the conclusion that a moderate excess mortality was observable for this period in Germany, in particular for the elderly. Morfeld et al. (2021) consider regional variation in mortality in Germany during the first wave (see also Morfeld et al. 2020). A calculation of the years of life lost over the course of the pandemic in Germany in 2020 was pursued by Rommel et al. (2021). International analyses on excess mortality due to COVID-19 include e.g. Krieger et al. (2020) looking at data from Massachusetts, Vandoros (2020) who focuses on England and Wales, and Michelozzi et al. (2020) investigating mortality in Italian cities. Global analyses in this direction were pursued by Karlinsky and Kobak (2021) and Aburto et al. (2021).

Monitoring excess mortality has a long tradition as part of analysing the impact of pandemics (Johnson and Mueller 2002; Simonsen et al. 2013). With the EuroMOMO project, Europe also runs an early-warning system specifically dedicated to mortality monitoring (Mazick et al. 2007). However, no unified methodological definition exists for deciding if the currently observed death counts are higher than what would be expected. A very simple approach is to compare the currently observed deaths for a selected time-period with the average of death counts for a similar period in previous yearsFootnote 1. Alternatively, the expected value can be computed by an underlying time-series model based on past values, e.g. including seasonality and excluding past phases of excess, as done in the EuroMOMO project (see e.g. Vestergaard et al. 2020; Nørgaard et al. 2021). These approaches, however, do not come without problems, as the age structure within a population can change significantly over time. Given that both general and COVID-related mortality are heavily dependent on age (Dowd et al. 2020; Levin et al. 2020), comparisons between different years based only on raw data will often lead to biased estimates. More specifically, using such techniques will lead to overestimating excess mortality for aging populations (such as those, e.g., in western Europe), and underestimating it for populations that get progressively younger. More sophisticated approaches thus need to adjust for different or changing age structures in the population. The latter point is of particular relevance when looking at aging populations (Kanasi et al. 2016) and the infectious risks for the elderly (Kline and Bowdish 2016). Such age-adjustments have a long tradition in demography when comparing mortality across different regions with different age-structure (Keiding and Clayton 2014; Kitagawa 1964). A general discussion on aging populations and mortality can be found in Crimmins and Zhang (2019).

In this paper, we build on existing methodology to propose two ways of calculating expected mortality taking age into account, respectively at the weekly and at the yearly level. These methods are compared to the existing benchmarks on data from Germany over the years 2016–2019, for which age-stratified information is available. We furthermore apply those methods to assess age group-specific excess mortality in Germany during the COVID-19 pandemic in 2020. The remainder of the manuscript is structured as follows. In Sect. 2 we look at yearly expected mortality, while the weekly view is pursued in Sect. 3. Sect. 4 ends the paper with some interpretative caveats and concluding remarks.

2 Yearly Excess Mortality

We first look at yearly data, and tackle the question of whether there was excess mortality in Germany in 2020. In order to obtain an age adjustment for mortality data, we calculate expected deaths based on official life tables. Life tables give the probability \(q_{x}\) of a person who has completed \(x\) years of age to die before completing their next life-year, i.e. before their \(x+1^{th}\) birthday. In our analysis we consider the life table provided for the year 2017/2019 by the Federal Statistical Office of Germany (Destatis 2020). The calculation of a life table, as simple as it sounds, is not straightforward, and is an age-old actuarial problem. First references date far back, to Price (1771) and Dale (1772). A historical digest of the topic is provided by Keiding (1987). Over the last decades, the calculation of the German life-tables made use of different methods proposed in Becker (1874), Raths (1909) and Farr (1859). We will come back to this point and demonstrate that further adjustments are recommendable to relate the expected number of deaths to recently observed ones. In particular, with increasing life expectancy, the average age of the German population has been steadily increasing (see e.g. Buttler 2003), and this has an effect on the validity of life tables, as discussed in Dinkel (2002). Generally, an aging population leads to increasingly high yearly death tolls (see e.g. Klenk et al. 2007). To quantify excess mortality one therefore needs to account for age effects, leading to the computation of standardized quantities such as the standardized mortality ratio (SMR, see e.g. Rothman et al. 2008). The SMR is defined as the ratio of observed death counts over expected deaths, and thus allows for an age adjusted view, meaning that instead of pure death counts one takes the (dynamic) age structure into account.

Fig. 1
figure 1

Lexis Diagram indicating the different quantities to be estimated

Calculating excess mortality on a yearly basis requires to calculate expected fatalities using life tables provided by the relevant statistical bureau. We make use of data provided by the Federal Statistical Office of Germany (Destatis 2020). A straightforward way of obtaining the expected number of deaths for age group \(A\) in year \(y\) is to calculate

$$\begin{aligned}e_{A,y}=\sum_{x\in A}q_{x}P_{x,y}\end{aligned}$$
(1)

where \(P_{x,y}\) is the population size of individuals aged \(x\) years at the beginning of year \(y\) and \(q_{x}\) are the age-specific death probabilities, e.g. those found in the most recent German life table from the years 2017/19, calculated following Raths (1909). More specifically, let \(D_{x}\) be the cumulated number of individuals that died at \(x\) years old, i.e. before their \(x+1\)-th birthday in the considered years 2017 to 2019. Let \(P_{x,y}\) denote the population size of \(x\) year old individuals on December 31st in year \(y\in\{2016,2017,2018,2019\}\). \(q_{x}\) provided in the German life-tables is then defined as

$$\begin{aligned}q_{x}=\frac{D_{x}}{\displaystyle\sum_{y=2016}^{2018}\frac{P_{x,y}+P_{x,y+1}}{2}+\frac{D_{x}}{2}}\end{aligned}$$
(2)

We label (1) in combination with (2) as Method 1 below. We now show that this quantity is biased for estimating the expected number of deaths of \(x\) year old people in year \(y\). To motivate this we look at the Lexis diagram in Fig. 1, and for simplicity we replace the calculation in (2) by looking at a single year only, i.e from \(y=t\) to \(y=t+1\). This leads to \(D_{x}=I+\textit{II}\), where \(I\) and II refer to the observed deaths in the two triangles in Fig. 1. Note that following the calculation principle (2) of the Statistisches Bundesamt we would obtain \(q_{x}\) as

$$\begin{aligned}q_{x}=\frac{D_{x}}{\displaystyle\frac{P_{x,t}+P_{x,t+1}}{2}+\frac{D_{x}}{2}}\end{aligned}$$
(3)

where \(P_{x,t}\) and \(P_{x,t+1}\) are the population sizes of \(x\) year olds indicated in Fig. 1. That is \(q_{x}\) is the probability of dying in triangles \(I\) and II. Let us define with \(\tilde{q}_{x}\) the probability of an individual aged \(x\) years at the beginning of year \(t\) (i.e. on December 31st in year \(t-1\)) to die before year \(t+1\) starts. In other words \(\tilde{q}_{x}\) is the probability of dying in triangles II and III. In fact, this is the probability we are interested in. It is easy to see that \(\tilde{q}_{x}\neq q_{x}\). Assuming that the probability of dying in triangle \(I\) is roughly equal to the probability of dying in triangle II, and assuming the same relationship for triangles III and IV holds, we can conclude the approximate equivalence

$$\begin{aligned}\tilde{q}_{x}=\frac{1}{2}q_{x}+\frac{1}{2}q_{x+1}\end{aligned}$$
(4)

which leads to the expected number of deaths

$$\begin{aligned}\tilde{e}_{A,y}=\sum_{x\in A}\tilde{q}_{x}P_{x,y}.\end{aligned}$$
(5)

We label (5) as Method 2 below. The adjustment is still not complete, and in fact it can be shown that (5) is still biased for delimited age groups (see Hartz et al. 1983). This is because individuals dying in triangle III count as \(x+1\) years old, so that part of the deaths contributes to an age group that is different from the target. We may now assume for simplicity that the probability of dying in triangles II and III is roughly the same, which leads to the following calculation. Let \(A=[a_{l},a_{r}]\),

$$\begin{aligned}\hat{e}_{A,y}=0.5\cdot\tilde{q}_{a_{l-1}}P_{a_{l-1},y}+\sum_{x=a_{l}}^{a_{r-1}}\tilde{q}_{x}p_{x,y}+0.5\cdot\tilde{q}_{a_{r}}P_{a_{r},y}\end{aligned}$$
(6)

where \(A\) is the age group, \(a_{l}\) and \(a_{r}\) refer to the left and right age boundaries of the group, \(\tilde{q}_{-1}=\tilde{q}_{0}\), and \(P_{-1,y}=P_{0,y}\) gives the approximation for the youngest age group. Accordingly, for \(a_{r}=\max(x)\) we take the full fraction of the last year, that is we add an additional \(0.5\cdot\tilde{q}_{a_{r}}p_{a_{r},y}\) to the formula above. We label (6) as Method 3 below.

Based on this method we can now compare expected and observed fatalities over the last years using the same 2017/2019 life table as basis. Note that, when looking at different years, one may more accurately also consider different life tables to account for changing life expectancy. We omit this point for simplicity since we only look at five years, and changes in life expectancy over this short period were moderate (Wenau et al. 2019). This is equivalent to implicitly assuming constant age-specific hazards over the last five years (while we still, of course, account for the changing age structure). Fig. 2 gives a first overview of the results for all age groups combined. In the figures, alongside Method 3, we also show the results obtained with Methods 1 and 2. This is to demonstrate how impactful their previously underlined biases, which may seem small on paper, can be in practice. We plot the observed death counts as black dots, and we represent the expected death counts based on the different methods as dashed lines. We can see that Method 1, which uses (1), clearly underestimates the expected death counts. Method 2 and Method 3 perform equally well, which is not surprising, since we here do not take an age-specific view. The latter is carried out in Fig. 3 for all different age groups available from the data. This age-specific view shows how Methods 2 and 3 differ, and that overall Method 3 shows the better fit. We can quantify the empirical discrepancy between the three methods by calculating the mean absolute percentage error for the different age groups, where we explicitly exclude year 2020 due to the COVID-19 pandemic. The results of this can be found in Table 1.

Fig. 2
figure 2

Expected deaths computed by calender year with the three different methods described, for all age groups combined. Realized fatalities are shown as black dots. Note that Methods 2 and 3 are visually indistinguishable, as here all age groups are pooled together

Fig. 3
figure 3

Expected deaths by calendar year and age group computed with the three different methods described. Realized fatalities are shown as black dots

Table 1 Age-specific mean absolute percentage error for expected yearly fatalities calculated with different methods over the years 2016 to 2019. Year 2020 is excluded due to the COVID-19 pandemic. The smallest value for each age group is highlighted in bold

Having seen that Method 3 empirically outperforms the other two over recent years, we can use the expected number of fatalities computed with this method for 2020 to quantify excess mortality during the first calendar year of the COVID-19 pandemic in Germany. Table 2 contains expected and observed mortality figures for all age groups in 2020, as well as the absolute and percentage differences between the two. From the table we can see that, for the entire population, the age-adjusted excess mortality was in the order of 1% in 2020. We stress that these results in terms of COVID-19 impact need to be interpreted with utmost care: We here focus on the methodological aspects, and defer the subject-matter discussion of the results to Sect. 4. Also note that, while this section focuses the attention on the difference between observed and expected mortality, one could also easily obtain the yearly SMRs by simply taking the ratio of those two quantities. We believe the (percentage) differences to be more interesting when looking at the data at the yearly level. Nonetheless, the real insight lies in estimating the expected number of deaths in a given period; once that is calculated, one can use any preferred method to quantify the excess.

In this Section we approached the problem of excess mortality from a yearly standpoint. A natural follow up would be to zoom into a monthly or weekly view. A way to move in this direction would be to divide the expected yearly mortality by the total number of weeks in a year, and computing a weekly “SMR” using weekly observed deaths. The main issue with this type of approach is that it does not allow to take within-year seasonality into account for the expected deaths. In the following section we therefore follow a different approach based on standardization, which can account for seasonality and is more model-free.

Table 2 Expected and observed yearly mortality in 2020 for each of the six age groups, computed with Method 3

3 Weekly Excess Mortality

To tackle the question of weekly excess mortality, classical standardization approaches such as direct and indirect standardization can be used to adjust the observed values for age effects, see e.g. Kitagawa (1964). We will focus on indirect standardization, but given an appropriate choice of reference population, direct standardization approaches are straightforward adaptations.

Fig. 4
figure 4

Weekly mortality rate per 100.000 for the different age groups as well as the range (min-max) of the corresponding mortality rates of the past four years and their mean

Let \(q_{t,x}\) be the mortality probability specific to age \(x\) and time period \(t\). In what follows, the considered time period will be one International Organization for Standardization (ISO) week, but other intervals (e.g. months) are also imaginable. We estimate \(q_{t,x}\) by dividing the number of observed deaths at age \(x\) during time period \(t\), defined as \(D_{t,x}\), by the corresponding population at the beginning of the time period, i.e. \(P_{t,x}\). To be specific, we define

$$\begin{aligned}\hat{q}_{t,x}=\frac{D_{t,x}}{P_{t,x}}.\end{aligned}$$
(7)

Since the age-stratified population is only available as a point estimate for December 31st of each year, we use linear interpolation to estimate \(P_{t,x}\). The corresponding estimates of weekly mortality probabilities (7) are shown in Fig. 4. We see that in the age groups \(\geq 50\) years old a substantial weekly excess mortality is observable from week 45 on, with more pronounced excess mortality for the elderly. Also note that the official 2020 population data are available since June 2021 and, hence, were used for the present retrospective analysis. However, when performing analyses in real time, recent population data might not (yet) be available, and projections would therefore be needed (see e.g. Ragnitz 2021; Höhle 2021).

A weekly SMR-based excess mortality measure for the entire year 2020 can now be computed as follows. Let \(t\) denote a specific ISO week in 2020, i.e. this will serve as notational shorthand for ISO week 2020‑W\(t\), where \(t=1,\ldots,53\). We form the expected age-time mortality probability for this week by computing the average of the mortality of the same week over the last 4 years, i.e.

$$\overline{q}_{t,x}=\frac{1}{4}\sum_{y=2016}^{2019}\hat{q}_{y\text{-W}t,x},\quad t=1,\ldots,53.$$

Because the years 2016–2019 do not have an ISO week 53, we define \(y\text{-W}53\) for \(y=2016,\ldots,2019\) as \(\frac{1}{2}(q_{y\text{-W}52}+q_{(y+1)\text{-W}01})\). The indirect standardization now computes the expected number of deaths for week \(t\) as

$$\overline{e}_{t,x}=\overline{q}_{t,x}\cdot P_{t,x}$$

This corresponds to the expected number of deaths in week \(t\) at age \(x\), if the current population would have been subject to the average death probability calculated over the past four years. Since fatalities are not given with exact ages but rather by age group, we indicate this by using \(q_{t,A}\), \(P_{t,A}\) and \(e_{t,A}\), where \(A\) denotes the age classes. Fig. 4 shows \(\hat{q}_{t,A}\) as well as \(\overline{q}_{t,x}\) for Germany for the available age classes. Also note that this computation is equivalent to computing, for each reference year \(y\), the expected number of deaths for the relevant week in 2020, and then taking the average of the expected deaths. In other words: by applying the mortality probabilities for the same week of the reference year \(y\) to our study population (i.e. 2020‑W\(t\)) and then averaging the four expected fatalities, we get:

$$\overline{e}_{t,x}=\frac{1}{4}\sum_{y=2016}^{2019}q_{y\text{-W}t,x}\cdot P_{t,x}.$$

One can now define the absolute excess mortality in week \(t\) and age-group \(A\) as \(D_{t,A}-e_{t,A}\). Instead of focusing on absolute differences, it is better in terms of interpretation to look at relative estimates of excess mortality given by the standardized mortality ratio (SMR)

$$\begin{aligned}\operatorname{SMR}_{t,A}=\frac{D_{t,A}}{\overline{e}_{t,A}}.\end{aligned}$$
(8)
Fig. 5
figure 5

Weekly SMR estimates for the different age groups

We plot the corresponding weekly estimate resulting from (8) for all age groups in Fig. 5. As already seen in the incidence plots, we note that in the older age groups the first approx. 10 weeks of the year had a rather low SMR, followed by a small increase consistent with the first COVID-19 wave. Furthermore, substantial increases are observed in the \(\geq 50\) years old age groups starting from week 45, coinciding with the 2nd wave, and reaching up to 40% more deaths than expected in certain weeks. Note that it would also be possible to aggregate the weekly numbers to generate yearly excess-mortality statements similar to those in Table 2. All in all, the results of the two methods at the yearly resolution are similar. We don’t include the results of this aggregation here, but refer to Höhle (2021) for comparison.

4 Discussion

The COVID-19 pandemic posed numerous challenges to scientists. One of those challenges lies in estimating the number of fatalities brought upon by the pandemic. To tackle this issue, we pursued an approach based on comparing observed all-cause mortality in 2020 with the number of fatalities that would have been expected in the same year without the advent of COVID-19. Building on existing methodology, we proposed two simple ways of computing expected mortality, respectively at the yearly and at the weekly level. We then put those methods to work to obtain estimates for excess mortality in 2020 in Germany. The two approaches yield similar results at the aggregate level, and highlight how 2020 was characterized by an overall excess mortality of approximately 1%. The light excess mortality was apparently driven by a spike in fatalities related to COVID-19 at the end of the year in the older age groups.

Interpreting COVID-19 mortality has become a politically sensitive issue, where the same underlying data are used to either enhance or downplay the consequences of COVID-19 infections. We therefore stress that our interests are methodological, and that the presented results are restricted to the calendar year 2020 for Germany as a whole. Altogether, the mild mortality in the older age groups during the first weeks (e.g. due to a mild influenza season) balanced the excess in the higher age groups which came later in the year. Clearly noticeable is the second wave during Nov-Dec 2020, which also continued in the early months of 2021. To better account for such seasonality, excess mortality computations for influenza are often pursued by season instead of calendar year, i.e. in the northern hemisphere for the period from July in Year \(X\) to June in Year \(X+1\) (Nielsen et al. 2011). Similarly, the impact of COVID-19 cases and fatalities was not only temporally, but also spatially heterogeneous, with strong peaks in Dec 2020 in the federal states of Saxony, Brandenburg and Thuringia (Höhle 2021). Hence, using mortality aggregates over periods and regions only provides a partial picture of the impact of COVID-19. Furthermore, the mortality figures observed in 2020 naturally incorporate the effects of all types of pandemic management consequences, which include changes in the behavior of the population (voluntary or due to governmental interventions). Disentangling the complex effects of all-cause mortality and the COVID-19 pandemic is a delicate matter, which takes experts in several disciplines (demographers, statisticians, epidemiologists) to solve. Timely analysis of all-cause mortality data is just one building block of this process; Nevertheless, the pandemic has shown the need to do this in near real-time based on sound data while adjusting for age structure.

Our analysis was motivated by the fact that many of the methods that have been applied to tackle this issue so far fail to take the changing age structure of the population into account. This can lead to biased results, and especially so for the rapidly aging developed countries. In the case of Germany, for example, the absolute number of people aged 80 or more increased by approximately 20% from 2016 to 2020. Such a remarkable increase will naturally have an effect on overall mortality, and as such direct comparisons in the number of casualties across different years will lead to significant overestimation of the excess mortality. Our approaches are instead robust to such changes in population structure, and can be used regardless of the demographic context. Note that, for both of our approaches, it would also be possible to obtain confidence intervals through imposing distributional assumptions. This would, however, not be straightforward, for several reasons. First of all, the residual variability is well beyond what would be explainable through a Poisson distributional assumption. To solve this, one could, in principle, replace the Poisson distribution with a Negative Binomial one, or adopt an approach based on quasi-likelihood (see McCullagh 1983) and incorporate an additional overdispersion parameter. But in addition to this, stating confidence intervals would also require an understanding of which (super-)population parameters the confidence intervals make statements about. Since the all-cause excess mortality estimates are for the entire population of interest (the German population), some kind of repeated sampling setting would have to be assumed. For those reasons we refrain from pursuing this, and leave it for future research on the subject. The same methodologies could also be used to pursue a similar analysis for any country in which mortality data and a mortality table are available, for any given year. A natural use for the proposed methodology would also be to assess the overall damages caused by the pandemic when it will be finally considered a thing of the past. All in all, we hope the proposed methods will help shedding light on the issue of computing the expected number of fatalities, and consequently in the assessment of (potential) general excess mortality.