Modeling and forecasting age-specific drug overdose mortality in the United States

Drug overdose deaths continue to increase in the United States for all major drug categories. Over the past two decades the total number of overdose fatalities has increased more than fivefold; since 2013 the surge in overdose rates is primarily driven by fentanyl and methamphetamines. Different drug categories and factors such as age, gender, and ethnicity are associated with different overdose mortality characteristics that may also change in time. For example, the average age at death from a drug overdose has decreased from 1940 to 1990 while the overall mortality rate has steadily increased. To provide insight into the population-level dynamics of drug overdose mortality, we develop an age-structured model for drug addiction. Using an augmented ensemble Kalman filter (EnKF), we show through a simple example how our model can be combined with synthetic observation data to estimate mortality rate and an age-distribution parameter. Finally, we use an EnKF to combine our model with observation data on overdose fatalities in the United States from 1999 to 2020 to forecast the evolution of overdose trends and estimate model parameters.


I. INTRODUCTION
The number of drug-overdose fatalities in the United States has been steadily increasing over the past 20 years [1,2].Between 1999 and 2020 more than 900,000 drug-overdose deaths were reported in the US.In 2020 alone, almost 100,000 people died from injury or poisoning from drugs of abuse (mainly opioids and psychostimulants), constituting a 32% rise over 2019.According to provisional mortality data [3], this trend has continued throughout 2021.
A study [1] that examined the exponential growth in drug overdose deaths between 1979 and 2016 in the US reveals that the drug types causing these rises have changed over time.During the 1980s and 1990s, the majority of fatal drug overdoses were due to illegal substances such as heroin and cocaine.Successive overdose waves were driven by prescription opioids in the 2000s, followed briefly by heroin in 2010, and, beginning in 2013, by synthetic opioids.The synthetic opioid wave persists to this day as the majority of US overdose deaths are due to fentanyl and its derivatives.There is also substantial variability in the demographic patterns of drug-overdose deaths.While cocaine and prescription drugs mostly led to increased mortality among 40-to-50-year-olds, current fentanyl use is accompanied by fatality rate increases among 20-to-40olds.In addition to age, factors such as gender, race, and place of residence are also associated with variations in drug-overdose risk [4].
The majority of studies analyzing the spatio-temporal evolution of overdose mortality are mainly descriptive and rely on data visualization and statistical analysis of * l.boettcher@fs.depast data.In this work, we instead use an age-structured model [5,6] to mechanistically study the drug epidemic in the US.The model is then used in conjunction with empirical data to forecast the short-term evolution of overdose mortality through an ensemble Kalman filter (EnKF), a data assimilation technique [7][8][9].
Age-structured models (also known as Kermack-McKendrick models) can be used to mathematically describe the evolution of distinct population categories (e.g., susceptible and dead), where the dynamics and interactions among categories may depend on the distribution of age in the population.Different variants of agestructured models have been developed and applied to model heroin addiction as an epidemic [10][11][12][13][14][15][16][17].Such models have also been applied to mechanistically describe cellular processes [18,19] and population dynamics associated with social interactions [20], birth control policies [21], and COVID-19 mortality [22][23][24].
The ensemble Kalman filter, which we use to combine observation data with an age-structured model of overdose mortality, originated from research activities in the geophysical sciences and has found various applications in problems that require combining highdimensional dynamical systems with observation data [25].Kalman filtering and related data assimilation methods (e.g., Bayesian Markov chain Monte Carlo) have been used in computational biology and medicine to estimate model parameters [26][27][28][29], identify patients with antibiotic-resistant bacteria in hospital wards [30], and develop risk-dependent contact interventions in epidemic management [31].Within computational social science, data assimilation methods have proven useful in combining mechanistic models with survey data, e.g., to study the evolution of political polarization in the US [32].
In Sec.II we present a general age-structured model that includes age-dependent addiction and age-specific mortality.We also discuss approximations that admit analytical solutions.The basic concepts underlying the EnKF are outlined in Sec.III.In Sec.IV we adapt our general age-structured model to describe a population suffering from substance use disorder (SUD).We then describe the available drug-overdose data and illustrate how the EnKF is applied to our model and dataset.Finally, we conclude our work with a discussion and future outlook in Sec.V.

II. A GENERAL AGE-STRUCTURED MORTALITY MODEL
Our starting point is the general age-structured model where n(a, t)da is the number of individuals ( i.e., people with SUD in our application) with age between a and a + da at time t.We assume this population dies at rate µ(a, t), and that there is an influx rate p(a, t).
The initial conditions at t = t 0 and a = 0 are specified via n(a, t = t 0 ) = ρ(a) and n(a = 0, t) = g(t), where ρ and g are non-negative functions such that g(t → t 0 ) = ρ(a → 0).We specifically set g(t) = 0, implying that no population of age a = 0 exists at any time.In the context of modeling overdose mortality, this means that the number of addicted newborns is assumed to be zero.Note that this model is different from the original McKendrick model [5] in which an age-dependent birth rate generates newborns through a self-consistent boundary condition on n(a, t).To solve Eq. ( 1) analytically we use the method of characteristics and distinguish the two cases a ≥ t − t 0 and a < t − t 0 .For a ≥ t − t 0 , the characteristic will begin at t = t 0 and n(a, t) will remain constant along a = t − t 0 .When a < t − t 0 the characteristic will begin at a = 0 and n(a, t) will remain constant along t = a + t 0 .The formal solution to Eq. ( 1) can then be expressed as As a specific example we set the initial time t 0 = 0, fix the initial condition ρ(a) = 0, and impose a constant death rate µ(a, t) = µ.We further assume an immigration rate p(a, t) = p(a) = ae −λa which has a maximum at age λ −1 > 0. Equations ( 2) and (3) become The function p(a) = ae −λa describes an influx of people of mean age 2λ −1 that suffer from an SUD.Using this functional form, the number of SUD cases that are much younger/older than 2λ −1 is small compared to the number of SUD cases with an age of about 2λ −1 .The distribution of overdose cases in the US population follows a qualitatively similar trend [33].We use this analytically tractable example in Sec.III to explain how age-structured models of the form presented in Eq. ( 1) can be combined with Kalman filters to learn model parameters from noisy observations.In Sec.IV, we describe p(a) by a more general linear combination of two gamma distributions to connect our model of drug-overdose mortality with corresponding data from the CDC WONDER database.
Observe that n(a, t) in Eqs. ( 4) and ( 5) is continuous for a = t and that the maxima of Eq. ( 4) are located along the trajectory where a max (t) > t is an increasing function of time.The steady state form of n(a, t → ∞) is given by the timeindependent term in Eq. (5).

III. ENSEMBLE KALMAN FILTER
In the first part of this section we describe the basic definitions and update rules in the EnKF [7].We use the standard state-space representation of a physical system and distinguish between state, output, and input (i.e., control) variables.Outputs are quantities that can be ob-served or measured (e.g., the number of overdose deaths), while other quantities such as age-specific mortality rates and the number of individuals suffering from SUD are state variables that are not known and have to be estimated.As a first application example, we use the EnKF to estimate the rates µ and λ that arise in Eqs. ( 4) and ( 5) of the simple model presented in Sec.II.

A. Basic definitions
To outline the main steps associated with the application of an EnKF to the age-structured PDE model in Eq. ( 1), we primarily follow the notation of Refs.[8,9]; the EnKF implementation that we use in this work is instead based on Ref. [34].
The evolution of the system state x(t) and observed state z(t) is described by where Q(t) and R(t) denote the covariance matrices associated with the Gaussian process noise N (0, Q(t)) and Gaussian observation noise N (0, R(t)) at time t, respectively.We assume the quantities Q(t) and R(t) to be known.The function f (•) describes the dynamics of the system state x(t), while h(•) maps x(t) to a measurable quantity.Both functions can be non-linear.
In the context of the age-structured model (1), element x j (t) of the state vector x(t) corresponds to n(a j , t) ≡ n(a 0 + j∆a, t) (0 ≤ j ≤ N a − 1), the density of individuals whose age lies within the [a 0 +j∆a, a 0 +(j +1)∆a) interval at time t.That is, We use N a and ∆a to denote the number of discretizations of the age interval and the corresponding age discretization step, respectively.For the numerical solution of Eq. ( 7), we later also discretize the simulation time interval [0, T ] into N t equidistant intervals of duration ∆t = T /N t .If we wish to estimate model parameters such as µ and λ introduced in Sec.II, we can augment the state to obtain An example of an inference problem with an augmented state (9) will be provided in Sec.III B. At every time point t, the goal of filtering is to determine the state posterior distribution given all prior observations.Before producing EnKF state predictions, we generate an initial ensemble [χ The quantities x0 and P 0 denote the given initial state and covariance estimates, respectively.
We now outline the two main EnKF steps: (i) forecasting the evolution of the system state and (ii) updating the predicted state estimates using observation data.To do so, we discretize the time evolution of the system state and use the shorthand notation y k ≡ y(t k ) to refer to a quantity y at time Here and in the remainder of the manuscript, we assume that t 0 = 0.
The basic idea behind forecast and update iterations is that one first uses state estimates χ k+1 at time t k+1 .These predicted estimates are the combined with observational data to obtain an updated state estimate k+1 is used to distinguish the predicted (i.e., prior) state estimates from the updated (i.e., posterior) state estimates.
(i) Forecast Step: For each ensemble member, we calculate the predicted state estimate χ where For the sake of computational efficiency, we avoid discretizations of the partial derivative of n(a, t) with respect to a in the EnKF simulations.In all numerical experiments, we first derive closed-form expressions of the rate of change of n(a, t) to compute predictions χ (i)− k+1 according to Eq. (10).The ensemble mean of the predicted state, x− k+1 , and the corresponding covariance matrix, (P − xx ) k+1 , are given by The covariance matrix (P − xx ) k+1 is not required in the EnKF iteration, but it can be used to estimate (ii) Update Step: We begin with deriving the ensem-ble mean of the predicted observation as well as the corresponding covariances The Kalman gain is For a given observation z k+1 , the state update of ensemble member i is where η k+1 ∼ N (0, R k+1 ).Finally, the updated state estimate and the corresponding covariance matrix are given by

B. Estimating model parameters
As a first example of estimating model parameters with the help of an EnKF, we focus on the analytically solvable case from Sec. II for which closed-form analytical solutions of n(a, t) can be obtained.Our goal is to estimate µ and λ in Eqs. ( 4) and (5).We thus augment the state (8) by µ, λ to obtain In accordance with Eqs. ( 4) and ( 5), the evolution of n(a, t) is described by The evolution of the first N a components of the augmented state ( 18) is described by Eq. (19).We assume that we can observe perturbed versions of n(a, t) but not µ, λ (i.e., the measurement function is h(x(t)) = [n(a 0 , t), . . ., n(a Na−1 , t)] ).To avoid sign changes in the estimates of µ, λ during the EnKF iterations, we apply an exponential transform to render both estimates positive.
In our simulations, we consider an age interval of [0, 120] years.
We generate unperturbed observation data from the model using µ = 0.08/year and λ = 0.2/year.The perturbations that we add to n(a, t) are normally distributed with zero mean and variance 10 −4 .Our goal is, given the randomized n(a, t), to estimate the underlying µ and λ with an EnKF and verify the degree of accuracy of our estimates compared to the original values.In real-world applications, new observation data may not be available for each prediction.To account for this potential lack of observation data, we perform update steps (i.e., integrate observation data into our predictions) every five prediction periods.
Figure 1(a) shows the evolution of both the true solution n(a, t) for which µ, λ are known (dashed black lines) and of the corresponding EnKF estimates that use the augmented state (18).Grey-shaded regions indicate 3σ intervals of the EnKF predictions [see Eq. ( 12)].We observe that the EnKF produces estimates of λ and µ that are very close to the true solution after t > ∼ 2 years and t > ∼ 7 years, respectively.

IV. APPLICATION TO DRUG-OVERDOSES
We now use an EnKF to combine the model in Eq. ( 1) with corresponding empirical data taken from the CDC WONDER database.Here different causes of death are classified according to the 10th revision of the International Statistical Classification of Diseases and Related Health Problems (ICD-10).We selected ICD-10 codes T40 (poisoning by narcotics and psychodysleptics) and T43.6 (psychostimulants with abuse potential) and all drug-induced deaths, including unintentional death, suicide, homicide, and death by an undetermined cause.We extracted data for the period 1999 until 2020.
In order to interface drug-overdose data with the analytical setup given in Eq. ( 1), we identify n(a, t) da as the number of people with SUD (w.r.t.any drug) of ages between a and a + da at time t.We also associate the influx into the SUD population with an addiction rate of the non-SUD population: r(a, t)[N (a, t) − n(a, t)], where N (a, t) is the overall population density at time t from which we subtract n(a, t), the density of people with an existing SUD.Finally, the prefactor r(a, t) represents an age-and time-dependent addiction rate, which might be modeled [35] or estimated from additional data such as surveys.Including these elements, the model in Eq. ( 1) is adapted to Equation ( 20) can be recast in the same form as Eq. ( 1) via Upon comparing Eq. ( 21) to Eq. ( 1) we can identify µ(a, t) → µ(a, t) + r(a, t) and p(a, t) → r(a, t)N (a, t), so that the analytical solutions to Eq. ( 21) can be written through the proper substitutions in Eqs. ( 2) and (3).Apart from n(a, t), Eq. ( 21) contains the functions N (a, t), r(a, t), µ(a, t).Here we introduce some possible forms for them, based on available data and realistic assumptions.We begin with the entire population density N (a, t).Because of different population-level dynamics such as aging and immigration, the population growth in specific age classes is non-monotonic.In principle, it is possible to use interpolation methods and non-linear functions to construct an age-stratified N (a, t) based on empirical population data that is usually available for 5 or 10-year age windows.However, for the sake of analytical tractability, we will assume an age-independent quantity with N (t) and focus on the more general form N (a, t) in future work.To account for the quasi-linear US population growth in the past two decades, we set where N 0 = 274.9× 10 6 and ∆N = 2.3 × 10 6 /year.22)], respectively.
To model r(a, t), we assume a linear combination of two gamma distributions f 1 (a; α 1 , β 1 ), f 2 (a; α 2 , β 2 ), each peaked at different ages, to describe possible variations in the age dependence of the addiction rate.We do this to account for changes in the prevalence of drug type and societal consumption patterns over the 21 year time frame we examine.The quantities α 1 , β 1 and α 2 , β 2 denote shape and rate parameters of the two distributions f 1 and f 2 , respectively.We also assume that r(a, t) does not depend on time and write where r 0 is a base modulating rate.
The numerical results that we discuss in the following paragraphs show that a linear combination of two gamma functions allows us to capture the double-peaked distribution of age-stratified overdose deaths [see Fig. 3(a-c)].Finally, for analytical tractability of the double integrals arising from the solutions of Eq. ( 21), we retain the constant mortality rate assumption µ(a, t) = µ.To combine the mechanistic model in Eq. ( 21) with empirical data on overdose deaths, we augment the system state x(t) [see Eq. ( 8)] by where D(a j , t) is the cumulative number of overdose deaths in the age interval [a j , a j+1 ) up to time t.We also augment the system state x(t) by the model parameters µ, r 0 , α 1 , β 1 , α 2 , β 2 that we wish to estimate.As a result, the final augmented system state is x(t) = n(a 0 , t), . . ., n(a Na−1 , t), D(a 0 , t), . . ., D(a Na−1 , t), µ, r 0 , α 1 , β 1 , α 2 , β 2 .

(25)
We derive the corresponding rate of change ∂n(a, t)/∂t for the EnKF updates in Appendix A.
For an accurate numerical evaluation of the rate of change of n(a, t), we use a sufficiently small discretization that is associated with age windows that are more granular than the available overdose data.We thus have to coarse-grain the modeled overdose death densities to be able to relate them to observation data.The CDC WONDER data that we use in this work is based on 22 age groups with a 0 = 0, a 22 = 120 years and ∆a 1 = 1, ∆a 2 = 4, ∆a 3 = 5, . . ., ∆a 21 = 5, ∆a 22 = 20 years.We use a superscript to distinguish the age discretization in the observation data from the age discretization in the underlying model.
We combine the modeled quantities D(a j , t) with corresponding observation data by numerically integrating D(a j , t) over age windows [a −1 , a ) (1 ≤ ≤ 22) to obtain the corresponding number of deaths D(a , t) in this age interval at time t.Here, a = a 0 + m=1 ∆a m for ≥ 1.Based on the described mapping of D(a j , t) to D(a , t), the measurement function becomes In our simulations, we set the initial values n(a j , 0) = D(a j , 0) = 0.The initial values of µ, r 0 , α 1 , β 1 , α 2 , β 2 are 7 × 10 −4 /year, 0.04/year, 15, 1/(3 year), 15, and 1/(3 year), respectively.We have chosen the initial value of r 0 in accordance with corresponding empirical data on the number of substance initiates [33].The initial mean of both gamma distributions is equal to α 1 /β 1 = α 2 /β 2 = 45 years.To ensure that the parameters µ, r 0 , α 1 , β 1 , α 2 , β 2 stay positive during EnKF iterations, we use the same exponential transform as in Sec.III B. All initial covariances are set to 10 −4 , except for the diagonal elements associated with µ, r 0 , α 1 , β 1 , α 2 , β 2 , which are set to 10 −2 .The process and observation noise covariances are as in Sec.III B. We use a small process noise and a relatively large initial model parameter variance to (i) let the dynamics evolve according to the mechanistic drugoverdose model without too much additional noise in n(a, t), D(a, t) and (ii) let the filter explore different trajectories associated with appreciable variations in the underlying model parameters.We have also performed simulations for larger process noise values associated with n(a, t) and D(a, t).For example, we set the corresponding diagonal elements of Q to values between 1 and 100 without observing substantial differences in the simulation results.In the measurement process, we divide D(a, t) by 10 3 to work with numerical values of O(1) when comparing predicted and observed overdose fatalities.A measurement variance of 10 −4 (i.e., a standard deviation of 10 −2 ) corresponds to about 10-100 overdose fatalities in the simulated data.Although the exact measurement noise is difficult to estimate given the unknown number of undocumented fatal drug-overdose cases, we used a standard deviation of about 10-100 as a reasonable modeling choice.
In our simulations, we use a relatively large ensemble size of M = 10 4 to minimize the effect of sampling errors that occur during the Monte Carlo approximation of the system state evolution in the prediction and update steps.Our simulations start in 1998 and we use a timestep of ∆t = 0.1 years.
Figure 3(a-c) shows reported drug-overdose deaths (dashed black lines) for the years 2008, 2013, and 2018.Solid red lines represent EnKF predictions that are based on updates that involved observation data from all previous years since 1999.No additional observation data were available between two subsequent years.That is, for predictions that were made for, e.g., 2008, the most recent observation data that was available to the EnKF was from 2007.Still, the EnKF predictions in Fig. 3(a-c) are closely aligned with the reported overdose deaths.For almost all age classes, predicted overdose fatalities lie within the shown 3σ regions (grey shaded regions).For applications in real-time monitoring of overdose fatalities, one may also include provisional data that becomes available during the course of a year to further refine forecasts.
The evolution of μ and r0 [see Fig. 3(d,e)] suggests that Finally, in Fig. 4, we show forecasts of overdose mortality in the US for the years 2021, 2022, and 2023.The latest observation data that is available for these forecasts is from 2020 (dashed black lines in Fig. 4).The predicted overdose mortality in 2021 is slightly smaller than in 2020 in age groups between 30-60.In 2022 and 2023, the predicted overdose mortality in many age groups exceeds that of 2020.Since the overall overdose mortality has increased unsteadily more than five-fold in the past two decades (with a particularly steep increase between 2019 and 2020), the variance in the shown forecasts is relatively large.

V. DISCUSSION AND CONCLUSIONS
We have developed an age-structured model of drugoverdose mortality.Our model accounts for age and timedependent addiction and mortality rates.It can readily be extended to account for multiple drug classes and different ways of stratifying the population.In a simple example, we have shown how age-structured models can be combined with data-assimilation methods such as an ensemble Kalman filter (EnKF) to forecast the evolution of fatalities and estimate model parameters.
Combining our age-specific overdose model with empirical data on overdose fatalities in the US, we have provided a proof-of-principle set of methods that can be useful for estimating parameters governing drug addiction and mor-tality and for forecasting the evolution of population-level overdose dynamics.
In addition to developing a framework to include provisional overdose data and retrospective updates of observation data, possible future work includes the study of how regularization terms can help smooth Kalman filter updates [36], or how other ensemble-based Kalman filters, such as ensemble adjustment Kalman filters [37], may help improve numerical stability and forecast accuracy.For applications of the proposed methodology to small population sizes, it might be worthwhile to update the age-stratified population using a Poisson-process model, where the Gaussian noise term only affects the underlying model parameters and not the population numbers themselves.Since we focused on drug-overdose forecasting over the past two decades, we decided to use the EnKF in a forward mode and not use backward passes/smoothing (i.e., not use future observations from times t > t at time t).As noted by Evensen and van Leeuwen [38], in forecasting mode, the EnKF and ensemble Kalman smoother (EnKS) produce the same state estimate at the latest time.However, using backward passes and an EnKS (or lagged versions) can help improve earlier parameter estimates, which is also an interesting direction for future research.Another possible direction for future work is to extend the presented data assimilation framework to account for age-dependent death rates µ(a, t) and age-dependent population data N (a, t) in Eq. ( 21).• Conflict of interest/Competing interests: The au-

FIG. 1 .
FIG. 1. State and parameter estimation with an EnKF.(a) The population with SUD with age between a and a + da at times t = 0.1, 2.0, 4.5.(b,c) EnKF estimates λ, μ of the rate parameters λ, µ [see Eqs.(4) and (5)].In all panels, the true solution is represented by a dashed black line.Solid red lines and grey-shaded regions indicate EnKF solutions and corresponding 3σ intervals.The shown results are based on M = 500 ensemble members.
Figure 2 shows the linear model N (t) and the corresponding population data for the period 2000-2020.

FIG. 2 .
FIG. 2. Population growth in the US between 2000 and 2020.The dashed black and solid red lines show the population growth in the US during 2000-2020 and a linear population-growth model [see Eq. (22)], respectively.

FIG. 3 .
FIG. 3. Forecasting overdose mortality and estimating model parameters with an EnKF.(a-c) Reported (dashed black lines) and predicted (solid red lines) numbers of overdose deaths in 2008, 2013, and 2018.Empirical data has been collected from the CDC WONDER database.(d-g) Evolution of estimated mortality rate μ, base modulating rate r0, and gamma function means α1/ β1, α2/ β2.In all panels, solid red lines and grey-shaded regions indicate EnKF solutions and corresponding 3σ intervals.The shown results are based on M = 10 4 ensemble members.Observation data for the previous year becomes available in the beginning of every year.

FIG. 4 .
FIG. 4. Predicted overdose mortality in 2021, 2022, and 2023.Solid red lines show EnKF predictions of numbers of overdose deaths in (a) 2021, (b) 2022, and (c) 2023.As a reference, dashed black lines show overdose deaths in 2020.Empirical data has been collected from the CDC WONDER database.Grey-shaded regions indicate corresponding 2σ intervals.The shown results are based on M = 10 4 ensemble members.The latest observation data that was available to generate the shown predictions is from 2020.