Epidemic modeling with heterogeneity and social diffusion

We propose and analyze a family of epidemiological models that extend the classic Susceptible-Infectious-Recovered/Removed (SIR)-like framework to account for dynamic heterogeneity in infection risk. The family of models takes the form of a system of reaction–diffusion equations given populations structured by heterogeneous susceptibility to infection. These models describe the evolution of population-level macroscopic quantities S, I, R as in the classical case coupled with a microscopic variable f, giving the distribution of individual behavior in terms of exposure to contagion in the population of susceptibles. The reaction terms represent the impact of sculpting the distribution of susceptibles by the infection process. The diffusion and drift terms that appear in a Fokker–Planck type equation represent the impact of behavior change both during and in the absence of an epidemic. We first study the mathematical foundations of this system of reaction–diffusion equations and prove a number of its properties. In particular, we show that the system will converge back to the unique equilibrium distribution after an epidemic outbreak. We then derive a simpler system by seeking self-similar solutions to the reaction–diffusion equations in the case of Gaussian profiles. Notably, these self-similar solutions lead to a system of ordinary differential equations including classic SIR-like compartments and a new feature: the average risk level in the remaining susceptible population. We show that the simplified system exhibits a rich dynamical structure during epidemics, including plateaus, shoulders, rebounds and oscillations. Finally, we offer perspectives and caveats on ways that this family of models can help interpret the non-canonical dynamics of emerging infectious diseases, including COVID-19.


Objectives of this paper
In this paper we propose, derive, and study a family of epidemiological models that extend the classic Kermack-McKendrick models of epidemics by structuring populations according to individual-level infection risk. The infection process modulates both the number of individuals in different compartments (e.g., susceptibles, infected, or recovered) as well as the infection risk amongst individuals who remain susceptible. The family of epidemiological models also includes a social diffusion mechanism by which infection risk changes both during and in the absence of epidemics. In doing so, the present work extends a previous model framework (Berestycki et al. 2021), proposed by a subset of the present author group. This prior article introduced a reaction-diffusion system to model epidemics in populations structured by heterogeneous infection risk, yielding non-canonical dynamics including plateaus, shoulders, and rebounds. However, this prior article only considered a particular framework with constant diffusion and no drift and did not provide a derivation of the model. Furthermore, the prior article did not include restorative mechanisms by which infection risk could stabilize during and after epidemics-as we will show, this feature provides the basis for both generalizations and useful simplifications in real-world contexts.
The scope of the study is as follows. First, in Sect. 2, we derive a general family of epidemiological models including heterogeneity in dynamic infection risk, extending the model proposed in Berestycki et al. (2021). Our initial analysis focuses on the evolution of the distribution of the risk parameter in the susceptible population. For this purpose, we represent the effect of the epidemic as a transport phenomenon that drives down the mean risk parameter through a process we term "sculpting", consistent with similar proposals in Rose et al. (2021). The distribution is also sculpted by a mechanism of social diffusion that we represent as random variation with a reversion drift that are associated to a drift-diffusion stochastic process. This SfIR-model describes the evolution of macroscopic (or bulk) quantities: S(t), I (t) and R(t), as in the classical framework, coupled with a microscopic description of the probability distribution function of risk behavior in the susceptible population that we denote f (t, a), where a is a risk factor variable. We are thus led to a system coupling SIR dynamics with a Fokker-Planck equation for the distribution f . The SfIR-model admits a unique equilibrium distribution of risk in the absence of an epidemic. Our analysis begins by showing in Sect. 3 that the risk distribution will converge from an arbitrary initial distribution to this unique equilibrium distribution after the passage of an epidemic (Sect. 3). This section focuses on mathematical features of convergence, both in a general case and in the case of Gaussian profiles. The key take-away from an epidemiological perspective is that there is a unique equilibrium for the distribution f associated with fixed points in which the disease is not circulating, i.e., I * = 0. This unique equilibrium distribution is structured by parameters of social diffusion-balancing dynamic variations in infection risk with a tendency for individuals to reduce their infection risk subject to a minimum constraint. Precisely because the PDE model involves several parameters it is desirable to identify a reduced number of governing parameters to aid analysis and to apply model insights to real-world outbreaks. Hence, we look for parsimonious versions of the family of models that still retain key dynamical features of the full PDE model.
In Sect. 4, we derive such a parsimonious ODE system by assuming a Gaussian initial distribution of infection risk in the population. The number of free parameters in the ODE system is then considerably reduced with respect to the PDE model. This model reduction is achieved by looking for self-similar solutions of the system. The resulting ODE system includes the fraction of individuals in different disease compartments as in classical models (e.g., susceptibles, infectious, and recovered/removed), while augmenting these models with a state variable that represents the mean infection risk of individuals who remain susceptible (similar to an approach in the absence of social diffusion shown in Rose et al. 2021). In this way, the ODE system includes a statistical property of the complete distribution modeled in the PDE system. We show in Sect. 4.4 that this ODE system gives rise to a rich variety of dynamical behaviors. Remarkably, while much simpler, the ODE system exhibits hallmark qualitative features of dynamical complexity as found in the full PDE model-including plateaus, shoulders, and oscillations. We also prove some key mathematical properties of this reduced system.
In Sect. 6 we further derive higher order models to account for more general initial distributions beyond the Gaussian lol. This is achieved by using a spectral method with the eigenfunctions of the harmonic oscillator for a better approximation of the PDE system.
As we show, the SfIR model generates non-canonical features of epidemic dynamics as a direct result of incorporating changes in susceptibility that take place on the time scale of disease spread. In future work we plan to study extensions of the model developed here to include the dynamics of multiple variants, spatial propagation, feedback between awareness and behavior change, and to improve model-data integration through connections with surveillance data including wastewater-based epidemiology.

Background
SIR and related epidemic models categorize individuals in a population according to their disease status, e.g., susceptible, infectious, and recovered/removed. There is a long history of expanding such models to incorporate population-level heterogeneity in the risk of infection, transmission, and/or outcome, e.g., by further differentiating individuals based on age, job type, race, gender, and/or socioeconomic status, etc. (Auchincloss and Diez Roux 2008;Stroud et al. 2007;Ibuka 2016;Leung 2017;Zhang 2020;Brauer 2005;Prem et al. 2017). Extensions to SIR models that include such heterogeneity assume that relevant epidemic parameters and interactions vary across categories (e.g., between ages as in Prem et al. 2021). The SARS-CoV-2 pandemic has accelerated a diversification of modeling approaches to examine the basis for and consequences of heterogeneity for the emergence of an outbreak, the speed of population-level spread and the size and severity of the outbreak at the populationscale. A key motivating factor was the early observation of plateaus and shoulders in SARS-CoV-2 case data-shapes that are seemingly incompatible with classic predictions of SIR-like models in which qualitative changes in dynamics are driven by susceptible depletion (Weitz et al. 2020a). As a result, multiple categories of models have been proposed to reconcile apparent slowdowns in spread, plateaus, shoulders, and oscillations in epidemic dynamics. In doing so, we distinguish three main classes of approaches: (i) bottom-up structured epidemic models; (ii) top-down behavioral models; (iii) mesoscopic models.
Structured epidemic models differentiate individual characteristics, including interactions and behavior, allowing for the inclusion of impacts on transmission from different types of interventions (e.g., school closing, case isolation, physical distancing) (Ferguson et al. 2020;Di Domenico et al. 2020). In the absence of interventions, structured epidemic models are expected to exhibit a single epidemic peak followed by a susceptible decline and elimination, whereas interventions can lead to delays, oscillations, rebounds, and/or asymmetric peaks. Even in the absence of interventions, classic outcomes predicted from SIR models can change when populations are structured. A key early example along these lines extended SIR-like models to account for the impacts of age-and activity-structure on the size of the ongoing SARS-CoV-2 outbreak (Britton et al. 2020). Numerical simulations revealed that herd immunity thresholds and final sizes of simulated outbreaks were smaller than that expected from classical theory when accounting for age-dependent mixing and activity differences. Other papers used more detailed features of social contacts such as social distance, indoor/outdoor environment, and the cumulative duration of contacts (Béraud et al. 2015). Likewise, a series of previous works described population heterogeneity in terms a finite number of transmission rate coefficients in the contact matrix in a SIR or SEIR modeling framework. (See for instance Arino et al. 2005;Turinici 2021, 2020;Magal et al. 2018.) In all of these cases, non-homogeneous interactions in a structured population leads, in effect, to changes in transmission rates relative to those in homogeneous populations.
In contrast, top-down behavioral models neglect individual differences; instead relevant epidemic parameters change over time as a result of time-dependent rates or explicit feedback between spread, awareness, and behavior change. The merits and limits of these contrasting approaches to epidemic modeling are listed in Sukumar and Nutaro (2012). For example, in epidemic models with awareness-based feedback increasing fatalities leads to a decrease in interactions that correspondingly decreases transmission (Weitz et al. 2020a). In effect, the transmission rate β is directly mod-ulated by the number of cumulative (or daily) deaths. Depending on the degree of awareness, this model class can yield plateaus, "shoulder" like patterns and oscillations for the dynamics of infectious individuals-initial peaks are unrelated to susceptible depletion. Whether or not a population experiences, oscillations, a single peak, or sustained plateau in new infections may depend on the extent to which individuals become fatigued or substitute actions (e.g., mask-wearing or increasing ventilation) even as mobility increases. Similarly, other models have assumed that individuals face a behavioral trade-off described as a mathematical utility function (Arthur et al. 2021). In the absence of an epidemic, people try to improve their utility though social interactions-working, attending school, or socializing-so as to reach an ideal level of social contact. In the presence of an epidemic, interactions becomes risky. People then cut back their contacts to a level that balances the benefit of interactions with costs associated with catching the disease. The model is expressed as a SIS system where the transmission rate depends on the optimal contact rate. In this framework, there is a theoretical endemic equilibrium, which means that multiple fluctuation waves are possible for a long time around equilibrium values.
Finally, mesoscopic models represent an effort to bridge the gap between bottom-up agent-based modeling and top-down imposition of behavioral feedback in an otherwise homogeneous population. Mesoscopic models structure populations via an explicit representation of a distribution of traits that play a role in the infection process. Thus, these mesoscopic models are of an intermediate type in terms of complexity and dimensionality relative to the bottom-up and top-down modeling frameworks.
For example, in Rose et al. (2021), the authors incorporate population-level heterogeneity in infection susceptibility. They show that behavioral variation strongly influences the rate of infection, while the infection process simultaneously sculpts the susceptibility distribution. In doing so, they show how the sculpting process can drive distributions towards a characteristic stable shape (e.g., gamma-type susceptibility distributions). Precisely because individuals with greater susceptibility are infected earlier, the remaining population tends to be less susceptible than initially. As a result, the epidemic slows down, leading naturally to power-law behavior in the strength of infection, where the power-law like behavior is generic while the power-law exponent depends on the shape of the heterogeneous distribution. This work suggests that first-order epidemic models that are parameterized in the exponential-like phase may systematically over-estimate the final size and severity of the epidemic, similar to findings by Brauer (2011Brauer ( , 2019 and by Eksin et al. (2019). However, because individuals do not change their behavior, the model in Rose et al. (2021) does not generate oscillations or long-term plateaus. (See Tkachenko et al. 2021 for an effort to link dynamical heterogeneity to the finding of plateaus and multiple waves.) There are multiple other examples-including Almeida et al. (2021) who introduced social diffusion together with behavior heterogeneity as a multidimensional risk variable x in a SIR-type framework. In this model, social diffusion appears as a linear diffusion of susceptible and infected populations (or only infected populations) in terms of x. Such models follow the kinetic modeling approach where the SIR (or SEIR) dynamics are coupled with Boltzmann or Fokker-Planck type equations and local equilibria being expressed as gamma-type distributions.
One may view the present work in the spirit of mesoscopic models. As such, we provide here a comprehensive and consistent derivation of a family of epidemiological models of reduced complexity encompassing both individual level risk heterogeneity and variability of individual behaviors (rather than variability in aggregate behaviors of the population as a whole). We term this a SfIR family of models. Such models describe the evolution of the distribution of population level quantities (S, I , and R) and the dynamics of a microscopic distribution f that captures variability in susceptibility to infection at the individual scale.

Stochastic framework
The model we introduced in Berestycki et al. (2021) involves a risk trait variable a ∈ (0, 1). More generally, here we will assume that a varies either in A = R + or in A = (0, 1). This parameter structures the population of susceptibles S so that S = S(t, a). We can think of a as a lumped variable characterizing the relative vulnerability of individuals to infection. Low levels of a are associated with cautious behaviors while high values correspond to increasing risk of infection. Here, cautious behavior involves both the number of contacts and the probability of infection in a given contact. However, in our approach, we do not explicitly include the effect of cautious individuals reducing contacts with others, perceived as more at risk. This effect of preferential mixing, described in the work of Feng (2014), which also appears in the work of Weitz et al. (2020b), would lead to a different term for the force of infection. The lumped variable a influences both the transmission rate factor profile a → β(a) and the initial distribution of susceptible individuals: a → S 0 (a) = S(0, a). The effect of a is to yield high levels of transmission rates β(a) for high values of a and small ones when a is small. Thus we always assume that β(a) is increasing with a. Because behaviors fluctuate in time, it is natural to consider a as a random variable.
Our purpose in this section is to provide a detailed derivation of the generalized SIR system in the framework of a population structured by the variable a. The dynamics of the distribution of the variable a in the susceptible population results from two joint effects. First, by the very nature of the risk factor, individuals with high a become infected at a higher rate than those with a low level of a. Thus, the epidemics has a dynamic effect on the probability distribution of a. Second, owing to the stochastic nature of the parameter a, it is subject to its own stochastic evolution which we describe below by the stochastic differential equation (9). Thus, our generalized model describes the coupled evolution of behavior and epidemic dynamics.

Dynamics of risk parameter a induced by heterogeneity: transport equation and Lagrangian viewpoint
We first derive the purely epidemic evolution of a. It can be considered as deterministic, in the framework of heterogeneous population with respect to disease transmission rate a → β(a). In particular we aim at determining the corresponding transport equation for the sculpting of the distribution of susceptibles.
In an heterogeneous population, stratified by the risk parameter a ∈ A, (where A denotes either (0, 1) or R + ), the purely epidemic depletion of S(t, a) is given by the first equation of the SIR model: Here, we only consider the stratification of the population of susceptibles. As was already mentioned in Berestycki et al. (2021), one can also envision a situation in which the population of infected is structured by a parameter b and write I = I (t, b). We leave this for further studies. We introduce the probability density or distribution f (t, a) of the population of susceptibles as a function of the level a at time t. That is, we write: Thus, S(t) denotes the total susceptible population at time t. From the heterogeneous SIR equation (1) above, we get and The meaning of (3) is clear. The effect of the epidemic is to deplete the fraction of population with high a and to increase the fraction of population a whose β = β(a) is below the average β in the population. Let F denote the cumulative distribution function, that is Integrating the right hand side of Eq. (3) with respect to a yields: where so that denoting β(t) the average transmission rate at time t, one has Equation (4) is a transport equation. It describes how the distribution of population S(t, a) reorganizes itself under the effect of the epidemics. Note that in the neighborhood of a = 0, one has μ ep (t, a) ∼ a(β(0) − β(t))I (t)/N . Then, to represent the solutions of the transport Eq. (4), we use the method of characteristics. To this end, we introduce the associated Lagrangian flow (t, α) → a(t, α) that follows the dynamics As a matter of fact, one has Hence, introducing the initial probability distribution f 0 and denoting a −1 (t, ·) the inverse of a(t, ·) for all given t > 0, one gets This means that the probability measure of susceptibles P(t) (associated with the density f (t, ·)) at time t is the pushforward of the initial probability measure P 0 by the flow a(t, ·): That is, for all G ⊂ A, the measure of G at time t is deduced from the measure of the image of G by the inverse of the flow a −1 (t, ·):
As in the SIR model, using the dynamics of R compartment (d R/dt = γ I ), we deduce that the Lagrangian paths associated with (5) can be naturally expressed in terms of R instead of t: In the case of homogeneous transmission rate β(a) ≡ β, the epidemic drift vanishes μ ep (t, a) ≡ 0: epidemics do not create heterogeneity if there is none initially.
Let us finally give a computable example of non-zero epidemic drift. Starting from the exponential distribution f 0 (a) = exp(−a) over A = (0, +∞), β(a) = β 1 a for some positive constant β 1 , one gets: so that the dynamical system (8) becomes This example illustrates the aforementioned monotonicity of t → a(t): in this case, traits decrease with the same rate regardless of their initial value. Then, since the transport equation satisfied by the distribution f (t, a) rewrites as

Derivation of the model
In order to account for random fluctuations of the distribution of the variable a, we consider that it is governed by a stochastic process that we write as a t . We will assume that, in the absence of epidemics, its distribution results from the following stochastic differential equation or Itô drift diffusion process: where W t is a standard Wiener process, σ (t, a) > 0, and μ b (t, a) represents background volatility and drift coefficients. This equation corresponds to a Lagrangian point of view and is the stochastic analogue of the deterministic part (7). We impose reflecting boundary conditions at the endpoints of the interval A, i.e. at a ∈ {0, 1} when A = (0, 1) or at a = 0 when A = (0, ∞).
Let us now consider a population stratified by the variable a subject to the combined effect of two effects: the stochastic differential equation (SDE) (9) on one hand and depletion caused by the epidemics driven by transport equation (4) (or equivalently) by the ordinary differential equation (ODE) (7) on the other hand. We assume that the superposition principle applies to these two effects. Then, under this joint effect, the evolution of a includes a drift or transport term with coefficient μ ep + μ b , whereas the stochastic part remains the same. Thus, the random process is given by the following SDE: where: • μ b is an additional drift and the diffusion σ ≥ 0 both possibly depending on t and a. The two variables μ b and σ need to comply with boundary conditions: • W t is a Wiener process with reflecting boundary conditions.
Then, the dynamics of the associated probability density f (t, a) is given by the following Fokker-Planck equation (or forward Kolmogorov equation): For the sake of completeness, we sketch the proof. This will allow us to derive the boundary condition. Let a → φ(a) be a smooth function over A. Then Itô's formula leads to: Then, taking the time derivative of the expectation of φ(t, a) yields on the one hand and on the other hand from (10) Hence, one has for all smooth enough φ so that integrating by parts the right hand side of the above equation leads to Taking test functions φ compactly supported in A, one therefore gets the dynamics of the probability density Given the above assumptions on μ and σ on the boundary ∂ A, the following boundary conditions lead to a consistent weak formulation In other words, one has Using (2), which is unchanged (the effect of random variations reshuffle a in a conservative way), we deduce that On the other hand, the dynamics of the total infected is not modified by random fluctuations of a: We conclude by the expression of the full PDE system of epidemiology with heterogeneous and variable behaviors: supplemented with boundary conditions where μ and σ are given functions satisfying the boundary conditions We recover in particular the model we studied in our earlier work (Berestycki et al. 2021) from the case μ b ≡ 0 and σ constant. Note that we mentioned there the more general form of the equation when σ = σ (a) depends on the risk trait a but we did not explain precisely how to derive it.

General SfIR epidemiologic model
To summarize, we rewrite the general system describing the dynamics of S, I and f variables The model parameters are assumed to satisfy In spite of its simple appearance, the evolution equation for f in (12) is rather involved. In fact, this equation is non-linear, not only because of the term f I (as in the quadratic term in the SIR model), but also owing to the term β f since β is linear in f . Note also that because of this term, the equation is non-local.

Convergence to equilibrium
There is a vast literature related to the convergence to equilibrium of solutions to the Fokker-Planck equations in the whole space R d , d ≥ 1 starting with the seminal results of Bakry and Émery (1985). We refer the reader to Markowich and Villani (1999) for a survey as well as to the book of Bakry et al. (2014). These works about convergence to equilibrium chiefly rely on a so-called "hyper-contractivity" condition of D. Bakry and M. Emery. We describe below this condition on the coefficients. In vague terms, it guarantees the existence of a "spectral gap" for the Fokker-Planck operator, which is shown to yield exponential convergence to equilibrium distribution in L 1 . A key quantity is the "relative entropy" E involving the probability distribution f and the equilibrium distribution that we also describe here. Then, one shows that E is bounded by its dissipation rate by using the logarithmic Sobolev inequality that we state below. In Sect. 3.1, we establish the convergence to equilibrium of the distribution f (t, ·) solution of the coupled SfIR system (12) under general assumptions on the coefficients and initial data. Among these, we assume a hyper-contractivity condition which is sufficient for a logarithmic Sobolev inequality to hold. Actually, we might have considered alternative hyper-contractivity conditions involving the Ricci curvature of the Riemannian structure underlying the Fokker-Planck operator as in Arnold et al. (2001) without epidemic coupling. We leave this for future work.
Then, in Sect. 3.2, we consider a more specific framework for the coefficients β, μ b and D in which the equilibrium distribution is of Gaussian type. It allows for much simpler proofs, making the argument more transparent. Furthermore, it allows one to derive stronger results such as the convergence of the average transmission rate β(t). Besides these advantages, this Gaussian framework has interesting mathematical properties such as the existence of exact self-similar solutions. Searching for such selfsimilar solutions leads to low complexity system of ordinary differential equations (ODEs) instead of the PDE system (12). We develop this point of view in Sect. 4 at first order and then extend it to higher order approximations in Sect. 6.1 using spectral decomposition of the distribution f .

The general case with time-independent coefficients
From now on, we assume that A = R + and we consider the SfIR system (12) in the case when the drift and diffusion coefficients, μ b and D = σ 2 /2, do not depend on time.
We consider here solutions S, f , I and R of (12) with the following assumptions on the model coefficients After the epidemic has peaked, one expects that the number of infected individuals I (t) will go to zero and that the probability density f (t, ·) will converge to the equilibrium distribution. This distribution denoted f ∞ is the unique stationary solution of System (12) in the absence of epidemic. It satisfies: Let us introduce a → α(a) defined up to an additive constant by We assume that a → exp(−α(a)) is integrable over R + and we set the constant in such a way that +∞ 0 exp(−α(a)) da = 1.
Then, we deduce from (14) that Associated to this equilibrium probability density, we define the measure ν by dν(a) = f ∞ (a) da and related L p functional spaces We now state some assumptions on the coefficients β, μ b and D that we shall use later on to study the convergence to equilibrium. The first one requires that the average initial and asymptotic transmission rates are finite: Next, we assume that the diffusion coefficient is non-degenerate, that is: In order to keep the average transmission rate bounded over time, we also require the following technical assumption: there exist positive constants C 1 and C 2 such that We will see that it is verified in the case we discuss in the next subsection.
Finally, the following Assumption (H 4) is the Bakry-Emery hyper-contractivity condition (Bakry and Émery 1985;Bakry et al. 2014;Bakry 1994;Arnold et al. 2001;Courtade and Fathi 2020) which allows one to control the convergence rate to equilibrium of solutions of the Fokker-Planck equations in the absence of epidemic. It simply writes in terms of a → α(a) as: Equivalently, we formulate (H 4) in terms of the D and μ b parameters as follows for all a ∈ R + : A classical result stated in Bakry et al. (2014), Courtade and Fathi (2020) is the following Sobolev type estimate: As a consequence, for the probability density distributions f over R + , we obtain the following Logarithmic Sobolev inequality: This inequality is easily derived from (15) by extending f /2 and f ∞ /2 as even functions on the whole real line, and considering g = √ f / f ∞ . It holds true as long as g ∈ L 2 (R + ; ν).
The left hand side of (16) is called the relative entropy 1 . This inequality allows one to estimate this entropy by what turns out to be its dissipation rate through the dynamics of Fokker-Planck equations in the right hand side.
On the one hand, in the classical Fokker-Planck equation case, that is, in the absence of coupling with epidemic model, this Logarithmic Sobolev inequality leads to a convergence of f (t, ·) to f ∞ in L 1 (R + ) norm at speed exp(−λ 1 t) (see Bakry and Émery 1985;Bakry 1994;Bakry et al. 2014). On the other hand, in the classical SIR model, in the absence of social diffusion, the quantities S(t), I (t) also converge exponentially to their limits as t → ∞.
Here, the novelty lies in the coupling of the Fokker-Planck equation with the epidemiological system which is not covered by previous results. Still, we are able to prove the following result.
Let us explain why (17) leads to the convergence in L 1 (R + ) of f (t, ·) to f ∞ as t goes to +∞. As we shall see in the first step of the proof of Theorem 2, the convergence of S(t) (resp. I (t)) to S ∞ ∈ [0, S(0)) (resp. 0) follows from straightforward monotonicity arguments. Then, given T > 0 and splitting the integral over (0, t) in the right hand side of (17) We also observe that Estimate (17) means that the rate of convergence of the probability density f to equilibrium is determined by the smallest decay between that of I (t) and the exponential decay associated with the convergence rate to equilibrium solutions of the Fokker-Planck equations. As a matter of fact, using Fatou's lemma, the convergence of f (t, ·) to f ∞ in L 1 (R + ) leads to: Therefore, we deduce from the convergence of which requires β ∞ S ∞ < γ , then for all δ ∈ (0,δ), there is some C δ > 0 such that Then, L 1 (R + ) convergence to equilibrium occurs at exponential speed The case when β ∞ S ∞ = γ , however, is more intricate regarding the convergence speed of I (t) to 0. Still, Theorem 2 holds even if I (t) were to converge more slowly to zero than exponential decreasing functions.
We now turn to the proof of Theorem 2.
Proof In order to prove the large time convergence of solutions (S, I , f ) of (12) to some equilibrium (S ∞ , 0, f ∞ ), we need to adapt the Lyapunov function introduced in the classical works in the absence of epidemic (Frank 2001). The so-called relative entropy plays the role of such a Lyapunov function. The first step consists in proving that (S(t), I (t)) converge. The second one is to prove that the average transmission rate t → β(t) is bounded. The third step is to derive relative entropy estimates in terms of its dissipation rate and the rate of infected individuals I (t)/N . Then logarithmic Sobolev inequality (16) and Csiszár-Kullback-Pinsker inequality will allow us to conclude.

Step 1: Convergence of (S(t), I(t))
From the evolution equations of S and I , we deduce that S(t) ≥ 0 and I (t) ≥ 0 for all t ≥ 0. If follows that S(t) is non-increasing with time t so that it converges to some S ∞ ∈ [0, S(0)). Introducing the recovered population which is bounded from above by N and non-decreasing, we see that R(t) converges to some R ∞ . We conclude that Step 2: Bounds on t →ˇ(t) Recalling the epidemiological model, it is natural to expect that the average transmission rate β(t) should remain bounded as time goes to +∞. Integrating the evolution Then, one has The above computations are formal. Their rigorous justification requires sufficient regularity and decay at infinity of solutions f . These aspects are proved in Appendix C.
Step 3: Relative entropy estimate Next, we observe that the partial differential equation satisfied for f can be expressed in the following way involving f ∞ : Using the fact that ∂ a ψ(t, 0) = 0 and defining the non-negative function x → ξ(x) for positive x as ξ(x) = x log x − x + 1, integration by parts yields Here we have used Assumption (H 1). Again, we made formal computations requiring sufficient regularity and decay of f that are proved in Appendix C. We then infer from (3.1) that Therefore, we infer that Hence, for some positive constant C 3 , using the fact that R ∞ = γ +∞ 0 Going back to (3.1), we deduce that there exists a positive constant C * such that Step 4: Logarithmic Sobolev inequality In the absence of epidemic, the following quantity is usually viewed as the relative entropy dissipation rate In order to estimate it, we make use of the Logarithmic Sobolev inequality stated in (16) and the lower bound on D(a) from Assumption (H 2). Thus, we get where η = 2D * λ 1 . Step

5: Conclusion
In order to conclude, we rely on the Csiszár-Kullback-Pinsker inequality. It allows us to estimate the squared L 1 distance of the probability distribution f (t, ·) to the equilibrium f ∞ in terms of the relative entropy (Csiszár 1967;Kullback 1967;Pinsker 1964). R.
Theorem 3 (Csiszár-Kullback-Pinsker) Let f and g be two non-negative real functions in L 1 (R + ) with f 1 = g 1 = 1. Then, the following inequality holds: For the reader's convenience and because most of the related references are stated in R and not in the half space R + , we recall a simple proof due to J.A. Canizo (https:// canizo.org/page/28) in Appendix B. Using the above estimate (21), we conclude that so that (20) allows us to complete the proof of Theorem 2 up to a change of C * by a factor 2.

Additional results and comments
Given the convergence in L 1 (R + ) of the distribution f (t, ·) to f ∞ , it is natural to expect that the average transmission rate β(t) will converge to β ∞ . From the L 1 convergence property, Fatou's Lemma yields : However, additional assumptions seem necessary to infer the convergence of β(t) to β ∞ . For instance, assuming C 1 ≤ C 2 β ∞ yields this convergence where C 1 , C 2 are the constants in condition (H 3). Indeed, in that case, we infer from (18) that and therefore lim t→+∞ β(t) = β ∞ . Such an apparently ad hoc assumption nevertheless applies to some interesting situations described in Sect. 3.2.

Open problems
The analysis of long term behavior for the system SfIR we have provided here opens the way to many mathematical questions.
1) It is of interest to relax the (H 2) condition and address the case of degenerate diffusion functions a → D(a). 2) Condition (H 4) to estimate the relative entropy in terms of its dissipation rate is not optimal. Following Arnold et al. (2001) in the absence of epidemiological coupling, one may replace (H 4) by the following assumption (H 5) leading to a more accurate convergence to equilibrium associated with the spectral gap of the uncoupled Fokker-Planck dynamical system: there exists λ 2 > 0 such that Note that in this assumption we do not require that D be non-degenerate. We expect to be able to derive the convergence Theorem 2 under this condition.

The Gaussian case
The previous subsection is rather technical and involves some intricate assumptions in order to prove the large time convergence of the probability distribution f (t, ·) to equilibrium. In order to make the arguments of Theorem 2 more transparent, we provide a separate comprehensive proof in the Gaussian case. That is, we consider henceforth the following framework. where β 0 ≥ 0 and β 1 > 0 are two constants. (iii) The initial probability distribution f 0 has a finite second moment, that is: In the absence of epidemic, Assumption (i) means that the trait a satisfies the Ornstein-Uhlenbeck Stochastic Differential Equation (SDE) with reflecting boundary condition at a = 0. Denoting again D = σ 2 /2, the corresponding Fokker-Planck equation then writes: with initial condition f 0 ≥ 0 such that In the presence of epidemic propagation, (22) is replaced by Then, using (6), the coupled model (12) in terms of the variables S(t), I (t), f (t, a) has now μ b (t, a) = −μ 0 a, D(a) = D and β(a) = β 0 + β 1 a 2 . In this framework, the stationary (i.e. time independent) distribution f ∞ of (12) is given by the Gaussian probability distribution truncated over R + f ∞ (a) = exp(−α(a)), where α(a) = μ 0 a 2 2D Since α (a) = μ 0 /D the logarithmic Sobolev inequality (16) writes in this case: The statement of Theorem 2 in this case writes with a further result about convergence of the average transmission rate.
Theorem 4 Under Assumptions (i), (ii) and (iii), t → I (t) converges to 0 and t → S(t) converges to S ∞ as time t goes to +∞ for some S ∞ such that 0 ≤ S ∞ < S(0). Moreover, as soon as f 0 log Moreover, the average transmission rate β(t) converges to as t goes to +∞.
Under Assumptions (i), (ii) and (iii), the requirements (H 1) (H 2) (H 3) and (H 4) are satisfied. Therefore, Theorem 4 is contained in Theorem 2 above. Nonetheless, we provide here a detailed proof of Theorem 4 since it is simpler. It follows the lines of the proof of Theorem 2 with some algebraic and technical simplifications that make the whole argument more transparent. It also yields a stronger results regarding the convergence of the average transmission rate.

Proof
Step 1: Convergence of (S(t), I (t)) This is unchanged with respect to the previous proof.
Step 2: Dynamics of t → β(t) The evolution equation of f multiplied by β(a) yields after integration over a variable: Then, introducing β ∞ = β 0 +Dβ 1 /μ 0 , using the results of Appendix C and integration by parts lead to It follows that so that β(t) is bounded and lim sup t→+∞ β(t) ≤ β ∞ .
Step 3: Entropy estimate As before, we write the evolution equation of f in terms of so that multiplying the above equation by ψ leads to Integrating over R + , using the homogeneous Neumann boundary conditions on ψ, and introducing the non-negative function ξ defined for positive x by ξ( The first term of the right hand side vanishes: for a = 0 because of the homogeneous Neumann boundary condition on ψ(t, ·) = f (t, ·)/ f ∞ , and for a = +∞ because f (t, ·) decays like f ∞ as proved in Appendix C. We therefore end up with Then, denoting the relative entropy we deduce that Finally, using the logarithmic Sobolev inequality (23), we obtain Gronwall type lemma then leads to for some positive constant C * . Estimate (4) then follows from Csiszár-Kullback-Pinsker's inequality (6) up to a change of C * by a factor 2.
The simpler assumptions above also turn out to exhibit interesting properties regarding self-similar solutions. Such a viewpoint is developed in Sects. 4 and 6.

The parsimonious model in the case of a Gaussian distribution
In order to obtain a parsimonious model, i.e. with as few parameters as possible, we look for self-similar solutions of (11). Dimarco et al. (2021) introduced a similar approach combining behavioral heterogeneity with social diffusion in the framework of kinetic theory and using gamma-type distributions of the susceptible population. Heterogeneity of susceptibles is described by a trait b ∈ R + with transmission rate expressed as power laws of b.
In this section, we focus on Gaussian distribution profiles truncated on A = R + , which as explained later is connected to the Dimarco et al. (2021) modeling approach.

Assumptions
In order to compute the reduced complexity model arising from self-similar solutions, we make the following assumptions: • Truncated Gaussian distribution The initial distribution f 0 (a) = S 0 (a)/S 0 of the susceptible population is described in terms of the continuous risk variable a ∈ R + f 0 (a) = 1 λ 0 φ a λ 0 where λ 0 > 0 and φ(a) = 2 π exp − a 2 2 .
The approach in Dimarco et al. (2021) consists in considering Maxwellians in the kinetic theory framework expressed as gamma type distributions. The two approaches are connected through a change of variables in the probability measures: introducing trait b as b = a 2 /2, one has which is nothing but a gamma distribution with parameter 1/2. Let us also observe that the expression as truncated Gaussian distributions allows to derive more general models representing solutions of the original PDE model (11) in a more accurate way. The basic idea is to expand solutions (t, a) → S(t, a) along even eigenfunctions of the underlying linear operator which turns out to coincide with the Schrödinger operator associated with harmonic oscillator potential. We provide a detailed derivation of the corresponding high order model in Appendix B, Sect. 6, together with numerical simulations illustrating the convergence of high order model to solutions of the full partial differential equation system.
• Constant diffusion coefficient with respect to a, possibly time dependent σ 2 2 = D(t) for some positive function t → D(t).
Note that this expression differs from Dimarco et al. (2021), which does not include the constant coefficient β 0 .
Note that one may replace μ 0 (t) by any function of time only, related or not to epidemic variables (it may be used for instance to account for the influence of public health policy).
• The susceptible population S(t, a) is assumed to be expressed as a self-similar profile for some functions of time t → S(t) and t → λ(t) such that S(0) = S 0 and λ(0) = λ 0 .

Derivation
We deduce from the first equation of (11) that We therefore require that the two time dependent functions on the left and right hand side of (27) are equal to zero: hence substituting (27) into (27), one gets on the one hanḋ On the other hand, the second equation of (11) leads to As a consequence, the derived model writes as follows with initial conditions S(0) = S 0 , λ(0) = λ 0 , I (0) = I 0 and R(0) = 0. Let us observe that the average of the risk parameter a is linked to the λ parameter: (t, a)) + ∂ 2 ∂a 2 (D f (t, a)).

Dynamics of the average transmission rateĽ et us recall the equation governing the probability density function f (t, a) = S(t, a)/S(t) of susceptible individuals
Then, we introduce the notation Then, inserting the self similar profiles into (27), f (t, a) = λ(t) −1 φ(a/λ(t)), one gets where we used the fact that A 2 φ = φ + φ and A 4 φ = (A 2 φ + 3φ ) + 3φ. The variance Var(β)(t) can then be expressed as It means that the dynamics of t → λ(t) is directly linked to the average value of the β parameter. Therefore, the equation on λ can be replaced by the dynamics of the average transmission rate β in System (27) dβ with initial conditions We have thus reduced the PDE system (11) to an ODE system with four unknowns. It is worth emphasizing that this ODE system, albeit simple, retains the memory of the PDE system, through the diffusion D and the drift μ 0 .
This system is a simple extension of the SIR model involving a parameter for social diffusion D and an additional equation for the dynamics of the average transmission rate β(t). This last effect is indeed a major characteristic of the evolution of the epidemics. Indeed, from the second equation in System (27), we see that, on the one hand, β(t) has a tendency to decrease under the effect of the epidemics sculpting the distribution of a by removing relatively more high a values, while, on the other hand, the social diffusion parameter tends to offset this tendency.

A rich variety of dynamical behaviors
We now show that this reduced form still yields a wealth of various dynamical behaviors. In fact, we recover nearly all the dynamical features we reported in our earlier work (Berestycki et al. 2021) for the full PDE model (11). All the figures we show here are obtained from numerical simulations of System (27) over a period of 18 months with realistic values of epidemic parameters, initialized with a very small sub-population of infected individuals in an otherwise susceptible population.
In Fig. 1, we consider the case in which R = 4 given an average infectious period of 10 days (full parameters included in the caption). The initial value of f is set to the equilibrium value expected in the absence of an epidemic. We then evaluate dynamics given an initial infected population fraction of 10 −6 . We observe an exponential growth, followed by a decay and then a smaller shoulder pattern. This is a representative sequence observed in epidemic data. Note that during the exponential growth phase, the average susceptibility remains close to the equilibrium level. However, when a sufficiently large number of individuals are infected, then the average susceptibility drops, leading to a decrease in susceptibility which then relaxes over time, enabling the re-emergence of a longer term shoulder. Over time, the infections decay and the susceptibility relaxes slowly towards β * (over a multi-year period given  Fig. 1 Epidemic dynamics leading to a single peak followed by a shoulder. Dynamics shown are that of infected fraction, I (t) (black, solid) in logarithmic scale, and Average susceptibility, β(t) (red, dashed). Parameters are β(0) = 0.4 days −1 , β 0 = 0 days −1 , μ 0 = 5 × 10 −3 day −1 , γ = 0.1 days −1 , 2Dβ 1 = 0.004 days −2 , R 0 = 4, I (0) = 10 −6 Fig. 2 Epidemic dynamics leading to multiple peaks. Dynamics shown are that of infected fraction, I (t) (black, solid) in logarithmic scale, and Average susceptibility, β(t) (red, dashed). Parameters are: β(0) = 0.8 day −1 , β 0 = 0.0 day −1 , I (0) = 1E − 6, 2Dβ 1 = 0.008 day −2 , μ 0 = 5E − 3 day −1 , γ = 0.2 day −1 this parameter set). We emphasize that, though much simplified with respect to the full PDE model, this system of ODEs can generate complex features.
As we already pointed out in our previous article (Berestycki et al. 2021) rebounds appear as part of the intrinsic dynamics of the epidemics even without variants. This is illustrated in Fig. 2 with larger amplitudes of oscillations. In this case however, as should be expected, further rebounds reach lower maximal levels than the first peak because of the dissipative nature of the underlying PDE model. As before, the β(t) decreases as infections increase and then revert back towards the equilibrium, enabling the second oscillation. The initial decrease in susceptibility reflects how the ODE retains the sculpting feature of the PDE while the rebound is a result of the drift back to the mean of the unique equilibrium distribution in f . (27) also exhibits the typical SIR behavior of an exponential growth followed by a (nearly) exponential decay. We perform more systematic comparison of PDE and ODE based dynamics in Sect. 6.1.

Properties of the ODE system
This section is devoted to deriving several mathematical properties of the ODE system (27). Our purpose here is to show that this reduced system displays natural features of the heterogeneous system. We mainly discuss here the large time behavior of (S(t), β(t), I (t), R(t)) in terms of the parameters μ 0 , D and β(0). The results and their dependence on the parameters are summarized in Table 1 at the end of the subsection.
Throughout this section, without loss of generality, we normalize the population in such a way that N = 1.

Consistency with SIR model
We first observe that (27) reduces to the standard SIR model in the absence of heterogeneity i.e. β 1 = 0, when β(0) = β 0 . Clearly, in this case there holds β(t) = β 0 for all t ≥ 0 and we get the solution of the corresponding SIR system.
Proof We argue by contradiction. In the case D > 0, suppose there is a first time t * where β(t * ) = β 0 . Then, from the equation we get β (t * ) > 0 which is a contradiction. In the case D = 0, the equation for x = β − β 0 is of the formẋ(t) = g(t, x)x(t), so that x(t) > 0 when x(0) > 0.

Asymptotic limit of (S, I, R)
As in the conventional SIR model, the solution converges to a constant state in the large time limit.
The case 0 > 0 Proposition 3 In the case when μ 0 > 0, the average transmission coefficient β(t) remains bounded and converges to β ∞ = β 0 + Dβ 1 /μ 0 as t goes to +∞. Moreover, the limits satisfy Proof First one has and thus remains bounded: there exists M > 0 such that β(t) ≤ M for all t ∈ R + . Then, taking β ∞ as defined in the above proposition, one also has Given ε > 0, for T large enough, and t > T one has Hence, taking T large enough and, say, t > 2T , we see that the right hand side of (29) is less than ε. This proves the convergence of β(t) to β ∞ as t → +∞. Finally, in view of the equation satisfied by I , it is straightforward to see that inequality (28) holds.
Notice that β(t) converges to the same limit β ∞ as in the case of solutions of the full PDE system (12) in the Gaussian case (compare the above Theorem 4).
It is interesting to observe that if initially I (0) = I 0 > 0, and β(0) = β ∞ , then t → β(t) is decreasing in the early stages of the epidemic. Indeed, one has Thus, if initially the epidemic depletes more the fraction of the population with large a, eventually, for large time, this trend will be reversed and β(t) will increase to converge to its equilibrium value β ∞ . We also established this property in the full PDE model (compare Theorem 4).
Let us observe that the case μ 0 = 0 is singular, since in that case all susceptibles become infected at the end of the epidemic. This property is related to the fact that Model (27) derives from the underlying partial differential equation model (11) where the risk trait a covers the unbounded domain R + .
Proof First, we observe that τ = 1/(β − β 0 ) satisfies Using the fact that τ ≥ 0, the integral on the left hand side converges since I itself is integrable over R + . As a consequence, τ (t) converges to some finite value τ ∞ ≥ 0 as t goes to +∞. The convergence of the integral of τ 2 requires that τ ∞ = 0, i.e. β(t) tends to +∞ as t goes to +∞. Finally, assuming that S ∞ > 0 the dynamics of I would be exponentially growing since β(t)S(t) − γ would tend to +∞, which does not make sense. We conclude that S ∞ = 0, so that R ∞ = 1.

The case D = 0
We derive here comparisons with the classical SIR model. To this end, we introduce the notation S S I R,α (t) for the solution of the SIR model with transmission rate β = α (constant) and a fixed parameter γ . We write S S I R,α ∞ for the final size of the susceptible population.
In the case when μ 0 > 0 and D = 0, the following properties hold Proposition 5 When D = 0, the average transmission rate t → β(t) is non-increasing and converges to β 0 as t goes to +∞. Moreover, if β(0)S 0 > γ and I 0 > 0, then t → I (t) has a unique maximum as in the homogeneous SIR model. Finally, the limit susceptible populations satisfies Proof As in the case of the homogeneous SIR model, the dynamics of susceptibles S in terms of R instead of time leads to the expression Here R ∞ is the final size of the R-population. It is given by the relation: We introduce the notations: From the equation for β in System (27), we see that β(t) is decreasing (aside from the obvious case β(t) ≡ β 0 ). Thus, in view of Proposition 1, we know that for all t ≥ 0, one has β(0) ≥ β(t) ≥ β 0 . Therefore, we infer that: From the classical final size relation in the SIR model, we deduce that

Expression in terms of R in the general case D ≥ 0
As in the case of SIR model, we observe that we can express all the variables S, β, I in terms of R: From the first equation of (30), we infer so that the infected write as The second equation then leads to Without social diffusion D = 0 and without drift 0 = 0 In the absence of social diffusion D = 0 and drift μ 0 = 0, the system describes a heterogeneous population without random fluctuations of the risk factor. It turns out that in this case, we can solve explicitly System (27) in terms of R to get: i.e.
The infected then write in terms of R as When β(0) = β 0 , one recovers the classical formulas of the conventional SIR model. Furthermore, at the initial stage, when R is close to R 0 , we get This formula generalizes the classical criterion R 0 > 1 for epidemic expansion. Here, it is expressed in terms of the parameter R 0 = β(0)S 0 /γ .

Entropy estimates when 0 > 0 and D > 0
Let us translate Estimate (24) of the relative entropy (25) in the framework of selfsimilar solutions as developed in Sect. 4. Denoting we infer from self similar expression (26) of f (t, ·) that whereas its dissipation rate rewrites as the simple equation Then, Estimate (24) rewrites in terms of θ as We observe that the Logarithmic Sobolev inequality is then associated to the property that for all Hence, we obtain the exponential convergence of the relative entropy which provides another proof of convergence of λ(t) to λ ∞ (hence proving the convergence of β(t) to β ∞ ). We summarize the previous results in the following synthetic Table 1.
The new systems we have introduced here beg for further study. These concern both the PDE system (12) and the ODE system (27). They lead to several open problems among which we mention the following three.
• What is the dependence of S ∞ in the PDE system (12) with respect to model parameters μ 0 , D, β 1 etc.? • Derive the asymptotics for the PDE model (12) when μ 0 → +∞. We conjecture that the probability distribution f (t, a) converges to a Dirac distribution δ a=0 , and β(t, a) → β 0 for any t > 0. Therefore, in the limit, we should get the SIR model with β 0 as the transmission rate. The same question is also relevant for the ODE model. • Derive the asymptotics when D → ∞ in the ODE and PDE modeling frameworks.
Here we conjecture that this situation should be analogous to the case μ 0 = 0 and that β(t) → ∞, I (t) → 0, S(t) → 0 for any fixed t > 0.

Oscillations: heuristic arguments
In the unfolding of epidemics, when there is a plateau, one often observes oscillatory behavior of the level of infected. Therefore, it is useful to check the ability of a model to reproduce this stylized fact. We provide here heuristic evidence that oscillations may indeed occur in the simplified model (27). We only consider the case μ 0 = 0, and β 0 = 0. As for other epidemic features, we find it more convenient to work in logarithmic scale, that is, with the variable log I rather than I . We seek some constant value I p that we will choose adequately later on, such that there is a plateau at the level I I p . We set X = log(I /I p ). Again, we assume that N = 1 for simplicity. Introducing η = 2Dβ 1 , the system (27) takes the form: Hence, computing the second derivative of X with respect to R we get Denoting ρ = β S/γ − 1, we reformulate the above ODE as follows In a plateau, one expects the right hand side of this equation to be very small. It is then natural to introduce the value of I that makes the right hand side vanish. We denote it byĨ p :Ĩ Note thatĨ p varies in time like S. Thus, Eq. (32) rewrites as The idea now is to seek a constant plateau value I p close toĨ p for which we can find oscillations around it. To this end, since I = I p exp(X ), it is convenient to reformulate Eq. (33) in the following manner: This ODE is of the form We then consider ranges of R such thatĨ p ∈ (0, 1) which requires that We interpret this relation as a necessary condition for a plateau to exist. Assume moreover that |ρ| 1 and I p is such that Then, the complex eigenvalues of M are purely imaginary numbers which means that I oscillates around I p in logarithmic scale. Thus we find oscillations in this plateau regime.
For the above heuristic arguments to make sense, we surmise that in a plateau regime, S varies more slowly than the oscillations of X . Further mathematical developments need to be achieved to rigorously justify this reasoning.

Model derivation
An intermediate complexity model, which covers the case of the aforementioned Gaussian based model, can be derived using mathematical properties of the eigenvalues and eigenfunctions of the linear operator associated with Model (12). Incidentally, such operator coincides with the Schrödinger operator associated with the harmonic oscillator potential: The eigenfunctions write in terms of the so called Hermite polynomials:
Because we restrict to A ∈ (0, +∞), we only consider the case of even integers n = 2k, k ∈ N. As a matter of fact, ψ 2k (resp. ψ 2k+1 ) are even functions (resp. odd functions): in order to comply with homogeneous Neumann boundary conditions on A = 0, n needs to be even. Let us also observe that Then, in order to represent solutions of (11) we decompose S(t, a) as a finite sum of (ψ 2k ) k∈N ) eigenfunctions: .
We provide detailed computations below (where A = a/λ) Therefore, one has Identifying for each k ∈ {1 . . . K } the coefficients in front of A 2 ψ 2k (A) then giveṡ On the other hand, the dynamics of the K coefficients is given bẏ for k = 0, . . . , K − 1 and for k = K Hence for k = 0, . . . , K − 1 and for k = KṠ In other words, one has The dynamics of infected follows from the computation Note that contrary to the case K = 0, it is more convenient to keep λ as a variable rather than β. To summarize, denoting where β is defined by (36) and the dynamics of (S k ) k=0...K is given by (35). The initial conditions read as: In such a way that

We claim that
Theorem 5 Let (S k ) k=0...K , I and R be solution of (37), (36), (35) with initial conditions (38). Assume that μ 0 > 0 and D > 0. Then, as time t goes to +∞, Proof Since most of the arguments are similar to the case K = 1, we only sketch the proof. First the convergence of S, I and R to limit values S ∞ , 0 and R ∞ follows from the same arguments. Next, the convergence of λ(t) to λ ∞ is also identical to the case K = 1. Then, the convergence of S K (t) to 0 follows from (34). The case k = 0 . . . K − 1 follows from the following lemma: for some c(t) > 0 and b(t) such that (c(t), b(t)) → (c ∞ , b ∞ ) with c ∞ > 0. Then x(t) → 0 as t goes to +∞.
Sketch of proof: replacing x by x − b ∞ /c ∞ , one may assume that b ∞ = 0. Then, The first term in the above right hand side converges to zero because of asymptotic properties of c(t), and the second one can be estimated for instance by splitting the integral over (0, T ) and (T , t) for t ≥ T > 0. Let us finally observe that the limit transmission rate satisfies which is the same as the average limit transmission rate β ∞ associated with solutions of the partial differential equation model.

Numerical simulations
We make comparisons between solutions of the full partial differential equations model and solutions obtained from Ordinary Differential Equations of order K as derived in the previous Sect. 6.1. The initial profile of susceptible individuals in terms of trait variable a ∈ R + is taken as follows: and γ = 0.25 days −1 , i.e. R = 2.89. The simulation is performed over time interval [0,200] (in days) for both PDE or ODE approaches. Space discretization indicates that 200 points in space are enough to achieve reasonable convergence of solutions of the full PDE system. Simulations of the ODE system associated with K = 0, 1, 2, 3, 4, 7 are illustrated in Fig. 3 above. 2 We observe a somehow monotonous convergence behavior when K becomes large enough, in particular for the maximum level of I (t) and time for such maximum and the period of oscillations. Further investigations will be needed to assess the epidemiological relevance of such higher order approximation models, which allows us to consider general initial probability densities f 0 beyond truncated Gaussian distributions.

Main results
We have derived here a system describing the evolution of epidemics taking into account the behavioral heterogeneity and variability of the population of susceptibles. Individual behaviors are characterized by a compound risk variable a ∈ R + associated with the susceptibility with respect to the epidemic. Larger values of a correspond to increasing exposure to contagion, whereas low values are associated with little exposed behaviors. Thus, the population of susceptibles is structured by this variable a: S = S(t, a), and the classical transmission coefficient β = β(a) is an increasing function of a. Rather than working with S(t, a), we found it more revealing to write the equations in terms of the total population S(t) = ∞ 0 S(t, a)da and the probability distribution of the trait a in the population: f (t, a) = S(t, a)/S(t).
We first expressed the effect of the epidemics on the distribution f (t, a) as a transport equation. We also assumed that this distribution is subject to random fluctuations according to a drift-diffusion stochastic process. Combining these two effects, we derived a Fokker-Planck equation governing the evolution of the distribution f . This equation is coupled with the evolution equations for S(t) and I (t), the total numbers of infected. We then derived a system, that we call the SfIR system, standing for the variables S(t), f (t, a), I (t) and the population of recovered R(t). This new system is non-linear and non-local and combines a partial differential equation of Fokker-Planck type and ordinary differential equations for S and I . The SfIR model is a natural extension of the classical SIR compartmental model to populations when considering heterogeneity and variability of individual behaviors.
Numerical simulations of this system showed that it exhibits some of the trademark dynamics of the unfolding of epidemics such as the current COVID-19 epidemic. Namely, we observed plateaus, shoulders, oscillations and rebounds. These aspects do not emerge in the classical SIR model and can be viewed as a consequence of including individual behaviors, heterogeneity, and variability.
We then studied the question of large time behavior for this system. There is a long-standing literature devoted to this question in the case of the scalar Fokker-Planck equation by itself which turns out to be rather delicate. This new context of a coupled non-linear and non-local system further complicates matters and requires new developments on this question. Under some assumptions on the model coefficients, i.e. transmission rate, drift and diffusion, we were able to prove the convergence of the probability distribution f (t, ·) to a unique equilibrium distribution f ∞ . This result extends the classical convergence to equilibrium results for solutions to Fokker-Planck equations in the absence of epidemic coupling, under classical Bakry-Emery (Bakry and Émery 1985) hyper-contractivity assumptions on drift and diffusion coefficients.
We paid particular attention to the case of constant diffusion D, linear dependence of the drift with respect to a and affine dependence of the transmission rate with respect to a 2 . We showed that this framework is associated with an Ornstein-Uhlenbeck process for the fluctuations of a. In this case, we further derived a model of reduced complexity. This ODE system was obtained by looking for self-similar solutions of the equation governing the distribution f involving Gaussian probability distributions truncated over R + . We showed that this ODE system even though quite simple still captures epidemic patterns such as rebounds, shoulders, and plateaus.
We also obtained higher order ODE models approximating the solutions of the PDE system through spectral type methods involving the eigenfunctions of some associated operator akin to the quantum harmonic oscillator. Furthermore, we illustrated the convergence to the PDE solution in the presence of oscillations as the model order increases.
Finally, we stated and proved some mathematical properties of the reduced complexity ODE system such as large time behavior depending on various assumptions on the model coefficients.

Perspectives
The derivation of the SfIR model, and our first analytical results naturally open many perspectives, both in the epidemiological modeling and in the mathematical analysis of these systems. The present paper is a first stage of a research program on epidemiological modeling in this vein.
The new system we introduced here raises a number of open mathematical questions. To start with, the existence and uniqueness of the solution to the Cauchy problem associated with this system is still open. As already noted, this system combines the difficulties of being non-linear and non-local. The extension of the convergence to equilibrium result by relaxing the assumptions and allowing as general a framework as possible is a natural question. In particular, deriving regularity and integrability of solutions for a wide class of initial data is open. It would be interesting to fur-ther describe the total number of infected individuals after the epidemic has subsided and how this number depends on the various parameters. More generally, it would be desirable to better understand the qualitative properties of dependence of solutions and their final states with respect to the parameters. Further mathematical developments are needed to rigorously justify this reasoning and to explore the qualitative basis for and quantitative levels of oscillations.
From an epidemiological point of view, one is naturally led to investigate extensions of the approach developed here. First, our model here rested on the assumption of homogeneous mixing. It would be interesting to expand our approach to the more complex and more realistic framework of preferential mixing (Feng 2014). Other natural extensions of the SIR model take into account additional epidemiological phenomena, e.g., loss of immunity and subsequent reinfection (Lavine et al. 2021) and the inclusion of an explicit exposed period before contagion as in SEIR models or via the use of realistic generation interval distributions (Park et al. 2022).
Second, as observed in the current COVID-19 pandemic, the succession of variants plays a major role in the unfolding of epidemics. It is therefore natural to use the SfIR methodology in the presence of variants. Such an approach could be used to model competition phenomenon between variants characterized by different parameters regarding transmission rate β and average infection duration γ −1 . In all these frameworks, it is natural to seek models of reduced complexity.
Third, the recent COVID-19 pandemic also showed the importance of "awareness" and "fatigue" phenomena, which were modeled in Weitz et al. (2020a) via a top-down approach. In the spirit of that work, embedding state-and/or time-dependence in the PDE framework or in the ODE model parameters is likely to enrich the capacity of SfIR modeling approach to capture more complex global phenomena. Specifically, we note that in the SfIR PDE framework, awareness-driven changes in behavior can dynamically shift susceptibility distributions in a way that is different than contagiondriven sculpting-evaluating the combination of both sculpting and shifting could lead to productive avenues for both theory and model-data integration.
Finally, combining the SfIR modeling approach with spatial diffusion of infection is undoubtedly a challenging but very relevant issue to address in the future. One may contemplate combining the models we have introduced here for a single location to include spatial diffusion (Roques et al. 2020) or coupling between locations that may differ in their response policies (Kortessis et al. 2020). In doing so, recent measurement campaigns of SARS-CoV-2 in wastewater in the framework of surveillance networks throughout the world is a motivation to extend our modeling approach to Wastewater Based Epidemiology-as a means to link ongoing changes in local incidence to forecasting and mitigation.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

A Appendix: Parsimonious model with more general probability distribution profiles
Section 4 focused on Gaussian distribution profiles. We consider here more general profiles of susceptible individuals for which self similar solutions exist.

A.1 Assumptions
We make the following assumptions: • The initial distribution f 0 of susceptible population is described in terms of the continuous risk variable a ∈ R + where for some given real k > 0 where denotes the classical Gamma function (z) = +∞ 0 x z−1 exp(−x)dx.
• Diffusion coefficient depending on a and t σ 2 2 = D(t)a 2−k for some positive function t → D(t).
• The transmission rate function β is a power function of a β(a) = β 0 + β 1 a k .
Note again that μ 0 (t) may be any non-negative function of time only, and that (unless k = 2) an extra diffusion dependent drift is also needed. • The susceptible population S(t, a) is assumed to be expressed as a self-similar profile for some functions of time t → S(t) and t → λ(t) such that S(0) = S 0 and λ(0) = λ 0 .
Let us observe at this stage that φ (A) = −A k−1 φ(A).

A.2 Derivation
We deduce from the first equation of (11) that Observing that We therefore require that the two time dependent functions on the left and right hand side of (27) are equal to zero: hence substituting (27) into (27), one gets on the one hanḋ On the other hand, the second equation of (11) leads to As a consequence, the derived model writes as follows Then, we introduce the notation (t, a) da, so that one has Then, inserting the self similar profiles into (4.3), f (t, a) = λ(t) −1 φ(a/λ(t)), so that one gets The variance V ar(β)(t) can then be expressed as V ar(β)(t) = kβ 2 1 λ 2k (t) = k(β − β 0 ) 2 It means that the dynamics of t → λ(t) is directly linked to the average value of the β parameter. Therefore, the equation λ can be replaced by the dynamics of the average transmission rate β in System (27) dβ Therefore, (27) becomes with initial conditions I (0) = I 0 ∈ (0, N ), S(0) = N − I 0 , R(0) = 0 and β(0) = β 0 ≥ β 0 .

B Appendix: Csiszár-Kullback-Pinsker inequality
One of the key estimate in the proof of convergence to equilibrium of the distribution f in Theorems 2 and 4 is the estimate of the L 1 (R + ) distance of two probability distributions in terms of the associated relative entropy. This is known as Csiszár-Kullback-Pinsker's inequality: (where here and in the following C denotes a generic positive constant). Lastly, the equilibrium f ∞ satisfies: f ∞ (a) f ∞ (a) = α (a) ≤ C(1 + a) for all a ∈ R + .
In this framework, we have the following a priori estimate.
Then, by the parabolic maximum principle in R + , since v(0, a) ≤ M it follows that v(t, a) ≤ M on R + for all t > 0. Indeed, we use the fact that for some positive constant C f ∞ (a) f ∞ (a) ≤ C(1 + a) and β(a) ≤ C(1 + a 2 ).
Let us observe that it means that the probability distribution f (t, a) is bounded from above by a distribution proportional to f ∞ , i.e.
which leads to the justification of formal computations performed in Sect. 3 since S(t) decreases with t and remains positive. Note also that we are able to obtain H 1 type regularity properties on f or more precisely on v = S/ f ∞ Theorem 8 Assuming (H 2) and (H 4), and that the initial profile of susceptibles S 0 is such that Then, the solutions v of Eq. (50) satisfy v ∈ L ∞ (R + ; L 2 (R + ; ν)), ∂ a v ∈ L 2 (R + ; L 2 (R + ; ν)) and We leave open the derivation of such estimates on the behavior at infinity in the more general case of System (12).