Set-membership estimations for the evolution of infectious diseases in heterogeneous populations

The paper presents an approach for set-membership estimation of the state of a heterogeneous population in which an infectious disease is spreading. The population state may consist of susceptible, infected, recovered, etc. groups, where the individuals are heterogeneous with respect to traits, relevant to the particular disease. Set-membership estimations in this context are reasonable, since only vague information about the distribution of the population along the space of heterogeneity is available in practice. The presented approach comprises adapted versions of methods which are known in estimation and control theory, and involve solving parametrized families of optimization problems. Since the models of disease spreading in heterogeneous populations involve distributed systems (with non-local dynamics and endogenous boundary conditions), these problems are non-standard. The paper develops the needed theoretical instruments and a solution scheme. SI and SIR models of epidemic diseases are considered as case studies and the results reveal qualitative properties that may be of interest.


Introduction
The role of heterogeneity of a population for the evolution of infectious diseases is well recognized in the existing literature, see e.g. Diekmann et al. (1990Diekmann et al. ( , 2012, Coutinho et al. (1999). Various kinds of models have been developed to take into account heterogeneity with respect to immune system, contact rates and other traits, including cellular automata (Schneckenreither et al. 2006), random networks (Miller 2007;Volz 2008), distributed integro-differential systems (Novozhilov 2008(Novozhilov , 2012Diekmann et al. 1990;Veliov 2005), etc. For a more comprehensive bibliography see the recent paper (Hickson and Roberts 2014). Theoretical results about the influence of heterogeneity on the basic reproduction number in such models are available (e.g. Katriel 2012;Margheri et al. 2015), as well as results about intervention strategies, e.g. vaccination in heterogeneous populations (Rodrigues et al. 2009;Katriel 2012). A substantial limitation for utilization of most of these models is that they require detailed information about the distribution of the population along the numerical values of the traits, that is, about the h-state (heterogeneous state) of the individuals in the population (Diekmann et al. 2012). Such detailed information is usually not available. The available information is vague and even reliable statistical characteristics are often not known. One way to overcome this difficulty is to pass to aggregated models that require less information. This approach is stressed in Diekmann et al. (2012) and we mention the papers (Dushoff 1999;Karev 2005Karev , 2010Veliov 2005;Novozhilov 2008Novozhilov , 2012 developing aggregation techniques for certain special classes of heterogeneous models defined by integro-differential systems. In the present paper we employ an alternative approach, in which the distribution of the population among the h-states is uncertain, but set-membership information is available (possibly together with certain aggregated data). The set-membership information may be given in the form of lower and upper bounds for the number of susceptible, infected, recovered, etc. individuals at each h-state. The aggregated information is typically about the total number of susceptible, infected, etc. individuals at the initial time. This information is used to obtain set-membership estimations (shortly: set-estimates) for the evolution of the disease. The set estimations at a given time t contain all aggregated states (total number of susceptible, infected, etc. individuals) that are consistent with the available initial information and the model describing the dynamics of the population system. The set-estimation approach is well known and widely used (see e.g. Bertsekas 1995;Milanese and Vicino 1996;Kurzhanski and Varaiya 2011), but in the present epidemiological context there are important points that had to be developed.
The investigation is carried out for a rather general model of a heterogeneous multi-group population, which consists of a distributed first order differential system complemented with integral relations. This model covers heterogeneous versions of SI, SIR, and many other standard epidemiological models. At this level of generality we present our set-estimation approach. In the set-estimation theory for evolutionary systems one can distinguish two different groups of methods. In the first, set-estimations of Markovian type are sought, where the set-estimation at a given time t determines the future set-estimations (the minimal set-estimation has this property). The advantage is, that it is sometimes possible to obtain infinitesimal (even differential) equations for the evolution of Markovian set-estimations in a prescribed family of sets (polyhedrons, ellipsoids, etc., see Kurzhanski and Varaiya 2011). The drawback is, that such estimations are usually too "pessimistic" , that is, too large, compared with the minimal set-estimation. Our approach belongs to the second group: at each time the set-estimation is obtained independently of the previously obtained estimations. Technically, finding such estimations (even minimal ones) can be done by solving families of auxiliary dynamic optimization problems. In our case these optimization problems are non-standard, because they involve constraints in the form of first order distributed differential systems and integral relations. Therefore, the first main goal of the paper is to present a technique for solving such optimization problems.
The second goal of the paper is to show that the set-estimation technique may give useful information about the spread of infectious diseases under uncertainty of data (we focus on uncertainty of the h-state-distribution of the initial population). In many cases the population has certain dissipativity property that makes the setestimations not much expanding, even shrinking to a point or to a reasonably small set, when the time progresses. Thanks to this, one can perform various kinds of comparative analysis. For example, we investigate the effect of various scenarios of interventions (vaccination or prevention programs) applied prior to the outburst of the disease.
We mention that our previous work (Veliov and Widder 2015) allows to determine the exact asymptotics of the aggregated states of a class of heterogeneous SI-models, depending on the initial h-state-distribution of the population. This allows to obtain a set-estimation for the asymptotic state of the disease for this particular SI-model in an alternative way. The comparison with the results obtained by the general approach in the present paper, which turns out to be the same, serves as a verification test.
The plan of the paper is as follows. Section 2 explains the aim of this paper in terms of a simple SI model used later as benchmark. The general model, the assumptions, and the formulation of the set-estimation problem are given in Sect. 3. In Sect. 4 we present the set-membership technique and some technicalities needed to adapt it to the present framework. Section 5 is devoted to numerical analysis of certain SI and SIR heterogeneous models by the set-estimation techniques. Some conclusions and lines of future research are presented in Sect. 6. Some technical proofs are given in the Appendices 1 and 2.

A benchmark SI model
To present our main motivation, we introduce below a particular case of the problem we investigate in this paper, which involves a heterogeneous version of the known SI model in mathematical epidemiology. The whole population is divided into two groups-susceptible individuals and infected individuals. The individuals are heterogeneous, in the sense that a scalar ω ∈ Ω ⊂ R is associated with each individual, indicating specific traits relevant to the particular disease, e.g. the intensity of risky contacts, the state of the immune system, etc. The parameter ω is called heterogeneous state (shortly h-state) of the individual, see e.g. Coutinho et al. (1999), Diekmann et al. (1990) or textbooks such as Diekmann et al. (2012).
The following model is a particular case of the one in Veliov (2005): Here S(t, ω) and I (t, ω) represent the size of the susceptible and of the infected population with h-state ω at time t, respectively. The parameter κ is the net population growth rate of the susceptible population, γ is the net mortality rate of the infected population, σ (ω) is the susceptibility, meaning the probability that a risky interaction between a susceptible and an infected individual results in infection of the susceptible individual (it may incorporate also the immune status of the susceptible individual), and p(ω) and q(ω) denote the participation rate of susceptible/infected individuals of h-state ω in risky interactions (i.e. the contact rate). The aggregated state variables K (t) and J (t) represent the total amount of susceptible/infected individuals, weighted with their respective risky behaviour, while J (t)/(K (t) + J (t)) is the weighted prevalence of the disease at time t (see e.g Veliov 2005; Veliov and Widder 2015 for more detailed explanations). At the initial time t = 0, the distribution of the initial susceptible and infected sub-populations along the h-states, ω ∈ Ω, is given by the functions u 1 (ω) and u 2 (ω), respectively. In fact, the main quantities of practical interest are the total size of the susceptible and infected populations: (2) Solving system (1) is not problematic, provided that all data involved are known. However, in reality the information about the distribution of individuals along the heterogeneous space Ω is vague. That is, the functions u 1 and u 2 are not precisely known. A relatively reliable information about these functions is provided by the aggregated values since measurements of S(0) and I (0) are feasible. Statistical information for higher integral moments of u 1 and u 2 (in the form of equalities or inequalities) may also be available, and its incorporation in our subsequent considerations is a matter of technical work that we avoid for more transparency. Additional information about u 1 and u 2 may be given in terms of bounds: Any pair of measurable functions (u 1 , u 2 ) satisfying (3) and (4) (that is, consistent with the available information) will be viewed as possible (sometimes called admissible) realizations of the uncertainty for the h-distribution of the initial population. Due to the uncertainty of the initial data (u 1 , u 2 ), the issue of obtaining a setmembership estimation, E(t), of the aggregated state (S(t), I (t)) does naturally arise. This means that sets E(t), t ≥ 0, have to be determined, such that whatever the admissible initial functions (u 1 , u 2 ) are, where (S(t), I (t)) is the corresponding solution of system (1) enhanced with (2). The main goal of this paper is to present a computationally implementable approach for obtaining set-membership estimations as in (5). Such an approach is developed in the next section for a general system with a structure similar to (1), (2).

Formulation of the problem and preliminaries
Having in mind the set-membership estimation problem in the previous section, below we formulate a more general problem that covers heterogeneous versions of a variety of models in mathematical epidemiology and in other areas.
Let The dynamics is given by the equationṡ where f : D × R m × R n → R m and g : D × R m → R n are given functions and the upper "dot" means differentiation with respect to t, so thaṫ x(t, ω) := ∂ x(t, ω)/∂t. The initial data u : Ω → R n is uncertain, and the available information about it is given by the following constraints: The inclusion in (9) is understood component-wise: . . , ϕ m j ), j = 0, 1; the vector c ∈ R m and the functions ϕ 0 and ϕ 1 are given.
We consider every function from the set as an admissible (possible) realization of the uncertain function u. Before formulating the estimation problem in the spirit of the previous section, we give the necessary assumptions and clarify the meaning of solution of the above model. Assumptions: in ω, both are locally essentially bounded, differentiable in (x, y) with locally Lipschitz partial derivatives, uniformly with respect to (t, ω) ∈ D; (ii) the functions ϕ 0 , ϕ 1 : Ω → R m are continuous and satisfy the inequalities By definition, solution of (6)-(8) is any pair of measurable and bounded functions (x(·, ·), y(·)) on D and [0, T ] respectively, such that for a.e. ω ∈ Ω the equation holds on [0, T ] and (8) holds for a.e. t ∈ [0, T ].
Notice that according to Assumption (i) x(·, ω) is (uniformly) Lipschitz continuous for a.e. ω and y is continuous.
Existence and uniqueness of the solution (x [u], y[u]) on [0, T ] for every u ∈ U is assumed in (iii) above in order to make the exposition less technical. In fact, the existence can be proved by a fixed point argument similarly as in Veliov (2008, Theorem 1) under an additional growth condition, which is not necessary in the rest of this paper. The uniqueness is standard due to the Lipschitz continuity in Assumption (i). Denote That is, R(t) the set of all aggregated states y that result from some admissible realization of the uncertainty, u ∈ U. In this sense, R(t) is the exact (meaning minimal) set-membership estimation of the aggregated state at time t. In the next section we present a method of obtaining estimates Even more, the method allows to obtain outer approximations of arbitrary accuracy of the convex hull co R(t).
Sometimes not all components of y are of interest (being included just to complete the model). Therefore, for a given subspace L ⊂ R n we will obtain estimations of the projections of y(t) on L: where pr L is the projection operator on L.
In the epidemiological problems which serve as prototypes for the above problem (cf. Dushoff 1999;Hickson and Roberts 2014;Veliov 2005;Veliov and Widder 2015), the dimension m may equal 2 (in SI and SIS models), 3 (in SIR models), etc. The aggregated state y has usually a higher dimension than x. In the benchmark model (1), (2) considered in Sect. 2 we have m = 2 and n = 4: . However, estimating the pair (S(t), I (t)) is of primal interest, thus L := (0, 0, R, R).

The set-membership estimation
In this section we focus on the approximation of the exact set-membership estimation R(t). The procedure described in the first subsection is well known in control theory, while the second subsection is devoted to the main technical tool which is specific for the model presented in the previous section. The numerical scheme is briefly described in the third subsection.

The approach
Let us fix a time τ ∈ (0, T ] and a unit vector l ∈ R n . Consider the optimization problem sup subject to the constraints (6)-(10). (Here and below ·, · denotes the scalar product.) Let y l = y[u l ](τ ) be an ε-solution, in the sense that If we find y l i for a number of unit vectors {l 1 , . . . , l k } =: Λ, then we have that Thus, E Λ (τ ) is a set-membership estimation of y(τ ). It is an easy exercise to show that the Hausdorff distance H (co Y Λ (τ ), E Λ (τ )) decreases with ε and converges to zero if the set Λ is a δ-net on the unit sphere and δ and ε converge to zero. Thus, we can obtain inner and outer approximations of any accuracy to the convex hull of the exact set-membership estimation R(τ ).
If only a set-membership estimation on a subspace L ⊂ R n is needed (see (12)), then it is enough to take collections of unit vectors Λ belonging to the space L (which makes problems of high dimension tractable, provided that the dimension of L is low-1, 2 or 3).
The approach described above requires multiple solving of problem (13), (6)-(10). This is not an easy task, since we deal with a distributed system with non-local dynamics (due to the presence of the aggregated states y) and constraints on the variable u. We employ a gradient projection method in the space L ∞ (Ω) for the variable u ∈ U. This means that the objective function in (13) has to be maximized on the set U. Then a standard gradient projection method can be implemented-for more details see Sect. 4.3 below. However, there is an auxiliary problem that arises: to determine the gradient (meaning the Fréchet derivative) of J . This problem will be addressed in the next subsection.

The gradient in problem (13)
Let u ∈ U and let (x, y) be the corresponding solution of system (6) where τ ∈ (0, T ] is the number fixed in the previous subsection. We shall obtain a representation of the Fréchet derivative of the functional J in (14) Here the superscript means transposition, and the meaning of solution is similar to that of the initial-value problem (6)-(8).
Lemma 1 For every u ∈ U and corresponding solution (x, y) of system (6) The proof is similar to that of Proposition 1 in Veliov (2008). Therefore, we only sketch it. The solution of the linear equation (15) (with ν regarded as given) can be represented by the Cauchy formula, involving the fundamental matrix solution, which of course, depends on ω. After inserting the resulting expression in (17) and changing the order of integration, we obtain a Volterra integral equation of the second kind for ν, which has a unique solution. The uniform boundedness follows from the same property of (x, y) (see Assumption (iii)).

Proposition 1
The functional J : L ∞ (Ω) −→ R is Fréchet differentiable and its derivative at u has a representation in L ∞ (Ω) given by where λ is defined by (15), complemented with (16), (17). Even more, there exists a constant d such that where ·, · is the scalar product in L 2 (Ω).
The proof of this proposition is given in Appendix 1.

Implementation of the gradient projection method
Below we briefly describe, first at a conceptual level, our implementation of the gradient projection method for finding an "approximate" solution of problem (13), (6)-(10). Given u k ∈ U at iteration k we define the next iteration as where pr U is the projection operator on the set U with respect to the L 2 -norm, and ρ k > 0 is a step along the gradient J (u k ). (Notice that the projection in L 2 exists and is unique.) Although there are various reasonable ways of choosing the step size ρ k > 0, we formulate a convergence result where the choice of ρ k is (to some extend) flexible.
Proposition 2 Let d > 0 be the number in Proposition 1 and let the real numbers ρ 0 > 0 and ε > 0 satisfy the inequality ερ 0 < 1 − dρ 0 . Then for any choice of ] the sequence {u k } generated by the gradient projection method starting from an arbitrary u 0 ∈ U satisfies This statement of the proposition is known in principle, but due to the infinite dimensionality and the specificity in Proposition 1, which does not claim Fréchet differentiability in L 2 , we present the proof (repeating usual arguments) in Appendix 2.
We point out that (20) represents an approximate version of the necessary optimality condition that a solution u * has to satisfy: J (u * ), u − u * ≤ 0, u ∈ U. Stronger convergence results can hardly be obtained in our infinite dimensional setting without additional assumptions of convexity and weak upper semi-continuity of J , which do not need to hold for our problem.
In the numerical implementation of the above conceptual method we pass to a finite-dimensional space by introducing meshes {ω i } M i=1 and {t j } N j=0 in Ω and [0, T ], respectively. Then the systems (6)-(8) and (15)-(17) are replaced with discretizations obtained by application of a second order Runge-Kutta scheme (the Heun scheme) for the differential equations and the trapezoidal quadrature formula for integration over Ω. The approximation of the constraining set U has the polyhedral form where α i are the coefficients of the quadrature formula. In this way we obtain an approximating mathematical programming problem. The relation between the solution(s) of the obtained discretization problem and the solution(s) of the original problem (13), (6)-(10) is a subject of a separate investigation, similarly as in the recent paper (Veliov 2015), considering essentially the same (even more general) system, but in different class of controls.
Solving the discretized problem by the gradient projection method (with the gradient calculated by using the discretization of the adjoint equation) involves projection on the set U M . (Observe that U M is non-empty for a sufficiently dense mesh {ω i } M i=1 due to Assumption (ii).) There is a huge literature and available software for this kind of projection problems, for both see e.g. Hager and Zhang (2015) and the references therein. For details about the implementation of the gradient projection method (including the choice of the step length ρ) see e.g. (Polak 1971, Chapter 4).
We mention that the known convergence results for the gradient projection method applied to the discrete problem are of the same kind as Proposition 2: claiming convergence to a "critical point". In our numerical analysis we run the optimization solver for various initial guesses u 0 . In the experiments with SI and SIR models (see the next section) we have never encountered convergence to a local (and non-global) maximizer.
We remind that to obtain a good approximation E L (τ ) of the minimal convex setmembership estimation pr L (co R(τ )) for a given τ it is necessary to solve problem (13), (6)-(10) for many unit vectors l in the space of interest, L. Even more, in order to predict the evolution of state y(t) by means of the estimation E L (t) we need to do this for a number of time instances τ . Naturally, the obtained (approximate) maximizer u for given τ and l can be used as an initial guess for neighbouring instances τ and vectors l, which makes the overall estimation procedure tractable on a commercial PC.

Numerical analysis
In this section we present numerical results and analysis of versions of SI and SIR heterogeneous models.

SI-model without population growth
Here we deal with the system (1), (2) with κ = 0, that is, the disease-free population has zero growth rate. We consider this special case for the following reason: the asymptotics of the minimal set-membership estimation R(t), t → +∞, can be determined in an alternative way, and can be compared with the estimation E(t) obtained by the approach in the present paper. This is a test for the performance of the set-estimation techniques. Let us briefly describe this alternative way.
From (1) it is apparent that S(t, ω) is monotonically decreasing and positive, and thus convergent. This easily implies thatṠ(t, ω) → 0. Sinceİ (t, ω) = −Ṡ(t, ω) − γ I (t, ω), we obtain in a standard way that I (t, ω) converges to 0, provided that γ > 0. Thus also I (t) → 0. In our paper (Veliov and Widder 2015) it is shown how to determine the asymptotics of S(t) for given initial data (S(0, ·), I (0, ·)) = (u 1 , u 2 ). There, it is assumed that p(ω) = q(ω) > 0, σ (ω) = σ > 0 is constant, and the set of those ω ∈ Ω, for which S(0, ω) > 0 and γ > σ p(ω), has positive measure. Then Veliov and Widder (2015, Section 4.2) claims that where F * is the unique positive solution of the equation Solving the two optimization problems involved in the last formula again requires a numerical algorithm, but now we deal with a completely static problems (differential equations are not involved). Again a gradient projection method is applied, since the  (22). For t < ∞ the estimation E(t) is calculated by using 20 equidistant unit vectors l ∈ R 2 Fréchet derivative of S * can be analytically represented. We skip the details of this procedure.
The essence of the above paragraph is that now we have two different methods for approximation of the limit of R(t): by the way mentioned just above, and by using the general technique presented in this paper for approximating R(t), applied for large t. The comparison is clearly seen in Figs. 1 and 2, obtained for the data specifications described below.
The initial size of the population is normalized to one: S(0) + I (0) = 1. Moreover, Ω = [0, 1], δ = 0.15 and σ = 0.1. The weight functions p(ω) and q(ω) are linear: p(ω) = q(ω) = 0.5 + ω (the constant term means that all individuals have risky contacts). In order to define the lower and the upper bounds ϕ 0 (·) and ϕ 1 (·) of u(·) we assume that the initial distribution of trait ω among the susceptible and infected populations is close to a normal distribution ϕ(·) (called further "base distribution") with mean 0.5 and variance 0.3 truncated to the unit interval and normalised there. More precisely, its deviation from ϕ is at most 20 %. This leads to bounds (23) Figure 1 shows the evolution of the estimation E(t) obtained by using 20 equidistant unit vectors l ∈ R 2 . It converges to the limit set R(+∞) calculated as in (22). Thus the two different ways to approximate the limit set-estimation are consistent with each other. This can be seen even better in Fig. 2 (left plot), where the dotted lines represent the interval in the right-hand side of (22), while the solid lines represent the evolution of pr S (E(t)). The convergence of pr I (E(t)) to zero is seen on the right plot in Fig. 2 (right plot).
We remind that obtaining a set-estimation E(t) requires solving the auxiliary problem (13) for various unit vectors l (in the present model case l ∈ R 2 ). A feasible u that solves this problem is called extremal realization of the uncertainty in the initial data, or merely extremal, in direction l. A comprehensive analysis of the structure of the extremal u is a complicated task, seemingly not tractable, in general, although it may give useful information about "worst case" realizations of the uncertainty. Our numerical experiments with the SI model in Sect. 5.1 give evidence that the extremal u has a bang-bang structure. More precisely, for an extremal u there exists a subset A ⊂ Ω such that u(ω) = ϕ 1 (ω) for ω ∈ A and u(ω) = ϕ 0 (ω) for ω ∈ Ω\A. Of course, the set A depends on u, hence on the estimation time t and the direction l. In the experiments with the present SI-model the set A always consists of a single interval. Figure 4 presents the extremal initial data u 2 (ω) = I (0, ω) for l = (sin(1.4π), cos(1.4π)) and various values of t. For t = 1, . . . , 27 the set A stays the same, A = [0, 0.5), and the corresponding u 2 is depicted on the left plot of Fig. 4. For t = 28, . . . , 40 we obtain A = (0.5, 1] and the corresponding u 2 is depicted on the right plot of Fig. 4. Thus the structure of the extremal data may abruptly change when the estimation time changes. In the rest of this subsection we investigate the effect of intervention (prevention) polices implemented prior to or around the outburst of the disease at t = 0. Such a policy may influence either the individual susceptibility, σ (ω), (say, by vaccination) or the individual contact rate, p(ω) (by educational or other prevention programs). Assuming that the resource for intervention is limited, the question arises how to allocate it among individuals, regarding their h-state ω. As mentioned in Sect. 2, exact information about the h-state of individuals is not available, therefore a complex intervention policy that targets specific sections of the population with particular hstates cannot be enforced in practice. However, it may be feasible to identify groups of high-level and groups of low-level risk.
In view of the above, we consider two scenarios: applying the intervention to high risk individuals (here we mean those with high values of p(ω)) or applying it to low risk individuals (i.e. those with low values of p(ω), respectively). Even though this way of modelling of interventions is crude, it qualitatively answers the question which As we see below, the answer is not evident.
To be specific, we assume that for a third of the population we can decrease susceptibility σ (ω) or the contact rate p(ω) by 50 %. The question is, what will the effect of the intervention be if it is applied to the one third of the population at higher risk versus the same fraction of the population at low risk. The effect of intervention is measured by the set-membership estimation of the prevalence. Let us clarify the last notion. If we have obtained a set-estimation E(t) for (S(t), I (t)), then the corresponding set-estimation for the prevalence I (t)/(S(t) + I (t)) is the interval In Fig. 5 we show the progress of the set-estimation of the prevalence in three scenarios: no intervention (the dotted lines), intervention applied to low risk individuals (dashed lines), and intervention applied to high risk individuals (solid lines). On the left plot the intervention decreases the susceptibility σ (ω) of the treated individuals, while on the right plot-the contact rates, p(ω). Comparing the figures we see that in both cases interventions are productive and that the intervention applied to high risk individuals is significantly more efficient.
However, it is not always the case that intervention is more efficient if applied to the individuals with highest risk. To show this we consider the above SI model with only the value of σ (ω) increased from 0.105 to 0.3, i.e. we assume higher susceptibility. On the left plot of Fig. 6 we see that the prevalence approaches value 1 (and actually the population becomes extinct, asymptotically). We consider again an intervention that reduces σ (ω) by 50 %. On the right plot of Fig. 6 we see the result of this intervention when applied to the high risk and to low risk individuals, respectively. Again both

No intervention
Intervention on σ(ω) Low risk

Minimal prevalence
High risk Maximal prevalence Fig. 6 The left plot shows the evolution of the set-estimation E p (t) of the prevalence in case of no intervention: the whole population becomes (asymptotically) infected. The right plot shows E p (t) with the intervention applied to the low risk (dashed lines) and high risk (solid lines) individuals. The intervention targeting the low risk individuals is now significantly more efficient and, in particular, prevents extinction interventions yield an improvement. However, now the intervention targeting the low risk individuals is more efficient and prevents extinction.

Comparison with other distributions of the initial data
In Sect. 5.2 we assume that the base information, ϕ, about the initial data is a normal distribution truncated to the interval [0, 1] and normalised so that 1 0 ϕ(ω) dω = 1. Now, we make a comparison with results for other choices of ϕ. We consider a uniform distribution, i.e. ϕ(ω) = 1, and two distributions skewed to either side of the interval [0, 1], namely, ϕ(ω) = C 1 (e ω − 1) and ϕ(ω) = C 2 (e −ω+1 − 1), where C 1 and C 2 are appropriate normalising constants. We will refer to the first of these two functions as "right-hand exponential" and to the second one as "left-hand exponential". As before (see (23)), a 20 % deviation of the real data from the base ones is considered as possible.
The different choices of ϕ, including the normal distribution used before, are shown in Fig. 7 . Figure 8 presents the set-membership estimations corresponding to uniform, rightshifted and left-shifted base distributions. It should be viewed in parallel with Fig. 3, which corresponds to a normal base distribution ϕ. Figure 9 shows the separate estimates for S(t) and I (t) on a long time-horizon.
The results correspond to the common sense. The more to the right is shifted the base distribution ϕ (meaning that more individuals have higher contact rates p(ω) = 0.5+ω), the more are the set-estimates shifted to higher number of infected individuals. It is interesting to observe that the size of the set-estimations increases along the line "uniform" → "normal" → "left-hand exponential" → "right-hand exponential" distribution of the base data ϕ. This is apparently related to differences in the stability of the model (1) for different initial data.

SIR-model
In this subsection we consider the following heterogeneous SIR model: where The new variable R(t, ω) represents the "number" of individuals who have recovered from the infection. Now the parameter γ has to be interpreted as the recovery rate, and κ denotes both the birth rate and the mortality rate. The function N (ω) = S(t, ω) + I (t, ω) + R(t, ω) describes the total population and is constant in time, as can be seen by summing up all three equations in (24). Thus in this model the disease has no influence on the mortality of infected individuals. Furthermore, newborn individuals are assumed to be susceptible and reproduction is not affected by being infected or recovered. Notice that the denominator of the weighted prevalence (compare with the SI model (1)) is missing. The reason is, that here we assume (for a simplification that is actually not necessary) that the susceptible, the infected, and the recovered individuals of h-state ω have the same contact rate p(ω). Then the weighted prevalence is given by Normalizing the denominator in the rightmost expression in (25) to 1 we obtain the simplified model (24). In Fig. 10 we show the progress of the set-estimation of (S(t), I (t)) for σ = 0.25, κ = 0.004, γ = 0.1, and p(ω) = 0.5 + ω. We assume that at t = 0 there are no recovered individuals, i.e. u 3 (ω) = 0, and the bounds on u 1 (·) and u 2 (·) are the same as in Sect. 5.1. We see that the set-estimation exhibits an oscillatory behaviour, in contrast with the SI-model. The size of the set-estimation varies with time, but remains reasonably small.
It is interesting to mention that the structure of the extremal initial data (u 1 (ω), u 2 (ω), 0) (see Sect. 5.2) in the SIR-model is much more complicated than that in the SI-model. As seen in Fig. 11, for a given unit vector l ∈ L := R 2 × 0, the extremal initial data u 2 (ω) = I (0, ω) may change its structure several times when the estimation time progresses: for time instances t = 20 and t = 220 the lower bound ϕ 2 0 (ω) is active for small ω and the upper bound ϕ 2 1 (ω) is active for large ω, while for time instances t = 120 and t = 320 the opposite happens. Extremal initial data u 2 (ω) = I (0, ω) corresponding to l = (0, 1) and times t = 20, 120, 220, 320

Conclusions and perspectives
In this paper we demonstrate the tractability and applicability of the set-membership estimation approach for prediction of the evolution of infectious diseases in heterogeneous populations, using distributed differential models under uncertainty about the individual traits relevant for the disease. The available information is in the form of two-sided bounds for the distribution of the initial population along the space of heterogeneity (the h-states), possibly together with some aggregated data. Although the numerical illustrations of the developed estimation technique involve only SI and SIR heterogeneous models, the technique is applicable to more complex models, pro-vided that the evolution of only 2 or 3 aggregated states (such as the total number of susceptible, infected, recovered, etc. individuals) have to be estimated. However, the presented general model has the drawback that the individuals do not change their h-state (that is, their individual traits) over time. If the trait comprises the contact rate, this means that individuals keep their contact rate constant, independently of the evolution of the disease. Change of the contact rate may happen only if an individual becomes infected. This assumption is not realistic, and models and corresponding estimation techniques aimed to cope with variable individual traits are a subject of current work.
Another line of research is to involve in the presented model framework dynamic intervention policies (not only prevention prior to the outburst of the disease, as in Sect. 5.2 of the present paper). The uncertainty about the h-states of the population brings into consideration the problem to control the evolution of set-membership estimations by prevention or medication policies. This problem is profoundly investigated in other, mainly engineering, contexts (see the recent book Kurzhanski and Varaiya 2014 and the numerous references therein). (t, ω,x(t, ω),ȳ(t)) − f x (t, ω)]Δx(t, ω) dω dt (t, ω,x(t, ω),ȳ(t)) − f y (t, ω)]Δy(t) dω dt (t, ω) lie between x(t, ω) and x(t, ω) + Δx(t, ω) for (t, ω) ∈ D andȳ(t) lies between y(t) and y(t) + Δy(t) for t ∈ [0, τ ].
We next estimate the four terms in the remainder r .

Lemma 3 The remainder r satisfies the estimate
where C is a constant.
Using the estimations (36), (37) and (38), taking into account Assumptions (i) and (iii) and the boundedness of λ (Lemma 1) we obtain that each of the terms in the definition of the remainder r can be estimated as in the statement of the lemma.
To finish the prof of Proposition 1 we observe that all constants that appear above can be chosen independent of u and Δu, provided that both u and u + Δu belong to U. This is due to the uniformity in Assumption (iii) and in Lemma 1.
On the other hand, we obtain from Proposition 1 that Thus the sequence {J (u k )} is monotone increasing, and due to the uniform boundedness in Assumption (iii) this sequence is also bounded, hence convergent. Then the last inequality implies that the sequence ε k := u k+1 − u k L 2 (Ω) converges to zero. From (39) we obtain that for every u ∈ U wherec is the constant from Lemma 1. From here we obtain where C is the constant in the parentheses. The convergence of ε k to zero implies the claim of the proposition.