A simple planning problem for COVID-19 lockdown: a dynamic programming approach

A large number of recent studies consider a compartmental SIR model to study optimal control policies aimed at containing the diffusion of COVID-19 while minimizing the economic costs of preventive measures. Such problems are non-convex and standard results need not to hold. We use a Dynamic Programming approach and prove some continuity properties of the value function of the associated optimization problem. We study the corresponding Hamilton–Jacobi–Bellman equation and show that the value function solves it in the viscosity sense. Finally, we discuss some optimality conditions. Our paper represents a first contribution towards a complete analysis of non-convex dynamic optimization problems, within a Dynamic Programming approach.


Introduction
Following the COVID-19 outbreak a large number of papers have been written combining elements of both epidemiology and economics. One important motivation for these papers is that the pandemic faced policymakers with the challenge of keeping the virus diffusion under control while avoiding to suffocate economic activity (see, e.g., Acemoglu et al. 2021;Alvarez et al. 2021;Ash et al. 2022;Atkeson 2020;Eichenbaum et al. 2021;Farboodi et al. 2021;Fabbri et al. 2023;Favero 2020;Federico and Ferrari 2021;Federico et al. 2022;Goenka and Liu 2020;Gollier 2020). From a mathematical perspective, this motivation leads to the formulation of suitable dynamic optimization problems, that can be tackled with different techniques. A common issue of these problems is that, even in simple settings, without considering heterogeneity of viral transmission or uncertainty on the infection mortality, they are mathematically involved because they may not be convex.
In the typical setup of these problems, the state dynamics are given by the so-called compartmental models, where the state variables are the epidemic compartments, such as the Susceptibles, the Infected, and the Recovered in the SIR model. A peculiar feature of the state equations is an interaction term between some of these classes, usually the product between the number of susceptibles and infected. The state dynamics provide the constraint for the optimization problem and the interaction term makes the Hamiltonian function associated to the control problem non-convex. In this situation, the classical sufficiency results for the Pontryagin Maximum Principle, like the Arrow or Mangasarian conditions, cannot be applied, as they require convexity of the Hamiltonian function. Likewise, the lack of convexity hinders the application of numerical methods, which are often employed in the analysis of these dynamic optimization problems.
This paper studies a specific family of non-convex problems to show that the Dynamic Programming approach can be profitably applied to analyze them. In particular, Dynamic Programming allows us to characterize the value function of the optimization problem as the unique viscosity solution of a suitable Hamilton-Jacobi-Bellman (HJB) equation. It also identifies optimality conditions which are sufficient for global optima independently of any convexity assumption (see, e.g., Bardi and Capuzzo-Dolcetta 1997).
We illustrate our approach with an application to the simple model of Alvarez et al. (2021), one of the recent epi-econ papers featuring the type of non-convexities in the epidemic propagation terms discussed above. The main results of the paper establish: the continuity, and in some cases the Lipschitz continuity, of the value function of the optimization problem; the fact that the value function is the unique viscosity solution of the HJB equation associated to the control problem and that it is also a bilateral viscosity solution; a sufficient optimality condition in terms of the semidifferentials of the value function. More details are given in Sect. 1.1 and 1.2 below. It is important to highlight that our results can be proved (with straightforward changes) for other models dealing with an optimal control problem in the presence of an SIR setup, such as those presented in Acemoglu et al. (2021), Piguillem and Shi (2022), Eichenbaum et al. (2021), Jones et al. (2021), Pollinger (2023), Zaman et al. (2017), Bolzoni et al. (2017), Balderrama et al. (2022), Elhia et al. (2013), and Ketcheson (2021).
We emphasize that, due to the difficulty of the problem, some issues remain open. In particular, we cannot prove that the value function is differentiable everywhere. This also means that we cannot rule out a singular behaviour (e.g., discontinuities) of optimal strategies in some regions of the state space. This fact may represent an issue, for instance, in numerical simulations for the type of optimization problems that we consider. To the best of our knowledge, there are no results in the mathematical literature that provide a numerical scheme suited to them; in addition, the extension of known numerical schemes for viscosity solutions to the kind of first-order Hamilton-Jacobi-Bellman equations that we treat here does not seem straightforward, due again to the non-convexity of the state dynamics and of the Hamiltonian.

Technical issues and selected related contributions
Optimal control problems that are non-convex either in the objective functional or in the state dynamics (or both), are notoriously difficult to study with a Maximum Principle approach. Indeed, the standard sufficiency conditions, like the Arrow or the Mangasarian conditions, do not hold. Nonetheless, Goenka et al. (2021) and Goenka et al. (2022) analyzed epi-econ SIS and SIR models with a Maximum Principle approach, by providing sufficient conditions for local extrema under weaker assumptions than those of the Arrow or the Mangasarian conditions. The Maximum Principle approach was applied also in Goenka et al. (2014) to a SIS model complemented with economic variables, by proving existence of optimal strategies. In Aspri et al. (2021), where a SEIARD model is studied, the authors establish the existence of optimal strategies in a specific class of controls by exploiting the convexity of the objective function. The results proved in all these papers rely on the structure of the problems, in particular on the convexity of the objective function and, for sufficient conditions, on some ideas given first in Leitmann and Stalford (1971). Unfortunately, these results cannot be directly applied to our case, since our objective function may be non-convex.
As we previously anticipated, the Dynamic Programming approach (if applicable) presents clear advantages in treating non-convex problems. One of these advantages is the possibility of characterising the value function of the optimization problem as the unique viscosity solution 1 of the HJB equation associated to the optimal control problem. It is important to stress that proving such a characterization can motivate the study of numerical algorithms suited to the class of optimization problems that we study here. These numerical schemes could be used to approximate the value function. 2 In our setting, the main issue that we face to prove this property of the value function is the presence of positivity state constraints, which may be a hindrance to show uniqueness, see, e.g., Soravia (1999a, b). This problem is solved because the so-called interior cone condition holds. This condition was introduced first in Soner (1986) and it allows us to prove uniqueness of the solution to the HJB equation in the viscosity sense (see Theorem 4.4 below).
Another advantage of the Dynamic Programming approach is the possibility of identifying optimality conditions which are sufficient for global optima independently of any convexity assumptions (more details on this in, e.g., Bardi andCapuzzo-Dolcetta 1997, Chapter III, Section 2.5 andFabbri et al. 2017, Section 2.5). These conditions are usually obtained through the so-called Verification Theorems and the main issue is to show that the value function is continuously differentiable in the interior of the state space. This is rather problematic in our setting because, in general, value functions are not continuously differentiable, i.e., they admit singularities (see, e.g., Fleming and Soner 2006, Section II.2). There are quite general conditions that imply continuous differentiability of the value function, namely, its semiconcavity and strict convexity in the costate variables of the Hamiltonian function (see Cannarsa and Soner 1989;Cannarsa and Sinestrari 2004, Section 5.3 and also Bardi and Capuzzo-Dolcetta 1997, Chapter II, Section 5). Unfortunately in our case these conditions do not hold or are difficult to show (provided that they hold). For this reason, based on the ideas of Bardi and Capuzzo-Dolcetta 1997, Chapter III, Sections 2.3−2.4, we establish a Backward Dynamic Programming Principle. This is key to prove that the value function is a bilateral solution of the correspoding HJB equation (see Theorem 4.7 below) and to state a weak form of sufficient optimality condition (see Theorem 4.8).
We mention that similar techniques were used successfully in other economic applications, see, e.g., Bambi and Gozzi (2020), Freni et al. (2006), Freni et al. (2008. However, in those problems homogeneity and semiconcavity allowed the authors to apply the method of Cannarsa and Soner (1989), which cannot be used in our setting.

Overview of the main results
From a technical perspective, we can single out three main contributions of our paper. 1 The concept of viscosity solutions has been introduced by Crandall and Lions (1983) (see, e.g., Crandall et al. 1992 for a synthesis of viscosity solution theory) to cope with the fact that, in many optimal control problems, the value functions are not differentiable everywhere and the associated HJB equations may not have classical (i.e., differentiable) solutions, even in simple cases. Using such a more general solution concept it is possible to prove existence and uniqueness of solutions which are simply continuous (or even discontinuous) and to apply suitable algorithms to compute numerically the value function. 2 We refer the reader to (Bardi and Capuzzo-Dolcetta 1997, Appendix A) for an introduction to numerical schemes for viscosity solutions to HJB equations.
• First, we prove that the value function is uniformly continuous and, for a sufficiently large discount rate, Lipschitz continuous in its domain (see Proposition 3.2). • Second, we establish dynamic programming principles for our problem (the standard one and the backward one, see Propositions 3.3 and 3.9, respectively). This allows us to characterize the value function as the unique viscosity solution to the associated HJB equation (28), satisfying a suitable boundary condition (i.e., being a supersolution at the boundary), and to prove that it is also a bilateral solution (Theorem 4.7). • Third, we show an optimality condition (see Theorem 4.8), that allows us to characterize the optimal strategies. In particular, we show that (except from trivial cases) the optimal strategy is a laissez-faire policy as long as the ratio between the rate of newly infected people and the population that can be put in lockdown is not higher than a threshold, which depends on the difference between the marginal cost of infected and the marginal cost of susceptibles. As this ratio increases, the lockdown is enforced up to a full lockdown, when a second threshold is passed.
The paper is organized as follows. In Sect. 2 we introduce the optimal control problem for the SIRD model that we aim to analyse and we provide some preliminary results. In Sect. 3 we provide continuity properties of the value function (Sect. 3.1) and we establish the dynamic programming principles (Sect. 3.2). In Sect. 4 we study the HJB equation and, in particular, we provide the explicit expression of the Hamiltonian function; in Sect. 4.1 we prove that the value function is a viscosity solution, in a suitable sense, of the HJB equation; Sect. 4.2 contains some optimality conditions, with which we are able to provide an economic intepretation of optimal policies. In Sect. 5 we draw some conclusions on our results and present some ideas for future work on the subject.

The optimal control problem
In this section we introduce the optimization problem for the SIRD model that we study. We denote by S, I , R, D, the classes of susceptible, infectious, recovered, and dead individuals, respectively. We assume that there are no newborn and that people either die from the infection or live forever; this is clearly unrealistic, but it is compatible with the duration of the pandemic/endemic phase, which is shorter than the average life duration.
The dynamics of the population introduced above are influenced by a planner, who may enforce lockdown by choosing its intensity, i.e., the percentage L t of the population that is forced to be locked down, at each time t ≥ 0. This percentage can be chosen up to some fixed thresholdL ≤ 1, that is, L t ∈ [0,L], for each t ≥ 0. However, the lockdown effectiveness is assumed to be less than the planned one, because people may fail to respect the lockdown measures and the virus can still circulate; the lockdown intensity is thus damped by a factor θ ∈ (0, 1), i.e., θ L t is the real fraction of population that is actually in lockdown. Lockdown applies only to susceptible and infectious individuals, since we assume that recovered ones cannot get infected again; this is possible because we assume that testing is available, and hence the planner knows who is infected and who has recovered.
The model we consider is specified as follows. The epidemic dynamics are given, for all t ≥ 0, by the following system of controlled ordinary differential equations (1) The lockdown intensity function t → L t is chosen in the set of admissible control strategies The parameters appearing in (1) have the following meaning: β > 0 is the number of susceptible agents per unit of time to whom an infected agent can transmit the virus, among those who are not in lockdown; γ > 0 is the fraction of infected agents that recovers; ϕ(i) is the rate per unit of time of infected agents i that die.

Remark 2.1
The case where the highest possible intensity of lockdown is equal to 1, i.e.,L = 1, corresponds to allowing the possibility of a full lockdown. This is not realistic, as basic activities related for example to energy production and distribution of fundamental goods must remain functional. Nonetheless, we will not introduce the restrictionL < 1, since it has no particular effect on the mathematical results presented below.
The following assumption ensures existence and uniqueness of a solution to (1), for any given L ∈ L . This can be easily shown with standard methods (see, e.g., Bardi and Capuzzo-Dolcetta 1997, Chapter III, Section 5).

Assumption 2.2
The function ϕ, appearing in (1), is positive, bounded, and Lipschitz continuous. More specifically, where γ is the fraction of infected agents that recovers, and there exists a constant M ϕ > 0, such that, for all i, i ∈ [0, 1],

Remark 2.3
In our model, the mortality rate is not constant, but provided by the function ϕ, and depends on the number of infected people. Such a choice is motivated by some studies (see, e.g., Ciminelli and Garcia-Mandicó 2020; Favero 2020). Various papers in the literature deal, instead, with a constant mortality rate (lower than γ ), which is a case covered by our model. However, specializing our results to the case of a constant mortality rate would not allow us to obtain deeper or more refined statements. Indeed, as highlighted in the Introduction, the technical difficulties lie in the fact that the epidemic dynamics given in (1) feature an interaction term between the number of susceptibles and infected.
In our setting, we account for the possibility of a vaccine and a cure being discovered (for simplicity at the same time) at a random time τ , which we assume to be defined on some probability space ( , F, P) and exponentially distributed with intensity ν > 0. Our model also describes the worst-case scenario in which a vaccine or a cure are not discovered: in this case, we set τ equal to +∞ P-almost surely.
We assume that the planner maximizes the following quantity over all admissible control strategies where E denotes the expectation with respect to probability P, N t is the total population (including deaths) at time t ≥ 0, i.e., N t := S t + I t + R t + D t , r > 0 is a fixed discount factor, w > 0 is the output produced by each agent alive that is not in lockdown, χ > 0 is an extra cost, in units of output, for each agent that dies as a consequence of the infection. Hence, the planner aims at maximizing the present value of the total production output, considering the cost of fatalities.
In particular, if we consider the case in which a vaccine and a cure are discovered at an exponentially distributed random time τ , the planner disregards what happens to the epidemic dynamics after τ , as she/he considers that all the individuals who survived the epidemic at time τ , i.e., N τ − D τ , are infinitely lived and productive. This fact is accounted for in the second term inside the expectation appearing in (5). If, instead, we consider the worst-case scenario in which a vaccine or a cure are never discovered, i.e., τ = +∞ P-almost surely, then (5) reduces to the quantity

Preliminary results
Let us fix, for the time being, an arbitrary admissible control L ∈ L . For simplicity, we normalize the initial population so that N 0 = s 0 + i 0 + r 0 + d 0 = 1.
From (1), we have N t = 0, thus N t = 1, for all t ≥ 0. Therefore, for every time t ≥ 0 and any initial condition This fact is consistent with the assumption that there are no newborn and that people either die from the infection or live forever. Moreover, to determine uniquely the solution to (1) it is enough to provide the triplet (s 0 , i 0 , r 0 ) as initial condition, with s 0 , i 0 , r 0 ≥ 0 and s 0 + i 0 + r 0 ≤ 1, and set d 0 = 1 − s 0 − i 0 − r 0 . From now on, we will specify only such a triplet, unless stated otherwise.
Since t → D t is clearly nondecreasing, we have that the number of people alive at time t ≥ 0, i.e., N t − D t = S t + I t + R t , is nondecreasing over time, that is, Therefore, for all t ≥ 0 and any initial condition (s 0 , i 0 , r 0 ) as above, the state of the system (S t , I t , R t , D t ) belongs to the set for every t ≥ 0, i.e., the dynamics is constant and not affected by the choice of the control strategy. If s 0 = 0 the dynamics is not constant but, as before, is not affected by the choice of the control strategy.
Recall that the planner maximizes over all admissible control strategies L ∈ L the functional which depends on any given initial condition s 0 , i 0 , r 0 ≥ 0, with s 0 + i 0 + r 0 ≤ 1, for (1). Using the dynamics of processes (S, I , R, D) and the law of the random variable τ , we can rewrite the functionalJ as follows.
Proof We can explicitly compute the expectation in (6) using the law of the random time τ and the dynamics of (S, I , R, D). As noted previously, if τ = +∞ P-almost surely, we have that (8) If, instead, τ is an exponential random variable with intensity ν > 0, Applying the Fubini-Tonelli theorem we get that Observe that (8) is a special case of (9), with ν = 0. Finally, noting that N t − D t = S t + I t + R t = I t ϕ(I t ) and integrating by parts, we get whence, recalling that N 0 = 1, we obtain (7).

Remark 2.6
It is worth emphasizing that the proof of Lemma 2.5 shows once more that our optimization problem includes the worst-case scenario in which a vaccine or a cure are never discovered, i.e., in which τ = +∞ P-almost surely. Indeed, it suffices to consider ν = 0 in all of the results of this paper, which remain valid also in this case.

Remark 2.7
The state variables appearing on the right-hand-side of (7) are only S and I , i.e., the number of susceptible and infectious individuals. If we consider the dynamics of this pair of variables, namely, then, for each L ∈ L , the solution to (10) is completely determined by any given initial condition (s 0 , i 0 ) ∈ T , where Moreover, since the map t → S t + I t is decreasing, we get that S t + I t ≤ s 0 + i 0 . Therefore, for any t ≥ 0, any (s 0 , i 0 ) ∈ T , and any L ∈ L , the state (S t , I t ) belongs to the set Clearly, specifying only (s 0 , i 0 ) ∈ T is not enough to determine the solution to the complete system (1), as r 0 is also needed. As a consequence of the discussion above and of Lemma 2.5, the functionalJ does not depend on r 0 , and hence our optimization problem depends only on the state variables S and I .
Equation (7) also shows that the optimization problem introduced at the beginning of this section is equivalent to the optimization problem defined, for any given (s, i) ∈ T , as inf L∈L J (L, s, i) where, for all L ∈ L and (s, i) ∈ T , Thus, from this point onward, we consider problem (P). As usual, we introduce the value function for the above minimization problem, namely, To conclude this section, we provide a brief comparison between the optimization problem studied here, i.e., problem (P), and the one analyzed in Alvarez et al. (2021). The optimization problem (P) is equivalent to the one studied in Alvarez et al. (2021), Eq. (7). We modified slightly the setup of Alvarez et al. 2021 including a class D of dead people, that is, we consider an SIRD model; the dynamics of the classes S, I and R are left unchanged, thus our formulation is completely equivalent to that of Alvarez et al. (2021), with the advantage that the population remains constant in our setting. We stress once more that we consider here the situation where testing is available and quarantine is not enforced. We make some remarks on other possible extensions of this model further below (see Remark 4.10).

Remark 2.8
It is also worth noting that Lemma 2.5 shows that optimization problem (P) is equivalent to the maximization one presented in Alvarez et al. (2020), Eq. (5). This provides a rigorous foundation to the arguments given in Alvarez et al. (2020), p. 10, regarding the equivalence of these two problems.

Properties of the value function and dynamic programming principles
In this section we begin our analysis of problem (P) with the dynamic programming approach. We derive in Sect. 3.1 a regularity result for the value function of this problem, defined in (13); then, in Sect. 3.2 we establish the forward and backward dynamic programming principles, that are used in Sect. 4.

Properties of the value function
As shown in the previous section, the state variables for problem (P) are the number of susceptible and infected individuals, whose dynamics are given in (10). To show some regularity results for the value function V (see Proposition 3.2 below), we need to provide, first, a useful estimate concerning the unique solution to this system of ordinary differential equations.
In what follows, we set for convenience X t = (S t , I t ), t ≥ 0, and we introduce the notation X L,x 0 t , S L,s 0 ,i 0 t , I L,s 0 ,i 0 t , to stress the dependence of the solution to (10) on the control strategy L ∈ L and on the initial condition x 0 = (s 0 , i 0 ) ∈ T . We also define the vector field b(s, i, l) : In this way, we can write the system (10) as or equivalently, in integrated form, We have the following lemma.
Lemma 3.1 Let X and X be the two solutions to (10) corresponding to initial conditions x 0 , x 0 ∈ T and strategies L, L ∈ L , respectively. Then, In particular, if L = L, then, Proof It is easy to show that the vector field b, introduced in (14), is bounded on T × [0,L] and that it is Lipschitz continuous in (s, i) ∈ T , uniformly with respect to l ∈ [0,L]. More precisely, we have that and, for all (s, i), with M b := 2 β + γ + M ϕ , where M ϕ is the Lipschitz constant appearing in (4). Thanks to (19), we deduce that and hence a simple application of Gronwall's lemma (see, e.g., Bardi andCapuzzo-Dolcetta 1997, Chapter III, Section 5, or Fleming andRishel 1975, Appendix A) yields (16). Setting L = L we immediately deduce (17).
Let us introduce the running cost function appearing inside the functional J given in (12), i.e., It is not hard to show that f is non-negative and bounded on T × [0,L] and that it is Lipschitz continuous in (s, i) ∈ T , uniformly with respect to l ∈ [0,L]. More precisely, and, for all (s, i), where M f := 2 L w + w r + χ (γ + M ϕ ) and M ϕ is the Lipschitz constant appearing in (4). From these facts, we deduce the following result. (13) is non-negative, bounded, and uniformly continuous on T . If, moreover, r + ν ≥ M b , where M b is the constant appearing in (19), then V is Lipschitz continuous on T .

Proposition 3.2 The value function V given in
Proof The value function V is clearly non-negative, because f is. Boundedness easily follows from (21). Indeed, for all L ∈ L and all (s, i) ∈ T , To prove uniform continuity it is enough to show that V is continuous on T , because T is a compact subset of R 2 . Let us fix ε > 0, (s, i), (s,ĩ) ∈ T , and the corresponding solutions to (10) (S, I ), ( S, I ), for any given admissible control. Consider an ε-optimal control for the minimization problem with initial data (s,ĩ), that is, L ε ∈ L such that Then, for a constant T > 0 to be chosen later, using (21), (22), and (17), we obtain The second term of the right hand side of the last inequality can be made smaller than ε choosing T large enough, while the first one can be made smaller than ε choosing an appropriate δ > 0 such that (s, i)−(s,ĩ) < δ. Exchanging the roles of (s, i), (s,ĩ), we get that V is continuous on T . Finally, if r + ν ≥ M b , we can take T → +∞ and ε → 0 + and get from the last inequality that is, V is Lipschitz continuous on T .

Dynamic programming principles
In this subsection we provide the dynamic programming principles for optimization problem (P). We start with the following standard result, stating that the value function V satisfies the Dynamic Programming Principle. The proof is analogous to that of Bardi and Capuzzo-Dolcetta 1997, Proposition III.2.5 and is thus omitted.

Proposition 3.3 For all (s, i) ∈ T and all T > 0, the value function V verifies
The following facts can be easily deduced from the Dynamic Programming Principle (for a slightly different approach, see Fleming and Rishel 1975, Theorems 3.1, 3.2).

is non-decreasing and it is constant if and only if L is optimal.
As a result of the previous proposition, we obtain the following useful regularity result for the value function V evaluated at optimal trajectories of the system (10).
Corollary 3.5 LetL ∈ L be an optimal control and denote by (Ŝ,Î ) the corresponding optimal trajectory of the system (10). Then, for almost every t ≥ 0, there exists V (Ŝ t ,Î t ) and Proof We follow closely the arguments given in the proof of Proposition 4.13 in Freni et al. 2008. Let us consider the function SinceL is an optimal control, we know from Proposition 3.4 that g is constant.
Moreover, g is differentiable at all Lebesgue points ofL, which implies that g (t) = 0, for almost all t ≥ 0. From (23) we deduce that and hence, for almost all t ≥ 0, V (Ŝ t ,Î t ) exists and satisfies whence the claim.
We want to show, next, that a result analogous to Proposition 3.3 holds for the backward trajectories of the system (10), that are given by the solution Y of for any initial condition y 0 = (s 0 , i 0 ) ∈ T and any Borel-measurable function L : To denote the solution to (24) we will use the notation Y L,y 0 t or Y L,s 0 ,i 0 t , t ≥ 0, to stress its dependence on L and y 0 = (s 0 , i 0 ) ∈ T . We have to restrict the set of admissible controls for the backward equation (24), as there is no guarantee that the backward trajectories remain in the state space T . Thus, we define, for any given (s, i) ∈ T , the set

Remark 3.6
It is important to note that, if (s, i) ∈ T are such that s + i = 1, with i = 0, then L − (s, i) = ∅. Indeed, from (10) we deduce that S t + I t = s + i + 0 t (γ + ϕ(I s ))I s ds, t < 0, and hence S t + I t > 1, for all t < 0.
We need also the following definition.
Definition 3.7 We say that a point (s, i) ∈ T is optimal if there exist (s 0 , i 0 ) ∈ T , t * > 0, and an optimal strategy L * ∈ L -i.e., V (s 0 , We denote the set of optimal points by O. Remark 3.8 The definition above can be rephrased saying that the controller can drive the system starting from (s 0 , i 0 ) to (s, i) in finite time with an optimal strategy L * .
We are now ready to state the Backward Dynamic Programming Principle. Its proof is very similar to the one given in Freni et al. 2008, Proposition 4.7 (see also, Bardi and Capuzzo-Dolcetta 1997, Proposition 2.25).

Proposition 3.9
For every (s, i) ∈ T , every t > 0, and every L ∈ L − (s, i), the value function V satisfies Remark 3.10 Since in the second part of Proposition 3.9 it is assumed that (s, i) ∈ O, the supremum in the last equality is attained, and hence we can replace it with a maximum.

The Hamilton-Jacobi-Bellman equation
In this section we study the HJB equation for optimization problem (P), which is given by In Sect. 4.1 we characterize the value function as the unique viscosity solution, in a sense to be made precise later, of the HJB equation (25). Then, in Sect. 4.2 we give some optimality conditions, that allow us to characterize optimal policies.
We give a preliminary result that allows to write (25) in a more explicit form. Let us define, for all (s, i, p, q, l so that (25) can be written as We also define the set of minimizers for (27), i.e., The following proposition shows that H and can be explicitly computed.
Proposition 4.1 Let us define the following sets, which form a partition of T × R 2 , , For any (s, i) ∈ T , ( p, q) ∈ R 2 , the Hamiltonian H defined in (27) is given by and the set of minimizers is given by (s, i, p, q Moreover, the Hamiltonian H is continuous on T ×R 2 and, for each fixed (s, i) ∈ T , the function R 2 ( p, q) → H (s, i; p, q) is concave.
Proof We note, first, that continuity of H is a straightforward consequence of the fact that the current value Hamiltonian H CV is continuous on T × R 2 × [0,L] and that [0,L] is a compact subset of R.
We also note that for each fixed (s, i, p, q) ∈ T × R 2 the set of minimizers (s, i, p, q) coincides with the set of minimizers of the quadratic expression We divide our proof according to the different possible cases. Clearly, H 0 (l; 0, 0, p, q) = 0, for all l ∈ [0,L] and any ( p, q) ∈ R 2 , and hence Therefore, for each fixed (s, i, p, q) ∈ C I ∪C S ∪ A 0 , with (s, i) = (0, 0), the minimum of l → H 0 (l; s, i, p, q), l ∈ [0,L], is attained at l = 0. Thus, (s, i, p, q) = {0}, (s, i, p, q) ∈ C I ∪ C S ∪ A 0 , with (s, i) = (0, 0), and, still considering (s, i) = (0, 0), Next, we study the case where s = 0, i = 0, and q = p. We observe that, for each fixed (s, i, p, q) ∈ A 1 ∪ A 2 ∪ A 3 ∪ A 4 , the abscissa of the vertex of the parabola x → H 0 (x; s, i, p, q) is and H (s, i, p, q) = H CV (s, i, p, q,L) Putting together all these facts we get the explicit expressions for H and .
It is not difficult to show that the expression of H on the set A 3 is, for each fixed (s, i) ∈ T , a concave function in ( p, q). Therefore, ( p, q) → H (s, i, p, q) is concave.
Remark 4.2 Note that the function ( p, q) → H (s, i, p, q), for fixed (s, i) ∈ T , is not strictly concave, as all expressions (except for the third one) given above for H are clearly linear in ( p, q).

The value function is the unique viscosity solution of the HJB equation
We now show that the value function V is the unique solution to the HJB equation (28) in the sense of viscosity solutions, introduced by Crandall and Lions (1983). In what follows, if g : R n → R is a continuously differentiable function, Dg(x) denotes the gradient of g at x ∈ R n , i.e., the vector of partial derivatives We denote by int T the interior of T . We need the following definitions.

Definition 4.3 A continuous function v : T → R is called a constrained viscosity solution of (28) on T if it is both
• a viscosity subsolution of (28) on int T , i.e., if • a viscosity supersolution of (28) on T , i.e., if If u is continuous on an open set ⊂ R n and x ∈ we define the superdifferential of u at x as the set |y − x| ≤ 0 and the subdifferential of u at x as the set It follows that a continuous function v : → R is a viscosity subsolution of (28) on if and only if (r + ν)v(s, i) − H (s, i, p, q) ≤ 0 for every (s, i) ∈ and every ( p, q) ∈ D + v(s, i), and it is a viscosity supersolution of (28) on if and only if (r + ν)v(s, i) − H (s, i, p, q) ≥ 0 for every (s, i) ∈ and every ( p, q) ∈ D − v(s, i).
We establish, first, the following uniqueness result.

Theorem 4.4 The value function V is the unique constrained viscosity solution to the HJB equation (28).
Proof By Proposition 3.2, V is a bounded and uniformly continuous function on T and by Proposition 3.3 it satisfies the Dynamic Programming Principle. Therefore, reasoning as in Soner (1986), Theorem 2.1, (see also Calvia 2018, Theorem 4.10), we deduce that V is the unique constrained viscosity solution to (28).
Using the Backward Dynamic Programming Principle, we can say something more on the value function as a viscosity solution to (28). We need the following definition.
Finally, we say that v is a bilateral viscosity solution of (28) on if it is both a bilateral subsolution and supersolution on .

Remark 4.6
Recall that, in general, a viscosity solution to either (30) or (31) is not a viscosity solution to the other one.
Theorem 4.7 The value function V is a bilateral viscosity supersolution to (28) in the interior of T . In particular, for every ( p, q) ∈ D − V (s, i), with (s, i) in the interior of T , Moreover, V is a (non bilateral) viscosity subsolution to (28) in the interior of T . This is equivalent to say that, for every ( p, q) ∈ D + V (s, i), with (s, i) in the interior of T , Finally, for any (s, i) in the boundary of T and any ( p, q) ∈ D − V (s, i), and, for any (s, i) ∈ C ∪O, where O is the set of optimal points given in Definition 3.7, and any ( p, q) ∈ D + V (s, i), Proof We provide a sketch of the proof. By Theorem 4.4, V is the unique constrained viscosity solution to (28), and hence we deduce that V is a viscosity solution of (28) on the interior of T . More precisely, we have that, for any (s, i) in the interior of T , and that, for any (s, i) in the boundary of T and any ( p, q) ∈ D − V (s, i), Using the fact that, by Proposition 3.9, V satisfies also the Backward Dynamic Programming Principle, arguing as in Bardi and Capuzzo-Dolcetta 1997, Corollary III.2.28, we find that V is a supersolution of −(r + ν)V (s, i) + H (s, i, ∂ s V (s, i), ∂ i V (s, i)) = 0 on the interior of T , i.e., for any (s, i) in the interior of T , and that, for any (s, i) ∈ O and any ( p, q) ∈ D + V (s, i), Combining (33) and (34), we get (32). We are, thus, left to show that the last inequality in the statement of the theorem holds for all (s, i) ∈ C and all ( p, q) ∈ D + V (s, i). Let us define, for all (s, i) in the boundary of T , the set U(s, i) := {l ∈ [0,L]: there exist L ∈ L − (s, i) and τ > 0 such that L t = l, ∀t ∈ (−τ, 0]}. Note that, by Remark 3.6, we have that U(s, i) = ∅, for all (s, i) ∈ T such that s + i = 1, with i = 0. Moreover, a simple inspection of (10) reveals that if i 0 = 0, then also the backward dynamics is constant, regardless of the choice of L, and that if s 0 = 0, i 0 = 1, then for any l ∈ [0,L] one can find a time τ and a constant (ii) If L ∈ L is an optimal control strategy, then for almost every t > 0 and every Remark 4.9 Condition 4.8 above is also sufficient for optimality if the set D + V (X t ) coincides with the Clark differential of V at X t for almost every t. For instance, this is the case if we restrict to constant control strategies, because the value function is then the infimum over a compact set of smooth functions with uniform bounds. The above condition on D + V is also satisfied if the value function happens to be differentiable everywhere.
Thanks to the explicit computations carried out in Proposition 4.1 we can reformulate condition (36) in Theorem 4.8 as follows; we assume x 0 = (0, 0), L ∈ L optimal, t > 0 and p = ( p, q) ∈ D ± V (X t ).
Furthermore, if we assume that the value function V , given in (13) is differentiable everywhere, then we can interpret optimal strategies and the partition of T × R 2 appearing in Proposition 4.1 as follows. Assume that x 0 = (0, 0), that L ∈ L is optimal, consider t > 0, and define , ) .
• If I t = 0, then there is no epidemic. In this case, (S t , I t , ∂ s V (S t , I t ), ∂ i V (S t , I t )) ∈ C I and, clearly, L t = 0. • If S t = 0, then the epidemic dies out without any need for a lockdown. In this case, (S t , I t , ∂ s V (S t , I t ), ∂ i V (S t , I t )) ∈ C S and L t = 0. • If S t , I t = 0, then the value at time t of the optimal policy L depends also on the derivatives of the value function and, in particular, on the sign of ∂ i V (S t , I t ) − ∂ s V (S t , I t ). More precisely, if ∂ i V (S t , I t ) − ∂ s V (S t , I t ) ≤ 0, i.e., if the marginal cost of the infected is not higher than the marginal cost of the susceptibles, then the optimal policy at time t is a laissez-faire policy. In this case, (S t , I t , ∂ s V (S t , I t ), ∂ i V (S t , I t )) ∈ A 0 ∪ A 1 . If, instead, ∂ i V (S t , I t ) − ∂ s V (S t , I t ) > 0, i.e., if the marginal cost of the infected is higher than the marginal cost of the susceptibles, then: • The optimal policy at time t is a laissez-faire policy whenever the ratio between the rate of newly infected people and the population that can be put in lockdown, β S t I t S t +I t , is not higher than the threshold K (1) (S t , I t ). In this case, (S t , I t , ∂ s V (S t , I t ), ∂ i V (S t , I t )) ∈ A 2 ; • A fraction of the population, smaller thanL, is put in lockdown at time t, whenever the ratio β S t I t S t +I t is between the two thresholds K (1) (S t , I t ) and K (2) (S t , I t ). In this case, (S t , I t , ∂ s V (S t , I t ), ∂ i V (S t , I t )) ∈ A 3 ; • The highest possible fraction of population, i.e.,L, is put in lockdown at time t, whenever the ratio β S t I t S t +I t is higher than the threshold K (2) (S t , I t ). In this case, (S t , I t , ∂ s V (S t , I t ), ∂ i V (S t , I t )) ∈ A 4 . Remark 4. 10 We observe that our main results can be applied to other similar epi-econ models which display the same structure, in particular: • the state equations are a controlled modification of the compartmental models in epidemiology, like SIR or similar; • the cost functional to minimize is not strictly convex or, possibly, non-convex. This is the case, for instance, of the model discussed in Acemoglu et al. (2020). All the results above hold also in that context, with all the required adaptations. A more general setting in which the techniques showed in this paper may be applied, is the optimal control of age-structured SIR-type models. To the best of our knowledge, the study of HJB equations in this context is still not carried out completely (see, e.g., Fabbri et al. 2021).
It is important to note that some issues remain open in the analysis of the model discussed in this paper and of those cited above. For instance, little is known about optimal strategies; a deeper study in this direction is required and this calls for different ideas and proofs.

Conclusion
This paper makes a first step towards a complete analysis of the dynamic programming approach for a class of epi-econ models that have been formulated and studied in recent years. From a technical point of view such models are difficult to study mainly due to the lack of convexity of the dynamics and of the cost. Existing numerical methods for solutions to HJB equations in the viscosity sense are not suitable (nor can be straightforwardly adapted) to simulate the value function of our optimization problem. Such simulations, in the absence of a closed-form expression for the value function, would allow us to obtain more insights about its behaviour. Other important aspects that we could not analyze with the results presented here are the existence and uniqueness of an optimal strategy, possibly in feedback form, and the behavior of optimal trajectories, that is, the evolution of the epidemics under the action of an optimal control. Nevertheless, we think that our results provide a solid ground for further research. For instance, an important aspect to be analyzed is the behavior of optimal trajectories. More precisely, the next steps will be: • to characterise the set where the value function is differentiable and where singularities in its gradient may arise; • to use the sufficient optimality conditions proved here to characterise the optimal strategies; • to extend or adapt existing numerical schemes to the non-convex case, in order to cover at least some of the examples mentioned herein.