Dynamic programming approach to the numerical solution of optimal control with paradigm by a mathematical model for drug therapies of HIV/AIDS
- First Online:
- Received:
- Accepted:
DOI: 10.1007/s11081-012-9204-4
- Cite this article as:
- Guo, BZ. & Sun, B. Optim Eng (2014) 15: 119. doi:10.1007/s11081-012-9204-4
- 426 Downloads
Abstract
In this paper, we present a new numerical algorithm to find the optimal control for the general nonlinear lumped systems without state constraints. The dynamic programming-viscosity solution (DPVS) approach is developed and the numerical solutions of both approximate optimal control and trajectory are produced. To show the effectiveness and efficiency of new algorithm, we apply it to an optimal control problem of two types of drug therapies for human immunodeficiency virus (HIV)/acquired immune deficiency syndrome (AIDS). The quality of the obtained optimal control and the trajectory pair is checked through comparison with the costs under the arbitrarily selected different controls. The results illustrate the effectiveness of the algorithm.
Keywords
Optimal control Viscosity solution Dynamic programming Numerical solution1 Introduction
Optimal control has been being one of the main topics in modern control theory, which develops from two aspects: abstract theory and computational method. From the computational aspect, on the one hand, there are essentially three types of approaches seeking the numerical solution of optimal control for a continuous control system (Sargent 2000). The first one is by the necessary condition of optimality, such as the Pontryagin maximum principle, to solve a two-point boundary value problem mainly utilizing the multiple shooting method. The second kind of method converts the original continuous problem into a finite-dimensional nonlinear program problem through full discretization. The last one is through the parameterization of control trajectory to get a nonlinear program problem and then adopts proper steps to deal with it further (von Stryk and Bulirsch 1992). All these methods have their own drawbacks. The Pontryagin maximum principle provides only a necessary condition for the optimal control and it is difficult to translate into feedback form. Moreover, for the two-point boundary problem obtained by the necessary condition of optimality, the shooting method has the difficulty of “guess” for the initial data to start the iterative numerical process (Stoer and Bulirsch 1993). Only exceptional case is that when the control has a valid bound, the forward-backward sweep iterative method presented in Hackbusch (1978) works for any initial guess. For other two approaches, the simplification for original problem leads to the fall of reliability and accuracy, and when the degrees of discretization and parameterization are very high, the work of computation stands out and the solving process gives rise to “curse of dimensionality” (Bryson 1996). On the other hand, from the application point of view, two effective approaches for solving the optimal control problem are not brought into enough attention (Lin and Arora 1994). The first one is based on the solution of the optimality conditions obtained by using the calculus of variations. The second approach is the dynamic programming approach based optimality. By this principle, the Hamilton-Jacobi-Bellman (HJB) equation satisfied by the value function of the optimal control can be obtained. Furthermore, differential dynamic programming approach is developed in a special way based on the HJB equation. The basic idea is to successively approximate the solution using the first- and/or second-order terms of Taylor’s expansion of the performance index about the nominal control and state trajectories (Jacobson and Mayne 1970; Mayne and Polak 1975).
In this paper, different than any one of numerical approaches aforementioned, a method based on dynamic programming is constructed and elaborated with paradigm by a mathematical model for drug therapies of HIV/AIDS, to get the approximate (or sub) optimal control. It is generally considered that finding the optimal feedback control is the Holy Grail of control theory (Ho 2005). Bellman’s dynamic programming approach could provide a way of finding the optimal feedback control. However, since Pontryagin’s time, it has been known that using dynamic programming, the value function of optimal control system satisfies a nonlinear partial differential equation called HJB equation. If the value function and its gradient are known, then the optimal feedback control can be obtained analytically. But unfortunately, no matter how smooth the coefficients of HJB equation are, the classical solution may still not exist. Moreover, even if the classical solution exists, it may happen that the solution is not unique. That is why for a long time, dynamic programming plays no important roles in finding the optimal control.
The evolution happens when the viscosity solution was introduced by M.G. Crandall and P.L. Lions in the early 1980’s. By the viscosity solution theory (Bardi and Capuzzo-Dolcetta 1997), the value function is usually the unique viscosity solution to the associated HJB equation. But finding the analytical viscosity solution of HJB equation is usually impossible. The numerical solution is almost the best choice for finding the optimal control.
The objective of this paper is to demonstrate how to develop a completely new algorithm to find the numerical solutions of approximate optimal control based on dynamic programming without all solutions of value function. Moreover, our numerical solution is a single closed-loop solution starting from the initial state (Richardson and Wang 2006).
Different than those existed methods, in this paper, a novel algorithm based on dynamic programming approach is developed for a rather general class of optimal control problems. And to show its effectiveness and efficiency, as a paradigm, the new algorithm is applied an optimal control problem of two types of drug therapies for HIV/AIDS to get the approximate optimal control strategy. Although the solution obtained is a single closed-loop solution starting from the initial state (Richardson and Wang 2006) which cannot be considered in complete feedback form, the method has potential to synthesis the optimal feedback control if we are able to get all solutions of value function instead of the discrete point solutions that we search only along some direction. From the aspect of general control theory, the feedback control has some merits like disturbance attenuation and robustness. And it can decrease some errors caused by the measurement and modeling (Zurakowski and Teel 2006). For instance, in the investigation of HIV control, this point also has been proven that the feedback control can overcome unplanned treatment interruptions, inaccurate or incomplete data and imperfect model specification (David et al. 2011).
We proceed as follows. In Sect. 2, which consists of two subsections, some preliminaries related to the theoretical background of the DPVS approach is provided and then, the new algorithm for finding the approximate optimal control is constructed step by step for the optimal control problems of the general nonlinear lumped systems without state constraints. As a paradigm, in Sect. 3, the algorithm is applied to an optimal control problem of two types of drug therapies for HIV/AIDS, which indicates the effectiveness of the new algorithm. Finally, in Sect. 4, the contributions of this work are briefly summarized. The potentiality of the algorithm for attacking other optimal control problems in higher dimensions is addressed.
2 The DPVS approach
2.1 Preliminary
The following two propositions show the important role of the value function in characterizing the optimal feedback law (Bardi and Capuzzo-Dolcetta 1997).
Proposition 1
Proposition 2
By Propositions 1 and 2, we have the following Theorem 1, which leads to the construction of the feedback law via the value function.
Theorem 1
We see from Theorem 1 that in order to find the feedback control law, not only the value function V itself but also its gradient ∇_{x}V are needed.
Equation (2.3) generally has no classical solution regardless of the smoothness of the functions f and L. Fortunately, under some basic assumptions on f and L, the value function V is the unique viscosity solution to (2.3). However, it is usually not possible to find analytical solution of (2.3) for general nonlinear functions f, L. It therefore becomes very important to solve (2.3) numerically, particularly in applications. Actually, some difference schemes have been proposed to find the viscosity solutions (Fleming and Soner, 1993; Huang et al., 2000, 2004; Wang et al., 2003).
Once a viscosity solution of (2.3) is obtained numerically, we are able to construct a numerical solution of the feedback law by Theorem 1.
2.2 Algorithm of finding optimal feedback law
In this subsection, we follow Theorem 1 to construct an algorithm for finding numerical solutions of optimal feedback control and optimal trajectory pair. The algorithm consists of two coupled discretization steps. The first step is to discretize the HJB equation (2.3) to find the feedback law and the second one is to discretize the state equation (2.1) to find the optimal trajectory.
In the last two decades, many different approximation schemes have been developed for the numerical solution of (2.3) such as the upwind finite difference scheme (Wang et al. 2000), the method of vanishing viscosity (Crandall and Lions 1984), and the parallel algorithm based on the domain decomposition technique (Falcone et al. 1994), to name just a few. As for numerical solution of the state equation (2.1), there are numerous classical methods available such as the Euler method, the Runge-Kutta method, the Hamming algorithm (Stoer and Bulirsch 1993), etc.
It is pointed out that the above approximation (2.4) brings obvious advantages to our algorithm presented in this paper. If we try to work out the viscosity solution of (2.3) first, we will most likely be under “the curse of dimensionality” for high dimensional problems, since we have to obtain data for all grids in the whole region. Perhaps that is why the numerical experiments about the viscosity solution of (2.3) studied in many literatures are most limited to 1-D or 2-D problems (e.g., Wang et al., 2000). On the other hand, since our scheme searches the optimal control only along the direction of f not in the whole region, the new algorithm involves much less data. This idea is also applicable to infinite-dimensional systems (Guo and Sun 2005). To the best of our knowledge, there has been no effort along this direction to find optimal control by dynamic programming.
Based on above discussion, we now construct the algorithm for the numerical solutions of the optimal feedback control-trajectory pairs.
It is worth noting that the focus of the above algorithm is not to solve the HJB equation, not even to obtain the value function itself. This is different from most literatures (e.g., see Huang et al., 2000, 2004; Wang et al., 2003) in the field, which focus on solving the HJB equations. Our ultimate aim is to find the numerical solutions of both optimal feedback control and corresponding optimal trajectory. The whole algorithm consists of solving state equation one time and the HJB equation N times. According to our numerical experiments reported in Guo and Sun (2005, 2007) and the example presented in next section, the mesh point sequence in space generated by the recursive relation (2.6) does not bring oscillation of the space variable in subregions of the given space even when f changes its sign. This is because (2.4) is the more proper definition of the directional derivative, which allows us to search the optimal control along its natural direction f.
- (a)
In the beginning, evaluate \(V^{0}_{i}, i = 0, 1, \ldots, M\), using the terminal condition and obtain \(u^{0}_{i}\) by Eq. (2.7). For any given initial \(\tilde{u}\), we set one initial state x^{0}=z to evoke the computation. By the finite difference scheme (2.5) for the HJB equation (2.3), we obtain \(\{\{V^{j}_{i}\}^{M}_{i=0}\}_{j=0}^{N}\) and \(\{\{u^{j}_{i}\}^{M}_{i=0}\}_{j=0}^{N}\) (or in continuous case V(t,z) and u(t,z)). And \((u^{N}_{0}, x^{0}) = (u(0), z)\) is the first optimal control-trajectory pair.
- (b)
Then, substitute u(0),z into the finite difference scheme for state equation and obtain the new state y^{1} at time t_{N−1}. Replace \(\tilde{u}, z\) in (a) by u(0),y^{1} to obtain \(u^{N-1}_{0}\) using the difference scheme (2.5). Now we obtain the second optimal control-trajectory pair \((u^{N-1}_{0}, y^{1}) = (u(t_{N-1}), y(t_{N-1}))\).
- (c)
Proceed the computation until we obtain all (u(t_{N−j}),y(t_{N−j})), j=0,1,…,N, which constitute the optimal control-trajectory pairs. Hereto, the computation is done.
Note that during the execution of the whole algorithm, the finite difference scheme of the HJB equation is called once every time node is updated in the difference scheme for the state equation. After the algorithm finishes running, we call the state equation (the corresponding finite difference scheme, actually) one time and the HJB equation (also the corresponding finite difference scheme) N times overall. This computation is evoked by the initial state z and the solution we obtain just starts from this initial data. Until we adopt the same algorithm and finish all computations by the initial states from the definition domain, the feedback map for the optimal control problem is completely constructed.
3 A paradigm on optimal control of HIV/AIDS
In this section, to show how to utilize the new algorithm, we apply it to an optimal control problem of two types of drug therapies for HIV/AIDS and get the approximate optimal control strategy. By the aid of this paradigm, we try to elaborate the validity and effectiveness of the new algorithm.
Adopting the mathematical methods to understand the dynamics of the HIV/AIDS has been studied by many researchers, examples can be found in Nowak and May (2000), The INSIGHT-ESPRIT Study Group and SILCAAT Scientific Committee (2009). For using mathematics to understand the HIV immune dynamics, we refer to Kirschner (1996). In order to understand the dynamics of HIV-1 infection in vivo, it is studied in Perelson and Nelson (1999) how the dynamical modeling and parameter estimation techniques can be used to discover some important features of HIV pathogenesis. In Nowak and May (2000), the virus dynamics is discussed and the mathematical principle of immunology and virology are investigated. The works in Craig and Xia (2001), Craig et al. (2004) introduce HIV/AIDS education into the curriculum of a university, and the model of HIV/AIDS is analyzed from the control engineering point of view. Using control theory, the question of when to initiate HIV therapy is studied in Jeffrey et al. (2003). It is concluded that the therapy is best initiated when the viral load is easier to control. By the Pontryagin maximum principle, some optimal chemotherapy strategies are obtained in Butler et al. (1997), Felippe de Souza et al. (2000), Kirschner et al. (1997). Several methods of stable control of the HIV-1 population using an external feedback control are developed in Brandt and Chen (2001), where it is shown, by a feedback control approach, how the immune system components can be bolstered against the virus. An interesting study of the dynamic multi-drug therapies for HIV is presented in Wein et al. (1997) where a dynamic but not optimal policy is proposed. Adopting the model predictive control approach, Zurakowski and Teel (2006) derives the treatment schedules of the HIV therapy, which is a closed-loop control solution of the modified Wodarz-Nowark model. A neighboring-optimal control policy is presented in Stengel et al. (2002). Based on linearization of the nonlinear model at the steady state, Radisavljevic-Gajic (2009) proposes the control strategy for the HIV-virus dynamics and further for linear-quadratic optimal control problem. The controller based on minimization of the square of the error between the actual and desired (equilibrium) values is obtained. For a fractional-order HIV-immune system with memory, Ding et al. (2012) discusses the necessary conditions for the optimality of a general fractional optimal control problem. The fractional-order two-point boundary value problem is solved numerically and the effects of mathematically optimal therapy are demonstrated. In Adams et al. (2007), the researchers fit a nonlinear dynamical mathematical model of HIV infection to longitudinal clinical data for individual patients. And a statistically-based censored data method is combined with inverse problem techniques to estimate dynamic parameters. The important works Adams et al. (2004, 2005) investigate the optimal control strategy of HIV by the Pontryagin maximum principle. Since the number of elements of the control set Λ in Adams et al. (2004, 2005) is finite, the optimal structured treatment interruptions control problems are considered via firstly the crude direct search approach involving the simple comparisons, then 5 day segment strategy to reduce the number of iterations, finally the subperiod method to further alleviate the computational burden. The suboptimal structured treatment interruptions therapy protocols are derived without mentioning the HJB equation and the concrete algorithm.
The optimal control obtained in these aforementioned works are obtained through the Pontryagin maximum principle. They are hence inherently the open loop control. The Pontryagin maximum principle is a necessary condition. Most often, “those sophisticated necessary conditions rarely give an insight into the structure of the optimal controls” (Rubio 1986). Moreover, the open loop control characterized by the Pontryagin maximum principle solves (when it does!) the problem only for specified initial data, yet the optimal control problem usually needs a solution for a multitude of initial data. In addition, some simple examples in Lenhart and Workman (2007) show that the optimal controls and trajectories corresponding to the initial data may have different structures. This brings the difficulties in finding the numerical solution (Miricǎ 2008). Actually, it is remarked in Boltyanskii (1971) that “the use of the Pontryagin maximum principle to solve concrete problems is very difficult since it contains two different simultaneous mathematical problems: integration of the canonical differential system and, simultaneously, the maximization of the Hamiltonian”.
This simple model of HIV progression considers only the uninfected CD4^{+} T cell population and the free virus population interacting in the plasma. Currently, there are a few newer models available as the HIV research develops, which include more state variables, such as the class of infected CD4^{+} T cells. Some of them are late findings on IL-2 coadministration for HIV therapy, such as that in The INSIGHT-ESPRIT Study Group and SILCAAT Scientific Committee (2009), which almost refers all the recent advances in this field. However, even from the biological meanings, the adopted model can show that the dynamics of HIV progression in the plasma can be based upon simple assumptions about the interactions of uninfected CD4^{+}T cells and free virus. Since there is extensive data for these two populations during the progression, the model simulations can be compared to data. This mathematical model (3.8) of HIV/AIDS progression was firstly presented in Kirschner and Webb (1998) to characterize the interaction of CD4^{+}T cells and HIV particles in the immune system, where one can find more details on its biological background. And Joshi (2002) utilized the Pontryagin maximum principle to investigate the optimal control problem of the model. In this paper, we aim to develop a completely new algorithm to find the numerical solutions of approximate optimal control based on dynamic programming without all solutions of value function. The problem of drug combination in virus treatment as an application is given. The simple HIV/AIDS model adopted in this paper, due to its mathematical properties, makes it more easily for illustration of our purpose. The proposed algorithm has the ability to treat more complex and practical models like those considered in Rong et al. (2007), Rong and Perelson (2009) where the drug resistance, immune response and HIV latency become critical issues leading to HIV persistence in the presence of long-term effective therapy.
It should be pointed out that making the magnitude difference between A_{1} and A_{2} is to balance the size of the terms. The control bounded for u_{1} here is [0,2.0×10^{−2}] and its square is [0,4.0×10^{−4}], and the control bounded for u_{2} is [0,0.9] and its square is [0,0.81]. The control u_{1}, u_{2} denote the drug administration schedules. The value u_{1}=0.02 means that the drug u_{1} is administered in full scale per day. Similarly, the drug u_{2} is administered in full scale per day if u_{2}=0.9.
It should be pointed out that although the results we obtained seem not optimal (the optimal control should be continuous from Theorems 6.1, 6.2 of Fleming Wendell and Rishel, 1975), they are approximate optimal (or suboptimal) in the sense that they are very effective to suppress the HIV particles and boost the function of immune systems. Even from the point of view of numerical analysis, our results are also satisfactory. In order to check this point, we do some numerical experiments. This is performed under the same initial condition to compare the cost corresponding to the approximate optimal control-trajectory pair with that of arbitrarily selected admissible controls and corresponding trajectories.
Costs corresponding to different control-trajectory pairs
Control (u_{1},u_{2}) | J values with A_{3}=100 | J values with A_{3}=0 |
---|---|---|
Case (a, A) | −19037.364346 | −19037.364346 |
Case (b, B) | −16091.030761 | −19455.373985 |
Case (b, C) | −18927.367136 | −18927.595354 |
Case (a, D) | −18365.038145 | −19997.793922 |
Case (c, B) | −13315.540793 | −16752.069485 |
Case (c, C) | −16250.916681 | −16251.276633 |
Case (d, A) | −14479.391625 | −14479.394837 |
Case (d, D) | −5464.061856 | −15709.633929 |
Case (e, E) | −18057.659758 | −18094.224090 |
Approximate optimal \((u^{*}_{1}, u^{*}_{2})\) | −19308.169228 | −20177.695872 |
Remark 1
It is clearly seen from Table 1 that the cost corresponding to the approximate optimal control-trajectory pair is less than any other values under the control (u_{1},u_{2}) no matter what A_{3} takes. In particular, when A_{3}=100, the cost corresponding to the control (d,D) is maximal. This corresponds to the case without any treatment. When A_{3}=0, the maximal cost occurs in the case (d,A), which advises us that the single viral suppressing drug u_{2} cannot give a good therapy effect, if the immune boosting drug u_{1} absents. When A_{3}=100, the case (a,A) is the second minimal one, which corresponds to the case of full therapy. Namely, the patient takes these two drugs in the whole observed period in the full scale. It tells us that receiving the full therapy is also a good treatment option if there are no better schemes. In both cases of (d,A), (d,D), if there is no the immune boosting drug u_{1}, the corresponding costs are very high whether the viral supporting drug u_{2} exists or not. So the immune boosting drug is very important to get the ideal therapy effect.
Remark 2
In Joshi (2002), it is also said that “the format of the optimal controls do agree with those in the papers Brandt and Chen (2001), Bryson (1996), Craig et al. (2004) with only one control”. Because we do not know the initial values used in Joshi (2002) (the optimal control depends on initial value), we are not able to produce exactly the same graph of Joshi (2002). However, we used the initial value (400.0,3.5) for the 50 days simulation as Joshi (2002). The first control is almost the same as that in Joshi (2002) and the format of CD4^{+}T cell does agree with that in Joshi (2002) (see Fig. 3: Comparison with existing results). As for the second control and HIV particles, our results are more like Fig. 4 of Kirschner et al. (1997) on p. 789.
4 Concluding remarks
In this paper, a new algorithm is presented to find the optimal single closed-loop solutions starting from the initial state (Richardson and Wang 2006) by dynamic programming approach. The algorithm is based on two observations: (a) the value function of the optimal control problem considered is the viscosity solution of the associated Hamilton-Jacobi-Bellman (HJB) equation; and (b) the appearance of the gradient of the value function in the HJB equation is in the form of directional derivative. The algorithm proposes a discretization method for seeking approximate optimal control-trajectory pairs based on a finite-difference scheme in time through solving the HJB equation and state equation. We apply the algorithm to a HIV/AIDS optimal control problem. The results illustrate that the method is valid and effective. Furthermore, the algorithm can be applied to other optimal control problems in higher dimensions.
From the HIV/AIDS model example, our numerical computations show that the influence of the different initial conditions are not remarkable. What can make a big difference for the results is the terminal cost. It seems that adding the terminal cost in the cost functional is necessary, which is the big difference the results presented in this paper with existing literatures.
Finally, we indicate that the solution we obtained might be local minimal due to the chosen of the starting control. This can be ruled out through comparison with “arbitrarily chosen” admissible control for a practical problem. The further investigation of this problem is needed.
Acknowledgements
The authors would like to thank the Editor and the anonymous reviewers for their valuable comments and suggestions that improve the paper substantially.