Optimization and Engineering

, Volume 15, Issue 1, pp 119–136

# Dynamic programming approach to the numerical solution of optimal control with paradigm by a mathematical model for drug therapies of HIV/AIDS

Article

DOI: 10.1007/s11081-012-9204-4

Guo, BZ. & Sun, B. Optim Eng (2014) 15: 119. doi:10.1007/s11081-012-9204-4

## Abstract

In this paper, we present a new numerical algorithm to find the optimal control for the general nonlinear lumped systems without state constraints. The dynamic programming-viscosity solution (DPVS) approach is developed and the numerical solutions of both approximate optimal control and trajectory are produced. To show the effectiveness and efficiency of new algorithm, we apply it to an optimal control problem of two types of drug therapies for human immunodeficiency virus (HIV)/acquired immune deficiency syndrome (AIDS). The quality of the obtained optimal control and the trajectory pair is checked through comparison with the costs under the arbitrarily selected different controls. The results illustrate the effectiveness of the algorithm.

### Keywords

Optimal control Viscosity solution Dynamic programming Numerical solution

## 1 Introduction

Optimal control has been being one of the main topics in modern control theory, which develops from two aspects: abstract theory and computational method. From the computational aspect, on the one hand, there are essentially three types of approaches seeking the numerical solution of optimal control for a continuous control system (Sargent 2000). The first one is by the necessary condition of optimality, such as the Pontryagin maximum principle, to solve a two-point boundary value problem mainly utilizing the multiple shooting method. The second kind of method converts the original continuous problem into a finite-dimensional nonlinear program problem through full discretization. The last one is through the parameterization of control trajectory to get a nonlinear program problem and then adopts proper steps to deal with it further (von Stryk and Bulirsch 1992). All these methods have their own drawbacks. The Pontryagin maximum principle provides only a necessary condition for the optimal control and it is difficult to translate into feedback form. Moreover, for the two-point boundary problem obtained by the necessary condition of optimality, the shooting method has the difficulty of “guess” for the initial data to start the iterative numerical process (Stoer and Bulirsch 1993). Only exceptional case is that when the control has a valid bound, the forward-backward sweep iterative method presented in Hackbusch (1978) works for any initial guess. For other two approaches, the simplification for original problem leads to the fall of reliability and accuracy, and when the degrees of discretization and parameterization are very high, the work of computation stands out and the solving process gives rise to “curse of dimensionality” (Bryson 1996). On the other hand, from the application point of view, two effective approaches for solving the optimal control problem are not brought into enough attention (Lin and Arora 1994). The first one is based on the solution of the optimality conditions obtained by using the calculus of variations. The second approach is the dynamic programming approach based optimality. By this principle, the Hamilton-Jacobi-Bellman (HJB) equation satisfied by the value function of the optimal control can be obtained. Furthermore, differential dynamic programming approach is developed in a special way based on the HJB equation. The basic idea is to successively approximate the solution using the first- and/or second-order terms of Taylor’s expansion of the performance index about the nominal control and state trajectories (Jacobson and Mayne 1970; Mayne and Polak 1975).

In this paper, different than any one of numerical approaches aforementioned, a method based on dynamic programming is constructed and elaborated with paradigm by a mathematical model for drug therapies of HIV/AIDS, to get the approximate (or sub) optimal control. It is generally considered that finding the optimal feedback control is the Holy Grail of control theory (Ho 2005). Bellman’s dynamic programming approach could provide a way of finding the optimal feedback control. However, since Pontryagin’s time, it has been known that using dynamic programming, the value function of optimal control system satisfies a nonlinear partial differential equation called HJB equation. If the value function and its gradient are known, then the optimal feedback control can be obtained analytically. But unfortunately, no matter how smooth the coefficients of HJB equation are, the classical solution may still not exist. Moreover, even if the classical solution exists, it may happen that the solution is not unique. That is why for a long time, dynamic programming plays no important roles in finding the optimal control.

The evolution happens when the viscosity solution was introduced by M.G. Crandall and P.L. Lions in the early 1980’s. By the viscosity solution theory (Bardi and Capuzzo-Dolcetta 1997), the value function is usually the unique viscosity solution to the associated HJB equation. But finding the analytical viscosity solution of HJB equation is usually impossible. The numerical solution is almost the best choice for finding the optimal control.

The objective of this paper is to demonstrate how to develop a completely new algorithm to find the numerical solutions of approximate optimal control based on dynamic programming without all solutions of value function. Moreover, our numerical solution is a single closed-loop solution starting from the initial state (Richardson and Wang 2006).

Different than those existed methods, in this paper, a novel algorithm based on dynamic programming approach is developed for a rather general class of optimal control problems. And to show its effectiveness and efficiency, as a paradigm, the new algorithm is applied an optimal control problem of two types of drug therapies for HIV/AIDS to get the approximate optimal control strategy. Although the solution obtained is a single closed-loop solution starting from the initial state (Richardson and Wang 2006) which cannot be considered in complete feedback form, the method has potential to synthesis the optimal feedback control if we are able to get all solutions of value function instead of the discrete point solutions that we search only along some direction. From the aspect of general control theory, the feedback control has some merits like disturbance attenuation and robustness. And it can decrease some errors caused by the measurement and modeling (Zurakowski and Teel 2006). For instance, in the investigation of HIV control, this point also has been proven that the feedback control can overcome unplanned treatment interruptions, inaccurate or incomplete data and imperfect model specification (David et al. 2011).

We proceed as follows. In Sect. 2, which consists of two subsections, some preliminaries related to the theoretical background of the DPVS approach is provided and then, the new algorithm for finding the approximate optimal control is constructed step by step for the optimal control problems of the general nonlinear lumped systems without state constraints. As a paradigm, in Sect. 3, the algorithm is applied to an optimal control problem of two types of drug therapies for HIV/AIDS, which indicates the effectiveness of the new algorithm. Finally, in Sect. 4, the contributions of this work are briefly summarized. The potentiality of the algorithm for attacking other optimal control problems in higher dimensions is addressed.

## 2 The DPVS approach

### 2.1 Preliminary

Consider the following control system:
$$\begin{cases} y^{\prime}(t)=f(y(t),u(t)), & t\in(0,T],T>0,\\ y(0)=z, \end{cases}$$
(2.1)
where y(⋅)∈ℝn is the state, $$u(\cdot)\in\mathcal{U}[0,T]= L^{\infty}([0,T]; U)$$ is the admissible control with compact set U⊂ℝm and z is the initial value. Assume that f:ℝn×U→ℝn is continuous in its variables.
Given a running cost L(t,y,u) and a terminal cost ψ(y), the optimal control problem for the system (2.1) is to seek an optimal control $$u^{*}(\cdot)\in\mathcal{U}[0,T]$$, such that
$$J\bigl(u^*(\cdot)\bigr)=\inf_{u(\cdot)\in\mathcal{U}[0,T]} J\bigl(u(\cdot)\bigr),$$
(2.2)
where J is the cost functional given by
$$J\bigl(u(\cdot)\bigr)= \int^T_0 L\bigl(\tau, y(\tau), u(\tau)\bigr)d\tau+ \psi\bigl(y(T)\bigr).$$
The dynamic programming principle due to Bellman is fundamental in modern optimal control theory. Instead of considering optimal control problem (2.1)–(2.2), the principle proposes to deal with a family of optimal control problems initiating from (2.1)–(2.2). That is, consider the optimal control problem for the following system for any (t,x)∈[0,T)×ℝn:
$$\begin{cases} y^{\prime}_{t,x}(s)=f( y_{t,x}(s),u(s)),& s\in(t, T],\\ y_{t,x}(t)=x, \end{cases}$$
with the cost functional
$$J_{t,x}\bigl(u(\cdot)\bigr)= \int^T_t L\bigl(\tau, y_{t,x}(\tau), u(\tau)\bigr)d\tau+ \psi \bigl(y_{t,x}(T)\bigr).$$
Define the value function
$$V(t,x)=\inf_{u(\cdot)\in\mathcal{U}[t,T]} J_{t,x}\bigl(u(\cdot)\bigr), \quad \forall\ (t,x)\in[0,T)\times\mathbb{R}^n$$
with the terminal value
$$V(T,x)=\psi(x) \quad \mbox{for all } x\in\mathbb{R}^n.$$
It is well known that if V is smooth enough, say VC1([0,T]×ℝn), then V satisfies the following HJB equation (Bardi and Capuzzo-Dolcetta 1997):
$$\begin{cases} V_t(t,x)+\inf_{u\in U} \{f(x,u)\cdot\nabla_x V(t,x) +L(t,x,u) \}=0, \quad (t,x)\in[0,T)\times\mathbb{R}^n,\\ V(T,x)=\psi(x), \quad x\in\mathbb{R}^n, \end{cases}$$
(2.3)
where ∇xV(t,x) stands for the gradient of V in x.

The following two propositions show the important role of the value function in characterizing the optimal feedback law (Bardi and Capuzzo-Dolcetta 1997).

### Proposition 1

LetVC1([0,T]×ℝn) be the value function. Then if there exists a control$$u^{*}(\cdot)\in\mathcal{U}[0,T]$$such thatthenu(⋅) is an optimal control, whereyis the state corresponding tou.
As usual, we denote the optimal control as (Bardi and Capuzzo-Dolcetta 1997)
$$u^*(t)\in\arg\inf_{u\in U} \bigl\{f\bigl(y^*(t),u\bigr)\cdot \nabla_x V\bigl(t,y^*(t)\bigr) +L\bigl(t,y^*(t),u\bigr) \bigr\}$$
for almost all t∈[0,T].

### Proposition 2

LetV(t,x)∈C1([0,T]×ℝn) be the value function. Then (u(⋅),y(⋅)) is an optimal control-trajectory pair in feedback form if and only if
$$V_t\bigl(t,y^*(t)\bigr)+f\bigl(y^*(t),u^*(t)\bigr)\cdot \nabla_x V\bigl(t,y^*(t)\bigr) +L\bigl(t,y^*(t), u^*(t)\bigr)=0$$
for almost allt∈[0,T).

By Propositions 1 and 2, we have the following Theorem 1, which leads to the construction of the feedback law via the value function.

### Theorem 1

LetV(t,x)∈C1([0,T]×ℝn) be the value function. Supposeu(t,x) satisfies
$$f\bigl(x,u(t,x)\bigr)\cdot\nabla_x V(t,x)+L\bigl(t,x,u(t,x)\bigr) = \inf_{u\in U} \bigl\{f(x,u)\cdot\nabla_x V(t,x)+L(t,x,u) \bigr\}.$$
Then
$$u_{z}^*(t)=u\bigl(t,y_{z}(t)\bigr)$$
is the feedback law of the optimal control problem (2.1)(2.2), whereyz(t) satisfying the following
$$y_{z}^{\prime}(t)=f\bigl(y_{z}(t),u \bigl(t,y_{z}(t)\bigr)\bigr), \quad y_{z}(0)=z, \ t\in[0, T].$$

We see from Theorem 1 that in order to find the feedback control law, not only the value function V itself but also its gradient ∇xV are needed.

Equation (2.3) generally has no classical solution regardless of the smoothness of the functions f and L. Fortunately, under some basic assumptions on f and L, the value function V is the unique viscosity solution to (2.3). However, it is usually not possible to find analytical solution of (2.3) for general nonlinear functions f, L. It therefore becomes very important to solve (2.3) numerically, particularly in applications. Actually, some difference schemes have been proposed to find the viscosity solutions (Fleming and Soner, 1993; Huang et al., 2000, 2004; Wang et al., 2003).

Once a viscosity solution of (2.3) is obtained numerically, we are able to construct a numerical solution of the feedback law by Theorem 1.

### 2.2 Algorithm of finding optimal feedback law

In this subsection, we follow Theorem 1 to construct an algorithm for finding numerical solutions of optimal feedback control and optimal trajectory pair. The algorithm consists of two coupled discretization steps. The first step is to discretize the HJB equation (2.3) to find the feedback law and the second one is to discretize the state equation (2.1) to find the optimal trajectory.

In the last two decades, many different approximation schemes have been developed for the numerical solution of (2.3) such as the upwind finite difference scheme (Wang et al. 2000), the method of vanishing viscosity (Crandall and Lions 1984), and the parallel algorithm based on the domain decomposition technique (Falcone et al. 1994), to name just a few. As for numerical solution of the state equation (2.1), there are numerous classical methods available such as the Euler method, the Runge-Kutta method, the Hamming algorithm (Stoer and Bulirsch 1993), etc.

Notice that when we use Theorem 1 to find the optimal feedback law, it is the directional derivative ∇xVf not the gradient ∇xV itself that is needed. This fact greatly facilitates our search for numerical solutions. The key step is to approximate ∇xVf by its natural definition as following: where η>0 is a small number and ∥⋅∥ denotes the Euclidean norm in ℝn. Based on observation (2.4), the HJB equation (2.3) can be thereby approximated by a finite difference scheme in time and space mesh approximation:
$$\begin{cases} \frac{V^{j+1}_i - V^j_i}{\Delta t}+\frac{V^{j}_{i+1} - V^j_i}{\eta}\cdot(1+\|f^j_i\|) + L^j_i=0,\\ u^{j+1}_i \in\arg\inf_{u\in U} \{ \frac{V^{j+1}_{i+1} - V^{j+1}_{i}}{\eta} \cdot(1+\|f(x_i,u)\|) + L(t_{j+1},x_i,u) \} \end{cases}$$
(2.5)
for i=0,1,…,M and j=0,1,…,N−1, where $$V^{j}_{i}=V(t_{j},x_{i}), f^{j}_{i}= f(x_{i},u^{j}_{i})$$, $$L^{j}_{i}= L(t_{j},x_{i},u^{j}_{i})$$. Meanwhile, it is assumed that
$$\frac{|\Delta t|}{\eta}\Bigl(1+ \max_{i,j}\bigl\|f^j_i\bigr\| \Bigr)\leq1,$$
which is a necessary and sufficient condition for the stability of the finite difference scheme (Press et al. 2002).

It is pointed out that the above approximation (2.4) brings obvious advantages to our algorithm presented in this paper. If we try to work out the viscosity solution of (2.3) first, we will most likely be under “the curse of dimensionality” for high dimensional problems, since we have to obtain data for all grids in the whole region. Perhaps that is why the numerical experiments about the viscosity solution of (2.3) studied in many literatures are most limited to 1-D or 2-D problems (e.g., Wang et al., 2000). On the other hand, since our scheme searches the optimal control only along the direction of f not in the whole region, the new algorithm involves much less data. This idea is also applicable to infinite-dimensional systems (Guo and Sun 2005). To the best of our knowledge, there has been no effort along this direction to find optimal control by dynamic programming.

Based on above discussion, we now construct the algorithm for the numerical solutions of the optimal feedback control-trajectory pairs.

Step 1: Initial partition on time and space. Select two positive integers N and M. Let tj=T+jΔt,j=0,1,…,N be a backward partition of [0,T], where Δt=−T/N. For any initial given $$\tilde{u}\in U$$, let initial state x0=z and
$$x_i=x_{i-1}+\eta \frac{f(x_{i-1},\tilde{u})}{1+\|f(x_{i-1},\tilde{u})\|}, \quad i=1,2, \ldots, M.$$
(2.6)
Step 2: Initialization of value function and control. Let
$$\begin{cases} V^0_i =\psi(x_i),\\ u^{0}_i \in\arg\inf_{u\in U} \{ \frac{V^{0}_{i+1} - V^{0}_{i}}{\eta} \cdot(1+\|f(x_i,u)\|) + L(T,x_i,u) \}, \quad i=0,1,\ldots, M. \end{cases}$$
(2.7)
Step 3: Iteration for HJB equation. By (2.5) and Step 2, we obtain all $$\{\{V^{j}_{i}\}^{M}_{i=0}\}_{j=0}^{N}$$ and $$\{\{u^{j}_{i}\}^{M}_{i=0}\}_{j=0}^{N}$$:
$$\begin{cases} V^{j+1}_i= (1+\frac{\Delta t}{\eta} (1+\|f^j_i\|) )V^j_i - \frac{\Delta t}{\eta}(1+\|f^j_i\|) V^j_{i+1} - \Delta t L^j_i,\\ u^{j+1}_i \in\arg\inf_{u \in U} \{ \frac{V^{j+1}_{i+1} - V^{j+1}_{i}}{\eta} \cdot(1+\|f(x_i,u)\|)+ L(t_{j+1},x_i,u) \}. \end{cases}$$
Here, $$(u_{0}^{N}, y^{0})=(u_{0}^{N}, y(0))=(u(0),z)$$ is the first optimal feedback control-trajectory pair.
Step 4: Iteration for state equation. For j=0,1,2,…,N−1, solve the state equation:
$$\dfrac{y^{j+1}-y^j}{-\Delta T}=f\bigl(y^j,u(t_{N-j})\bigr),$$
to obtain yj+1=y(tNj−1). Replace $$(\tilde{u},z)$$ in Step 1 by (u(tNj),yj+1) and goto Step 2 and Step 3 to obtain $$u_{0}^{N-j-1}$$. Then $$(u_{0}^{N-j-1},y^{j+1})=(u(t_{N-j-1}),y(t_{N-j-1}))$$ is the (j+2)-th optimal feedback control-trajectory pair. Continue the iteration to obtain all $$(u(t_{N-j}), y(t_{N-j}))^{N}_{j=0}$$.
After Steps 1–4, we finally get all the desired optimal feedback control-trajectory pairs:
$$\bigl(u(t_{N-j}), y(t_{N-j})\bigr), \quad j=0,1,\ldots, N.$$

It is worth noting that the focus of the above algorithm is not to solve the HJB equation, not even to obtain the value function itself. This is different from most literatures (e.g., see Huang et al., 2000, 2004; Wang et al., 2003) in the field, which focus on solving the HJB equations. Our ultimate aim is to find the numerical solutions of both optimal feedback control and corresponding optimal trajectory. The whole algorithm consists of solving state equation one time and the HJB equation N times. According to our numerical experiments reported in Guo and Sun (2005, 2007) and the example presented in next section, the mesh point sequence in space generated by the recursive relation (2.6) does not bring oscillation of the space variable in subregions of the given space even when f changes its sign. This is because (2.4) is the more proper definition of the directional derivative, which allows us to search the optimal control along its natural direction f.

If only the solution of HJB equation (2.3) is concerned, for instance computing the values of V in the polyhedral $$\prod_{i=1}^{n}[a_{i},b_{i}]$$ of ℝn, we have to produce the monotone mesh point sequence in space generated by the recursive relation (2.6). To this purpose, we have to change the searching direction forcefully. In this case, we suggest to use the following approximation for directional derivative instead of using the approximation of (2.4). Specifically, for every fixed (x,u),x=(x1,x2,…,xn),f=(f1,f2,…,fn), where
$$\operatorname {sgn}(\wp) = \begin{cases} 1, & \textrm{if } \wp\geq0,\\ -1, & \textrm{if }\wp< 0, \end{cases}$$
Ip is the n-dimension unit vector with the p-th component 1. $$V_{x^{p}}$$ is the partial derivative of the value function V with respect to xp.
In this way, the possible oscillation caused by sign change of the function f can be avoided when we use the formula (2.6) to perform the spatial mesh partition. Accordingly, the mesh point sequence in space generated by the recursive relation (2.6) now becomes
$$x_i^p=x_{i-1}^p+\eta \frac{\operatorname {sgn}(f^p(x_{i-1},\tilde{u}))f^p(x_{i-1},\tilde{u})}{1+\| f^p(x_{i-1},\tilde{u})\|}, \quad p=1,2,\ldots,n, \ i=1,2,\ldots, M,$$
where $$x_{k}=(x_{k}^{1},x_{k}^{2},\ldots,x_{k}^{n}), k=i-1,i$$.
To end this section, we point out that by the algorithm, the solution we obtained is a single closed-loop solution starting from the initial state (Richardson and Wang 2006). This fact can be seen from the execution process of the algorithm.
1. (a)

In the beginning, evaluate $$V^{0}_{i}, i = 0, 1, \ldots, M$$, using the terminal condition and obtain $$u^{0}_{i}$$ by Eq. (2.7). For any given initial $$\tilde{u}$$, we set one initial state x0=z to evoke the computation. By the finite difference scheme (2.5) for the HJB equation (2.3), we obtain $$\{\{V^{j}_{i}\}^{M}_{i=0}\}_{j=0}^{N}$$ and $$\{\{u^{j}_{i}\}^{M}_{i=0}\}_{j=0}^{N}$$ (or in continuous case V(t,z) and u(t,z)). And $$(u^{N}_{0}, x^{0}) = (u(0), z)$$ is the first optimal control-trajectory pair.

2. (b)

Then, substitute u(0),z into the finite difference scheme for state equation and obtain the new state y1 at time tN−1. Replace $$\tilde{u}, z$$ in (a) by u(0),y1 to obtain $$u^{N-1}_{0}$$ using the difference scheme (2.5). Now we obtain the second optimal control-trajectory pair $$(u^{N-1}_{0}, y^{1}) = (u(t_{N-1}), y(t_{N-1}))$$.

3. (c)

Proceed the computation until we obtain all (u(tNj),y(tNj)), j=0,1,…,N, which constitute the optimal control-trajectory pairs. Hereto, the computation is done.

Note that during the execution of the whole algorithm, the finite difference scheme of the HJB equation is called once every time node is updated in the difference scheme for the state equation. After the algorithm finishes running, we call the state equation (the corresponding finite difference scheme, actually) one time and the HJB equation (also the corresponding finite difference scheme) N times overall. This computation is evoked by the initial state z and the solution we obtain just starts from this initial data. Until we adopt the same algorithm and finish all computations by the initial states from the definition domain, the feedback map for the optimal control problem is completely constructed.

## 3 A paradigm on optimal control of HIV/AIDS

In this section, to show how to utilize the new algorithm, we apply it to an optimal control problem of two types of drug therapies for HIV/AIDS and get the approximate optimal control strategy. By the aid of this paradigm, we try to elaborate the validity and effectiveness of the new algorithm.

Adopting the mathematical methods to understand the dynamics of the HIV/AIDS has been studied by many researchers, examples can be found in Nowak and May (2000), The INSIGHT-ESPRIT Study Group and SILCAAT Scientific Committee (2009). For using mathematics to understand the HIV immune dynamics, we refer to Kirschner (1996). In order to understand the dynamics of HIV-1 infection in vivo, it is studied in Perelson and Nelson (1999) how the dynamical modeling and parameter estimation techniques can be used to discover some important features of HIV pathogenesis. In Nowak and May (2000), the virus dynamics is discussed and the mathematical principle of immunology and virology are investigated. The works in Craig and Xia (2001), Craig et al. (2004) introduce HIV/AIDS education into the curriculum of a university, and the model of HIV/AIDS is analyzed from the control engineering point of view. Using control theory, the question of when to initiate HIV therapy is studied in Jeffrey et al. (2003). It is concluded that the therapy is best initiated when the viral load is easier to control. By the Pontryagin maximum principle, some optimal chemotherapy strategies are obtained in Butler et al. (1997), Felippe de Souza et al. (2000), Kirschner et al. (1997). Several methods of stable control of the HIV-1 population using an external feedback control are developed in Brandt and Chen (2001), where it is shown, by a feedback control approach, how the immune system components can be bolstered against the virus. An interesting study of the dynamic multi-drug therapies for HIV is presented in Wein et al. (1997) where a dynamic but not optimal policy is proposed. Adopting the model predictive control approach, Zurakowski and Teel (2006) derives the treatment schedules of the HIV therapy, which is a closed-loop control solution of the modified Wodarz-Nowark model. A neighboring-optimal control policy is presented in Stengel et al. (2002). Based on linearization of the nonlinear model at the steady state, Radisavljevic-Gajic (2009) proposes the control strategy for the HIV-virus dynamics and further for linear-quadratic optimal control problem. The controller based on minimization of the square of the error between the actual and desired (equilibrium) values is obtained. For a fractional-order HIV-immune system with memory, Ding et al. (2012) discusses the necessary conditions for the optimality of a general fractional optimal control problem. The fractional-order two-point boundary value problem is solved numerically and the effects of mathematically optimal therapy are demonstrated. In Adams et al. (2007), the researchers fit a nonlinear dynamical mathematical model of HIV infection to longitudinal clinical data for individual patients. And a statistically-based censored data method is combined with inverse problem techniques to estimate dynamic parameters. The important works Adams et al. (2004, 2005) investigate the optimal control strategy of HIV by the Pontryagin maximum principle. Since the number of elements of the control set Λ in Adams et al. (2004, 2005) is finite, the optimal structured treatment interruptions control problems are considered via firstly the crude direct search approach involving the simple comparisons, then 5 day segment strategy to reduce the number of iterations, finally the subperiod method to further alleviate the computational burden. The suboptimal structured treatment interruptions therapy protocols are derived without mentioning the HJB equation and the concrete algorithm.

The optimal control obtained in these aforementioned works are obtained through the Pontryagin maximum principle. They are hence inherently the open loop control. The Pontryagin maximum principle is a necessary condition. Most often, “those sophisticated necessary conditions rarely give an insight into the structure of the optimal controls” (Rubio 1986). Moreover, the open loop control characterized by the Pontryagin maximum principle solves (when it does!) the problem only for specified initial data, yet the optimal control problem usually needs a solution for a multitude of initial data. In addition, some simple examples in Lenhart and Workman (2007) show that the optimal controls and trajectories corresponding to the initial data may have different structures. This brings the difficulties in finding the numerical solution (Miricǎ 2008). Actually, it is remarked in Boltyanskii (1971) that “the use of the Pontryagin maximum principle to solve concrete problems is very difficult since it contains two different simultaneous mathematical problems: integration of the canonical differential system and, simultaneously, the maximization of the Hamiltonian”.

Different with the maximum principle, in this paper, we apply the DPVS approach to an optimal control problem of two types of drug therapies for HIV/AIDS to get the approximate optimal control strategy. Now let us introduce the investigated HIV/AIDS therapy model in this section. Let x1 represent the concentration of the uninfected CD4+T cells, and x2 the free infectious virus particle population, respectively. Then for any tf>0, the two types of drug treatments of HIV/AIDS model can be described by the following system of ordinary differential equations (Joshi 2002; Kirschner and Webb 1998):
$$\begin{cases} x^{\prime}_1(t) = s_1 - \frac{s_2 x_2(t)}{B_1 + x_2(t)} - \mu x_1(t) - k x_2(t) x_1(t) + u_1(t) x_1(t), \quad t\in(0,t_f),\\[3pt] x^{\prime}_2(t) = \frac{g (1 - u_2(t)) x_2(t)}{B_2 + x_2(t)} - c x_2(t) x_1(t), \quad t\in(0,t_f), \\ (x_1(0), x_2(0))=z = (z_1, z_2), \end{cases}$$
(3.8)
where $$u(\cdot) = (u_{1}(\cdot), u_{2}(\cdot))\in\mathcal{U}[0, t_{f}]= L^{\infty}((0, t_{f});U)$$, the admissible control set in which u1,u2 represent the immune boosting and viral suppressing drugs, respectively. Here we may use IL-2 immunotherapy as the immune system enhancing drug u1. By u1(t)x1(t) it is assumed that the enhancement of the immune system through IL-2 results in an increase in the CD4 T cells proportional to the population of these cells at the rate u1(t). The set U=[c1,c1r]×[c2,c2r] is the control constraint set for some constants ciℓ,cir,i=1,2. In the model equation, the term $$s_{1} - \frac{s_{2} x_{2}(t)}{B_{1} + x_{2}(t)}$$ represents the source/proliferation of uninfected CD4 T cells which includes both an external (not plasma) contribution of cells from sources such as the thymus and lymph nodes, and an internal (plasma) contribution from CD4 T cell differentiation. This T-cell source deteriorates during the progression with limiting value s1s2. s1,s2 are constant source/production of CD4+T cells, μ is the death rate of the uninfected CD4+T cells, k is the rate of CD4+T cells that become infected by free virus x2, g is the input rate of the external viral source, c is the loss rate of virus and B1,B2 are half saturation constants.

This simple model of HIV progression considers only the uninfected CD4+ T cell population and the free virus population interacting in the plasma. Currently, there are a few newer models available as the HIV research develops, which include more state variables, such as the class of infected CD4+ T cells. Some of them are late findings on IL-2 coadministration for HIV therapy, such as that in The INSIGHT-ESPRIT Study Group and SILCAAT Scientific Committee (2009), which almost refers all the recent advances in this field. However, even from the biological meanings, the adopted model can show that the dynamics of HIV progression in the plasma can be based upon simple assumptions about the interactions of uninfected CD4+T cells and free virus. Since there is extensive data for these two populations during the progression, the model simulations can be compared to data. This mathematical model (3.8) of HIV/AIDS progression was firstly presented in Kirschner and Webb (1998) to characterize the interaction of CD4+T cells and HIV particles in the immune system, where one can find more details on its biological background. And Joshi (2002) utilized the Pontryagin maximum principle to investigate the optimal control problem of the model. In this paper, we aim to develop a completely new algorithm to find the numerical solutions of approximate optimal control based on dynamic programming without all solutions of value function. The problem of drug combination in virus treatment as an application is given. The simple HIV/AIDS model adopted in this paper, due to its mathematical properties, makes it more easily for illustration of our purpose. The proposed algorithm has the ability to treat more complex and practical models like those considered in Rong et al. (2007), Rong and Perelson (2009) where the drug resistance, immune response and HIV latency become critical issues leading to HIV persistence in the presence of long-term effective therapy.

Now we propose a new optimal control problem for the system (3.8). That is, to seek an optimal control $$u^{*}(\cdot)\in\mathcal{U}[0, t_{f}]$$ such that
$$J\bigl(u^*(\cdot)\bigr)=\inf_{u(\cdot)\in\mathcal{U}[0, t_f]} J\bigl(u(\cdot)\bigr),$$
(3.9)
where J is the cost functional given by
$$J\bigl(u(\cdot)\bigr)= \int^{t_f}_{0} \bigl[ A_1 u^2_1(\tau) + A_2 u^2_2(\tau) - x_1(\tau) \bigr] d\tau+ A_3 x^2_2(t_f)$$
(3.10)
with positive constants A1,A2,A3. It should be emphasized that we require A3≠0 in (3.10), which is different from Joshi (2002) although our method can certainly be used to treat the case of A3=0 as well. The running cost in the cost functional above includes the benefit of CD4+T cells and the costs of the drug therapies. The square of the final virus particle count composes the terminal cost term. Because the drugs are toxic to the body when the patients are administered in high dose, we use the quadratic terms in the running cost. Since the final virus particle count is one of main suppressing aims in the treatment, the square term is adopted in the same way. Our goal here is to minimize drug use while maximize uninfected cell count and get a minimal virus particle count in the final stage of the whole investigational treatment period. By using the terminal cost, we successfully avoid a typical problem in existed literatures (see for instance Figs. 13, 15 of Adams et al. (2005) and Figs. 3, 5 of Adams et al. (2004)), which allow the viral load to rebound simply because the horizon is ending.
By (2.3), if the value function of the optimal control problem (3.9) is sufficiently smooth, one can derive the following HJB equation satisfied by the value function
$$\begin{cases} V_{t}(t,x)+\inf_{u \in U} \{V_x(t,x) \cdot F(x,u) + A_1 u^2_1 + A_2 u^2_2 - x_1 \}=0,\\ V(t_f,x)= A_3 x^2_2, \end{cases}$$
where t∈[0,tf),x=(x1,x2)∈ℝ2. The function F(x,u)=(F1(x,u),F2(x,u))T is defined by
$$F(x,u)=\left ( \begin{array}{c} s_1 - \frac{s_2 x_2}{B_1 + x_2} - \mu x_1 - k x_2 x_1 + u_1 x_1\\[3pt] \frac{g (1 - u_2) x_2}{B_2 + x_2} - c x_2 x_1 \end{array} \right ).$$
In the following, we generate the numerical solutions of the optimal control problem (3.8)–(3.9). This exactly follows the algorithm presented in Sect. 2. The programme is developed in C and the results are plotted by MATLAB. These parameters used in computation are listed as follows: and A3 is taken two values 100 and 0 respectively but this is only for numerical purpose. Among these parameters and constants listed above, some of them are directly borrowed from Kirschner and Webb (1998) in which the model (3.8) was first constructed.

It should be pointed out that making the magnitude difference between A1 and A2 is to balance the size of the terms. The control bounded for u1 here is [0,2.0×10−2] and its square is [0,4.0×10−4], and the control bounded for u2 is [0,0.9] and its square is [0,0.81]. The control u1, u2 denote the drug administration schedules. The value u1=0.02 means that the drug u1 is administered in full scale per day. Similarly, the drug u2 is administered in full scale per day if u2=0.9.

We compute the approximate optimal control-trajectory pairs for the cases of A3=100 and 0, respectively. The obtained results are plotted into two figures. Figure 1 presents the computed numerical solution of approximate optimal control-trajectory pair when A3=100, which plots the approximate optimal control components $$u_{1}^{*},u_{2}^{*}$$ and the computed corresponding trajectories of CD4+T cells and the HIV particles, respectively. In order to make the necessity of the terminal cost stand out, the case of A3=0 is plotted in Fig. 2 in the same way.

It should be pointed out that although the results we obtained seem not optimal (the optimal control should be continuous from Theorems 6.1, 6.2 of Fleming Wendell and Rishel, 1975), they are approximate optimal (or suboptimal) in the sense that they are very effective to suppress the HIV particles and boost the function of immune systems. Even from the point of view of numerical analysis, our results are also satisfactory. In order to check this point, we do some numerical experiments. This is performed under the same initial condition to compare the cost corresponding to the approximate optimal control-trajectory pair with that of arbitrarily selected admissible controls and corresponding trajectories.

The arbitrarily selected admissible controls are several possible combinations of two controls. Each control is chosen to be one of five different functions:
$$u_1(t)=\left \{ \begin{array}{r} 0.02,\quad \mathrm{Case~a}, \\ -0.02t/40 + 0.02,\quad \mathrm{Case~b}, \\ 0.02t/40,\quad \mathrm{Case~c}, \\ 0, \quad \mathrm{Case~d}, \\ 0.02|\sin30t|,\quad \mathrm{Case~e}, \end{array} \right . \qquad u_2(t)=\left \{ \begin{array}{r} 0.9,\quad \mathrm{Case~A}, \\ -0.9t/40 + 0.9,\quad \mathrm{Case~B}, \\ 0.9t/40, \quad \mathrm{Case~C}, \\ 0, \quad \mathrm{Case~D}, \\ 0.9|\sin50t|, \quad \mathrm{Case~E}. \end{array} \right .$$
We then compute the corresponding cost functionals (J values) for these controls. The results are listed in Table 1.
Table 1

Costs corresponding to different control-trajectory pairs

Control (u1,u2)

J values with A3=100

J values with A3=0

Case (a, A)

−19037.364346

−19037.364346

Case (b, B)

−16091.030761

−19455.373985

Case (b, C)

−18927.367136

−18927.595354

Case (a, D)

−18365.038145

−19997.793922

Case (c, B)

−13315.540793

−16752.069485

Case (c, C)

−16250.916681

−16251.276633

Case (d, A)

−14479.391625

−14479.394837

Case (d, D)

−5464.061856

−15709.633929

Case (e, E)

−18057.659758

−18094.224090

Approximate optimal $$(u^{*}_{1}, u^{*}_{2})$$

−19308.169228

−20177.695872

### Remark 1

It is clearly seen from Table 1 that the cost corresponding to the approximate optimal control-trajectory pair is less than any other values under the control (u1,u2) no matter what A3 takes. In particular, when A3=100, the cost corresponding to the control (d,D) is maximal. This corresponds to the case without any treatment. When A3=0, the maximal cost occurs in the case (d,A), which advises us that the single viral suppressing drug u2 cannot give a good therapy effect, if the immune boosting drug u1 absents. When A3=100, the case (a,A) is the second minimal one, which corresponds to the case of full therapy. Namely, the patient takes these two drugs in the whole observed period in the full scale. It tells us that receiving the full therapy is also a good treatment option if there are no better schemes. In both cases of (d,A), (d,D), if there is no the immune boosting drug u1, the corresponding costs are very high whether the viral supporting drug u2 exists or not. So the immune boosting drug is very important to get the ideal therapy effect.

### Remark 2

To show our approach is effective we had the comparison with the existing references. It shows that our results agree with those in Joshi (2002), Kirschner et al. (1997). That is to say, the formats of the optimal solutions are similar. Among Joshi (2002), Kirschner et al. (1997) and our work, the investigational objects are different. In Kirschner et al. (1997), the model is different with ours and there is only one control applied there. Moreover, the cost functionals of Joshi (2002), Kirschner et al. (1997) do not include the terminal cost. In order to have the proper comparison with the existing results, we did calculate the case where there is no terminal cost (A3=0) and plotted into the following Fig. 3. The format of the obtained results do agree with these in references.

In Joshi (2002), it is also said that “the format of the optimal controls do agree with those in the papers Brandt and Chen (2001), Bryson (1996), Craig et al. (2004) with only one control”. Because we do not know the initial values used in Joshi (2002) (the optimal control depends on initial value), we are not able to produce exactly the same graph of Joshi (2002). However, we used the initial value (400.0,3.5) for the 50 days simulation as Joshi (2002). The first control is almost the same as that in Joshi (2002) and the format of CD4+T cell does agree with that in Joshi (2002) (see Fig. 3: Comparison with existing results). As for the second control and HIV particles, our results are more like Fig. 4 of Kirschner et al. (1997) on p. 789.

## 4 Concluding remarks

In this paper, a new algorithm is presented to find the optimal single closed-loop solutions starting from the initial state (Richardson and Wang 2006) by dynamic programming approach. The algorithm is based on two observations: (a) the value function of the optimal control problem considered is the viscosity solution of the associated Hamilton-Jacobi-Bellman (HJB) equation; and (b) the appearance of the gradient of the value function in the HJB equation is in the form of directional derivative. The algorithm proposes a discretization method for seeking approximate optimal control-trajectory pairs based on a finite-difference scheme in time through solving the HJB equation and state equation. We apply the algorithm to a HIV/AIDS optimal control problem. The results illustrate that the method is valid and effective. Furthermore, the algorithm can be applied to other optimal control problems in higher dimensions.

From the HIV/AIDS model example, our numerical computations show that the influence of the different initial conditions are not remarkable. What can make a big difference for the results is the terminal cost. It seems that adding the terminal cost in the cost functional is necessary, which is the big difference the results presented in this paper with existing literatures.

Finally, we indicate that the solution we obtained might be local minimal due to the chosen of the starting control. This can be ruled out through comparison with “arbitrarily chosen” admissible control for a practical problem. The further investigation of this problem is needed.

## Acknowledgements

The authors would like to thank the Editor and the anonymous reviewers for their valuable comments and suggestions that improve the paper substantially.