Abstract
This chapter investigates the optimal control strategy problem for nonzero-sum games of the immune system based on adaptive dynamic programming. Firstly, the main objective is approximating a Nash equilibrium between the tumor cells and the immune cell population, which is governed through chemotherapy drugs and immunoagents guided by the mathematical growth model of the tumor cells. Secondly, a novel intelligent nonzero-sum games-based ADP is put forward to solve optimization control problem through reducing the growth rate of tumor cells and minimizing chemotherapy drugs and immunotherapy drugs. Meanwhile, convergence analysis and iterative ADP algorithm are specified to prove feasibility. Finally, simulation examples are listed to account for availability and effectiveness of the research methodology.
You have full access to this open access chapter, Download chapter PDF
3.1 Introduction
As the rapid increase of tumor patients, immunotherapies integrated with multi-pronged approaches are being burgeoning for treatment of cancers with specific forms, especially for poorly immunogenic tumors as [1]. The original intention of immunotherapy is fighting cancer cells with their own lethality of immune cells. AIDS as a typical immunodeficiency syndrome caused by failure of immune response tends to be attributed to weakened immune levels, however, Natural killer cell population determine whether shutdown of immune system, once the activate immune system can not be suspended from and produce cytokines [2], which is regarded as an overreaction of the immune system such as COVID-19. Thus, the Nash equilibrium between the tumor cells and the immune cell population needs to be solved through optimal regulation based on specific learning method, and optimal control scheme is firstly brought into this field with its unique superiority, what’s more, nonzero-sum games-based ADP enjoys meliority and practicability.
Decision and estimation on unknown nonlinearity existed so extensively in fields of engineering practice, medical treatments and even the social sciences, such that literature [3] firstly proposed the evaluation of the designed S-Box with highly nonlinearity on the basis of Chinese I-Ching philosophy. It is of great importance to make a suitable treatment decision in the field of health care where remains highly nonlinearity. To obtain an optimal mixed treatment strategy, the growth model of cell population levels was developed based on combination of immune and chemotherapy as literatures [4, 5]. When it comes to reaction of the immune system to tumor growth, a rather complicated nonlinear model of the immune system is requisite to simulate the overall aggressive combination treatment plan of immunotherapy and chemotherapy well. Thus, the process of solving the nonlinear function is hardly to be achieved unless the application of exceptionally optimized iterative algorithm such as backstepping techniques in [6], self-learning optimal regulation in [7], hierarchical lifelong learning as [8], broad learning adaptive neural control in [9] and adaptive dynamic programming, which benefits from its adaptive capability and strong autonomous iterative learning ability [10, 11]. Whether backstepping or adaptive dynamic programming both could guarantee the control objective would be achieved, and unknown nonlinear function matched the value of successive searching approximation through neural networks or fuzzy control as literatures [12,13,14,15].
\(\mathcal {H}_{\infty }\) control enjoys excellent disturbance suppression while minimizing performance index and it is recognized as a typical two-player zero-sum problem, which can be equivalent to solving algebraic Riccati equations, and it is generally applied into linear dynamics systems, of course, systems with quadratic performance index could be actually solved such as literature [16]. Meanwhile, the familiar Hamilton-Jacobi-Isaacs is perceived as an effective medium in dealing with systems considering inherent nonlinearity, such as unknown mechanical parameters in [17], which is difficult to achieve using conventional methods for absence of exact system parameters. The mainstream analysis of ADP is seeking optimal control strategy integrated with solution to Bellman functions without information of system dynamics, which has ascended to the core methodologies of optimization and artificial intelligence. When it comes to actual models, control constraint has been definitely considered as [18,19,20], thus the chapter mainly focuses on dynamic model of the immune system which limits the single injection of drugs to an intervention level, and the optimal control scheme is transformed into constrained control which needs to take a discounted factor into account, avoiding infinite time dimension effectively, which will lead to development of optimal constrained control policy.
Model-free adaptive control was developed to obtain optimal control strategy without knowledge of exact system parameters as literatures [21,22,23], and multiple neural networks were constructed to achieve multi-objective approximation or optimization control process. Research with respect to multiple networks has been extended to multitudinous actor-critic constructions. A tremendous amount of practical application scenarios need multiple controllers, each of which minimizes its individual performance function as nonzero-sum problem. As elaborated in nonzero-sum game theory, the control objective was minimizing the individual performance function and maintaining stability to yield a Nash equilibrium in [24]. As literature [25], saddle point of the Nash equilibrium was explored throughout the nonzero-sum games-based optimization iterative process using ADP, even if there was no feasible saddle point, optimum was realized through mixed optimal control scheme iteratively, and the latter is of universal significance for conditions that are uneasy to satisfy in practical applications, The local optimal problem exits extensively which was firstly effectively avoided through fault-tolerant adaptive multigradient recursive reinforcement learning as [9]. To seek the solution to Nash equilibrium, the simultaneous algebraic Riccati or Hamilton-Jacobi-Isaacs functions require solving for nonlinear systems, which leads to “curse of dimensionality” with huge amount of computation, especially for multitudinous actor-critic constructions suffering from higher computational burden by many multiples, such as a double-loop policy iteration in [26]. According to the reason described above, the chapter adopts compromise acceptable actor-critic neural networks with appropriate dimensions, effectively realizing the transformation process from value iterative to cost function.
Value and policy iterations generally constitute the whole iterative methods, and begin with an semidefinite function or admissible control law accordingly. With applications of ADP to solve the optimal control strategy for both continuous [27, 28] or discrete-time systems [29, 30], however, traditional ADP can not satisfy the physical application in the immune system considering the mixed treatment strategy with chemotherapy drugs and immunotherapy, improving matters somewhat by nonzero-sum games-based ADP. There are seldom any literatures on nonzero-sum games-based ADP method for solving optimal regulation schemes of the immune system, let alone considering optimal constrained control, policy iterations, tumor regression and mixed control strategy of chemotherapy and immunotherapy, scilicet the cost function approaching covers minimization of the tumor cells, chemotherapy drugs and immunotherapy drugs, simultaneously.
3.2 Establishment of Mathematical Model
This part mainly introduces the mathematical growth model of tumor cells, which considers the influence of external factors such as chemotherapy drugs and immunotherapy on the tumor cells, mutual effect between two types of cells. In the following model, Tu(t) represents the amount of tumor cells, Im(t) denotes the number of immune cells, and Che(t), \(Im_{py}(t)\) depicts the concentrations of chemotherapy drugs and immunotherapy drugs in the bloodstream, respectively.
3.2.1 Growth Model of Tumor Cells
Individually considering the natural growth law of tumor cells without the relationship with immune cells and any external effect on them, the growth law of tumor cells is subject to logical growth.
But when it comes to the interaction between immune cells and tumor cells, the direct killing of cells by chemotherapeutic drugs, and the growth model of tumor cells can be revised to:
where the specifications of parameters are demonstrated as Table 3.1.
3.2.2 Growth Model of Immune Cells
Considering the natural growth law of immune cells simply, we assume that a fixed number of immune cells are produced in a unit of time and that these cells have an inevitable life cycle.
The tumor cells in the body can stimulate the growth of immune cells, which shows a positive non-linear change by (3.4).
In immunotherapy, the addition of immune agents can produce an immune response, which leads to the non-linear growth of immune cells.
Simultaneously, in the struggle between immune cells and tumor cells, immune cells themselves can also cause losses,
and in chemotherapy, chemotherapeutic drugs can also cause damage to immune cells.
Combined (3.3)–(3.7), and then (3.8) can be obtained.
Parameter elucidation of immune cells are outlined as Table 3.2.
3.2.3 Drug Attenuation Model
We assume that at some point after the injection of a chemotherapy drug, the concentration of the drug in the body will decrease exponentially. To guarantee the effectiveness of the treatment, we add chemotherapy drugs to the body, simultaneously.
Similarly, we can obtain the attenuation model of the immunoagents:
where injected at t, \(Dr_{Che}(t)\) and \(Dr_{Im}(t)\) denotes concentrations of the chemotherapy drugs and immunoagents separately. \(\gamma _1\) and \(\gamma _2\) is the decay rates of the chemotherapy drugs and immunoagents.
3.2.4 The Design of the Optimization Problem
Combined with the contents of (A), (B) and (C), we finally obtain the mathematical model affecting the growth of tumor cells:
Given that Tu(t), Im(t) are biomass, and Che(t), \(Im_{py}(t)\) are the drug concentrations in the bloodstream,
And all parameters in the model are non-negative:
When we qualitatively analyze the problem that how to minimize the residual tumor cell population in the bloodstream on the premise of using as few drugs as possible, including chemotherapy drugs and immunoagents. This process can be described as a quantitative mathematical expression as (3.14).
It is emphasized here that the single dose of the two drugs should be limited to avoid drug poisoning. So we use a definition form with input constraints. During the whole treatment process, we can get:
where \(0< \lambda < 1\), \(\bar{U}_1\) and \(\bar{U}_2\) represent the maximum permissible dose of chemotherapy drug and dose of immune agents in a single injection, respectively.
3.3 The Proposed Nonzero-Sum Games-Based ADP Scheme
To solve the given problems above, we propose an aggressive treatment plan or control scheme based on nonzero-sum games-based ADP algorithm.
3.3.1 Theoretical Introduction
For a differential control system \(x(t+1)=F(x(t),u(t),t))\), x(t) is the state variable, u(t) is the control variable, F is the transition mapping between states, and then the cost of state transition is obtained: U(x(t), u(t), t), and the total cost of the whole period is \(\sum _{t=t_0}^{t_f}U(x(t),u(t),t)\).
When solving a finite time problem, we can equivalent it to
In the application of Bellman’s optimality principle to solve (3.1), we first stipulate \(J(x(t_0))=\sum _{t=t_0}^{\infty } \lambda ^t U(x(t),u(t),t)\), and then we can obtain that
The corresponding optimal control can be solved and the form as follows.
This typical solution approach is a considerable challenge for computing and storage space.
Remark 3.1
Adaptive dynamic programming as an optimize learning method is usually used to track the cost function, which is not only designed to minimize the tumor cells, but also minimum dose chemotherapy drugs and immunoagents in this chapter.
3.3.2 Iterative ADP Algorithm
To solve (1), we use an iterative adaptive dynamic programming algorithm, and the revised facilitate solving differential equations model.
(1) Brief interpretation of ADP algorithm
Firstly, we take a value function K(x) to approximate the cost function J(x). In this case, the purpose of iteration is to ensure that the approximate function approaches to the optimal value equation and obtain the optimal decision law. Namely,
Secondly, in the specific solution process:
Give \(K^{0}(\cdot )=0\), we make
and update the value function as
for \(i=1,2,3,...\) , we can get
and
Thus,
and the optimal solution is obtained when the error requirement has been adequately satisfied as condition that \(K^i(x(t))\rightarrow K^*(x(t))\) and \(\begin{Vmatrix}K^{i+1}(x(t))- K^i(x(t+1))\end{Vmatrix} \le \varepsilon \), where i represents the number of iterations.
\(\mathbf {\mathop {Algorithm: Evolutionary ADP algorithm }}\) |
\(\mathbf {\mathop {Initialization:}}\) |
1. A certain initial state is given randomly in the feasible region x(t); |
2. Set \(\varLambda ^0(\cdot )=0\); |
3. Specific parameters are given according to the requirements: error \(\epsilon \), |
discount factor \(\lambda \); |
\(\mathbf {\mathop {Iteration and Update:}}\) |
4. \(i=0\) , substitute x(t) into “(3.26) = 0 ”, yield \(\kappa ^i(t)\); |
5. Plug x(t) and \(\kappa ^i(t)\) into (3.25),and to get \(x(t+1)\); |
6. According to the (3.29), calculate \( \varLambda ^{i+1}(x(t))= \frac{\partial U(x(t+1),\kappa ^i(t))}{\partial x(t)}+\varLambda ^{i}(x(t+1))\); |
7. According to the data set [\(x(t),\varLambda ^{i+1}(x(t))\)] , |
the neural network of the relationship between \(x \sim \varLambda \) ; |
8. Using the neural network obtained by “7.”, the value in the same state |
is calculated. When \(\begin{Vmatrix} \varLambda ^{i+1}(x(t))-\varLambda ^{i}(x(t)) \end{Vmatrix} \le \epsilon \) ,ends; |
If it is not true, returns “4.”; |
To faster convergence to the optimal solution, we update in each iteration and value function, the control law according to the current direction of steepest descent, that is,
Setting \(\varLambda ^{i}(x(t+1))= \frac{\partial K^i(x(t+1))}{\partial x(t+1)}\):
(2) Modification of Model (3.11)
Compared with the traditional control strategy, we directly solve the problem proposed in this chapter by using ADP, although it is difficult to solve the model. Here, we propose a fitting idea to modify the model. Analysis on (3.11) shows that the injection of chemotherapy drugs into the body has a direct effect on tumor cells. On the other hand, immunoagents act on immune cells, which affects tumor cell populations. Throughout the whole action process, we can only consider the input of chemotherapy drugs and immunoagents at every moment as the two control inputs of the system and the state variables of the system are selected as the intermediate transition variables such as tumor cells and immune cell population.
1. The standard expressions of control variables, state variables, cost functions and so on are given as follows,
2. The modified system model adopts the form of nonlinear affine system, namely:
3. Update the optimal control law and value function:
Let \(\frac{\partial K^{i+1}(x(t))}{\partial u^{i}_1(t)}=0 \) and \(\frac{\partial K^{i+1}(x(t))}{\partial u^{i}_2(t)}=0 \),
From this, we can also get
Remark 3.2
To approximate optimal value based on optimal decision law, value iteration method is devoted to tending to the cost function J(x) through value function K(x).
Remark 3.3
The fitted curve is constructed according to date obtained from the original model which is uneasy to solve, and the modification of model is research objectives for replacement, considering control inputs as chemotherapy drugs and immunoagents, simultaneously.
3.3.3 Convergence Analysis
This section provides proof of the convergence of this algorithm to prove the effectiveness of the algorithm in theory. This proof is mainly derived from formulas (3.1), (3.2), and (3.3), including two lemmas and three theorems.
Lemma 3.4
Take a control sequence {\(\vec {Ar}^i(\vec {x}(t)\))}. When it is brought into formula (1), the corresponding value function \(J^{i}_{Ar}(\vec {x})\) is obtained. Compared with the control sequence {\(\vec {\kappa ^{i}}(\vec {x}(t))\)} corresponding to the minimum cost \(K^{i}(\vec {x}(t))\). If \(J^{0}_{Ar}(\cdot )=K^{0}(\cdot )=0\), \(J^{i+1}_{Ar}(\vec {x}(t))=U[\vec {x}(t),\vec {A}r^i(\vec {x}(t))]+\lambda J^{i}_{Ar}(\vec {x}(t+1))\), satisfying
Then, \( J^{i}_{Ar}(\vec {x}(t)) \ge K^{i}(\vec {x}(t)) \) for \(\forall i\).
Proof
\(K^{i}(\vec {x})\) is obtained by taking the minimum value equation \(J^{i}_{Ar}(\vec {x})\). {\(\vec {\kappa }^i(\vec {x}(t))\)} is the corresponding optimal control sequence. For the arbitrarily control sequence {\(\vec {Ar}^i(\vec {x}(t))\)}, the value equation \(J^{i}_{Ar}(\vec {x})\) which is corresponding with the arbitrarily control sequence must not be less than \(K^{i}(\vec {x})\).
Lemma 3.5
Select a stable admissible control sequence {\(\vec {Sa}^i(\vec {x}(t)\))} with certain restrictions and the corresponding value equation \(J^{i}_{Sa}(\vec {x})\). For controllable system, if \(J^{0}_{Sa}(\cdot )=K^{0}(\cdot )=0\) and \(J^{i+1}_{Sa}(\vec {x}(t))=U[\vec {x}(t),\vec {Sa}^i(\vec {x}(t))] +\lambda ^{i}(\vec {x}(t+1))\), Then \(J^{i}_{Sa}(\vec {x})\) is bounded.
Proof
Thus, \(J^{i+1}_{Sa}(\vec {x}(t))=\sum _{j=0}^{i}\lambda ^iU[\vec {x}(t+j), \vec {Sa}^{i-j}(\vec {x}(t+j))]\) and \(J^{i+1}_{Sa}(\vec {x}(t))\le \lim _{i \rightarrow \infty } \sum _{j=0}^{i}\lambda ^iU[\vec {x}(t+j),\vec {Sa}^{i-j}(\vec {x}(t+j))] \), where {\(Sa^{i}(\vec {x})\)} is the stable allowable control sequence, and we can get an conclusion that \(0\le J^{i+1}_{Sa}(\vec {x}(t))\le \lim _{i \rightarrow \infty } \sum _{j=0}^{i}\lambda ^iU[\vec {x}(t+j),\vec {Sa}^{i-j}(\vec {x}(t+j))]\le C \) for given constant C. That is, \(J^{i}_{Sa}(\vec {x})\) is bounded.
Theorem 3.6
From formula (1), {\(\vec {\kappa ^{i}}(\vec {x}(t))\)} is the control sequence corresponding to the minimum value function \(K^{i}(\vec {x})\). Assuming the initial state \(K^{i}(\cdot )=0 \), it can be proved that the sequence {\(\vec {\kappa ^{i}}(\vec {x}(t))\)} is a monotonic non-decreasing sequence, and \(K^{i}(\vec {x}(t)) \le K^{i+1}(\vec {x}(t))\).
Proof
Define a value equation \(T^{i}(\vec {x}(t)): T^{i}(\cdot )=0\), \(T^{i+1}(\vec {x}(t))=\lambda T^{i}(\vec {x}(t+1)) + U[\vec {x}(t),\vec {\tau }^{i+1}(\vec {x}(t))]\). When \(i=0\), \(T^{1}(\vec {x}(t))=U[\vec {x}(t),\vec {\tau }^{0}(\vec {x}(t))]+\lambda T^{0}(\vec {x}(t+1))\), \( T^{1}(\vec {x}(t))-T^{0}(\vec {x}(t))=U[\vec {x}(t),\vec {\tau }^{0} (\vec {x}(t))]\ge 0\), we get \(T^{1}(\vec {x}(t))\ge T^{0}(\vec {x}(t))\).
Assuming \(t=i-1\), \(T^{i}(\vec {x}(t))\ge T^{i-1}(\vec {x}(t))\), When \(t=i\), \(T^{i+1}(\vec {x}(t))=U[\vec {x}(t),\vec {\xi }^{i}(\vec {x}(t))]+\lambda T^i\vec {x}(t+1)\) and \(T^{i+1}(\vec {x}(t))-T^{i}(\vec {x}(t))=\lambda (U[\vec {x}(t),\vec {\xi }^{i-1}(\vec {x}(t+1))])\ge 0 \). Then \(T^{i+1}(\vec {x}(t))\ge T^{i}(\vec {x}(t))\). And we can get \(K^{i}(\vec {x}(t)) \le K^{i+1}(\vec {x}(t))\).
Theorem 3.7
It is known that {\(\vec {\kappa ^{i}}(\vec {x}(t))\)} is the control sequence corresponding to the minimum cost function \(K^{i}(\vec {x})\), which can prove \(\lim _{i \rightarrow \infty }K^{i}(\vec {x}(t))= K^{*}(\vec {x}(t))\).
Proof
{\(\kappa ^{i}(\vec {x})\)}and \(K^{i}(\vec {x})\) have been given in Lemma 3.2, and the corresponding value function of {\(\kappa ^{i,l}(\vec {x})\)} is \(K^{i+1,l}(\vec {x}(t))=U[\vec {x}(t),\vec {\kappa }^{i,l}(\vec {x}(t))]+\lambda K^{i,l}(\vec {x}(t))\), where l is the length. Obviously, \(K^{i+1,l}(\vec {x}(t))=\sum _{j=0}^{i}\lambda ^iU[\vec {x}(t+j),\vec {\kappa }^ {i-j,l}(\vec {x}(t+j))]\).
After taking the limit, we can obtain \(K^{\infty ,l}(\vec {x}(t))=\lim _{i \rightarrow \infty } \sum _{j=0}^{i}\lambda ^iU[\vec {x}(t+j), \vec {\kappa }^{i-j,l}(\vec {x}(t+j))]\), and define \(K^{*}(\vec {x}(t))=\mathop {\inf }\limits _{l}\{K^{\infty ,s} (\vec {x}(t))\}\). Similarly, \(\varOmega ^{\infty +1,s}(\vec {x}(t)) \le K^{\infty ,l}(\vec {x}(t)) \le D^s\) can be obtained from Lemma 3.5. On the other hand, we get \(K^{i+1}(\vec {x}(t)) \le K^{\infty ,s}(\vec {x}(t))\) based on Lemma 3.4. Therefore, it can be concluded that \(K^{i+1}(\vec {x}(t)) \le \varOmega ^{i+1,l}(\vec {x}(t))\le \varOmega ^{\infty ,l}(\vec {x}(t)) \le D^s\). \(K^{*}(\vec {x}(t)) = \mathop {\inf }\limits _{l} K^{\infty ,l}(\vec {x}(t))\) with the definition of minimum value for the optimal value equation, extracting a control sequence \(\{\vec {\kappa }^{i,m}\}\) so that \(K^{\infty ,m} \le K^{*}(\vec {x}(t))+\epsilon \), and drawing an conclusion that \( K^{\infty ,m} \le K ^{*}(\vec {x}(t))+\epsilon \). Considering \(K^{i+1}(\vec {x}(t)) \le K^{i+1,l}(\vec {x}(t))\le K^{\infty ,l}(\vec {x}(t)) \le D^l\) in another way and taking the limit, the formula holds for any i, l, then \(\lim _{i \rightarrow \infty }K^{i}(\vec {x}(t)) = \mathop {\inf }\limits _{s} {D^{s}}\).
To guarantee \(\lim _{i \rightarrow \infty }K^{i}(\vec {x}(t)) =K^{\infty ,g}(\vec {x}(t))\), the control sequence \(\{\vec {\kappa }^{i,g}\}\) is necessary, and then we can get \(K^{i+1}(\vec {x}(t)) \ge K^{*}(\vec {x}(t)) \). Combining both aspects above, \(\lim _{i \rightarrow \infty }K^{i}(\vec {x}(t))= K^{*}(\vec {x}(t))\) is obtained.
Theorem 3.8
For any state variable \(\vec {x}(t)\), the optimal value equation \(K^{i}(\vec {x}(t))\) satisfies the characteristics of the HJB equation.
Proof
From the proved lemmas and theorems, a series of characteristics about “\(K^{i}(\vec {x}(t))\)” are obtained. At this time, it is necessary to verify that characteristics of the HJB equation are satisfied. According to (3.23), there exits \(K^{*}(\vec {x}(t))=\mathop {\inf }\limits _{\vec {\kappa }(t)} \{U[\vec {x}(t),\vec {\kappa }]+\lambda K^{i}(\vec {x}(t+1))\}\), meanwhile, according to Theorems 3.6 and 3.7, yield that \(K^{i+1}(\vec {x}(t))=\mathop {\min \limits _{\vec {\kappa }(t)}\{{U[\vec {x}(t),\vec {\kappa }]}+\lambda K^{i}(\vec {x}(t+1))\}}\). Then take the mathematical limit, we get \(K^{*}(\vec {x}(t)) \le \mathop {\inf }\limits _{\vec {\kappa }(t)} U[\vec {x}(t),\vec {\kappa }]+\lambda K^{i}(\vec {x}(t+1)\) for the randomness of {\(\vec {u}(t)\)}).
From the other side, we have \(K^{i+1}(\vec {x}(t))\ge \mathop {\inf }\limits _{\vec {\kappa }(t)} U[\vec {x}(t),\vec {\kappa }]+\lambda K^{i-1}(\vec {x}(t+1)\), take the limit again, then yield that \(K^{*}(\vec {x}(t))\ge \mathop {\inf \limits _{i}}{U[\vec {x}(t),\vec {\kappa }]+\lambda K^{i}(\vec {x}(t+1))}\). As to the analysis above, we can get a final conclusion.
All content is verified.
Remark 3.9
The control sequence {\(\vec {\kappa ^{i}}(\vec {x}(t))\)} is a monotonic non-decreasing sequence corresponding to the minimum value function \(K^{i}(\vec {x})\), which tend to be \(K^{*}(\vec {x}(t))\) eventually, satisfying the characteristics of the HJB functions as [31].
3.4 Simulation and Numerical Experiments
In this section, we consider the mechanism model of tumor cell growth combined with immunotherapy, chemotherapy and combination treatments proposed as experimental validation. Firstly, The affine system model is constructed with chemotherapy drugs and immunoagents as control inputs and the account involved of tumor cells as state variables. Secondly, according to the affine model obtained by fitting, we developed the cost function of treatment loss with the clinical treatment requirements. Finally, the optimal treatment plan for a patient with a basic condition is given after calculation by the algorithm.
3.4.1 An Affine Model of Tumor Cell Growth
According to clinical medical statistics and literature [4], the specific parameters of the mechanism model are given as Table 3.3.
At this point, when we give the initial count of tumor cell population and immune cells in a patient and follow a certain chemotherapy and immunotherapy regimen, we can get the following four curves on tumor cells and immune cell population as shown in Figs. 3.1 and 3.2. It is obviously that state variable Tu(t) denoted the population of tumor cells tend to be stable in Fig. 3.2, similarly, for Im(t) in Fig. 3.1.
When the fitted affine system is carried out according to the data obtained from the mechanism model, \(Dr_{Che}(t)\) and \(Dr_{Im}(t)\) are selected as two control inputs and Im(t) as state variables. Within the allowable error range, the obtained fitting relation is shown as the following form,
The curves before and after fitting are compared as Fig. 3.3, which meets the requirements of fitting precision, which guarantees accuracy of the data traced back to the original source.
3.4.2 The Treatment Loss Cost Function
The form of the cost function proposed in the third part as (3.17). Unlike the theoretical mechanism model analysis, and combined with clinical requirements, it is necessary to limit the single injection of drugs to no more than 0.05. Therefore,
To avoid the optimal solution in the infinite time dimension, we choose the discount factor \(\lambda =0.95\). Finally, the specifically obtained cost function as follows:
3.4.3 The Optimal Solution of the Treatment
According to the previous two subsections, we have completed the transformation from the mathematical mechanism model to the solvable affine model, and determined the specific value of the cost function according to the clinical requirements. The optimal treatment strategy is acquired through the proposed algorithm and make a comparison to prove the effectiveness and feasibility. The cost function is designed to minimize the tumor cells, meanwhile, there exit minimum dose chemotherapy drugs and immunoagents.
In the following three figures (Figs. 3.4, 3.5 and 3.6), the blue curve represents the changes of tumor cells and the changes of a single dose in patients under the normal treatment regimen. In contrast, the red curve represents the optimal treatment regimen’s effect calculated by the nonzero-sum game-based ADP algorithm.
As shown in Fig. 3.4, there are originally many cancer cells in the body. The two curves are close to the upper limit, with drugs and dual function of the immune system, a substantial reduction in the number of cancer cells. The amount of drug injection therapy hasn’t changed greatly during the process from beginning to end. Even in the closing stage, cancer cells decreased significantly, there are still specific doses, and we solve the treatment dose is substantially less than the former.
Correspondingly, as shown in Fig. 3.5 that the changing trend of the injection dose of immunoagents on the two curves is close to the changing direction of chemotherapy drugs. The optimized treatment is slightly more than the traditional treatment plan when more cancer cells are in the initial stage, but it will not last for a long time. When the number of cancer cells is relatively large, the primary or indirect target of these two drugs is cancer cells; then, in the late stage of treatment, the number of cancer cells is significantly reduced. If the chemotherapy drugs are put in according to the normal treatment, the normal cells will suffer a lot of erosion, which has a more significant impact on the body. However, the optimized drug dose has been dramatically reduced, and the normal cells have been less affected.
As shown in Fig. 3.6, control effect of the two treatment schemes on the number of tumor cells enjoy resemblance to that in the initial stage. Still, at the final stage, the algorithm optimized by ADP not only significantly reduces the count of tumor cell population, combined with Figs. 3.4 and 3.5, but also minimize the injection amount of the two drugs, which shows the effectiveness of our treatment scheme.
Remark 3.10
The optimal regulation strategy for the immune system enjoys advantage of decreasing of tumor cells, what is more, clinical treatment benefits from typical minimization of chemotherapy drugs and immunoagents.
3.5 Conclusion
Nonzero-sum games-based adaptive dynamic programming has been proposed acquiring the optimum through affecting the growth of tumor and immune cells, providing guidance for clinical practice through adjusting the administered doses of chemotherapy drugs and immunotherapy drugs. Obtained results have shown that the immune system can decrease the tumor cells, meanwhile, minimizing of chemotherapy drugs and immunoagents through optimal control behavior. Simulation examples have presented availability and effectiveness of the research methodology. The future research will focus on solving the optimal mixed treatment strategy taking account of complex immunotherapy system including immune cell subsets and cytokines, considering the switched control policies in according with hybrid therapy.
References
Wang J, Huang M, Chen S, Luo Y, Shen S, Du X (2021) Nanomedicine-mediated ubiquitination inhibition boosts antitumor immune response via activation of dendritic cells. Nano Res 14:3900–3906
Chen C, Li A, Sun P, Xu J, Du W, Zhang J, ..., Jiang X (2020) Efficiently restoring the tumoricidal immunity against resistant malignancies via an immune nanomodulator. J Control Rel 324(10):574–585
Zhang T, Chen CP, Chen L, Xu X, Hu B (2018) Design of highly nonlinear substitution boxes based on I-Ching operators. IEEE Trans Cybernet 48(12):3349–3358
de Pillis LG, Gu W, Radunskaya AE (2006) Mixed immunotherapy and chemotherapy of tumors: Modeling, applications and biological interpretations. J Theor Biol 238(4):841–862
Ogunmadeji B, Yusuf T (2018) Optimal control strategy for improved cancer biochemotherapy outcome. Int J Sci Eng Res 9(12):583–600
Chen CP, Wen GX, Liu YJ, Liu Z (2016) Observer-based adaptive backstepping consensus tracking control for high-order nonlinear semi-strict-feedback multiagent systems. IEEE Trans Cybernet 46(7):1591–1601
Wang D, Ha M, Qiao J (2020) Self-learning optimal regulation for discrete-time nonlinear systems under event-driven formulation. IEEE Trans Autom Control 65(3):1272–1279
Zhang T, Su G, Qing C, Xu X, Cai B, Xing X (2021) Hierarchical lifelong learning by sharing representations and integrating hypothesis. IEEE Trans Syst Man Cybernet: Syst 51(2):1004–1014
Huang H, Zhang T, Yang C, Chen CP (2020) Motor learning and generalization using broad learning adaptive neural control. IEEE Trans Ind Electron 67(10):8608–8617
Zhang H, Cui L, Luo Y (2013) Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Trans Cybernet 43(1):206–216
Li T, Yang D, Xie X, Zhang H (2022) Event-triggered control of nonlinear discrete-time system with unknown dynamics based on HDP(\(\lambda \)). IEEE Trans Cybernet 52(7):6046–6058
Zhao B, Liu D (2020) Event-triggered decentralized tracking control of modular reconfigurable robots through adaptive dynamic programming. IEEE Trans Ind Electron 67(4):3054–3064
Liang H, Liu G, Zhang H, Huang T (2021) Neural-network-based event-triggered adaptive control of nonaffine nonlinear multiagent systems with dynamic uncertainties. IEEE Trans Neural Netw Learn Syst 32(5):2239–2250
Sun J, Zhang H, Wang Y, Sun S (2022) Fault-tolerant control for stochastic switched IT2 fuzzy uncertain time-delayed nonlinear systems. IEEE Trans Cybernet 52(2):1335–1346
Sun J, Zhang H, Wang Y, Shi Z (2022) Dissipativity-based fault-tolerant control for stochastic switched systems with time-varying delay and uncertainties. IEEE Trans Cybernet 52(10):10683–10694
Doyle JC, Glover K, Khargonekar PP, Francis BA (1989) State-space solutions to standard \(H_{2}\) and \(H_{\infty }\) control problems. IEEE Trans Autom Control 34(8):831–847
Davari M, Gao W, Jiang ZP, Lewis FL (2021) An optimal primary frequency control based on adaptive dynamic programming for islanded modernized microgrids. IEEE Trans Autom Sci Eng 18(3):1109–1121
Yang D, Li T, Xie X, Zhang H (2020) Event-triggered integral sliding-mode control for nonlinear constrained-input systems with disturbances via adaptive dynamic programming. IEEE Trans Syst Man Cybernet: Syst 50(11):4086–4096
Zhao B, Liu D, Luo C (2020) Reinforcement learning-based optimal stabilization for unknown nonlinear systems subject to inputs with uncertain constraints. IEEE Trans Neural Netw Learn Syst 31(10):4330–4340
Yang Y, Ding D-W, Xiong H, Yin Y, Wunsch DC (2020) Online barrier-actor-critic learning for \(H_{\infty }\) control with full-state constraints and input saturation. J Frankl Inst 357(6):3316–3344
Zhong X, He H, Wang D, Ni Z (2018) Model-free adaptive control for unknown nonlinear zero-sum differential game. IEEE Trans Cybernet 48(5):1633–1646
Luo B, Yang Y, Liu D, Wu H (2020) Event-triggered optimal control with performance guarantees using adaptive dynamic programming. IEEE Trans Neural Netw Learn Syst 31(1):76–88
Yang Y, Gao W, Modares H, Xu CZ (2022) Robust actor-critic learning for continuous-time nonlinear systems with unmodeled dynamics. IEEE Trans Fuzzy Syst 30(6):2101–2112
Starr AW, Ho YC (1969) Nonzero-sum differential games. J Optim Theory Appl 3(3):184–206
Zhang H, Wei Q, Liu D (2011) An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica 47(1):207–214
Zhu Y, Zhao D (2022) Online minimax Q network learning for two-player zero-sum Markov games. IEEE Trans Neural Netw Learn Syst 33(3):1228–1241
Zhong X, He H (2017) An event-triggered ADP control approach for continuous-time system with unknown internal states. IEEE Trans Cybernet 47(3):683–694
Wei Q, Li H, Yang X, He H (2021) Continuous-time distributed policy iteration for multicontroller nonlinear systems. IEEE Trans Cybernet 51(5):2372–2383
Wei Q, Song R, Liao Z, Li B, Lewis FL (2020) Discrete-time impulsive adaptive dynamic programming. IEEE Trans Cybernet 50(10):4293–4306
Zhang H, Qin C, Jiang B, Luo Y (2014) Online adaptive policy learning algorithm for \(H_{\infty }\) state feedback control of unknown affine nonlinear discrete-time systems. IEEE Trans Cybernet 44(12):2706–2718
Yang Y, Vamvoudakis KG, Modares H, Yin Y, Wunsch DC (2021) Hamiltonian-driven hybrid adaptive dynamic programming. IEEE Trans Systems Man Cybernet: Syst 51(10):6423–6434
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2024 The Author(s)
About this chapter
Cite this chapter
Sun, J., Xu, S., Liu, Y., Zhang, H. (2024). Optimal Regulation Strategy for Nonzero-Sum Games of the Immune System Using Adaptive Dynamic Programming. In: Adaptive Dynamic Programming. Springer, Singapore. https://doi.org/10.1007/978-981-99-5929-7_3
Download citation
DOI: https://doi.org/10.1007/978-981-99-5929-7_3
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-5928-0
Online ISBN: 978-981-99-5929-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)