Evolutionary Dynamics Optimal Research-Oriented Tumor Immunity Architecture

Sun, Jiayue; Xu, Shun; Liu, Yang; Zhang, Huaguang

doi:10.1007/978-981-99-5929-7_4

Jiayue Sun⁵,
Shun Xu⁶,
Yang Liu⁶ &
…
Huaguang Zhang⁵

1073 Accesses

Abstract

This chapter devotes to evolutionary dynamics optimal control oriented tumor immune differential game system. Firstly, the mathematical model covering immune cells and tumor cells considering the effects of chemotherapy drugs and immune agents. Secondly, the bounded optimal control problem covering is transformed into solving HJB equation considering the actual constraints and infinite-horizon performance index based on minimize the amount of medication administered. Finally, approximate optimal control strategy is acquired through iteration dual heuristic dynamic programming algorithm avoiding dimensional disaster effectively and providing optimal treatment scheme for clinical applications.

You have full access to this open access chapter, Download chapter PDF

4.1 Introduction

Interaction between cancer cells, surrounding stromal cells and immune cells through autonomous and non-autonomous signaling can influence survival competition. Therefore, it is very critical for evolutionary and ecological dynamics mechanistic understanding of tumor progression [1]. It is assumed that evolution causes traits to change continuously over time even if the ecological dynamics are constantly changing. More broadly, imagine an evolutionarily stable state that is a trajectory of phenotypic states-an evolutionarily stable trait attractor. This can be used in scenarios where there is sufficient variation to facilitate rapid evolution, or where the state involves a plastic response to environmental conditions, eventually constituting evolutionary stability. Simultaneously, Natural killer (NK) cells as one of the players in the game attack many tumour cell lines, which is critical in anti-tumour immunity [2], however, the interaction between NK cells and tumour targets is poorly. To overcome drug resistance, anti-tumor immunotherapy gradually replaces the traditional treatment strategy [3]. The interaction between specialized cancer cell populations and immune cells has become a special evolutionary dynamics phenomenon in the process of tumor immunity growth architecture. The goal of optimization is to minimize administration dosage and reduce negative effects.

The dynamic perception or learning process is realized through interactions between cells and organism architecture, accomplishing observing their responses and learning optimal control strategy ultimately of Markov decision. It is required to seek an optimal control scheme such that the desired dosage of administration can be tracked and the optimal performance of minimize chemotherapeutic drugs and immunological agents can be achieved. Thus, reinforcement learning is urgently needed for optimal research-oriented tumor immunity architecture. The classical policy iteration and value iteration frameworks are never out of date, and the new min-Hamilton function [4] and the low-gain parameter ADP-Bellman equation for global stabilization are thriving [5].

The interaction between cells is highly nonlinear and coupled. When the computational conditions allow, whether it is the adaptive algorithm design based on policy iteration, or the adaptive hierarchical neural network algorithm [6], which can easily solve the coupled fractional order chaotic synchronization problem. All inspire us in solving the optimal solution of the HJB equation of the idea. Once computing conditions are not available, model-free is the best idea. The iteration-ADP algorithm is developed into iteration-NDP algorithm, which does not require an accurate system model [7], but only requires observable system data, which can reduce the cost and optimize the control action in the process of error backpropagation [8]. The emergence of Q-learning, from containing three classes four networks to interleaving double iteration, and then to the critical Q-learning [9] of a single class one network, effectively improves the utilization of resources, and the problem of insufficient exploration no longer exists. The interaction between cells coincides with multiple agents, and the attack of tumor cells on normal cells may cause abnormal reactions, and the neural net-based attack detection and estimation scheme designed by [10] can easily capture such anomalies. Cells cannot proliferate without limit. When solving the optimal solution of the constrained auxiliary subsystem, based on the framework of ADP, the idea of pi iteration is continued, and a strong convergence synchronous iterative optimization strategy [11] is given.

The difficult-to-decouple leaderer-follower behavior of vehicle-vehicle communication [12], human-vehicle interaction, and mutual quality of everyone can be easily solved with off-PI [13]. Switching system [14], T-S fuzzy, nash equilibrium, zero-sum game [15], let each agent deal with a low-dimensional state and local pattern, reduce conservatism, can easily obtain the minimum local cost [16]. Influenced by the improved exploration feature, the parallel A-C asynchronous gradient sharing mechanism can realize the parallel optimization operation of diversified agents in a short time [17]. Affected by the time difference error, integral reinforcement learning can obtain the estimated control strategy by updating the critic weight [18, 19]. In order to obtain a better stabilizing adaptive control scheme, it is necessary to give an appropriate robust control scheme for the control system [20]. Reference [21] summarizes the recent outstanding progress in the continuous nonlinear control system of the controller that combines adaptability and robustness. The reliability and effectiveness of the actual power system and some large machinery and heavy machinery devices with these two designs considered are also demonstrated. The theory integrates ecological and evolutionary dynamics blending ecological mathematical model evolutionary game theory [22]. Then evolutionarily stable strategies will be investigated to seamless integration of both sides [23]. Solvable dynamic equations can be used to explore optimal control objectives, however, what followed is a disaster of dimensions.

To overcome it, dual-heuristic dynamic programming is proposed for the nonlinear affine evolutionary dynamic dated from ADP considering the actual constraints. By introducing a discounted performance index, the optimal regulation problem of the infinite dimensional problem is reformulated into a finite dimensional. Different from previous value iterations which requires a strategy for initially stable the system. ADP is conformed to the optimal formation control by the establishment of performance index function [24]. The affine mathematical model is firstly introduced to twinborn the real scenario [25]. The optimal control is transformed into pursuing solution of HJB, and the convergence is proved. ADP involves learners giving rise to learning strategy, and the author studied a competitive learning system setting with cancer cell populations and immune cells, aiming at minimizing the dose administered.

4.2 Pre-knowledge

Consider a classical discrete-time nonlinear affine system,

$$\begin{aligned} \mathcal {x(t+1)} = \mathcal {f}(x(\mathcal {t}))+ \mathcal {g}(x(\mathcal {t}))u(\mathcal {t}) \end{aligned}$$

(4.1)

where the state variable $x(\mathcal {t}) \in \mathcal {R}^{\mathcal {n}}$, the control variable $u(\mathcal {t}) \in \mathcal {R}^{\mathcal {m}}$, and $\mathcal {f}(\cdot ) \in \mathcal {R}^{\mathcal {n}}, \mathcal {g}(\cdot ) \in \mathcal {R}^{\mathcal {n \times m}}$ can be stabilized on a compact set $ \mathbf {\Omega } \in \mathcal {R}^{\mathcal {n}}$, and $\mathcal {f}(0)=0\ \mathcal {g}(0)=0$. Colloquially, the optimal control problem of (4.1) is equivalent to obtaining $u^*(\mathcal {t})=u(x(\mathcal {t}))$(the optimal control law) that minimizes the proposed infinite-horizon performance index:

$$\begin{aligned} \textrm{J}(x(\mathcal {t})) = \sum _{\mathcal {t}=0}^{\infty } \textrm{K}(x(\mathcal {t}),u(\mathcal {t})). \end{aligned}$$

(4.2)

$\textrm{K}(x(\mathcal {t}),u(\mathcal {t}))$ is the cost function, $\textrm{K}(x,u) \ge 0 \ \forall x,u $. Basically, the cost function $\textrm{K}(\cdot )$ is given a quadratic form

$$\begin{aligned} \textrm{K}(x(\mathcal {t}),u(\mathcal {t}))= x^{\scriptscriptstyle {T}}(\mathcal {t})\textrm{P}x(\mathcal {t}) + u^{\scriptscriptstyle {T}}(\mathcal {t})\textrm{Q}u(\mathcal {t}) \end{aligned}$$

(4.3)

$\textrm{P} ,\textrm{Q} > 0 $ are all positive definite matrices.

The optimal control problem of (4.2) can be converted to solve the HJB equation. According to the Bellman optimal principle, the optimal value function should obey the following[9]:

$$\begin{aligned} \textrm{J}^{\scriptscriptstyle {*}}\!(x(\mathcal {t})) \!=\! \underset{u(\mathcal {t})}{\text{ m }in} \Big \{ x^{\scriptscriptstyle {T}}(\mathcal {t})\textrm{P}x(\mathcal {t}) \! + u^{{\scriptscriptstyle {T}}}(\mathcal {t})\textrm{Q}u(\mathcal {t}) \!+ \textrm{J}^{\scriptscriptstyle {*}}\!(x(\mathcal {t+1}))\Big \} \end{aligned}$$

(4.4)

By minimizing the right side of the (4.4) to solve the optimal control law, get the optimal value function $\textrm{J}^*(x(\mathcal {t}))$. For necessity, one can take the partial derivative of the right-hand side of (4.4) with respect to $u(\mathcal {t})$ to obtain $u^*$. Hence,

$$\begin{aligned} u^*(\mathcal {t}) = -\frac{\textrm{Q}^{\scriptscriptstyle {-1}}}{2}\ \Big [\mathcal {g}(x(\mathcal {t}))\Big ]^{\scriptscriptstyle {T}}\!\frac{\partial \textrm{J}^*(x(\mathcal {t+1}))}{\partial x(\mathcal {t+1})} \end{aligned}$$

(4.5)

Take (4.5) into (4.4), it can be obtained that

$$\begin{aligned} \textrm{J}^*(x(\mathcal {t})) =\,&x^{\scriptscriptstyle {T}}(\mathcal {t})\textrm{P}x(\mathcal {t})+\frac{1}{4}\Big [\frac{\partial \textrm{J}^*(x(\mathcal {t+1}))}{\partial x(\mathcal {t+1})}\Big ]^{\scriptscriptstyle {T}}\mathcal {g}(x(\mathcal {t}))\textrm{Q}^{\scriptscriptstyle {-1}} \nonumber \\&\ \cdot \mathcal {g}^{\scriptscriptstyle {T}}(x(\mathcal {t}))\Big [\frac{\partial \textrm{J}^*(x(\mathcal {t+1}))}{\partial x(\mathcal {t}+1)}\Big ] + \textrm{J}^*(x(\mathcal {t+1})). \end{aligned}$$

(4.6)

By the on (4.6), it is almost impossible to obtain an analytical solution for $u^*(\mathcal {t})$. Impossible in the current moment $\mathcal {t}$ can know the next moment $\textrm{J}^*(x(\mathcal {t+1}))$. To overcome this dilemma, the approximate optimal solution of HJB equation can be studied. In the fourth part of this chapter, the derivation of IDHP algorithm is introduced to solve this kind of optimal control problem [26, 27].

4.3 Modeling of Mixed Immunotherapy and Chemotherapy for Tumor Cell

In this part, a mathematical model is constructed from the natural growth of a single type of tumor cells, the gradual increase of the interaction between various immune cells and tumor cells in vivo, and the influence of external application of chemotherapy drugs and immune agents on the population of tumor cells [22, 28, 29].

First, define the acronyms of various cells:

$\mathcal {T_u}(\mathcal {t})$: Tumor cell population in the vivo.
$\mathcal {N_K}(\mathcal {t})$: NK cells are derived from bone marrow lymphoid stem cells.
$\mathcal {C_T}(\mathcal {t})$: Cytotoxic T lymphocytes (CTL), a subdivision of leukocytes, are specific T cells that secrete various cytokines and participate in immune function.
$\mathcal {C_L}(\mathcal {t})$: Number of circulating lymphocytes (or leukocytes).
$\mathcal {Ch_{dr}}(\mathcal {t})$: Chemotherapeutic drug concentration in the blood.
$\mathcal {Im_{dr}}(\mathcal {t})$: Immunotherapy drug concentration in the blood.

For the convenience of writing, the following subsections do not specify the time, and the default is $\mathcal {t}$, lowercase letters “$\mathfrak {a}$, $\mathfrak {b}$, $\mathfrak {c_1}$, $\mathfrak {c_2}$, $\mathfrak {e}$, $\mathfrak {f}$, $\mathfrak {g}$, $\mathfrak {h_1}$, $\mathfrak {h_2}$, $\mathfrak {i}$, $\mathfrak {j}$, $\mathfrak {l}$, $\mathfrak {m}$, $\mathfrak {n_1}$, $\mathfrak {n_2}$, $\mathfrak {p_1}$, $\mathfrak {p_2}$, $\mathfrak {q_1}$, $\mathfrak {q_2}$, $\mathfrak {r}$, $\mathfrak {s}$, $\mathfrak {u}$” all represent fixed real numbers; Uppercase letters “$ \textrm{G,K,O,R,I} $” represent different categories of gain items, which depend on time $\mathcal {t}$; $\mathcal {L}_{(\cdot )}$ is a constant that depends on the cell type; and $\textbf{e}^{(\cdot )}$ stands for exponential functions.

4.3.1 The Natural Growth of Cells

According to the [2, 22], the increase of tumor cells follows a natural growth curve, $ \mathcal {G}_{\mathcal {T_u}}=\mathfrak {a}\mathcal {T_u}(1-\mathfrak {b}\mathcal {T_u})$ ($\mathcal {G}_{(\cdot )}$ represents the natural growth tescr operator of all types of cells). Natural killer cells [22] are assumed to be produced at a constant rate and to be influenced by circulating lymphocytes throughout the production cycle (since circulating lymphocytes represent the overall level of immune health), and thus,$\mathcal {G}_{\mathcal {N_K}}=\mathfrak {c}_1\mathcal {C_{L}}-\mathfrak {c}_2\mathcal {N_K}$. In the absence of tumor cells, Cytotoxic T lymphocytes are assumed to be absent and cell growth of $\mathcal {C_{T}}(\mathcal {t})$ cells is only affected by natural mortality, $\mathcal {G}_{\mathcal {C_{T}}}=-\mathfrak {e}\mathcal {C_{T}}$. Circulating lymphocytes are also produced at a constant rate during their lifetime, $\mathcal {G}_{\mathcal {C_{L}}}=\mathfrak {f}-\mathfrak {g}\mathcal {C_{L}}$. It is set that when the body is injected with chemotherapy drugs or immune agents, it will show exponential decay, $\mathcal {G}_{\mathcal {Ch_{dr}}}=-\textbf{e}^{-\gamma _{\alpha }}\mathcal {Ch_{dr}}$, $\mathcal {G}_{\mathcal {Im_{dr}}}=-\textbf{e}^{-\gamma _{\beta }}\mathcal {Im_{dr}}$.

4.3.2 Intercellular Conditioning

When the above cells exist at the same time, there will be a negative interaction between the two populations, partly due to the competition for growth space and nutrients, and this indirect effect. The other part is the direct resistance of cell populations to each other [22]:

$$\begin{aligned} \mathcal {K}_{\mathcal {T_u}}=-\mathfrak {j}\mathcal {N_K}\mathcal {T_u}\quad \mathcal {K}_{\mathcal {C_{T}}}=\mathfrak {h}_1\cdot \frac{(\mathcal {C_{T}}/\mathcal {T_u})^\mathfrak {i}}{\mathfrak {h}_2(\mathcal {C_{T}}/\mathcal {T_u})^\mathfrak {i}}\cdot \mathcal {T_u} \end{aligned}$$

And just to simplify the writing, let’s write $\mathcal {O}$ for a particular term, and notice that $\mathcal {O}=\mathcal {O}(\mathcal {t})$, which is related to $\mathcal {C_{T}}(\mathcal {t}),\mathcal {T_u}(\mathcal {t})$.

$$\begin{aligned} \mathcal {O}=\mathfrak {h}_1\cdot \frac{(\mathcal {C_{T}}/\mathcal {T_u})^\mathfrak {i}}{\mathfrak {h}_2(\mathcal {C_{T}}/\mathcal {T_u})^\mathfrak {i}} \quad \mathcal {K}_{\mathcal {C_{T}}}=\mathcal {O}\cdot \mathcal {T_u} \end{aligned}$$

(4.7)

NK cells have the function of recruitment, which is to design sequential application methods of cell cycle non-specific drugs and cell cycle specific drugs, recruit more cells at specific stages into the proliferation cycle, so as to increase the number of tumor cells killed [29,30,31].

$$\begin{aligned} \mathcal {R}_{\mathcal {N_k}}=\frac{\mathfrak {l}\cdot \mathcal {T_u}^2}{\mathfrak {m}\mathcal {T_u}^2}\mathcal {N_k};\quad \mathcal {R}_{\mathcal {C_{T}}}(\mathcal {T_u},\mathcal {C_{T}} )=\mathfrak {p}_1\frac{\mathcal {O}^2\mathcal {T_u}^2}{\mathfrak {q}_1\!+\mathcal {O}^2\mathcal {T_u}^2}\mathcal {C_{T}} \end{aligned}$$

$\mathcal {C_{T}}$ cells have a similar recruitment effect [32]. It is directly proportional to the number of cells killed by NK cell lysis of tumor cells, $\mathcal {R}_{\mathcal {C_{T}}}(\mathcal {N_k},\mathcal {T_u} )=\mathfrak {n}_1\mathcal {N_k}\mathcal {T_u}$. Also, the presence of tumor cells stimulates the immune system to secrete more cells, $\mathcal {R}_{\mathcal {C_{T}}}(\mathcal {C_{L}},\mathcal {T_u} )=\mathfrak {n}_2\mathcal {C_{L}}\mathcal {T_u}$. In the immune function, NK cells or CD cells may have to undergo multiple contact with tumor cells, and then inactivate [29, 33,34,35].

$$\begin{aligned} \mathcal {I}_{\mathcal {ac,N_k}}=-\mathfrak {p}_2\mathcal {T_u}\mathcal {N_k}\quad \mathcal {I}_{\mathcal {ac,\mathcal {C_{T}}}}=-\mathfrak {q}_2\mathcal {C_{T}}\mathcal {T_u}\quad \mathcal {I}_{\mathcal {C_{L}},\mathcal {C_{T}}}=-\mathfrak {r}\mathcal {N_k}(\mathcal {C_{T}})^2 \end{aligned}$$

4.3.3 Drug Intervention

All kinds of cell populations in this model contain the action tescr of chemotherapy drugs, and the killing effect of chemotherapy drugs is not always effective. At low drug concentration, the killing rate increases almost linearly, while at high drug concentration, the killing rate tends to be stable. Saturation type is used to describe them in the model [36], $1-\textbf{e}^{\mathcal {Ch_{dr}}(\mathcal {t})}$.

$$\begin{aligned} \mathcal {D}^{\mathcal{C}\mathcal{h}}_{r}(\cdot )=\mathcal {L}_{(\cdot )}(1-\textbf{e}^{\mathcal {Ch_{dr}}(\mathcal {t})})(\cdot )\quad \end{aligned}$$

$(\cdot )=\mathcal {T_u}, \mathcal {C_{T}}, \mathcal {C_{L}}, \mathcal {N_k}$.

$\mathcal {L}_{(\cdot )}$ represents the interaction coefficient between corresponding cells and tumor cells. It also includes immunotherapy, whose impact on immune system efficacy can be mathematically described by the Michaelis-Menten interaction, $\mathfrak {s},\mathfrak {u}$ are the constant [30].

$$\begin{aligned} \mathcal {D}^{\mathcal{I}\mathcal{m}}_{r}(\mathcal {C_T},\mathcal {Im_{dr}})=\mathfrak {u}\frac{\mathcal {Im_{dr}}\mathcal {C_T}}{s+\mathcal {Im_{dr}}} \end{aligned}$$

Chemotherapy and immunotherapy drugs are injected in a certain period of time, and denote by $\mathcal {V}_{Che}(\mathcal {t})$ and $\mathcal {V}_{Im}(\mathcal {t})$ the amount of chemotherapy drug injection and the amount of immunotherapy drug injection, respectively.

4.3.4 Mixed Growth Model of Cell Population

Combined with the above contents, the total cell population growth model can be obtained:

$$\begin{aligned} \mathcal{I}\mathcal{m}&_\mathcal{d}\mathcal{r}(\mathcal {t}+1)=(1-\textbf{e}^{-\gamma _{\beta }})\mathcal {Im_{dr}}(\mathcal {t})+\mathcal {V}_{Im}(\mathcal {t}) \end{aligned}$$

(4.8a)

$$\begin{aligned} \mathcal{C}\mathcal{h}&\mathcal {e_{dr}}(\mathcal {t}+1)=(1-\textbf{e}^{-\gamma _{\alpha }})\mathcal {Ch_{dr}}(\mathcal {t})+\mathcal {V}_{Che}(\mathcal {t}) \end{aligned}$$

(4.8b)

$$\begin{aligned} \mathcal {C_{L}}(\mathcal {t}&+1)=\mathfrak {f}-\mathcal {L}_{\mathcal {C_{L}}}+(1-\mathfrak {g})\mathcal {C_{L}}(\mathcal {t})-\mathcal {L}_{\mathcal {C_{L}}}\textbf{e}^{\mathcal {Ch_{dr}}(\mathcal {t})} \end{aligned}$$

(4.8c)

$$\begin{aligned} \mathcal {T_u}(\mathcal {t}&+1)=(1+\mathfrak {a}-\mathcal {L}_{\mathcal {T_u}})\mathcal {T_u}(\mathcal {t})-\mathfrak {b}\mathcal {T_u}^{\scriptscriptstyle {2}}(\mathcal {t})\nonumber \\&+\mathcal {T_u}(\mathcal {t})\Big [\textbf{e}^{\mathcal {Ch_{dr}}(\mathcal {t})}-\mathfrak {j}\mathcal {N_k}(\mathcal {t})-\mathcal {O}(\mathcal {t})\Big ] \end{aligned}$$

(4.8d)

$$\begin{aligned} \mathcal {C_T}(\mathcal {t}&+1)=(1-\mathfrak {e}-\mathcal {L}_{\mathcal {C_T}})\mathcal {C_T}(\mathcal {t})+[\mathfrak {n}_1\mathcal {N_k}(\mathcal {t})-\mathfrak {q}_2\mathcal {C_T}(\mathcal {t})\nonumber \\&+\mathfrak {n}_2\mathcal {C_{L}}(\mathcal {t})]\cdot \mathcal {T_u}(\mathcal {t})-\mathfrak {r}\mathcal {N_k}(\mathcal {t})\mathcal {C_T}^{\scriptscriptstyle {2}}(\mathcal {t})+\mathcal {L}_{\mathcal {C_T}}\mathcal {C_T}(\mathcal {t}) \end{aligned}$$

(4.8e)

$$\begin{aligned}&\cdot \textbf{e}^{\mathcal {Ch_{dr}}(\mathcal {t})}+\mathcal {C_T}(\mathcal {t})\Big [\frac{\mathfrak {u}\mathcal {Im_{dr}(\mathcal {t})}}{s+\mathcal {Im_{dr}}(\mathcal {t})}+\frac{\mathfrak {p}_1\mathcal {O}^{\scriptscriptstyle {2}}(\mathcal {t})\mathcal {T_u}^{\scriptscriptstyle {2}}(\mathcal {t})}{\mathfrak {q}_1+\mathcal {O}^{\scriptscriptstyle {2}}(\mathcal {t})\mathcal {T_u}^{\scriptscriptstyle {2}}(\mathcal {t})}\Big ]\nonumber \\ \mathcal {N_k}(\mathcal {t}&+1)=-\mathcal {L}_{\mathcal {N_k}}+(1-\mathfrak {c}_2)\mathcal {N_k}(\mathcal {t})+\frac{\mathfrak {l}\cdot \mathcal {T_u}^{\scriptscriptstyle {2}}(\mathcal {t})}{\mathfrak {m}+\mathcal {T_u}^{\scriptscriptstyle {2}}(\mathcal {t})}\mathcal {N_k}(\mathcal {t})\nonumber \\ {}&+\Big [\mathcal {L}_{\mathcal {N_k}}\textbf{e}^{\mathcal {Ch_{dr}}(\mathcal {t})}-\mathfrak {p}_2\mathcal {T_u}(\mathcal {t})\Big ]\mathcal {N_k}(\mathcal {t})+\mathfrak {c}_1\mathcal {C_{L}}(\mathcal {t}) \end{aligned}$$

(4.8f)

4.4 Iterative-Dual Heuristic Dynamic Programming Algorithm for Mixed Treatment

The optimal control problem has been transformed into solving the HJB equation (4.4). In this part, a constrained iterative dual heuristic dynamic programming algorithm based on mixed treatment is given. The algorithm is derived from adaptive dynamic programming [26]. This part mainly three parts research content are presented as working mechanism of ADP algorithm, structure of constrained iterative dual-heuristic dynamic programming algorithm and proof of convergence on I-DHP algorithm.

4.4.1 Working Mechanism of ADP Algorithm

Generally speaking, for unconstrained control problems, the performance functional (4.3) is usually chosen as the quadratic form. In this chapter, considering the actual constraints, is transformed into solving a bounded control problem, adopted a non-quadratic functional as follows:

$$\begin{aligned} \textrm{Y}(\mathcal {t})= x^{\scriptscriptstyle {T}}(\mathcal {t})\textrm{P}x(\mathcal {t}) + 2\int _{0}^{u(\mathcal {t})} \text {tanh}^{\scriptscriptstyle {-T}}(\overline{\mathcal {U}}^{\scriptscriptstyle {-1}}s)\overline{\mathcal {U}}\textrm{Q}ds \end{aligned}$$

It is convenient for mathematical calculation avoiding the loop or unlimited create unlimited returns markov decision process. In the loop or unlimited markov process which will constantly get reward again and again, so we need to add discount factor to avoid infinity and infinitesimal value function,By introducing discount factor $\lambda $, an infinite dimensional problem is transformed into a finite dimensional problem, $0<\lambda \le 1$.

$$\begin{aligned} \textrm{J}(\mathcal {t})&=\sum _{l=\mathcal {t}}^{\infty }\lambda ^{\scriptscriptstyle {l-\mathcal {t}}} \textrm{Y}(x(l),u(l)) =\textrm{Y}(\mathcal {t})\nonumber \\&\quad \quad +\lambda \sum _{l=\mathcal {t}+1}^{\infty }\lambda ^{\scriptscriptstyle {{l-(\mathcal {t}+1)}}} \textrm{Y}(x(l),u(l)) \end{aligned}$$

(4.9)

According to the Bellman optimality principle, the optimal value function satisfies:

$$\begin{aligned} \textrm{J}^{\scriptscriptstyle {*}}\!(x(\mathcal {t}))&= \underset{u(\mathcal {t})}{\text{ m }in} \Big \{ x^{\scriptscriptstyle {T}}(\mathcal {t})\textrm{P}x(\mathcal {t}) + 2\int _{0}^{u(\mathcal {t})} \text {tanh}^{\scriptscriptstyle {-T}}(\overline{\mathcal {U}}^{\scriptscriptstyle {-1}}s)\cdot \nonumber \\&\quad \quad \quad \quad \overline{\mathcal {U}}\textrm{Q}ds+\lambda \textrm{J}^{\scriptscriptstyle {*}}\!(x(\mathcal {t+1}))\Big \}. \end{aligned}$$

(4.10)

In the ADP algorithm structure, it iterates according to the policy iteration, selecting $\textrm{T}^{\iota }(x)$ as the approximation function and $\mathrm {\tau }^{\iota }(x)$ as the corresponding control law. The whole iterative process is as follows:

1.
Let the initial value function be $\textrm{T}^{0}(\cdot )=0$ (which is far from optimal) and compute the control law at “$\iota =0 $”as follows.
$$\begin{aligned} \mathrm {\tau }^{\scriptscriptstyle {0}}\!(x(\mathcal {t})) =&\, \underset{u(\mathcal {t})}{\text {arg min}} \Big \{x^{\scriptscriptstyle {T}}(\mathcal {t})\textrm{P}x(\mathcal {t}) + 2\int _{0}^{u(\mathcal {t})} \text {tanh}^{\scriptscriptstyle {-T}}\!(\overline{\mathcal {U}}^{\scriptscriptstyle {-1}}s) \nonumber \\&\quad \quad \cdot \overline{\mathcal {U}}\textrm{Q}ds +\lambda \textrm{T}^{\scriptscriptstyle {0}}\!(x(\mathcal {t+1})) \Big \} \end{aligned}$$
(4.11)
2.
Get $\textrm{T}^{\scriptscriptstyle {1}}\!(x(\mathcal {t}))$:
$$\begin{aligned} \textrm{T}^{\scriptscriptstyle {1}}\!(x(\mathcal {t})) =\,&x^{\scriptscriptstyle {T}}(\mathcal {t})\textrm{P}x(\mathcal {t}) + 2\int _{0}^{\mathrm {\tau }^{\scriptscriptstyle {0}}\!(x(\mathcal {t}))}\! \text {tanh}^{\scriptscriptstyle {-T}}\!(\overline{\mathcal {U}}^{\scriptscriptstyle {-1}}s)\overline{\mathcal {U}}\nonumber \\ {}&\quad \cdot \textrm{Q}ds +\lambda \textrm{T}^{\scriptscriptstyle {0}}\!(x(\mathcal {t+1})). \end{aligned}$$
(4.12)
3.
And for $\iota =1,2,3,\cdots $
$$\begin{aligned} \mathrm {\tau }^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t})) =\,&\underset{u(\mathcal {t})}{\text {arg min}} \Big \{x^{\scriptscriptstyle {T}}(\mathcal {t})\textrm{P}x(\mathcal {t}) + 2\int _{0}^{u(\mathcal {t})}\! \text {tanh}^{\scriptscriptstyle {-T}}\!(\overline{\mathcal {U}}^{\scriptscriptstyle {-1}}s) \nonumber \\&\quad \quad \cdot \overline{\mathcal {U}}\textrm{Q}ds +\lambda \textrm{T}^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t+1})) \Big \}. \end{aligned}$$
(4.13)
4.
The iterative value function is obtained as follows:
$$\begin{aligned} \textrm{T}^{\scriptscriptstyle {\iota +1}}\!(x(\mathcal {t})) =\,&x^{\scriptscriptstyle {T}}(\mathcal {t})\textrm{P}x(\mathcal {t}) + 2\int _{0}^{\mathrm {\tau }^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t}))} \text {tanh}^{\scriptscriptstyle {-T}}\!(\overline{\mathcal {U}}^{\scriptscriptstyle {-1}}s)\nonumber \\ \cdot \overline{\mathcal {U}}\textrm{Q}ds&+\lambda \textrm{T}^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t+1})). \end{aligned}$$
(4.14)

4.4.2 Structure of Constrained Iterative Dual-Heuristic Dynamic Programming Algorithm

In the dual heuristic dynamic programming, the assumption is that the value function is smooth, modelled on the (4.5), the partial derivatives (4.14) on the right side of $\mathrm {\tau }^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t}))$, can get [37]:

$$\begin{aligned} \frac{\partial {\textrm{T}^{\scriptscriptstyle {\iota +1}}\!(x(\mathcal {t})) }}{\partial {u(\mathcal {t})}} \!&=\!\frac{\partial {\Big \{x^{\scriptscriptstyle {T}}(\mathcal {t})\textrm{P}x(\mathcal {t}) + 2\int _{0}^{u(\mathcal {t})} \text {tanh}^{\scriptscriptstyle {-T}}\!(\overline{\mathcal {U}}^{\scriptscriptstyle {-1}}s)\overline{\mathcal {U}}\textrm{Q}ds\Big \}}}{\partial {u(\mathcal {t})}}\nonumber \\&+\lambda \frac{\partial {\textrm{T}^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t+1}))}}{\partial {u(\mathcal {t})}}= 0. \end{aligned}$$

And, for $\iota =0,1,2,\cdots $

$$\begin{aligned} \mathrm {\tau }^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t})) =\overline{\mathcal {U}}\text {tanh}\Big (\frac{-\lambda }{2\overline{\mathcal {U}}\textrm{Q}}\ \Big [\frac{\partial {x(\mathcal {t+1})}}{\partial {u(\mathcal {t})}}\Big ]^{\scriptscriptstyle {T}}\frac{\partial {\textrm{T}^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t}+1)) }}{\partial {x(\mathcal {t}+1)}}\Big ) \end{aligned}$$

(4.15)

Do the same with (4.14) respect to $x(\mathcal {t})$,

$$\begin{aligned} \frac{\partial {\textrm{T}^{\scriptscriptstyle {\iota +1}}\!(x(\mathcal {t})) }}{\partial {x(\mathcal {t})}}&= 2\textrm{P}x(\mathcal {t})+\lambda \Big [\frac{\partial {x(\mathcal {t+1})}}{\partial {x(\mathcal {t})}}\Big ]^{\scriptscriptstyle {T}}\frac{\partial {\textrm{T}^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t}+1)) }}{\partial {x(\mathcal {t}+1)}}. \end{aligned}$$

(4.16)

As can be seen in (4.15) and (4.16), both have $\displaystyle {\frac{\partial {\textrm{T}^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t}+1)) }}{\partial {x(\mathcal {t}+1)}}}$, compared to $\textrm{T}^{\scriptscriptstyle {\iota }}(x(\mathcal {t}))$ in (4.14), DHP algorithm evaluates and updates the first partial derivative of the value function.

The specific algorithm structure is as follows: (set costate function $\textrm{C}^{\boldsymbol{\iota }}(x(\mathcal {t}))=\partial {\textrm{T}^{\boldsymbol{\iota }}}\!(x(\mathcal {t}))/\partial {x(\mathcal {t})}$).

4.4.3 Proof of Convergence on I-DHP Algorithm

The convergence proof of the algorithm shows that with the increase of the number of iterations, the evaluation and update between (4.15) and (4.16) are continuously completed, and the termination condition can finally be satisfied and the optimal solution can be obtained.

The corresponding lemma needs to be given before the formal theorem proving. In order to facilitate writing, abbreviated “$2\int _{0}^{u(\mathcal {t})} \text {tanh}^{\scriptscriptstyle {-T}}(\overline{\mathcal {U}}^{\scriptscriptstyle {-1}}s)\overline{\mathcal {U}}\textrm{Q}ds$” to “$\mathrm {H(u(\mathcal {t}))}$”.

Lemma 4.1

Assume that $\mathrm {\tau }^{\scriptscriptstyle {\iota }}(\mathcal {t})$ is the control sequence calculated by (4.13), $\textrm{T}^{\scriptscriptstyle {\iota }}(x)$ is the value function calculated by (4.14). $\mathrm {\omega }^{\scriptscriptstyle {\iota }}(\mathcal {t})$ is any admissible control sequence in the domain, and $\mathrm {\Omega }^{\scriptscriptstyle {\iota }}(x)$ is its corresponding value function equation,

$$\begin{aligned} \mathrm {\Omega }^{\scriptscriptstyle {\iota +1}}\!(x(\mathcal {t})) \!=\! x^{\scriptscriptstyle {T}}(\mathcal {t})\textrm{P}x(\mathcal {t}) + \mathrm {H(\mathrm {\omega }^{\scriptscriptstyle {\iota }}(\mathcal {t}))} +\lambda \mathrm {\Omega }^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t+1})) \end{aligned}$$

(4.17)

and it is easy to obtain:

If $\mathrm {\Omega }^{\scriptscriptstyle {0}}\!(\cdot )=\textrm{T}^{\scriptscriptstyle {0}}\!(\cdot )=0$, then $0 \le \textrm{T}^{\scriptscriptstyle {\iota }}(x) \le \mathrm {\Omega }^{\scriptscriptstyle {\iota }}(x)$, $\forall \iota $.

Proof

The conclusion is obvious. $\textrm{T}^{\scriptscriptstyle {\iota }}(x)$ is the minimum value that can be obtained on the right side of (4.14), and $\mathrm {\tau }^{\scriptscriptstyle {\iota }}(\mathcal {t})$ is the corresponding control sequence. And $\mathrm {\Omega }^{\scriptscriptstyle {\iota }}(x)$ is any admissible value function, so it must be not less than $\textrm{T}^{\scriptscriptstyle {\iota }}(x)$.$\blacksquare $

Lemma 4.2

Given that $\textrm{T}^{\scriptscriptstyle {\iota }}(x)$ by the (4.14), and if the system is controlled, then $\textrm{T}^{\scriptscriptstyle {\iota }}(x)$ has an upper bounded $\mathfrak {Z}$ (a constant).

$$\begin{aligned} 0 \le \textrm{T}^{\scriptscriptstyle {\iota }}(x) \le \mathfrak {Z}, \forall \iota \end{aligned}$$

Proof

Set $\mathcal {v}^{\scriptscriptstyle {\iota }}(\mathcal {t})$ to be an admissible and stabilizing control sequence and $\mathcal {V}^{\scriptscriptstyle {\iota }}(x)$ to be:

$$\begin{aligned} \mathcal {V}^{\scriptscriptstyle {\iota +1}}(x)=\!x^{\scriptscriptstyle {T}}(\mathcal {t})\textrm{P}x(\mathcal {t}) + \mathrm {H(\mathcal {v}^{\scriptscriptstyle {\iota }}(\mathcal {t}))} +\lambda \mathcal {V}^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t+1})) \end{aligned}$$

Then, it can be obtained: ($\mathcal {V}^{\scriptscriptstyle {0}}\!(\cdot )= \textrm{T}^{\scriptscriptstyle {\iota }}(\cdot ) =0$)

$$\begin{aligned} \mathcal {V}^{\scriptscriptstyle {\iota +1}}(x)&=\!x^{\scriptscriptstyle {T}}(\mathcal {t})\textrm{P}x(\mathcal {t}) + \mathrm {H(\mathcal {v}^{\scriptscriptstyle {\iota }}(\mathcal {t}))} +\lambda \mathcal {V}^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t+1}))\\&=\!x^{\scriptscriptstyle {T}}(\mathcal {t})\textrm{P}x(\mathcal {t}) + \mathrm {H(\mathcal {v}^{\scriptscriptstyle {\iota }}(\mathcal {t}))} +\lambda \Big [ x^{\scriptscriptstyle {T}}(\mathcal {t}+1)\textrm{P} \\&\!\cdot x(\mathcal {t}+1)\!+\!\mathrm {H(\mathcal {v}^{\scriptscriptstyle {\iota -1}}(\mathcal {t}+1))}\Big ]\!+\!\lambda ^{\scriptscriptstyle {2}} \mathcal {V}^{\scriptscriptstyle {\iota -1}}\!(x(\mathcal {t+2}))\\&=\ldots \\&=\!x^{\scriptscriptstyle {T}}(\mathcal {t})\textrm{P}x(\mathcal {t}) + \mathrm {H(\mathcal {v}^{\scriptscriptstyle {\iota }}(\mathcal {t}))}+\lambda \Big [ x^{\scriptscriptstyle {T}}(\mathcal {t}+1)\textrm{P} \\&\!\cdot x(\mathcal {t}+1)\!+\!\mathrm {H(\mathcal {v}^{\scriptscriptstyle {\iota -1}}(\mathcal {t}+1))}\Big ]\!+\cdots \\&+\!\lambda ^{\scriptscriptstyle {\iota }}\Big [x^{\scriptscriptstyle {T}}(\mathcal {t}+\iota )\textrm{P}x(\mathcal {t}+\iota ) + \mathrm {H(\mathcal {v}^{\scriptscriptstyle {0}}(\mathcal {t}+\iota ))}\Big ]\\&+\lambda ^{\scriptscriptstyle {\iota +1}} \mathcal {V}^{\scriptscriptstyle {0}}\!(x(\mathcal {t+\iota +1})) .\\ \end{aligned}$$

$\mathcal {V}^{\scriptscriptstyle {\iota }+l}(x) = \sum _{l=0}^{\iota }\lambda ^{\scriptscriptstyle {l}} \Big [\!x^{\scriptscriptstyle {T}}(\mathcal {t}+l)\textrm{P}x(\mathcal {t}+l) + \textrm{H}(\mathcal {v}^{\scriptscriptstyle {\iota }-l}(\mathcal {t}+l)) \Big ]\le \lim _{\iota \rightarrow \infty } \Big \{ \sum _{l=0}^{\iota }\lambda ^{\scriptscriptstyle {l}} \Big [ \!x^{\scriptscriptstyle {T}}(\mathcal {t}+l)\textrm{P}x(\mathcal {t}+l) + \textrm{H}(\mathcal {v}^{\scriptscriptstyle {\iota }-l}(\mathcal {t}+l)) \Big ] \Big \} $.

Due to the admissible control sequence $\mathcal {v}{\scriptscriptstyle {\iota }}(\mathcal {t})$, it has an upper bound $\mathfrak {Z}$ that

$$\begin{aligned} \mathcal {V}^{\scriptscriptstyle {\iota }+l}(x) \! \le \!\lim _{\iota \rightarrow \infty }\! \Big \{ \sum _{l=0}^{\iota }\lambda ^{\scriptscriptstyle {l}} \Big [ \!x^{\scriptscriptstyle {T}}(\mathcal {t}+l)\textrm{P}x(\mathcal {t}+l)\!+\! \textrm{H}(\mathcal {v}^{\scriptscriptstyle {\iota }-l}(\mathcal {t}+l))\!\Big ] \Big \}\!\le \!\mathfrak {Z}. \end{aligned}$$

Combined with Lemma 4.1, it can be obtained the result.$\blacksquare $

Theorem 4.1

For the iterative cost function $\textrm{T}^{\scriptscriptstyle {\iota }}(x)$ which follows (4.14) and its corresponding control law $\mathrm {\tau }^{\scriptscriptstyle {\iota }}(\mathcal {t})$ obtained by (4.13), it can be concluded that with the increase of the number of iterations, $\textrm{T}^{\scriptscriptstyle {\iota }}(x)$ will converge to the optimal value function and $\mathrm {\tau }^{\scriptscriptstyle {\iota }}(\mathcal {t})$ will converge to the optimal control law, i.e., $\textrm{T}^{\scriptscriptstyle {\iota }}(x)\rightarrow \textrm{J}^*(x)$, $\mathrm {\tau }^{\scriptscriptstyle {\iota }}(\mathcal {t}) \rightarrow {u}^*(\mathcal {t})$.

Proof

From Lemma 4.1, $\mathrm {\Omega }^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t}))$ is the cost function corresponding to an any admissible control sequence $\mathrm {\omega }^{\scriptscriptstyle {\iota }}(\mathcal {t})$, with $\Omega ^0(\cdot ) = 0$.

Firstly, $\iota =0$,

$$\begin{aligned} \textrm{T}^{\scriptscriptstyle {1}}\!(x(\mathcal {t}))-\mathrm {\Omega }^{\scriptscriptstyle {0}}\!(x(\mathcal {t}))=x^{\scriptscriptstyle {T}}(\mathcal {t})\textrm{P}x(\mathcal {t}) + \mathrm {H(\mathrm {\omega }^{\scriptscriptstyle {0}}(\mathcal {t}))} \ge 0 \end{aligned}$$

then, $\textrm{T}^{\scriptscriptstyle {1}}\!(x(\mathcal {t}))\ge \mathrm {\Omega }^{\scriptscriptstyle {0}}\!(x(\mathcal {t}))$, $\iota =0$.

Secondly, for $\iota -1$, given $\textrm{T}^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t}))\ge \Omega ^{\scriptscriptstyle {\iota -1}}(x(\mathcal {t}))$, $\forall x(\mathcal {t})$. Then, as $\iota $, it can be able to conclude that

$$\begin{aligned} \textrm{T}^{\scriptscriptstyle {\iota +1}}\!(x(\mathcal {t}))-&\mathrm {\Omega }^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t}))=x^{\scriptscriptstyle {T}}(\mathcal {t})\textrm{P}x(\mathcal {t}) + \mathrm {H(\mathrm {\omega }^{\scriptscriptstyle {\iota +1}}(\mathcal {t}))} \nonumber \\&+\lambda \big (\textrm{T}^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t}+1))-\Omega ^{\scriptscriptstyle {\iota -1}}(x(\mathcal {t}+1))\big )\nonumber \\&\ge \lambda \big (\textrm{T}^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t}+1))-\Omega ^{\scriptscriptstyle {\iota -1}}(x(\mathcal {t}+1))\big ) \end{aligned}$$

(4.18)

By the mathematical induction, it can be obtained that $\textrm{T}^{\scriptscriptstyle {\iota +1}}\!(x(\mathcal {t}))\ge \mathrm {\Omega }^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t}))$, $\forall \iota $. Combined with Lemma 4.1, it is obviously concluded that $\textrm{T}^{\scriptscriptstyle {\iota +1}}\!(x(\mathcal {t})) \ge \mathrm {\Omega }^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t}))\ge \textrm{T}^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t}))$, that is, $\Big \{\textrm{T}^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t}))\Big \} $ is a non-decreasing sequence, $\forall \iota $.

From Lemma 4.2, the sequence $\Big \{\textrm{T}^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t}))\Big \}$ is bounded to $\mathfrak {Z}$, which is equivalent to that the iterative equation has a limit value, which can be expressed as $\lim _{\iota \rightarrow \infty }\textrm{T}^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t}))=\textrm{T}^{\scriptscriptstyle {\infty }}\!(x(\mathcal {t}))$. Therefore, it is bold to assume that $ \textrm{T}^{\scriptscriptstyle {\infty }}\!(x(\mathcal {t}))=\underset{\mathrm {\tau }(\mathcal {t})}{\text {min}} \Big \{x^{\scriptscriptstyle {T}}(\mathcal {t})\textrm{P}x(\mathcal {t}) + \mathrm {H(\mathrm {\tau }(\mathcal {t}))}+\lambda \textrm{T}^{\scriptscriptstyle {\infty }}\!(x(\mathcal {t}))\Big \} $. This assumption will be proved below. According to (4.14),

$$\begin{aligned} \textrm{T}^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t})) \le x^{\scriptscriptstyle {T}}(\mathcal {t})\textrm{P}x(\mathcal {t})+ \mathrm {H(\mathrm {\tau }(\mathcal {t}))}+ \lambda \textrm{T}^{\scriptscriptstyle {\iota -1}}\!(x(\mathcal {t}+1)). \end{aligned}$$

(4.19)

From the non-decreasing property of sequence $\Big \{\textrm{T}^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t}))\Big \} $, it can be known that $\textrm{T}^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t})) \le \textrm{T}^{\scriptscriptstyle {\infty }}\!(x(\mathcal {t})) \quad \forall \iota $.

Substitute it into (4.19),

$$\begin{aligned} \textrm{T}^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t})) \le x^{\scriptscriptstyle {T}}(\mathcal {t})\textrm{P}x(\mathcal {t})+ \mathrm {H(\mathrm {\tau }(\mathcal {t}))}+ \lambda \textrm{T}^{\scriptscriptstyle {\infty }}\!(x(\mathcal {t}+1)), \quad \forall \iota . \end{aligned}$$

(4.20)

(4.20) for any $\iota $ was established, that when $\iota =\infty $, also meet.

$$\begin{aligned} \textrm{T}^{\scriptscriptstyle {\infty }}\!(x(\mathcal {t})) \le x^{\scriptscriptstyle {T}}(\mathcal {t})\textrm{P}x(\mathcal {t})+ \mathrm {H(\mathrm {\tau }(\mathcal {t}))}+ \lambda \textrm{T}^{\scriptscriptstyle {\infty }}\!(x(\mathcal {t}+1)), \forall \iota . \end{aligned}$$

(4.21)

Considering that $\mathrm {\tau }(\mathcal {t})$ is any given control sequence, (4.21) can further obtain:

$$\begin{aligned} \textrm{T}^{\scriptscriptstyle {\infty }}\!(x(\mathcal {t})) \le \text {min}\Big \{x^{\scriptscriptstyle {T}}(\mathcal {t})\textrm{P}x(\mathcal {t})+ \mathrm {H(\mathrm {\tau }(\mathcal {t}))}+ \lambda \textrm{T}^{\scriptscriptstyle {\infty }}\!(x(\mathcal {t}+1))\Big \}, \forall \iota . \end{aligned}$$

(4.22)

With (4.14), $\textrm{T}^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t}))\!=\!\underset{\mathrm {\tau }(\mathcal {t})}{\text {min}}\Big \{x^{\scriptscriptstyle {T}}(\mathcal {t})\textrm{P}x(\mathcal {t})\!+\! \mathrm {H(\mathrm {\tau }(\mathcal {t}))}\!+\!\lambda \textrm{T}^{\scriptscriptstyle {\iota -1}}\!(x(\mathcal {t}+1))\Big \}$.$\forall \iota $

At this time of its on the left, and as a result of $\Big \{\textrm{T}^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t}))\Big \} $ non decreasing, get, $\textrm{T}^{\scriptscriptstyle {\infty }}\!(x(\mathcal {t}))\!\ge \!\underset{\mathrm {\tau }(\mathcal {t})}{\text {min}}\Big \{x^{\scriptscriptstyle {T}}(\mathcal {t})\textrm{P}x(\mathcal {t})\!+\! \mathrm {H(\mathrm {\tau }(\mathcal {t}))}\!+\!\lambda \textrm{T}^{\scriptscriptstyle {\iota -1}}\!(x(\mathcal {t}+1))\Big \}$. Similarly, let $\iota \rightarrow \infty $,

$$\begin{aligned} \textrm{T}^{\scriptscriptstyle {\infty }}\!(x(\mathcal {t}))\!\ge \!\text {min}\Big \{x^{\scriptscriptstyle {T}}(\mathcal {t})\textrm{P}x(\mathcal {t})+ \mathrm {H(\mathrm {\tau }(\mathcal {t}))}+ \lambda \textrm{T}^{\scriptscriptstyle {\infty }}\!(x(\mathcal {t}+1))\Big \}, \forall \iota . \end{aligned}$$

(4.23)

Combining (4.22) and (4.23), it follows that,

$$\begin{aligned} \textrm{T}^{\scriptscriptstyle {\infty }}\!(x(\mathcal {t}))\!=\!\text {min}\Big \{x^{\scriptscriptstyle {T}}(\mathcal {t})\textrm{P}x(\mathcal {t})+ \mathrm {H(\mathrm {\tau }(\mathcal {t}))}+ \lambda \textrm{T}^{\scriptscriptstyle {\infty }}\!(x(\mathcal {t}+1))\Big \}, \forall \iota . \end{aligned}$$

(4.24)

Can be seen from (4.24), the previous assumption proved how. Can be learned from Theorem 4.1, $\textrm{T}^{\scriptscriptstyle {\infty }}\!(x(\mathcal {t}))$ is a discrete-time time solution of HJB equation. Considering the uniqueness of the solution of the discrete-time-time HJB equation, it means that $\textrm{T}^{\scriptscriptstyle {\infty }}\!(x(\mathcal {t}))$ in (4.24) and $\textrm{J}^{\scriptscriptstyle {*}}\!(x(\mathcal {t}))$ in (4.10) are the same solution. In other words, $\lim _{\iota \rightarrow \infty }\textrm{T}^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t}))=\textrm{T}^{\scriptscriptstyle {*}}\!(x(\mathcal {t}))=\textrm{J}^{\scriptscriptstyle {*}}\!(x(\mathcal {t}))$.$\blacksquare $

Theorem 4.1 proves that $\textrm{T}^{\scriptscriptstyle {\infty }}\!(x(\mathcal {t}))$ in (4.24) and $\textrm{J}^{\scriptscriptstyle {*}}\!(x(\mathcal {t}))$ in (4.24) in (4.10) are the same solution of the HJB equation corresponding to the same cost function, while the termination criterion “$\left\| \textrm{T}^{\iota +1}(x(\mathcal {t}))-\textrm{T}^{\iota }(x(\mathcal {t})) \right\| \le \epsilon $” indicates that the optimal control law can be solved in finite time, and Theorem 4.2 will explain this context.

Theorem 4.2

The system (4.1) is controllable and the initial state $x(\mathcal {t})$ of the system can be chosen arbitrarily. Under the finite iteration index $\iota $, the iterative approximate cost function and the optimal cost function $\Vert \textrm{T}^{*}(x(\mathcal {t}))-\textrm{T}^{\iota }(x(\mathcal {t})) \Vert \le \epsilon $ are equivalent to the termination criterion $\Vert \textrm{T}^{\iota +1}(x(\mathcal {t}))-\textrm{T}^{\iota }(x(\mathcal {t})) \Vert \le \epsilon $.

Proof

In Theorem 4.1, it is mentioned that $\Big \{\textrm{T}^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t}))\Big \} $ is a non-decreasing sequence, that is

$$\begin{aligned} \textrm{J}^{\scriptscriptstyle {*}}\!(x(\mathcal {t}))=\textrm{T}^{\scriptscriptstyle {*}}\!(x(\mathcal {t})) \ge \textrm{T}^{\scriptscriptstyle {\iota +1}}\!(x(\mathcal {t})) \ge \textrm{T}^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t})). \end{aligned}$$

(4.25)

If $\left\| \textrm{T}^{*}(x(\mathcal {t}))-\textrm{T}^{\iota }(x(\mathcal {t})) \right\| \le \epsilon $, it can be concluded that

$$\begin{aligned} \textrm{T}^{*}(x(\mathcal {t}))-\textrm{T}^{\iota }(x(\mathcal {t}))\le \epsilon ,\ \textrm{T}^{*}(x(\mathcal {t}))\le \textrm{T}^{\iota }(x(\mathcal {t}))+\epsilon . \end{aligned}$$

(4.26)

Combined (4.26) with (4.25),

$$\begin{aligned} \textrm{T}^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t}))\le \textrm{T}^{\scriptscriptstyle {\iota +1}}\!(x(\mathcal {t}))\le \textrm{T}^{\scriptscriptstyle {*}}\!(x(\mathcal {t}))\le \textrm{T}^{\iota }(x(\mathcal {t}))+\epsilon . \end{aligned}$$

$$\begin{aligned} \Rightarrow \textrm{T}^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t}))\le \textrm{T}^{\scriptscriptstyle {\iota +1}}\!(x(\mathcal {t}))\le \textrm{T}^{\iota }(x(\mathcal {t}))+\epsilon . \end{aligned}$$

(4.27)

It can get that,

$$\begin{aligned} \left\| \textrm{T}^{\iota +1}(x(\mathcal {t}))-\textrm{T}^{\iota }(x(\mathcal {t})) \right\| \le \epsilon \end{aligned}$$

(4.28)

From a different perspective, if (4.28) holds and the $\Big \{\textrm{T}^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t}))\Big \}$ is nondecreasing,

$$\begin{aligned} -\epsilon +\textrm{T}^{\iota +1}(x(\mathcal {t})) \le \textrm{T}^{\iota }(x(\mathcal {t}))\le \textrm{T}^{*}(x(\mathcal {t}))=\textrm{J}^{*}(x(\mathcal {t})). \end{aligned}$$

(4.29)

It is obvious that $\textrm{T}^{\iota +1}(x(\mathcal {t}))-\textrm{T}^{*}(x(\mathcal {t})\le \epsilon $,

$$\begin{aligned} \left\| \textrm{T}^{\iota +1}(x(\mathcal {t}))-\textrm{T}^{*}(x(\mathcal {t})\right\| \le \epsilon . \end{aligned}$$

(4.30)

Based on the analysis of both sides, it can be concluded that $\Vert \textrm{T}^{*}\!(x(\mathcal {t}))\!-\!\textrm{T}^{\iota }\!(x(\mathcal {t})) \Vert \le \epsilon \Leftrightarrow \left\| \textrm{T}^{\iota +1}\!(x(\mathcal {t})) -\textrm{T}^{\iota }\!(x(\mathcal {t})) \right\| \le \epsilon $.$\blacksquare $

The two theorems deal with value functions $\textrm{T}(x(\mathcal {t}))$, while Algorithm 1 deals with costate function $\boldsymbol{\textrm{C}}(x(\mathcal {t}))$. It will be shown in Theorem 4.3 that this convergence is equivalent.

Theorem 4.3

(4.14) defines the sequence of value functions. The control law sequence is shown in (4.13) and the update cofunction sequence is shown in (4.16). The optimal value is chosen as the limit of the costate function $\textrm{C}^{*}(x(\mathcal {t}))=\lim _{\iota \rightarrow \infty } \!\textrm{C}^{\iota }(x(\mathcal {t}))$, and when the value function approaches the optimal value, the sequence of costate functions converges with the sequence of the control law.

Proof

In Theorems 4.1 and 4.2, it is shown that $\textrm{T}^{*}(x(\mathcal {t}))$ and $\textrm{T}^{\infty }(x(\mathcal {t}))$ satisfy the corresponding HJB equation respectively. i.e., $\textrm{T}^{\scriptscriptstyle {\infty }}\!(x(\mathcal {t}))\!=\!\textrm{T}^{\scriptscriptstyle {*}}\!(x(\mathcal {t}))\!=\!\underset{\mathrm {\tau }(\mathcal {t})}{\text {min}}\Big \{x^{\scriptscriptstyle {T}}(\mathcal {t})\textrm{P}x(\mathcal {t})+ \mathrm {H(\mathrm {\tau }(\mathcal {t}))}+ \lambda \textrm{T}^{\scriptscriptstyle {*}}\!(x(\mathcal {t}+1))\Big \}.$

Therefore, it can be concluded that the sequence $\Big \{\textrm{T}^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t}))\Big \}$ of value functions converges to the optimal value function of the discrete-time-time HJB equation. i.e.,$\textrm{T}^{\scriptscriptstyle {\iota }} \rightarrow \textrm{T}^{\scriptscriptstyle {*}} $, as $\iota \rightarrow \infty $.

Given $\textrm{C}^{\boldsymbol{\iota }}(x(\mathcal {t}))=\partial {\textrm{T}^{\boldsymbol{\iota }}}\!(x(\mathcal {t}))/\partial {x(\mathcal {t})}$. It is also possible that the corresponding sequence $\Big \{\textrm{C}^{\scriptscriptstyle {\iota }}\!(x(\mathcal {t}))\Big \}$ of costate function converges to $\textrm{C}^{\scriptscriptstyle {\iota }}\! \rightarrow \textrm{C}^{\scriptscriptstyle {*}} $ as $\iota \rightarrow \infty $. Due to the association, costate function is convergent, at the same time, it is concluded that the corresponding sequence converges to the optimal control law $\mathrm {\tau }^{\scriptscriptstyle {\iota }}\! \rightarrow \mathrm {\tau }^{\scriptscriptstyle {*}} $ as $\iota \rightarrow \infty $.$\blacksquare $

Table 4.1 Estimated parameter values

Full size table

4.5 Multi-factor Mixed Optimization Experiment Treatment of Tumor Cells

This section explores a novel therapeutic intervention for tumor cell growth inhibition. A discrete-time affine control system has been constructed from the multi-factor tumor cell growth model, and the iterative DHP algorithm has been applied to realize the reduction of drug dosage under the condition of greatly inhibiting the proliferation of tumor cell population.

4.5.1 Discrete Affine Model of Tumor Cell Growth

According to clinical medical statistics and literature [2, 30, 31, 38,39,40,41], the values of each parameter in the tumor cell proliferation model affected by multiple factors are shown in Table 4.1.

Using these parameters, try to observe the tumor cell proliferation model given some circumstances.

With reference to [1], the initial “$\mathcal {T_u}(0)=2\times 10^{7}$, $\mathcal {N_k}(0)=1\times 10^{3}$, $\mathcal {C_T}(0)=10$, $\mathcal {C_L}(0)=6\times 10^{8}$ ” was selected, and the chemotherapy drug at a dose of $\mathcal {V}_{Che}(\mathcal {t})=3.5$ was injected every 5 days in (4.8) to observe the changes of various cells in the current body.

Figure 4.1 shows an injection method of chemotherapy drugs in the form of pulse. The drug is injected into the body to study the influence of the addition of chemotherapy drugs on the number of various cell populations in the body at different times. As can be seen from the curve of tumor cell change in Fig. 4.1a (the second curve), a dose of 5 chemotherapy drug injected every 5 days for 60 days is sufficient to control the proliferation of tumor cells. The four curves showed different forms of oscillatory changes in the early stage, which mainly depended on the pulse injection of chemotherapy drugs, and immunospecific cell $\mathcal {C_T}$, which also decreased to stability after tumor cells stabilized in the later stage. Figure 4.1b shows the corresponding mode of administration, with the red is the pulse of administration and the green is the change of the corresponding chemotherapy drug in the body.

4.5.2 Construction of Affine Model

In (4.8), although the discrete model has been obtained, it is too complex and the addition of various coupling forms, which is difficult to be directly combined into the iteration-DHP structure. At this time, the idea of constructing a simple affine model is introduced. It can be easily learned from the above two sub-parts, which can be simplified as the influence of the injected concentrations of the two drugs on tumor cells in the body. Then, the current concentration of tumor cells can be selected as the state variable, and the injected concentrations of the two drugs (chemotherapy drugs and immune agents) can be used as the control variable to form a data set, starting from a large number of random data. The desired affine discrete model is obtained by fitting.

$$\begin{aligned} x(\mathcal {t}+1) = \mathcal {f}(x(\mathcal {t}))+ \begin{bmatrix} \mathcal {g}_1(x(\mathcal {t})) \\ \mathcal {g}_2(x(\mathcal {t})) \end{bmatrix}^{\scriptscriptstyle {T}} u(\mathcal {t}) \end{aligned}$$

(4.31)

$$\begin{aligned} \mathcal {g}_1\Big (\text {log}_{10}(x)\Big )&= 0.001771\Big (\text {log}_{10}(x)\Big )\!^{\scriptscriptstyle {5}}\!-\!0.02931\Big (\text {log}_{10}(x)\Big )\!^{\scriptscriptstyle {4}} \!+\!0.1793\Big (\text {log}_{10}(x)\Big )\!^{\scriptscriptstyle {3}}\!\nonumber \\ {}&-\!0.5353\Big (\text {log}_{10}(x)\Big )\!^{\scriptscriptstyle {2}}\!+\!1.741\Big (\text {log}_{10}(x)\Big )\!-\!1.133 \end{aligned}$$

(4.32)

$$\begin{aligned} \mathcal {g}_2\Big (\text {log}_{10}(x)\Big )&= 0.007579 \Big (\text {log}_{10}(x)\Big )\!^{\scriptscriptstyle {4}}\!-\!0.1087\Big (\text {log}_{10}(x)\Big )\!^{\scriptscriptstyle {3}}\nonumber \\&\!+\!0.4838\Big (\text {log}_{10}(x)\Big )\!^{\scriptscriptstyle {2}}\!+\!0.1783\Big (\text {log}_{10}(x)\Big )\!^{\scriptscriptstyle {2}}\!-\!0.2304 \end{aligned}$$

(4.33)

4.5.3 Optimization of Mixed Treatment Regimen

Following the affine model mentioned above, it is necessary to specify the cost function required in iteration-DHP before optimizing the treatment:

$$\begin{aligned} \textrm{J}&(x(\mathcal {t}))= \sum _{\iota =0}^{\infty }\lambda ^{\scriptscriptstyle {\iota }} \Big \{ x^{\scriptscriptstyle {T}}(\mathcal {t})\textrm{P}x(\mathcal {t}) + m_1\int _{0}^{u_1(\mathcal {t})} \text {tanh}^{\scriptscriptstyle {-T}}(\overline{\mathcal {U}}_1^{\scriptscriptstyle {-1}}s)\nonumber \\&\cdot \overline{\mathcal {U}}_1\textrm{Q}_1ds +m_2\int _{0}^{u_2(\mathcal {t})} \text {tanh}^{\scriptscriptstyle {-T}}(\overline{\mathcal {U}}_2^{\scriptscriptstyle {-1}}s)\overline{\mathcal {U}}_2\textrm{Q}_2ds\Big \}. \end{aligned}$$

(4.34)

Table 4.2 Default parameters

Full size table

According to clinical experience, the default parameters are shown in Table 4.2. The iteration error $\epsilon $ is set to $10^{-6}$, and the iteration error variation curve is shown in Fig. 4.2. The error decreases extremely fast in the first twenty iterations of the calculation, and the convergence rate gradually decreases after 20 iterations. At $\iota =67$, the termination condition has been satisfied.

Analysis of tumor cells after meet the termination criterion, according to the optimized regimen of population change curve as shown in Fig. 4.3, visible at an extremely rapid rate by the growth of stem. The usage and dosage of two drugs are shown in Fig. 4.4. Figure 4.4a represents the curve of injected concentration of chemotherapy drugs, and Fig. 4.4b represents the curve of injected concentration of immune drugs.

4.6 Conclusion

In this chapter, a tumor immune differential game system has been established to solve the problem of optimal clinical tumor treatment oriented to evolutionary dynamics. Firstly, a mathematical model of the game system between tumor cells and immune cells treated by immune agents and chemotherapy drugs has been given. Secondly, the bounded optimal control problem has been solved by the HJB equation with infinite horizon performance index which is subjected to practical constraints. Finally, the optimal iterative approximate control strategy has been obtained by the iterative dual heuristic dynamic programming algorithm, and the effectiveness of the proposed algorithm has been proved.

References

Gao H, Li W, Pan M, Han Z, Poor HV (2021) Modeling COVID-19 with mean field evolutionary dynamics: social distancing and seasonality. J Commun Netw 23(5):314–325
Article Google Scholar
Diefenbach A, Jensen ER, Jamieson AM, Raulet D (2001) Rae1 and H60 ligands of the NKG2D receptor stimulate tumor immunity. Nature 413(6852):165–171
Article Google Scholar
Cassetta L, Pollard JW (2018) Targeting macrophages: therapeutic approaches in cancer. Nat Rev Drug Discov 17(12):887–904
Article Google Scholar
Yang Y, Modares H, Vamvoudakis KG, He W, Xu CZ, Wunsch DC (2022) Hamiltonian-driven adaptive dynamic programming with approximation errors. IEEE Trans Cybernet 52(12):13762–13773
Article Google Scholar
Rizvi SAA, Lin Z (2022) Adaptive dynamic programming for model-free global stabilization of control constrained continuous-time systems. IEEE Trans Cybernet 52(2):1048–1060
Article Google Scholar
Luo S, Lewis FL, Song Y, Ouakad HM (2022) Optimal synchronization of unidirectionally coupled FO chaotic electromechanical devices with the hierarchical neural network. IEEE Trans Neural Netw Learn Syst 33(3):1192–1202
Article MathSciNet Google Scholar
Yang Y, Vamvoudakis KG, Modares H, Yin Y, Wunsch DC (2021) Hamiltonian-driven hybrid adaptive dynamic programming. IEEE Trans Syst Man Cybernet: Syst 51(10):6423–6434
Article Google Scholar
Moghadam R, Natarajan P, Jagannathan S (2022) Online optimal adaptive control of partially uncertain nonlinear discrete-time systems using multilayer neural networks. IEEE Trans Neural Netw Learn Syst 33(9):4840–4850
Article MathSciNet Google Scholar
Luo B, Liu D, Huang T, Wang D (2016) Model-free optimal tracking control via critic-only Q-learning. IEEE Trans Neural Netw Learn Syst 27(10):2134–2144
Article MathSciNet Google Scholar
Niu H, Bhowmick C, Jagannathan S (2020) Attack detection and approximation in nonlinear networked control systems using neural networks. IEEE Trans Neural Netw Learn Syst 31(1):235–245
Article MathSciNet Google Scholar
Yang X, He H, Zhong X (2021) Approximate dynamic programming for nonlinear-constrained optimizations. IEEE Trans Cybernet 51(5):2419–2432
Article Google Scholar
Zhang D, Ye Z, Feng G, Li H (2022) Intelligent event-based fuzzy dynamic positioning control of nonlinear unmanned marine vehicles under DoS attack. IEEE Trans Cybernet 52(12):13486–13499
Article Google Scholar
Huang M, Jiang ZP, Ozbay K (2022) Learning-based adaptive optimal control for connected vehicles in mixed traffic: robustness to driver reaction time. IEEE Trans Cybernet 52(6):5267–5277
Article Google Scholar
Sun J, Zhang H, Wang Y, Sun S (2022) Fault-tolerant control for stochastic switched IT2 fuzzy uncertain time-delayed nonlinear systems. IEEE Trans Cybernet 52(2):1335–1346
Article Google Scholar
Liu P, Sun J, Zhang H, Xu S, Liu Y (2023) Combination therapy-based adaptive control for organism using medicine dosage regulation mechanism. IEEE Trans Cybernet. https://doi.org/10.1109/TCYB.2022.3196003
Liu D, Baldi S, Yu W, Chen G (2022) On distributed implementation of switch-based adaptive dynamic programming. IEEE Trans Cybernet 52(7):7218–7224
Article Google Scholar
Labao AB, Martija MAM, Naval PC (2021) A3C-GS: adaptive moment gradient sharing with locks for asynchronous actor-critic agents. IEEE Trans Neural Netw Learn Syst 32(3):1162–1176
Article Google Scholar
Moghadam R, Jagannathan S (2023) Optimal adaptive control of uncertain nonlinear continuous-time systems with input and state delays. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2021.3112566
Al-Dabooni S, Wunsch DC (2020) An improved n-step value gradient learning adaptive dynamic programming algorithm for online learning. IEEE Trans Neural Netw Learn Syst 31(4):1155–1169
Article MathSciNet Google Scholar
Wang D, Liu D, Mu C, Zhang Y (2018) Neural network learning and robust stabilization of nonlinear systems with dynamic uncertainties. IEEE Trans Neural Netw Learn Syst 29(4):1342–1351
Article Google Scholar
Wang D, He H, Liu D (2017) Adaptive critic nonlinear robust control: a survey. IEEE Trans Cybernet 47(10):3429–3451
Article Google Scholar
Britton NF (2003) Essential mathematical biology. Springer, Berlin
Book MATH Google Scholar
Yazdani D, Cheng R, He C, Branke J (2022) Adaptive control of subpopulations in evolutionary dynamic optimization. IEEE Trans Cybernet 52(7):6476–6489
Article Google Scholar
Mu C, Peng J, Sun C (2023) Hierarchical multiagent formation control scheme via actor-critic learning. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2022.3153028
Wang D, Hu L, Zhao M, Qiao J (2023) Adaptive critic for event-triggered unknown nonlinear optimal tracking design with wastewater treatment applications. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2021.3135405
Werbos PJ (1992) Approximate dynamic programming for real-time control and neural modeling. Handbook of intelligent control. Van Nostrand, New York, NY, USA
Google Scholar
Prokhorov DV, Santiago RA, Wunsch DC II (1995) Adaptive critic designs: a case study for neurocontrol. Neural Netw 8(9):1367–1372
Article Google Scholar
de Pillis LG, Gu W, Radunskaya AE (2006) Mixed immunotherapy and chemotherapy of tumors: modeling, applications and biological interpretations. J Theor Biol 238(4):841–862
Article MathSciNet MATH Google Scholar
de Pillis LG, Radunskaya AE (2003) Immune response to tumor invasion. Comput Fluid Solid Mech 2:1661–1668
Google Scholar
Kirschner D, Panetta JC (1998) Modeling immunotherapy of the tumor-immune interaction. J Math Biol 37(3):235–252
Article MATH Google Scholar
Kuznetsov V, Makalkin I, Taylor M, Perelson A (1994) Nonlinear dynamics of immunogenic tumors: parameter estimation and global bifurcation analysis. Bull Math Biol 56(2):295–321
Article MATH Google Scholar
Huang AYC, Golumbek P, Ahmadzadeh M, Jaffee E, Pardoll D, Levitsky H (1994) Role of bone marrow-derived cells in presenting MHC class I-restricted tumor antigens. Science 264(5161):961–965
Article Google Scholar
Gilbertson SM, Shah PD, Rowley DA (1986) NK cells suppress the generation of Lyt-2+ cytolytic T cells by suppressing or eliminating dendritic cells. J Immunol 136(10):3567–3571
Article Google Scholar
Gett A, Sallusto F, Lanzavecchia A, Geginat J (2003) T cell fitness determined by signal strength. Nat Immunol 4(4):355–360
Article Google Scholar
Rosenberg S, Lotze M (1986) Cancer immunotherapy using interleukin-2 and interleukin-2-activated lymphoytes. Ann Rev Immunol 4:681–709
Article Google Scholar
Gardner SN (2000) A mechanistic, predictive model of dose-response curves for cell cycle phase-specific and nonspecific drugs. Can Res 60(5):1417–1425
Google Scholar
Dierks T, Thumati BT, Jagannathan S (2009) Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence. Neural Netw 22(5–6):851–860
Article MATH Google Scholar
Dudley ME, Wunderlich JR... Rosenberg SA (2002) Cancer regression and autoimmunity in patients after clonal repopulation with antitumor lymphocytes. Science 298(5594):850–854
Google Scholar
Kuznetsov V, Makalkin I (1992) Bifurcation-analysis of mathematical-model of interactions between cytotoxic lymphocytes and tumor-cells-effect of immunological amplification of tumor-growth and its connection with other phenomena of oncoimmunology. Biofizika 37(6):1063–1070
Google Scholar
Yates A, Callard R (2002) Cell death and the maintenance of immunological memory. Discret Contin Dyn Syst Ser B 1(1):43–59
MathSciNet MATH Google Scholar
Lanzavecchia A, Sallusto F (2000) Dynamics of T-lymphocyte responses: intermediates, effectors, & memory cells. Science 290(5489):92–97
Article Google Scholar

Download references

Author information

Authors and Affiliations

The State Key Laboratory of Synthetical Automation for Process Industries and the College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning, China
Jiayue Sun & Huaguang Zhang
The Department of Thoracic Surgery, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning, China
Shun Xu & Yang Liu

Authors

Jiayue Sun
View author publications
You can also search for this author in PubMed Google Scholar
Shun Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Huaguang Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiayue Sun .

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sun, J., Xu, S., Liu, Y., Zhang, H. (2024). Evolutionary Dynamics Optimal Research-Oriented Tumor Immunity Architecture. In: Adaptive Dynamic Programming. Springer, Singapore. https://doi.org/10.1007/978-981-99-5929-7_4

Download citation

DOI: https://doi.org/10.1007/978-981-99-5929-7_4
Published: 13 September 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-5928-0
Online ISBN: 978-981-99-5929-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Evolutionary Dynamics Optimal Research-Oriented Tumor Immunity Architecture

Abstract

4.1 Introduction

4.2 Pre-knowledge

4.3 Modeling of Mixed Immunotherapy and Chemotherapy for Tumor Cell

4.3.1 The Natural Growth of Cells

4.3.2 Intercellular Conditioning

4.3.3 Drug Intervention

4.3.4 Mixed Growth Model of Cell Population

4.4 Iterative-Dual Heuristic Dynamic Programming Algorithm for Mixed Treatment

4.4.1 Working Mechanism of ADP Algorithm

4.4.2 Structure of Constrained Iterative Dual-Heuristic Dynamic Programming Algorithm

4.4.3 Proof of Convergence on I-DHP Algorithm

Lemma 4.1

Proof

Lemma 4.2

Proof

Theorem 4.1

Proof

Theorem 4.2

Proof

Theorem 4.3

Proof

4.5 Multi-factor Mixed Optimization Experiment Treatment of Tumor Cells

4.5.1 Discrete Affine Model of Tumor Cell Growth

4.5.2 Construction of Affine Model

4.5.3 Optimization of Mixed Treatment Regimen

4.6 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation