6.1 Introduction

The death toll is soaring caused by neoplastic diseases, and the issues on nonlinear dynamics and control of tumour growth have motivated a widespread concern as [1]. Essential nutrients in humans are the resources for which the normal cells and tumor cells compete. Tumour cells will keep proliferating, robbing the limited energy supply of the body, and eventually disintegrating the somatic function to death. Somatic cells constantly divide, and new cells differentiate which end with apoptosis. In this manner, the relative balance can be maintained in human bodies. Nevertheless, when the process of differentiation for normal cells is out of control, the cells may well evolve into tumor cells. It is the nature for the tumor cells of which the tendency is to eat the bodyā€™s nutrients crazily.

The population of tumour cells progressively increases for the following three characteristics. Firstly, the most obvious characteristic is the insensitivity to anti-growth signals. There exists strict control mechanism for normal cells, but for tumor cells, this mechanism is no longer valid. During the continuous process of division, tumor cells can escape from the monitoring of the anti-growth signals, which leads to the crazy growth of tumour cells. Secondly, tumour cells have the ability to promote the growth of blood vessels which are essential for providing nutrients, and it is the reason why the blood vessel density is associated with the malignant degree of the tumor tissue. Finally, tumor cells are also duplicitous, evolving camouflage abilities during their constant battle with immune cells to mislead the immune system into regarding them as normal cells, which results in the tumor immune escape. Thus, to suppress the growth of tumour cells, obstructing the generative mechanisms which relies on the necessary nutrients was an effective approach as literatures [2, 3].

Distinguishing from the mixed tumor treatment approach of immunotherapy and chemotherapy as [4], this chapter explores a more effective adaptive control strategy for organism using medicine dosage regulation mechanism. An additional population of cells which called endothelial cells enjoy the substances induced by malignant tumour cells, and they could transfer oxygen and nutrients to the primary focus causing proliferating of blood cells, which will increase carrying capacity of tumour cells known as tumour angiogenesis in [5]. As indicated in literature [6], anti-angiogenic agent could particularly decrease the growing rate of tumours, reaching saturation to some extent without killing the endothelial cells completely. When the chemotherapy agent was used in combination with anti-angiogenic agent to reduce the population of tumor cells , the latter could increase the effect of the former as described in [7]. Nevertheless, as the key element of promoting the growth of the vasculature, the endothelial cells could not be completely destroyed. Otherwise, it may not exist that the specified number of vasculature for constructing access of chemotherapy agent. On the basis of the pharmaceutical science concerning the chemotherapy agent and anti-angiogenic agent, the adaptive control strategy for organism will provide a guidance for clinical practice under the medicine dosage regulation mechanism, especially in the treatment process of Lung cancer. Furthermore, what counts is that since the anti-tumor drugs often kill both tumor cells and normal cells, itā€™s of significance to utilize less drugs to achieve the therapeutic goal during the treatment process.

ADP is derived from dynamic programming and reinforcement learning, is a powerful tool to tackle optimization issues [8,9,10]. In general, the successful implementation of ADP-based methods depends on the cooperative work of actor and critic networks [11]. Under this framework, the actor is responsible for performing the control strategy with current data [12]. The goal of critic is to provide actor with the feedback information derived from the evaluation of the cost under the strategy. The distinct merit of this type of algorithm lies in that the optimal control strategy could be approximately acquired in the manner of iteration computation, and the ā€œcurse of dimensionalityā€ could be obviated with effect. Different ADP-based methods have been researched by scholars to tackle multifarious optimal control problems with the aid of the artificial neural networks of which the performance is outstanding [13, 14], such as the robust control [15, 16], optimal consensus control [17,18,19] and the optimal tracking issues [20, 21]. Furthermore, for the system with multiple controllers, the optimization issues can be formulated by game theory. As a vital branch of game theory, NZSGs is derived from [22] with the goal of attaining the optimal strategy pair that can minimize the personal performance index for each player when stabilizing the controlled system [23,24,25]. Due to the excellent ability to approximate optimization, the ADP methods have been proposed to solve NZSGs. In [26], the adaptive method of critic-only structure was developed to solve two-player NZSGs without any initial stabilizing control. The experience replay technique was integrated into the ADP algorithm in [27] to concurrently utilize the historical data together with the real-time data to approximate the value function such that the persistence of excitation condition was not indispensable. In [28], the data-based integral reinforcement learning algorithm was proposed to solve NZSGs. More specially, it was a novel iterative learning algorithm based on both off-line and online manner which could extend the applicability of the data-based control scheme. Furthermore, in [29], the discrete-time N-player NZSGs was tackled via the off-policy reinforcement learning method which was independent of system dynamics.

Although the relevant academic achievements have been presented in theories and applications as [30,31,32,33,34,35,36,37,38], there is seldom any literature on this filed according to literature survey of the authors. The contributions can be shown as follows. First, the near-optimal therapy for the treatment of tumor is firstly acquired via the ADP approach which is an efficient adaptive intelligent learning algorithm. Second, the interactive system with discounted value function is constructed based on the mathematical model simulating the interaction relationships among cells and drugs. Besides, two kinds of chemotherapy drugs and a kind of anti-angiogenic agent participate in the therapy such that the combination therapeutic strategy can be derived under the architecture of NZSGs. Third, the idea of cybernetics is extended to the frontier fields of medicine, more precisely, the therapy of tumor. Under the MDRM, the derived therapeutic strategy can achieve the therapeutic goal with the lowest doses of drugs, and the practical indications for medicine is considered for the first time.

Notations: \(\mathbb {N}^{+}\) denotes the set containing all positive integers. \(\parallel \cdot \parallel \), \(diag\{\cdot \}\) and \(\bigtriangledown (\cdot )\triangleq \partial (\cdot )/\partial x\) respectively represent the Euclidean norm of a vector/matrix, the operation of constructing diagonal matrix and the gradient operator. \(\lambda _m(\cdot )\) and \(\lambda _M(\cdot )\) separately denote the minimum eigenvalue and maximum eigenvalue of a matrix. \(I_{n\times n}\) is the unit matrix whose dimension is n.

6.2 Preliminaries

6.2.1 Establishment ofĀ Mathematical Model

In this section, the growth mathematical model is established which considers the interaction relationships among the normal cells, tumor cells and endothelial cells. Moreover, the effects of control inputs, i.e., the chemotherapy and anti-angiogenic drugs, on these cells are embodied in the model. Thus, in the model formed from ordinary differential equations as follows, \(P_{NC}(t)\), \(P_{TC}(t)\) and \(P_{EC}(t)\) respectively represent the populations of normal cells, tumor cells and endothelial cells, \(P_{CD\jmath }(t) (\jmath =1,2)\) and \(P_{AD}(t)\) denote the concentrations of chemotherapy and anti-angiogenic drugs.

The population of normal cells, which is influenced by tumor cells, endothelial cells and the concentrations of chemotherapy and anti-angiogenic drugs, is modeled by

$$\begin{aligned} \dot{P}_{NC}(t)=&\,\alpha _1 P_{NC}(t)\Big (1-\frac{P_{NC}(t)}{C_1}\Big )-A_1 P_{NC}(t)P_{TC}(t) \nonumber \\&-\varXi _1\big (P_{EC}(t),P_{AD}(t)\big )\frac{P_{NC}(t)P_{CD1}(t)}{B_1+P_{NC}(t)}\nonumber \\&-\varXi _2\big (P_{EC}(t),P_{AD}(t)\big )\frac{P_{NC}(t)P_{CD2}(t)}{B_1+P_{NC}(t)}, \end{aligned}$$
(6.1)

where \(\varXi _\imath \big (P_{EC}(t),P_{AD}(t)\big )=\varXi _{\imath 1}P_{EC}(t)+\varXi _{\imath 2}P_{AD}(t)+\varXi _{\imath 0}, \imath =1,2\). The parameters \(\alpha _1\), \(B_1\), \(C_1\) denote the proliferation rate, Holling type 2 constant and carrying capacity for normal cells, respectively. \(A_1\) is the contention parameter between normal cells and tumor cells.

As the tumor cells contend with normal cells for necessary nutrients, the population of tumor cells is affected by that of normal cells. Besides, there exist mutual effects among tumor cells, endothelial cells and the drugs. Thus the corresponding model can be written as

$$\begin{aligned} \dot{P}_{TC}(t)=&\,\alpha _2 P_{TC}(t)-\frac{\alpha _2 P_{TC}(t)P_{TC}(t)}{C_2+\Phi P_{EC}(t)}-\varPi _1\big (P_{EC}(t),P_{AD}(t)\big )\frac{P_{TC}(t)P_{CD1}(t)}{B_2+P_{TC}(t)} \nonumber \\&-\varPi _2\big (P_{EC}(t)\!,\!P_{AD}(t)\big )\frac{P_{TC}(t)P_{CD2}(t)}{B_2+P_{TC}(t)} -A_2 P_{NC}(t)P_{TC}(t), \end{aligned}$$
(6.2)

where \(\varPi _\jmath \big (P_{EC}(t),P_{AD}(t)\big )=\varPi _{\jmath 1}P_{EC}(t)+\varPi _{\jmath 2}P_{AD}(t)+\varPi _{\jmath 0}\), \(\jmath =1,2\). The parameters \(\alpha _2\), \(B_2\), \(C_2\) are multiplication rate, Holling type 2 constant and carrying capacity for tumor cells. \(A_2\) is contention parameter between normal cells and tumor cells.

The population of endothelial cells is associated with tumor cells and anti-angiogenic drugs. The relations can be given as

$$\begin{aligned} \dot{P}_{EC}(t)=s_1\!+\!K P_{TC}(t)\!+\!\alpha _3 P_{EC}(t)\Big (1\!-\!\frac{P_{EC}(t)}{C_3}\Big ) \!-\!\frac{\varXi _3P_{EC}(t)P_{AD}(t)}{B_3\!+\!P_{EC}(t)} \end{aligned}$$
(6.3)

where K is multiplication rate caused by tumor cells and \(s_1\) the inflow rate. Similarly, the parameters \(\alpha _3\), \(B_3\), \(C_3\) are multiplication rate, Holling type 2 constant and carrying capacity for endothelial cells. \(\varXi _3\) is the killing rate for endothelial cells.

The concentrations of the drugs decrease during the treatment phases, owing to the washout process. Hence we can model the evolution process of the concentrations of chemotherapy and anti-angiogenic drugs by

$$\begin{aligned} \dot{P}_{CD1}(t)=Dr_{c1}-\Big (\beta _{c1}\!+\!m_1\frac{P_{NC}(t)}{B_1+P_{NC}(t)}\!+\!m_2\frac{P_{TC}(t)}{B_2\!+\!P_{TC}(t)}\Big )P_{CD1}(t) \end{aligned}$$
(6.4)
$$\begin{aligned} \dot{P}_{CD2}(t)=Dr_{c2}\!-\!\Big (\beta _{c2}\!+\!m_3\frac{P_{NC}(t)}{B_1\!+\!P_{NC}(t)}\!+\!m_4\frac{P_{TC}(t)}{B_2+P_{TC}(t)}\Big )P_{CD2}(t) \end{aligned}$$
(6.5)

and

$$\begin{aligned} \dot{P}_{AD}(t)=Dr_a-\Big (\beta _a+\frac{m_5P_{EC}(t)}{B_3+P_{EC}(t)}\Big )P_{AD}(t), \end{aligned}$$
(6.6)

where \(Dr_{c1}\), \(Dr_{c2}\) and \(Dr_a\) are the control inputs. \(\beta _{c1}\), \(\beta _{c2}\) and \(\beta _a\) denote the washout rates for the drugs. \(m_1\), \(m_2\), \(m_3\), \(m_4\) and \(m_5\) are the rates at which the drugs integrate into the cells. Based on the operations similar to that in [39], we obtain the simplified version of the model as

$$\begin{aligned} \left\{ \begin{aligned} \dot{p}_{NC}(t)=&\,\alpha _1 p_{NC}(t)(1-p_{NC}(t))-a_1 p_{NC}(t)p_{TC}(t) \\&-\xi _1\frac{p_{NC}(t)p_{CD1}(t)}{b_1+p_{NC}(t)}-\xi _2\frac{p_{NC}(t)p_{CD2}(t)}{b_1+p_{NC}(t)}, \\ \dot{p}_{TC}(t)=&\,\alpha _2 p_{TC}(t)\Big (1-\frac{p_{TC}(t)}{1+\phi p_{EC}(t)}\Big )-a_2 p_{NC}(t)p_{TC}(t) \\&-\pi _1\frac{p_{TC}(t)p_{CD1}(t)}{b_2+p_{TC}(t)}-\pi _2\frac{p_{TC}(t)p_{CD2}(t)}{b_2+p_{TC}(t)} \\ \dot{p}_{EC}(t)=&\,s_1+k p_{TC}(t)+\alpha _3 p_{EC}(t)(1-p_{EC}(t)) -\xi _3\frac{p_{EC}(t)p_{AD}(t)}{b_3+p_{EC}(t)}, \\ \dot{p}_{CD1}(t)=&\,u_{c1}-\Big (\beta _{c1}+m_1\frac{p_{NC}(t)}{b_1+p_{NC}(t)} +m_2\frac{p_{TC}(t)}{b_2+p_{TC}(t)}\Big )p_{CD1}(t), \\ \dot{p}_{CD2}(t)=&\,u_{c2}-\Big (\beta _{c2}+m_3\frac{p_{NC}(t)}{b_1+p_{NC}(t)} +m_4\frac{p_{TC}(t)}{B_2+p_{TC}(t)}\Big )p_{CD2}(t), \\ \dot{p}_{AD}(t)=&\,u_a-\Big (\beta _a+\frac{m_5 p_{EC}(t)}{b_3+p_{EC}(t)}\Big )p_{AD}(t), \end{aligned} \right. \end{aligned}$$
(6.7)

where \(\xi _\imath \big (p_{EC}(t),p_{AD}(t)\big )=\) \(\xi _{\imath 1}p_{EC}(t)+\xi _{\imath 2}p_{AD}(t)+\xi _{\imath 0}\) and \(\pi _\jmath \big (p_{EC}(t),p_{AD}(t)\big )\) \(=\pi _{\jmath 1}p_{EC}(t)+\pi _{\jmath 2}p_{AD}(t)+\pi _{\jmath 0}\) with \(\imath , \jmath =1,2\). The states \(p_{NC}(t)\), \(p_{TC}(t)\), \(p_{EC}(t)\), \(p_{CD1}(t)\), \(p_{CD2}\) and \(p_{AD}\) are nonnegative.

Remark 6.1

The differential equation (6.7) is the simplified model describing the interaction relationships among cells and drug. Observing the model, one can discover that there exists competition between normal cells and tumor cells. The tumor cells require more nutrients such that they facilitate the proliferation of endothelial cells, which could provide the indispensable nutrients to promote the growth of tumor. The tumor cells can be effectively damaged by the chemotherapy drugs which have side-effects on normal cells to some extent, and the anti-angiogenic drug contributes to the proliferation inhibition of the endothelial cells.

6.2.2 Nonzero-Sum Games Formulation

Consider the interaction model (6.7) rewritten as

$$\begin{aligned} \dot{x}&=f(x)+\left[ \begin{array}{cc} 0_{3\times 1}&{}0_{3\times 1} \\ 0.06&{}0 \\ 0&{}0 \\ 0&{}0.12 \end{array} \right] \mathcal {u}_1+\left[ \begin{array}{cc} 0_{3\times 1}&{}0_{3\times 1} \\ 0&{}0 \\ 0.1&{}0 \\ 0&{}0 \end{array} \right] \mathcal {u}_2 \\ \nonumber&=f(x)+\mathcal {g}_1\mathcal {u}_1+\mathcal {g}_2\mathcal {u}_2, \end{aligned}$$
(6.8)

where \(\mathcal {u}_1=[u_{c1},u_a]^{T}\), \(\mathcal {u}_2=[u_{c2},0]^{T}\) and f(x) is constructed by the right-hand side parts of (6.7) excluding the terms \(u_{c1}\), \(u_{c2}\) and \(u_a\).

Define the value function for player \(\imath (\imath =1,2)\) as

$$\begin{aligned} {V}_\imath (x(t))=\int _t^{\infty }e^{-\varrho _\imath (\varsigma -t)}\delta _\imath (x,\mathcal {u}_1,\mathcal {u}_2)d\varsigma , \end{aligned}$$
(6.9)

where the utility function \(\delta _\imath (x,\mathcal {u}_1,\mathcal {u}_2)=x^{T}\varUpsilon _\imath x+\mathcal {u}_1^{T}\mathcal {R}_{\imath 1}\mathcal {u}_1+\mathcal {u}_2^{T}\mathcal {R}_{\imath 2}\mathcal {u}_2\). The matrixes \(\mathcal {R}_{\imath \jmath }(\imath ,\jmath =1,2)\) and \(\varUpsilon _\imath \) are positive definite, and \(\varrho _\imath >0\) is the discount factor. According to the value function (6.9), we can define the corresponding Hamiltonian function as

$$\begin{aligned} H_\imath (x,\mathcal {u}_1,\mathcal {u}_2)=(\nabla V_\imath )^{T}(f+\mathcal {g}_1\mathcal {u}_1+\mathcal {g}_2\mathcal {u}_2) \nonumber \\ +\delta _\imath (x,\mathcal {u}_1,\mathcal {u}_2)-\varrho _\imath V_\imath , \imath =1,2. \end{aligned}$$
(6.10)

The optimal value function is defined as

$$\begin{aligned} V_\imath ^{*}=\min _{\mathcal {u}_\imath }\int _t^{\infty }e^{-\varrho _\imath (\varsigma -t)}(x^{T}\varUpsilon _\imath x+\sum _{\jmath =1}^{{N}=2}\mathcal {u}_\jmath ^{T}\mathcal {R}_{\imath \jmath }\mathcal {u}_\jmath )d\varsigma . \end{aligned}$$
(6.11)

The target of NZSGs is to attain the admissible strategy pair \(\{\mathcal {u}_1^*,\mathcal {u}_2^*\}\) with the definition given in [23, 40]. According to the stationarity condition, the optimal strategy for player \(\imath \) could be obtained by

$$\begin{aligned} \mathcal {u}_\imath ^{*}=-\frac{1}{2}\mathcal {R}_{\imath \imath }^{-1}\mathcal {g}_\imath ^{T}\nabla V_\imath ^{*}. \end{aligned}$$
(6.12)

Thus the HJEs can be obtained as

$$\begin{aligned} H_\imath (x,\mathcal {u}_1^{*},\mathcal {u}_2^{*},\nabla V_\imath ^{*})=(\nabla V_\imath ^{*})^{T}(f+\mathcal {g}_1\mathcal {u}_1^{*}+\mathcal {g}_2\mathcal {u}_2^{*}) \nonumber \\ +x^{T}\varUpsilon _\imath x+\mathcal {u}_1^{*}\mathcal {R}_{\imath 1}\mathcal {u}_1^{*}+\mathcal {u}_2^{*}\mathcal {R}_{\imath 2}\mathcal {u}_2^{*}-\varrho V_\imath ^{*}=0. \end{aligned}$$
(6.13)

Remark 6.2

Itā€™s noteworthy that there exists no zero equilibrium for system (6.8), which may well result in the divergence of \(V_\imath (x(t))\). To resolve this issue, the discounted factor \(\varrho _\imath \) is introduced to form decay term such that \(V_\imath (x(t))\) can be convergent.

In general, solving NZSGs is synonymous with solving the equations (6.13). Nevertheless, for nonlinear system, itā€™s very intractable to tackle the coupled equations. To resolve this difficulty, an ADP method utilizing dosage regulation mechanism is proposed in the following sections.

6.3 MDRM-Based Adaptive Critic Learning Method forĀ NZSGs

Firstly, we introduce the indications for medicine to judge when the medicine dosage should be regulated. Then under the MDRM, the ADP method of single-critic architecture is proposed to approximately seek the optimal strategy for the NZSGs of model (6.7).

6.3.1 MDRM-Based Optimal Strategy Derivation

For the sake of realizing conditioned therapy strategy, MDRM is required to handle the clinical data such that the strategy can be changed timely and necessarily. The time sequence \(\{\hbar _\ell \}\) is constructed for recording the regulating instants and \(\ell \) denotes the \(\ell \)th regulating instant. Then the state could be denoted as

$$\begin{aligned} \breve{x}_\ell (t)=x(\hbar _\ell ), t\in [\hbar _\ell ,\hbar _{\ell +1}). \end{aligned}$$
(6.14)

For evaluating the difference between real-time data and latest recorded data, itā€™s necessary to define an error function that \(z_\ell =\breve{x}_\ell -x(t),t\in [\hbar _\ell ,\hbar _{\ell +1})\). The operation of MDRM depends on the regulating condition which compares the error \(z_\ell \) with the threshold associated with real-time data. The strategy is adjusted only when \(z_\ell \) is larger than the threshold. That is, \(\breve{\mathcal {u}}_\imath =\mathcal {u}_\imath (\breve{x}_\ell ),\imath =1,2\), and \(\ell \in \mathbb {N}^{+}\). Thus the MDRM-based strategy could be got as

$$\begin{aligned} \breve{\mathcal {u}}_\imath ^{*}=-\frac{1}{2}\mathcal {R}_{\imath \imath }^{-1}\mathcal {g}_\imath ^{T}(\breve{x}_\ell )\nabla \breve{V}_\imath ^{*}, \imath =1,2, \end{aligned}$$
(6.15)

where \(\nabla \breve{V}_\imath ^{*}=\partial V_\imath ^{*}/\partial x\) when \(t=\hbar _\ell \). The version that based on the adjustment mechanism of HJEs is derived as

$$\begin{aligned}&H_\imath (x,\breve{\mathcal {u}}_1^{*},\breve{\mathcal {u}}_2^{*},V_\imath ^{*}) =\frac{1}{4}\sum _{\jmath =1}^{N=2}(\nabla \breve{V}_\jmath ^{*})^{T}\mathcal {g}_\jmath (\breve{x}_\ell )\mathcal {R}_{\jmath \jmath }^{-1}\mathcal {R}_{\imath \jmath }\mathcal {R}_{\jmath \jmath }^{-1}\mathcal {g}_\jmath ^{T}(\breve{x}_\ell )\nabla \breve{V}_\jmath ^{*}\nonumber \\&+(\nabla V_\imath ^{*})^{T}\left( f-\frac{1}{2}\sum _{\jmath =1}^{N=2}\mathcal {g}_\jmath \mathcal {R}_{\jmath \jmath }^{-1}\mathcal {g}_\jmath ^{T}(\breve{x}_\ell )\nabla \breve{V}_\jmath ^{*}\right) +x^{T}\varUpsilon _\imath x-\varrho _\imath V_\imath ^{*}. \end{aligned}$$
(6.16)

Differing from HJEs (6.13), due to the existence of the error \(z_\ell \), (6.16) does not equal to zero. Before proceed with the discussion, the following assumption is required [41].

Assumption 6.1

The optimal strategy \(\mathcal {u}_\imath ^{*}\) is locally Lipschitz. That is, for \(\imath =1,2\), there exists a constant \(\theta _\imath >0\) such that \(\Vert \mathcal {u}_\imath ^{*}-\breve{\mathcal {u}}_\imath ^{*}\Vert ^2\le \theta _\imath \Vert x-\breve{x}_\ell \Vert ^2\).

Theorem 6.3

Consider the system (6.8), and suppose that Assumption 6.1 holds and \(V_\imath ^{*}\) is the solution of (6.13). Then \(\breve{u}_\imath ^{*}\) formulated as (6.15) can stabilize system (6.8) when the following medicine indication is applied

$$\begin{aligned} \Vert z_\ell \Vert ^2\le \frac{(1-2\zeta )\lambda _m(\varUpsilon )}{\theta \lambda _M(Y)}\Vert x\Vert ^2, \end{aligned}$$
(6.17)

where \(\zeta \in (0,1/2)\) is adjustable parameter. The terms \(\theta \), \(\varUpsilon \) and Y are given in (6.21) and (6.22).

Proof

Selecting the Lyapunov function \(L_{ya}=V_1^{*}+V_2^{*}\), we can obtain the corresponding derivative as

$$\begin{aligned} \dot{L}_{ya}=\sum _{\imath =1}^{N=2}(\nabla V_\imath ^{*})^{T}(f+\mathcal {g}_1\breve{\mathcal {u}}_1^{*} +\mathcal {g}_2\breve{\mathcal {u}}_2^{*}). \end{aligned}$$
(6.18)

According to (6.13), we have

$$\begin{aligned} (\nabla V_\imath ^{*})^{T}f=&-(\nabla V_\imath ^{*})^{T}(\mathcal {g}_1\mathcal {u}_1^{*}+\mathcal {g}_2\mathcal {u}_2^{*})-x^{T}\varUpsilon _\imath x \nonumber \\&-\mathcal {u}_1^{*T}\mathcal {R}_{\imath 1}\mathcal {u}_1^{*}-\mathcal {u}_2^{*T}\mathcal {R}_{\imath 2}\mathcal {u}_2^{*}+\varrho _\imath V_\imath ^{*}, \end{aligned}$$
(6.19)

and

$$\begin{aligned} (\nabla V_\imath ^{*})^{T}\sum _{\jmath =1}^{N=2}\mathcal {g}_\jmath (\mathcal {u}_\jmath ^{*}-\breve{\mathcal {u}}_\jmath ^{*}) =-2\mathcal {u}_\imath ^{*T}\mathcal {R}_{\imath \imath }\mathcal {g}_\imath ^{-1}\sum _{\jmath =1}^{N=2}\mathcal {g}_\jmath (\mathcal {u}_\jmath ^{*}-\breve{\mathcal {u}}_\jmath ^{*}). \end{aligned}$$
(6.20)

Let \(\mathcal {u}^{*}=[\mathcal {u}_1^{*T},\mathcal {u}_2^{*T}]^{T}\) and \(\breve{\mathcal {u}}^{*}=[(\breve{\mathcal {u}}_1^{*}-\mathcal {u}_1^{*})^{T},(\breve{\mathcal {u}}_2^{*}-\mathcal {u}_2^{*})^{T}]^{T}\). Then we can derive that

$$\begin{aligned} \dot{L}_{ya}=&-x^{T}\varUpsilon _1 x-x^{T}\varUpsilon _2 x-\sum _{\imath =1}^{N=2}\sum _{\jmath =1}^{N=2}\mathcal {u}_\jmath ^{*T}\mathcal {R}_{\imath \jmath }\mathcal {u}_\jmath ^{*} \nonumber \\&+2\sum _{\imath =1}^{N=2}\mathcal {u}_\imath ^{*T}\mathcal {R}_{\imath \imath }\mathcal {g}_\imath ^{-1}\sum _{\jmath =1}^{N=2}\mathcal {g}_\jmath (\mathcal {u}_\jmath ^{*}-\breve{\mathcal {u}}_\jmath ^{*})+\varrho _1 V_1^{*}+\varrho _2 V_2^{*} \nonumber \\ =&-x^{T}\varUpsilon x-\mathcal {u}^{*T}\mathcal {R}\mathcal {u}^{*}-2\mathcal {u}^{*T}Z\breve{\mathcal {u}}^{*} \nonumber \\&+\varrho _1 V_1^{*}+\varrho _2 V_2^{*}, \end{aligned}$$
(6.21)

where \(\varUpsilon =\varUpsilon _1+\varUpsilon _2\), \(\mathcal {R}=diag\{\mathcal {R}_{11}+\mathcal {R}_{21},\mathcal {R}_{12}+\mathcal {R}_{22}\}\), and \(Z=[Z_1,Z_2]\) with \(Z_\imath =[\mathcal {R}_{11}\mathcal {g}_1^{-1}\mathcal {g}_\imath ,\mathcal {R}_{22}\mathcal {g}_2^{-1}\mathcal {g}_\imath ]^{T},\imath =1,2\). Applying Youngā€™s inequality, we have

$$\begin{aligned} \dot{L}_{ya}\le&-x^{T}\varUpsilon x-\mathcal {u}^{*T}\mathcal {R}\mathcal {u}^{*}+\mathcal {u}^{*T}\mathcal {R}\mathcal {u}^{*} \nonumber \\&+\breve{\mathcal {u}}^{*T}Z^{T}\mathcal {R}^{-1}Z\breve{\mathcal {u}}^{*}+\varrho _V \nonumber \\ =&-x^{T}\varUpsilon x+\breve{\mathcal {u}}^{*T}Y\breve{\mathcal {u}}^{*}+\varrho _V, \end{aligned}$$
(6.22)

where \(Y=Z^{T}\mathcal {R}^{-1}Z\). Itā€™s noted that \(\mathcal {u}_\imath ^{*}\) is the admissible strategy, we can derive that \(V_\imath ^{*}\) is bounded. Hence \(\varrho _V\) is the bound of the term \(\varrho _1 V_1^{*}+\varrho _2 V_2^{*}\). According to the definitions of \(\varUpsilon \) and Y, we have that \(\lambda _m(\varUpsilon )>0\) and \(\lambda _M(Y)>0\). Furthermore, we can obtain

$$\begin{aligned} \dot{L}_{ya}\le&-2\zeta \lambda _m(\varUpsilon )\Vert x\Vert ^2-(1-2\zeta )\lambda _m(\varUpsilon )\Vert x\Vert ^2 \nonumber \\&+\lambda _M(Y)\theta \Vert z_\ell \Vert ^2+\varrho _V, \end{aligned}$$
(6.23)

where \(\theta =\theta _1+\theta _2\). When the indication (6.17) is satisfied, we derive that \(\dot{L}_{ya}\le -2\zeta \lambda _m(\varUpsilon )\Vert x\Vert ^2+\varrho _V\). Then we can find that \(\dot{L}_{ya}<0\) holds when \(\Vert x\Vert >\sqrt{\frac{\varrho _V}{2\zeta \lambda _m(\varUpsilon )}}\). In light of Lyapunov theorem, the strategy (6.15) can stabilize system (6.8). This completes the proof.\(\blacksquare \)

6.3.2 Implementation ofĀ Adaptive Critic Learning Method

In this section, the approximate optimal strategy under MDRM is derived by ADP method of single-critic architecture. In light of the universal approximation properties of neural networks (NNs), \(V_\imath ^{*}\) can be obtained by

$$\begin{aligned} V_\imath ^{*}=\omega _\imath ^{*T}\nu _\imath (x)+\sigma _\imath , \imath =1,2, \end{aligned}$$
(6.24)

where \(\omega _\imath ^{*}\) is the ideal weight vector, \(\nu _\imath \) the activation function and \(\sigma _\imath \) the approximate error. To acquire the approximate version of the unknown vector \(\omega _\imath ^{*}\), the critic NN is constructed by

$$\begin{aligned} \hat{V}_\imath =\hat{\omega }_\imath ^{T}\nu _\imath (x),\imath =1,2, \end{aligned}$$
(6.25)

with \(\hat{\omega }\) being the approximate vector. With the aid of critic NN, we can present the optimal strategy as

$$\begin{aligned} \mathcal {u}_\imath ^{*}=-\frac{1}{2}\mathcal {R}_{\imath \imath }^{-1}\mathcal {g}_\imath ^{T}\big ((\nabla \nu _\imath )^{T}\omega _\imath ^{*}+\nabla \sigma _\imath \big ),\imath =1,2. \end{aligned}$$
(6.26)

Accordingly, we can obtain the optimal and approximate optimal strategies under MDRM as

$$\begin{aligned} \breve{\mathcal {u}}_\imath ^{*}=-\frac{1}{2}\mathcal {R}_{\imath \imath }^{-1}\mathcal {g}_\imath ^{T}(\breve{x}_\ell )\big ((\nabla \nu _\imath (\breve{x}_\ell ))^{T}\omega _\imath ^{*}+\nabla \sigma _\imath (\breve{x}_\ell )\big ), \end{aligned}$$
(6.27)

and

$$\begin{aligned} \breve{\mathcal {u}}_\imath =-\frac{1}{2}\mathcal {R}_{\imath \imath }^{-1}\mathcal {g}_\imath ^{T}(\breve{x}_\ell )(\nabla \nu _\imath (\breve{x}_\ell ))^{T}\hat{\omega }_\imath . \end{aligned}$$
(6.28)

Then the approximate Hamiltonian can be presented as

$$\begin{aligned} H_\imath (x,\breve{\mathcal {u}}_1,\breve{\mathcal {u}}_2,\hat{\omega }_\imath ) =\hat{\omega }_\imath ^{T}\psi _\imath +\delta _\imath (x,\breve{\mathcal {u}}_1,\breve{\mathcal {u}}_2)\triangleq \epsilon _\imath , \end{aligned}$$
(6.29)

where \(\psi _\imath =\nabla \nu _\imath \big (f+\mathcal {g}_1\mathcal {u}_1(\breve{x}_\ell )+\mathcal {g}_2\mathcal {u}_2(\breve{x}_\ell )\big )-\varrho _\imath \nu _\imath \).

In order to minimize \(\epsilon _\imath \) in (6.29), we set the target of minimization as \(E=E_1+E_2=1/2\epsilon _1^2+1/2\epsilon _2^2\). Via applying gradient descent approach, we obtain

$$\begin{aligned} \dot{\hat{\omega }}_\imath =-\gamma _\imath \frac{1}{(\psi _\imath ^{T}\psi _\imath +1)^2}\frac{\partial E}{\partial \hat{\omega }_\imath } =-\gamma _\imath \frac{\psi _\imath }{(\psi _\imath ^{T}\psi _\imath +1)^2}\epsilon _\imath =-\gamma _\imath \breve{\psi }_\imath \epsilon _\imath , \end{aligned}$$
(6.30)

where \(\gamma _\imath \) is the adjustable parameter and \(\breve{\psi }_\imath =\psi _\imath /(\psi _\imath ^{T}\psi _\imath +1)^2\). Define \(\tilde{\omega }_\imath =\omega _\imath ^{*}-\hat{\omega }_\imath \). From (6.30), we derive that

$$\begin{aligned} \dot{\tilde{\omega }}_\imath =-\gamma _\imath \bar{\psi }_\imath \bar{\psi }_\imath ^{T} \tilde{\omega }_\imath +\gamma _\imath \breve{\psi }_\imath e_\imath , \end{aligned}$$
(6.31)

where \(\bar{\psi }_\imath =\psi _\imath /(\psi _\imath ^{T}\psi _\imath +1)\) and the approximated residual error \(e_\imath =-\nabla \sigma _\imath ^{T}(f+\mathcal {g}_1\breve{\mathcal {u}}_1+\mathcal {g}_2\breve{\mathcal {u}}_2)+\varrho _\imath \sigma _\imath \). For proceeding further, the following assumptions are required [11, 26, 27].

Assumption 6.2

For any \(\imath \in \{1,2\}\), the signal \(\bar{\psi }_\imath \) is persistently excited on the time interval \([t,t+T]\). That is, there exists the positive constant \(b_{\psi \imath }\) such that

$$\begin{aligned} b_{\psi \imath } I_{N_{c\imath }\times N_{c\imath }}\le \int _t^{t+T}\bar{\psi _\imath }\bar{\psi }_\imath ^{T}d\varsigma , \end{aligned}$$
(6.32)

with \(N_{c\imath }\) being the neuron number of the \(\imath \)th critic network.

Assumption 6.3

For \(\imath \in \{1,2\}\), there exist positive constants such that \(\Vert \omega _\imath ^{*}\Vert \le b_{\omega \imath }\), \(\Vert \nabla \nu _\imath \Vert \le b_{\nu \imath }\), \(\Vert \nabla \sigma _\imath \Vert \le b_{\sigma \imath }\) and \(\Vert e_\imath \Vert \le b_{e\imath }\).

6.4 Stability Analysis

In this section, the asymptotic stability of the controlled system is analyzed by applying Lyapunov theory. Before presenting the main results, the boundedness of critic weight is discussed in the following lemma.

Lemma 6.4

For any \(\imath \in \{1,2\}\), suppose that Assumptions 6.2ā€“6.3 hold and the initial weight is finite. If the critic tuning law (6.30) is applied, then it holds that \(\tilde{\omega }_\imath \) is locally ultimately bounded.

Proof

Consider the Lyapunov function as \(L_{y\omega }\). Itā€™s noted that the derivative of \(\tilde{\omega }_\imath \) is flow dynamics, which indicates that there doesnā€™t exist any jumps in the values of \(\tilde{\omega }_\imath \). More specially, \(\tilde{\omega }_\imath \) is continuous at the regulating instant. Thus we only need to consider the time interval between two adjoining regulating instants.

According to Assumptions 6.2ā€“6.3, it can be derived that

$$\begin{aligned} \dot{L}_{y\omega }=&\,2\gamma _1\tilde{\omega }_1^T\dot{\tilde{\omega }}_1+2\gamma _2\tilde{\omega }_2^T\dot{\tilde{\omega }}_2 \nonumber \\ =&\,2\gamma _1(-\tilde{\omega }_1 \bar{\psi }_1 \bar{\psi }_1^{T} \tilde{\omega }_1+\tilde{\omega }_1 \breve{\psi }_1 e_1) \nonumber \\&+\,2\gamma _2(-\tilde{\omega }_2 \bar{\psi }_2 \bar{\psi }_2^{T} \tilde{\omega }_2+\tilde{\omega }_2 \breve{\psi }_2 e_2). \end{aligned}$$
(6.33)

By applying Youngā€™s inequation, we can get

$$\begin{aligned} \dot{L}_{y\omega }\le&-\gamma _1(\tilde{\omega }_1 \bar{\psi }_1 \bar{\psi }_1^{T} \tilde{\omega }_1-e_1^{T} e_1) \nonumber \\&-\gamma _2(\tilde{\omega }_2 \bar{\psi }_2 \bar{\psi }_2^{T} \tilde{\omega }_2-e_2^{T} e_2) \nonumber \\ \le&-\gamma _1 b_{\psi 1}\Vert \tilde{\omega }_1\Vert ^2-\gamma _2 b_{\psi 2}\Vert \tilde{\omega }_2\Vert ^2+\varGamma _1, \end{aligned}$$
(6.34)

where \(\varGamma _1=\gamma _1 b_{e1}^2+\gamma _2 b_{e2}^2\). Furthermore, when \(\Vert \tilde{\omega }_1\Vert >\sqrt{\frac{\varGamma _1}{\gamma _1 b_{\psi 1}}}\triangleq b_{\tilde{\omega }_1}\) or \(\Vert \tilde{\omega }_2\Vert >\sqrt{\frac{\varGamma _1}{\gamma _2 b_{\psi 2}}}\triangleq b_{\tilde{\omega }_2}\), it yields that \(\dot{L}_{y\omega }<0\). The lemma is proved. \(\blacksquare \)

Theorem 6.4

Consider the system (6.8) with strategy formulated as (6.28). Suppose that Assumptions 6.1ā€“6.3 hold. The tuning law for critic network is given by (6.30). Then the state x and weight estimation error \(\tilde{\omega }_\imath \) are UUB provided that the indication is applied

$$\begin{aligned} \Vert z_\ell \Vert ^2\le \frac{(1-\varpi _1^2)\lambda _m(\varUpsilon )}{(1+\varpi _2)\theta \lambda _M(Y)}\Vert x\Vert ^2\triangleq \Vert z_e\Vert ^2, \end{aligned}$$
(6.35)

with \(\varpi _1\) and \(\varpi _2\) being the adjustable parameters.

Proof

Select the Lyapunov function candidate as

$$\begin{aligned} L_Y=&\sum _{\imath =1}^{N=2}V_\imath ^{*}(\breve{x}_\ell )+\sum _{\imath =1}^{N=2}V_\imath ^{*}(x) +\frac{1}{2}\sum _{\imath =1}^{N=2}\tilde{\omega }_\imath ^{T}\tilde{\omega }_\imath \nonumber \\ =&L_{Ya}+L_{Yb}+L_{Yc}. \end{aligned}$$
(6.36)

Due to the utilization of MDRM, we present the proof process in two cases.

Case I: No regulation occurs, i.e., \(t\in [\hbar _\ell ,\hbar _{\ell +1})\). Then we obtain \(\dot{L}_{Ya}=0\). The derivative of \(L_{Yb}\) can be obtained as

$$\begin{aligned} \dot{L}_{Yb}=\sum _{\imath =1}^{N=2}(\nabla V_\imath ^{*})^{T}(f+\mathcal {g}_1\breve{\mathcal {u}}_1 +\mathcal {g}_2\breve{\mathcal {u}}_2). \end{aligned}$$
(6.37)

Let \(\breve{\mathcal {u}}=[(\breve{\mathcal {u}}_1-\mathcal {u}_1^{*})^{T},(\breve{\mathcal {u}}_2-\mathcal {u}_2^{*})^{T}]^{T}\). Applying the operations similar to that in Theorem 6.3, we have

$$\begin{aligned} \dot{L}_{Yb}\le&-x^{T}\varUpsilon x+\breve{\mathcal {u}}^{T}Y\breve{\mathcal {u}}+\varrho _V \nonumber \\ \le&-x^{T}\varUpsilon x+\varrho _V+\lambda _M(Y)\Vert \mathcal {u}_1^{*}-\breve{\mathcal {u}}_1^{*} +\breve{\mathcal {u}}_1^{*}-\breve{\mathcal {u}}_1\Vert ^2 \nonumber \\&+\lambda _M(Y)\Vert \mathcal {u}_2^{*}-\breve{\mathcal {u}}_2^{*}+\breve{\mathcal {u}}_2^{*}-\breve{\mathcal {u}}_2\Vert ^2 \nonumber \\ \le&-x^{T}\varUpsilon x+\varrho _V+\lambda _M(Y)(1+1/\varpi _2)\Vert \breve{\mathcal {u}}_1^{*}-\breve{\mathcal {u}}_1\Vert ^2 \nonumber \\&+\lambda _M(Y)(1+\varpi _2)\Vert \mathcal {u}_1^{*}-\breve{\mathcal {u}}_1^{*}\Vert ^2 \nonumber \\&+\lambda _M(Y)(1+1/\varpi _2)\Vert \breve{\mathcal {u}}_2^{*}-\breve{\mathcal {u}}_2\Vert ^2 \nonumber \\&+\lambda _M(Y)(1+\varpi _2)\Vert \mathcal {u}_2^{*}-\breve{\mathcal {u}}_2^{*}\Vert ^2. \end{aligned}$$
(6.38)

Recall that \(\theta =\theta _1+\theta _2\), and substitute (6.27) and (6.28) into (6.38). Then we can derive

$$\begin{aligned} \dot{L}_{Yb}\le&-x^{T}\varUpsilon x+(1+\varpi _2)\theta \lambda _M(Y)\Vert x-\breve{x}_\ell \Vert ^2 \nonumber \\&+\varrho _V+\varGamma _2, \end{aligned}$$
(6.39)

where \(\varGamma _2=\frac{1}{4}\lambda _M(Y)(1+1/\varpi _2)^2\big (\Vert \mathcal {R}_{11}^{-1}\Vert ^2 b_{g1}^2 b_{\nu 1}^2 b_{\tilde{\omega }1}^2+\Vert \mathcal {R}_{22}^{-1}\Vert ^2 b_{g2}^2 b_{\nu 2}^2 b_{\tilde{\omega }2}^2\big )+\frac{1}{4\varpi _2}\lambda _M(Y)(1+\varpi _2)^2\big (\Vert \mathcal {R}_{11}^{-1}\Vert ^2 b_{g1}^2 b_{\sigma 1}^2+\Vert \mathcal {R}_{22}^{-1}\Vert ^2 b_{g2}^2 b_{\sigma 2}^2\big )\) with \(b_{g1}\) and \(b_{g2}\) denoting the bounds of known \(\mathcal {g}_1\) and \(\mathcal {g}_2\).

According to Assumption 6.2 and Assumption 6.3, we derive that

$$\begin{aligned} \dot{L}_{Yc}\le -\gamma _1 b_{\psi 1}\Vert \tilde{\omega }_1\Vert ^2-\gamma _2 b_{\psi 2}\Vert \tilde{\omega }_2\Vert ^2+\varGamma _1. \end{aligned}$$
(6.40)

Based on (6.39) and (6.40), we can obtain

$$\begin{aligned} \dot{L}_Y\le&-(1-\varpi _1^2)\lambda _m(\varUpsilon )\Vert x\Vert ^2-\varpi _1^2\lambda _m(\varUpsilon )\Vert x\Vert ^2 \nonumber \\&+(1+\varpi _2)\lambda _M(Y)\theta \Vert x-\breve{x}_\ell \Vert ^2-\gamma _1 b_{\psi 1}\Vert \tilde{\omega }_1\Vert ^2 \nonumber \\&-\gamma _2 b_{\psi 2}\Vert \tilde{\omega }_2\Vert ^2+\pounds , \end{aligned}$$
(6.41)

where \(\pounds =\varGamma _1+\varGamma _2+\varrho _V\). Applying the indication (6.35), then we conclude that \(\dot{L}_Y<0\) when one of the conditions hold that

$$\begin{aligned} \Vert x\Vert >\frac{1}{\varpi _1}\sqrt{\frac{\pounds }{\lambda _m(\varUpsilon )}}\triangleq \beta _x, \end{aligned}$$
(6.42)
$$\begin{aligned} \Vert \tilde{\omega }_\imath \Vert >\sqrt{\frac{\pounds }{\gamma _\imath b_{\psi \imath }}}\triangleq \beta _{\tilde{\omega }\imath },\imath =1,2. \end{aligned}$$
(6.43)

Thus x and \(\tilde{\omega }_\imath \) can be guaranteed to be UUB.

Case II: A regulation occurs, that is, \(t=\hbar _{\ell +1}\). The difference of \(L_Y\) can be given by

$$\begin{aligned} \triangle L_Y=\triangle L_{Ya}+\triangle L_{Yb}+\triangle L_{Yc}, \end{aligned}$$
(6.44)

where the terms are defined by \(\triangle L_{Ya}=V_1^{*}(\breve{x}_{\ell +1})-V_1^{*}(\breve{x}_{\ell })+V_2^{*}(\breve{x}_{\ell +1}) -V_2^{*}(\breve{x}_{\ell })\), \(\triangle L_{Yb}=V_1^{*}(x(\hbar _{\ell +1}))-V_1^{*}(x(\hbar _{\ell +1}^-))+V_2^{*}(x(\hbar _{\ell +1}))-V_2^{*}(x(\hbar _{\ell +1}^-))\), \(\triangle L_{Yc}=1/2\tilde{\omega }_1^{T}(\hbar _{\ell +1})\tilde{\omega }_1(\hbar _{\ell +1}) -1/2\tilde{\omega }_1^{T}(\hbar _{\ell +1}^-)\tilde{\omega }_1(\hbar _{\ell +1}^-)+1/2\tilde{\omega }_2^{T}(\hbar _{\ell +1})\tilde{\omega }_2(\hbar _{\ell +1}) -1/2\tilde{\omega }_2^{T}(\hbar _{\ell +1}^-)\tilde{\omega }_2(\hbar _{\ell +1}^-)\). Recalling the analysis in Case I, we obtain that \(\dot{L}_Y<0\) when x or \(\tilde{\omega }_\imath \) is out of the corresponding bound. Furthermore, we can derive that \(L_{Yb}+L_{Yc}\) is monotonically decreasing when \(t\in [\hbar _\ell ,\hbar _{\ell +1})\). In light of the properties of limits, we have

$$\begin{aligned} 0\le&V_\imath ^{*}(x(\hbar _{\ell +1}^-))+\frac{1}{2}\tilde{\omega }_\imath ^{T}(\hbar _{\ell +1}^-)\tilde{\omega }_\imath (\hbar _{\ell +1}^-) \nonumber \\&-V_\imath ^{*}(x(\hbar _{\ell +1}))-\frac{1}{2}\tilde{\omega }_\imath ^{T}(\hbar _{\ell +1})\tilde{\omega }_\imath (\hbar _{\ell +1}). \end{aligned}$$
(6.45)

As x is proved to be UUB, we can obtain

$$\begin{aligned} V_\imath ^{*}(\breve{x}_{\ell +1})\le V_\imath ^{*}(\breve{x}_\ell ). \end{aligned}$$
(6.46)

According to (6.45) and (6.46), we can derive \(\triangle L_Y<0\), which indicates that the selected Lyapunov (6.36) is monotonically decreasing when \(t=\hbar _{\ell +1}\). This completes the proof. \(\blacksquare \)

Remark 6.5

\(\varpi _1\) and \(\varpi _2\) in (6.35) are the adjustable parameters which determine the frequency of medicine dosage regulation. A larger \(\varpi _1\) or \(\varpi _2\) leads to a higher regulation frequency, and a smaller parameter implies a lower adjustment frequency. Thus we can determine these parameters according to the clinical data.

Remark 6.6

In thischapter, the approximate optimal combination therapeutic strategy is derived via ADP method to inhibit the proliferation of tumor cells under the mechanism of medicine dosage regulation. The MDRM is constructed on the foundation of the above-mentioned medicine indication (6.35). The data at the dosage-regulating instants should be recorded and will be utilized as reference data in the future. When the difference between the current clinical data and latest reference data is larger than the threshold, the medicine dosage can be regulated. Therefore, this mechanism can guarantee the derived therapeutic strategy to be regulated timely and necessarily.

6.5 Simulation Study

In this section, the mathematical model (6.7) is considered which presents the relations between cells and drugs. For simplicity, we have constructed the rephrased system (6.8) of which the control issue could be deemed as NZSGs.

In light of the clinical medical statistics and literature [38], the parameters on cells and drugs for model (6.7) are given in TableĀ 6.1 and TableĀ 6.2, respectively. For the discounted value function (6.9) of system (6.8), the corresponding parameters are set as \(\mathcal {R}_{11}=0.8I_{2\times 2}\), \(\mathcal {R}_{12}=15I_{2\times 2}\), \(\mathcal {R}_{21}=5I_{2\times 2}\), \(\mathcal {R}_{22}=I_{2\times 2}\), \(\varUpsilon _1=0.02I_{6\times 6}\) and \(\varUpsilon _2=0.06I_{6\times 6}\). In addition, the discounted factors \(\varrho _1=\varrho _2=0.2\).

Table 6.1 Parameter specifications of the cells
Table 6.2 Parameter specifications of the drugs
Fig. 6.1
figure 1

The evolutions of model states

Fig. 6.2
figure 2

The therapy strategy curves of chemotherapy drug 1

Fig. 6.3
figure 3

The therapy strategy curves of chemotherapy drug 2

Fig. 6.4
figure 4

The therapy strategy curves of anti-angiogenic drug

Fig. 6.5
figure 5

The population of normal cells under different therapies

Fig. 6.6
figure 6

The population of tumor cells under different therapies

Fig. 6.7
figure 7

The population of endothelial cells under different therapies

For the critic NNs, the activation functions are both set as \([x_1^2\), \(x_1 x_2\), \(x_1 x_3\), \(x_1 x_4\), \(x_1 x_5\), \(x_1 x_6\), \(x_2^2\), \(x_2 x_3\), \(x_2 x_4\), \(x_2 x_5\), \(x_2 x_6\), \(x_3^2\), \(x_3 x_4\), \(x_3 x_5\), \(x_3 x_6\), \(x_4^2\), \(x_4 x_5\), \(x_4 x_6\), \(x_5^2\), \(x_5 x_6\), \(x_6^2]^{T}\), and the learning laws are set by \(\gamma _1=1.5\) and \(\gamma _2=2\). Besides, the parameters \(\theta =8\), \(\varpi _1=0.8\) and \(\varpi _2=8\).

The evolution curves of the model (6.7) are depicted in Fig.Ā 6.1. From Fig.Ā 6.1 we can observe that when \(t=200d\), the population of tumor cells reduces to zero, and when \(t=600d\), the population of normal cells almost returns to 1 and that of endothelial cells drops down to a small steady value. This indicates that the proliferation of tumor cells can be suppressed after 600 days under the optimal therapy strategy. In Figs.Ā 6.1, 6.2, 6.3, 6.4, 6.5, 6.6 and 6.7, we compare the medicine dosages of the derived therapy strategy and that of initial therapy strategy. It indicates that the medicine dosages of our near-optimal therapy strategy are significantly less than the dosages of initial strategy. Itā€™s of great practical significance since superfluous drugs may well affect the health of patients and impose additional financial burdens on patients. Besides, one can find that when the clinical data becomes better, the regulation frequency of the derived therapy strategy becomes lower. This implies that the therapy strategy based on medicine dosage regulation mechanism can be regulated with the indications for medicine timely and necessarily. FiguresĀ 6.5, 6.6 and 6.7 present the curves of the cells under different therapy strategies, that is, chemotherapy drug 1, chemotherapy drug 2, anti-angiogenic drug and the therapy comprised of these three drugs. We can conclude from Figs.Ā 6.5, 6.6 and 6.7 that the therapeutic effect of the derived therapy is the best. Thus simulation results validate the effectiveness of our therapy strategy

6.6 Conclusion

In this chapter, an ADP-based method using medicine dosage regulation mechanism has been proposed to obtain the optimal combination therapy for curing cancer. A mathematical model is employed to describe the interactions among the normal cells, tumor cells, endothelial cells, chemotherapy drugs and anti-angiogenic drug. The mathematical model provides the foundation for us to solve the optimization issue under the architecture of NZSGs. The ADP method of single-critic framework is proposed to approximately seek the optimal strategy. In addition, the introduction of the medicine dosage adjustment mechanism guarantees the therapy strategy to be adjusted timely and necessary. Finally, the theory analysis and simulation results both indicate that the designed strategy can effectively decrease the population of tumor cells and endothelial cells with very few medicine dosage, which verifies the availability of the proposed method. Our future research direction is to seek the optimal strategy for decreasing tumor cells or other harmful cells with latest therapies, for example, the therapy applying oncolytic virus.