Combination Therapy-Based Adaptive Control for Organism Using Medicine Dosage Regulation Mechanism

Sun, Jiayue; Xu, Shun; Liu, Yang; Zhang, Huaguang

doi:10.1007/978-981-99-5929-7_6

Jiayue Sun⁵,
Shun Xu⁶,
Yang Liu⁶ &
…
Huaguang Zhang⁵

1138 Accesses

Abstract

In this chapter, the optimal control strategy for organism is investigated by using adaptive dynamic programming (ADP) method under the architecture of nonzero-sum games (NZSGs). Firstly, a tumor model is established to formulate the interaction relationships among normal cells, tumor cells, endothelial cells and the concentrations of drugs. Then, the ADP-based method of single-critic network architecture is proposed to approximate the coupled Hamilton-Jacobi equations (HJEs) under the medicine dosage regulation mechanism (MDRM). According to game theory, the approximate MDRM-based optimal strategy can be derived, which is of great practical significance. Owing to the proposed mechanism, the dosages of the chemotherapy and anti-angiogenic drugs can be regulated timely and necessarily. Furthermore, the stability of the closed-loop system with the obtained strategy is analyzed via Lyapunov theory. Finally, a simulation experiment is conducted to verify the effectiveness of the proposed method.

You have full access to this open access chapter, Download chapter PDF

6.1 Introduction

The death toll is soaring caused by neoplastic diseases, and the issues on nonlinear dynamics and control of tumour growth have motivated a widespread concern as [1]. Essential nutrients in humans are the resources for which the normal cells and tumor cells compete. Tumour cells will keep proliferating, robbing the limited energy supply of the body, and eventually disintegrating the somatic function to death. Somatic cells constantly divide, and new cells differentiate which end with apoptosis. In this manner, the relative balance can be maintained in human bodies. Nevertheless, when the process of differentiation for normal cells is out of control, the cells may well evolve into tumor cells. It is the nature for the tumor cells of which the tendency is to eat the body’s nutrients crazily.

The population of tumour cells progressively increases for the following three characteristics. Firstly, the most obvious characteristic is the insensitivity to anti-growth signals. There exists strict control mechanism for normal cells, but for tumor cells, this mechanism is no longer valid. During the continuous process of division, tumor cells can escape from the monitoring of the anti-growth signals, which leads to the crazy growth of tumour cells. Secondly, tumour cells have the ability to promote the growth of blood vessels which are essential for providing nutrients, and it is the reason why the blood vessel density is associated with the malignant degree of the tumor tissue. Finally, tumor cells are also duplicitous, evolving camouflage abilities during their constant battle with immune cells to mislead the immune system into regarding them as normal cells, which results in the tumor immune escape. Thus, to suppress the growth of tumour cells, obstructing the generative mechanisms which relies on the necessary nutrients was an effective approach as literatures [2, 3].

Distinguishing from the mixed tumor treatment approach of immunotherapy and chemotherapy as [4], this chapter explores a more effective adaptive control strategy for organism using medicine dosage regulation mechanism. An additional population of cells which called endothelial cells enjoy the substances induced by malignant tumour cells, and they could transfer oxygen and nutrients to the primary focus causing proliferating of blood cells, which will increase carrying capacity of tumour cells known as tumour angiogenesis in [5]. As indicated in literature [6], anti-angiogenic agent could particularly decrease the growing rate of tumours, reaching saturation to some extent without killing the endothelial cells completely. When the chemotherapy agent was used in combination with anti-angiogenic agent to reduce the population of tumor cells , the latter could increase the effect of the former as described in [7]. Nevertheless, as the key element of promoting the growth of the vasculature, the endothelial cells could not be completely destroyed. Otherwise, it may not exist that the specified number of vasculature for constructing access of chemotherapy agent. On the basis of the pharmaceutical science concerning the chemotherapy agent and anti-angiogenic agent, the adaptive control strategy for organism will provide a guidance for clinical practice under the medicine dosage regulation mechanism, especially in the treatment process of Lung cancer. Furthermore, what counts is that since the anti-tumor drugs often kill both tumor cells and normal cells, it’s of significance to utilize less drugs to achieve the therapeutic goal during the treatment process.

ADP is derived from dynamic programming and reinforcement learning, is a powerful tool to tackle optimization issues [8,9,10]. In general, the successful implementation of ADP-based methods depends on the cooperative work of actor and critic networks [11]. Under this framework, the actor is responsible for performing the control strategy with current data [12]. The goal of critic is to provide actor with the feedback information derived from the evaluation of the cost under the strategy. The distinct merit of this type of algorithm lies in that the optimal control strategy could be approximately acquired in the manner of iteration computation, and the “curse of dimensionality” could be obviated with effect. Different ADP-based methods have been researched by scholars to tackle multifarious optimal control problems with the aid of the artificial neural networks of which the performance is outstanding [13, 14], such as the robust control [15, 16], optimal consensus control [17,18,19] and the optimal tracking issues [20, 21]. Furthermore, for the system with multiple controllers, the optimization issues can be formulated by game theory. As a vital branch of game theory, NZSGs is derived from [22] with the goal of attaining the optimal strategy pair that can minimize the personal performance index for each player when stabilizing the controlled system [23,24,25]. Due to the excellent ability to approximate optimization, the ADP methods have been proposed to solve NZSGs. In [26], the adaptive method of critic-only structure was developed to solve two-player NZSGs without any initial stabilizing control. The experience replay technique was integrated into the ADP algorithm in [27] to concurrently utilize the historical data together with the real-time data to approximate the value function such that the persistence of excitation condition was not indispensable. In [28], the data-based integral reinforcement learning algorithm was proposed to solve NZSGs. More specially, it was a novel iterative learning algorithm based on both off-line and online manner which could extend the applicability of the data-based control scheme. Furthermore, in [29], the discrete-time N-player NZSGs was tackled via the off-policy reinforcement learning method which was independent of system dynamics.

Although the relevant academic achievements have been presented in theories and applications as [30,31,32,33,34,35,36,37,38], there is seldom any literature on this filed according to literature survey of the authors. The contributions can be shown as follows. First, the near-optimal therapy for the treatment of tumor is firstly acquired via the ADP approach which is an efficient adaptive intelligent learning algorithm. Second, the interactive system with discounted value function is constructed based on the mathematical model simulating the interaction relationships among cells and drugs. Besides, two kinds of chemotherapy drugs and a kind of anti-angiogenic agent participate in the therapy such that the combination therapeutic strategy can be derived under the architecture of NZSGs. Third, the idea of cybernetics is extended to the frontier fields of medicine, more precisely, the therapy of tumor. Under the MDRM, the derived therapeutic strategy can achieve the therapeutic goal with the lowest doses of drugs, and the practical indications for medicine is considered for the first time.

Notations: $\mathbb {N}^{+}$ denotes the set containing all positive integers. $\parallel \cdot \parallel $, $diag\{\cdot \}$ and $\bigtriangledown (\cdot )\triangleq \partial (\cdot )/\partial x$ respectively represent the Euclidean norm of a vector/matrix, the operation of constructing diagonal matrix and the gradient operator. $\lambda _m(\cdot )$ and $\lambda _M(\cdot )$ separately denote the minimum eigenvalue and maximum eigenvalue of a matrix. $I_{n\times n}$ is the unit matrix whose dimension is n.

6.2 Preliminaries

6.2.1 Establishment of Mathematical Model

In this section, the growth mathematical model is established which considers the interaction relationships among the normal cells, tumor cells and endothelial cells. Moreover, the effects of control inputs, i.e., the chemotherapy and anti-angiogenic drugs, on these cells are embodied in the model. Thus, in the model formed from ordinary differential equations as follows, $P_{NC}(t)$, $P_{TC}(t)$ and $P_{EC}(t)$ respectively represent the populations of normal cells, tumor cells and endothelial cells, $P_{CD\jmath }(t) (\jmath =1,2)$ and $P_{AD}(t)$ denote the concentrations of chemotherapy and anti-angiogenic drugs.

The population of normal cells, which is influenced by tumor cells, endothelial cells and the concentrations of chemotherapy and anti-angiogenic drugs, is modeled by

$$\begin{aligned} \dot{P}_{NC}(t)=&\,\alpha _1 P_{NC}(t)\Big (1-\frac{P_{NC}(t)}{C_1}\Big )-A_1 P_{NC}(t)P_{TC}(t) \nonumber \\&-\varXi _1\big (P_{EC}(t),P_{AD}(t)\big )\frac{P_{NC}(t)P_{CD1}(t)}{B_1+P_{NC}(t)}\nonumber \\&-\varXi _2\big (P_{EC}(t),P_{AD}(t)\big )\frac{P_{NC}(t)P_{CD2}(t)}{B_1+P_{NC}(t)}, \end{aligned}$$

(6.1)

where $\varXi _\imath \big (P_{EC}(t),P_{AD}(t)\big )=\varXi _{\imath 1}P_{EC}(t)+\varXi _{\imath 2}P_{AD}(t)+\varXi _{\imath 0}, \imath =1,2$. The parameters $\alpha _1$, $B_1$, $C_1$ denote the proliferation rate, Holling type 2 constant and carrying capacity for normal cells, respectively. $A_1$ is the contention parameter between normal cells and tumor cells.

As the tumor cells contend with normal cells for necessary nutrients, the population of tumor cells is affected by that of normal cells. Besides, there exist mutual effects among tumor cells, endothelial cells and the drugs. Thus the corresponding model can be written as

$$\begin{aligned} \dot{P}_{TC}(t)=&\,\alpha _2 P_{TC}(t)-\frac{\alpha _2 P_{TC}(t)P_{TC}(t)}{C_2+\Phi P_{EC}(t)}-\varPi _1\big (P_{EC}(t),P_{AD}(t)\big )\frac{P_{TC}(t)P_{CD1}(t)}{B_2+P_{TC}(t)} \nonumber \\&-\varPi _2\big (P_{EC}(t)\!,\!P_{AD}(t)\big )\frac{P_{TC}(t)P_{CD2}(t)}{B_2+P_{TC}(t)} -A_2 P_{NC}(t)P_{TC}(t), \end{aligned}$$

(6.2)

where $\varPi _\jmath \big (P_{EC}(t),P_{AD}(t)\big )=\varPi _{\jmath 1}P_{EC}(t)+\varPi _{\jmath 2}P_{AD}(t)+\varPi _{\jmath 0}$, $\jmath =1,2$. The parameters $\alpha _2$, $B_2$, $C_2$ are multiplication rate, Holling type 2 constant and carrying capacity for tumor cells. $A_2$ is contention parameter between normal cells and tumor cells.

The population of endothelial cells is associated with tumor cells and anti-angiogenic drugs. The relations can be given as

$$\begin{aligned} \dot{P}_{EC}(t)=s_1\!+\!K P_{TC}(t)\!+\!\alpha _3 P_{EC}(t)\Big (1\!-\!\frac{P_{EC}(t)}{C_3}\Big ) \!-\!\frac{\varXi _3P_{EC}(t)P_{AD}(t)}{B_3\!+\!P_{EC}(t)} \end{aligned}$$

(6.3)

where K is multiplication rate caused by tumor cells and $s_1$ the inflow rate. Similarly, the parameters $\alpha _3$, $B_3$, $C_3$ are multiplication rate, Holling type 2 constant and carrying capacity for endothelial cells. $\varXi _3$ is the killing rate for endothelial cells.

The concentrations of the drugs decrease during the treatment phases, owing to the washout process. Hence we can model the evolution process of the concentrations of chemotherapy and anti-angiogenic drugs by

$$\begin{aligned} \dot{P}_{CD1}(t)=Dr_{c1}-\Big (\beta _{c1}\!+\!m_1\frac{P_{NC}(t)}{B_1+P_{NC}(t)}\!+\!m_2\frac{P_{TC}(t)}{B_2\!+\!P_{TC}(t)}\Big )P_{CD1}(t) \end{aligned}$$

(6.4)

$$\begin{aligned} \dot{P}_{CD2}(t)=Dr_{c2}\!-\!\Big (\beta _{c2}\!+\!m_3\frac{P_{NC}(t)}{B_1\!+\!P_{NC}(t)}\!+\!m_4\frac{P_{TC}(t)}{B_2+P_{TC}(t)}\Big )P_{CD2}(t) \end{aligned}$$

(6.5)

and

$$\begin{aligned} \dot{P}_{AD}(t)=Dr_a-\Big (\beta _a+\frac{m_5P_{EC}(t)}{B_3+P_{EC}(t)}\Big )P_{AD}(t), \end{aligned}$$

(6.6)

where $Dr_{c1}$, $Dr_{c2}$ and $Dr_a$ are the control inputs. $\beta _{c1}$, $\beta _{c2}$ and $\beta _a$ denote the washout rates for the drugs. $m_1$, $m_2$, $m_3$, $m_4$ and $m_5$ are the rates at which the drugs integrate into the cells. Based on the operations similar to that in [39], we obtain the simplified version of the model as

$$\begin{aligned} \left\{ \begin{aligned} \dot{p}_{NC}(t)=&\,\alpha _1 p_{NC}(t)(1-p_{NC}(t))-a_1 p_{NC}(t)p_{TC}(t) \\&-\xi _1\frac{p_{NC}(t)p_{CD1}(t)}{b_1+p_{NC}(t)}-\xi _2\frac{p_{NC}(t)p_{CD2}(t)}{b_1+p_{NC}(t)}, \\ \dot{p}_{TC}(t)=&\,\alpha _2 p_{TC}(t)\Big (1-\frac{p_{TC}(t)}{1+\phi p_{EC}(t)}\Big )-a_2 p_{NC}(t)p_{TC}(t) \\&-\pi _1\frac{p_{TC}(t)p_{CD1}(t)}{b_2+p_{TC}(t)}-\pi _2\frac{p_{TC}(t)p_{CD2}(t)}{b_2+p_{TC}(t)} \\ \dot{p}_{EC}(t)=&\,s_1+k p_{TC}(t)+\alpha _3 p_{EC}(t)(1-p_{EC}(t)) -\xi _3\frac{p_{EC}(t)p_{AD}(t)}{b_3+p_{EC}(t)}, \\ \dot{p}_{CD1}(t)=&\,u_{c1}-\Big (\beta _{c1}+m_1\frac{p_{NC}(t)}{b_1+p_{NC}(t)} +m_2\frac{p_{TC}(t)}{b_2+p_{TC}(t)}\Big )p_{CD1}(t), \\ \dot{p}_{CD2}(t)=&\,u_{c2}-\Big (\beta _{c2}+m_3\frac{p_{NC}(t)}{b_1+p_{NC}(t)} +m_4\frac{p_{TC}(t)}{B_2+p_{TC}(t)}\Big )p_{CD2}(t), \\ \dot{p}_{AD}(t)=&\,u_a-\Big (\beta _a+\frac{m_5 p_{EC}(t)}{b_3+p_{EC}(t)}\Big )p_{AD}(t), \end{aligned} \right. \end{aligned}$$

(6.7)

where $\xi _\imath \big (p_{EC}(t),p_{AD}(t)\big )=$ $\xi _{\imath 1}p_{EC}(t)+\xi _{\imath 2}p_{AD}(t)+\xi _{\imath 0}$ and $\pi _\jmath \big (p_{EC}(t),p_{AD}(t)\big )$ $=\pi _{\jmath 1}p_{EC}(t)+\pi _{\jmath 2}p_{AD}(t)+\pi _{\jmath 0}$ with $\imath , \jmath =1,2$. The states $p_{NC}(t)$, $p_{TC}(t)$, $p_{EC}(t)$, $p_{CD1}(t)$, $p_{CD2}$ and $p_{AD}$ are nonnegative.

Remark 6.1

The differential equation (6.7) is the simplified model describing the interaction relationships among cells and drug. Observing the model, one can discover that there exists competition between normal cells and tumor cells. The tumor cells require more nutrients such that they facilitate the proliferation of endothelial cells, which could provide the indispensable nutrients to promote the growth of tumor. The tumor cells can be effectively damaged by the chemotherapy drugs which have side-effects on normal cells to some extent, and the anti-angiogenic drug contributes to the proliferation inhibition of the endothelial cells.

6.2.2 Nonzero-Sum Games Formulation

Consider the interaction model (6.7) rewritten as

$$\begin{aligned} \dot{x}&=f(x)+\left[ \begin{array}{cc} 0_{3\times 1}&{}0_{3\times 1} \\ 0.06&{}0 \\ 0&{}0 \\ 0&{}0.12 \end{array} \right] \mathcal {u}_1+\left[ \begin{array}{cc} 0_{3\times 1}&{}0_{3\times 1} \\ 0&{}0 \\ 0.1&{}0 \\ 0&{}0 \end{array} \right] \mathcal {u}_2 \\ \nonumber&=f(x)+\mathcal {g}_1\mathcal {u}_1+\mathcal {g}_2\mathcal {u}_2, \end{aligned}$$

(6.8)

where $\mathcal {u}_1=[u_{c1},u_a]^{T}$, $\mathcal {u}_2=[u_{c2},0]^{T}$ and f(x) is constructed by the right-hand side parts of (6.7) excluding the terms $u_{c1}$, $u_{c2}$ and $u_a$.

Define the value function for player $\imath (\imath =1,2)$ as

$$\begin{aligned} {V}_\imath (x(t))=\int _t^{\infty }e^{-\varrho _\imath (\varsigma -t)}\delta _\imath (x,\mathcal {u}_1,\mathcal {u}_2)d\varsigma , \end{aligned}$$

(6.9)

where the utility function $\delta _\imath (x,\mathcal {u}_1,\mathcal {u}_2)=x^{T}\varUpsilon _\imath x+\mathcal {u}_1^{T}\mathcal {R}_{\imath 1}\mathcal {u}_1+\mathcal {u}_2^{T}\mathcal {R}_{\imath 2}\mathcal {u}_2$. The matrixes $\mathcal {R}_{\imath \jmath }(\imath ,\jmath =1,2)$ and $\varUpsilon _\imath $ are positive definite, and $\varrho _\imath >0$ is the discount factor. According to the value function (6.9), we can define the corresponding Hamiltonian function as

$$\begin{aligned} H_\imath (x,\mathcal {u}_1,\mathcal {u}_2)=(\nabla V_\imath )^{T}(f+\mathcal {g}_1\mathcal {u}_1+\mathcal {g}_2\mathcal {u}_2) \nonumber \\ +\delta _\imath (x,\mathcal {u}_1,\mathcal {u}_2)-\varrho _\imath V_\imath , \imath =1,2. \end{aligned}$$

(6.10)

The optimal value function is defined as

$$\begin{aligned} V_\imath ^{*}=\min _{\mathcal {u}_\imath }\int _t^{\infty }e^{-\varrho _\imath (\varsigma -t)}(x^{T}\varUpsilon _\imath x+\sum _{\jmath =1}^{{N}=2}\mathcal {u}_\jmath ^{T}\mathcal {R}_{\imath \jmath }\mathcal {u}_\jmath )d\varsigma . \end{aligned}$$

(6.11)

The target of NZSGs is to attain the admissible strategy pair $\{\mathcal {u}_1^*,\mathcal {u}_2^*\}$ with the definition given in [23, 40]. According to the stationarity condition, the optimal strategy for player $\imath $ could be obtained by

$$\begin{aligned} \mathcal {u}_\imath ^{*}=-\frac{1}{2}\mathcal {R}_{\imath \imath }^{-1}\mathcal {g}_\imath ^{T}\nabla V_\imath ^{*}. \end{aligned}$$

(6.12)

Thus the HJEs can be obtained as

$$\begin{aligned} H_\imath (x,\mathcal {u}_1^{*},\mathcal {u}_2^{*},\nabla V_\imath ^{*})=(\nabla V_\imath ^{*})^{T}(f+\mathcal {g}_1\mathcal {u}_1^{*}+\mathcal {g}_2\mathcal {u}_2^{*}) \nonumber \\ +x^{T}\varUpsilon _\imath x+\mathcal {u}_1^{*}\mathcal {R}_{\imath 1}\mathcal {u}_1^{*}+\mathcal {u}_2^{*}\mathcal {R}_{\imath 2}\mathcal {u}_2^{*}-\varrho V_\imath ^{*}=0. \end{aligned}$$

(6.13)

Remark 6.2

It’s noteworthy that there exists no zero equilibrium for system (6.8), which may well result in the divergence of $V_\imath (x(t))$. To resolve this issue, the discounted factor $\varrho _\imath $ is introduced to form decay term such that $V_\imath (x(t))$ can be convergent.

In general, solving NZSGs is synonymous with solving the equations (6.13). Nevertheless, for nonlinear system, it’s very intractable to tackle the coupled equations. To resolve this difficulty, an ADP method utilizing dosage regulation mechanism is proposed in the following sections.

6.3 MDRM-Based Adaptive Critic Learning Method for NZSGs

Firstly, we introduce the indications for medicine to judge when the medicine dosage should be regulated. Then under the MDRM, the ADP method of single-critic architecture is proposed to approximately seek the optimal strategy for the NZSGs of model (6.7).

6.3.1 MDRM-Based Optimal Strategy Derivation

For the sake of realizing conditioned therapy strategy, MDRM is required to handle the clinical data such that the strategy can be changed timely and necessarily. The time sequence $\{\hbar _\ell \}$ is constructed for recording the regulating instants and $\ell $ denotes the $\ell $th regulating instant. Then the state could be denoted as

$$\begin{aligned} \breve{x}_\ell (t)=x(\hbar _\ell ), t\in [\hbar _\ell ,\hbar _{\ell +1}). \end{aligned}$$

(6.14)

For evaluating the difference between real-time data and latest recorded data, it’s necessary to define an error function that $z_\ell =\breve{x}_\ell -x(t),t\in [\hbar _\ell ,\hbar _{\ell +1})$. The operation of MDRM depends on the regulating condition which compares the error $z_\ell $ with the threshold associated with real-time data. The strategy is adjusted only when $z_\ell $ is larger than the threshold. That is, $\breve{\mathcal {u}}_\imath =\mathcal {u}_\imath (\breve{x}_\ell ),\imath =1,2$, and $\ell \in \mathbb {N}^{+}$. Thus the MDRM-based strategy could be got as

$$\begin{aligned} \breve{\mathcal {u}}_\imath ^{*}=-\frac{1}{2}\mathcal {R}_{\imath \imath }^{-1}\mathcal {g}_\imath ^{T}(\breve{x}_\ell )\nabla \breve{V}_\imath ^{*}, \imath =1,2, \end{aligned}$$

(6.15)

where $\nabla \breve{V}_\imath ^{*}=\partial V_\imath ^{*}/\partial x$ when $t=\hbar _\ell $. The version that based on the adjustment mechanism of HJEs is derived as

$$\begin{aligned}&H_\imath (x,\breve{\mathcal {u}}_1^{*},\breve{\mathcal {u}}_2^{*},V_\imath ^{*}) =\frac{1}{4}\sum _{\jmath =1}^{N=2}(\nabla \breve{V}_\jmath ^{*})^{T}\mathcal {g}_\jmath (\breve{x}_\ell )\mathcal {R}_{\jmath \jmath }^{-1}\mathcal {R}_{\imath \jmath }\mathcal {R}_{\jmath \jmath }^{-1}\mathcal {g}_\jmath ^{T}(\breve{x}_\ell )\nabla \breve{V}_\jmath ^{*}\nonumber \\&+(\nabla V_\imath ^{*})^{T}\left( f-\frac{1}{2}\sum _{\jmath =1}^{N=2}\mathcal {g}_\jmath \mathcal {R}_{\jmath \jmath }^{-1}\mathcal {g}_\jmath ^{T}(\breve{x}_\ell )\nabla \breve{V}_\jmath ^{*}\right) +x^{T}\varUpsilon _\imath x-\varrho _\imath V_\imath ^{*}. \end{aligned}$$

(6.16)

Differing from HJEs (6.13), due to the existence of the error $z_\ell $, (6.16) does not equal to zero. Before proceed with the discussion, the following assumption is required [41].

Assumption 6.1

The optimal strategy $\mathcal {u}_\imath ^{*}$ is locally Lipschitz. That is, for $\imath =1,2$, there exists a constant $\theta _\imath >0$ such that $\Vert \mathcal {u}_\imath ^{*}-\breve{\mathcal {u}}_\imath ^{*}\Vert ^2\le \theta _\imath \Vert x-\breve{x}_\ell \Vert ^2$.

Theorem 6.3

Consider the system (6.8), and suppose that Assumption 6.1 holds and $V_\imath ^{*}$ is the solution of (6.13). Then $\breve{u}_\imath ^{*}$ formulated as (6.15) can stabilize system (6.8) when the following medicine indication is applied

$$\begin{aligned} \Vert z_\ell \Vert ^2\le \frac{(1-2\zeta )\lambda _m(\varUpsilon )}{\theta \lambda _M(Y)}\Vert x\Vert ^2, \end{aligned}$$

(6.17)

where $\zeta \in (0,1/2)$ is adjustable parameter. The terms $\theta $, $\varUpsilon $ and Y are given in (6.21) and (6.22).

Proof

Selecting the Lyapunov function $L_{ya}=V_1^{*}+V_2^{*}$, we can obtain the corresponding derivative as

$$\begin{aligned} \dot{L}_{ya}=\sum _{\imath =1}^{N=2}(\nabla V_\imath ^{*})^{T}(f+\mathcal {g}_1\breve{\mathcal {u}}_1^{*} +\mathcal {g}_2\breve{\mathcal {u}}_2^{*}). \end{aligned}$$

(6.18)

According to (6.13), we have

$$\begin{aligned} (\nabla V_\imath ^{*})^{T}f=&-(\nabla V_\imath ^{*})^{T}(\mathcal {g}_1\mathcal {u}_1^{*}+\mathcal {g}_2\mathcal {u}_2^{*})-x^{T}\varUpsilon _\imath x \nonumber \\&-\mathcal {u}_1^{*T}\mathcal {R}_{\imath 1}\mathcal {u}_1^{*}-\mathcal {u}_2^{*T}\mathcal {R}_{\imath 2}\mathcal {u}_2^{*}+\varrho _\imath V_\imath ^{*}, \end{aligned}$$

(6.19)

and

$$\begin{aligned} (\nabla V_\imath ^{*})^{T}\sum _{\jmath =1}^{N=2}\mathcal {g}_\jmath (\mathcal {u}_\jmath ^{*}-\breve{\mathcal {u}}_\jmath ^{*}) =-2\mathcal {u}_\imath ^{*T}\mathcal {R}_{\imath \imath }\mathcal {g}_\imath ^{-1}\sum _{\jmath =1}^{N=2}\mathcal {g}_\jmath (\mathcal {u}_\jmath ^{*}-\breve{\mathcal {u}}_\jmath ^{*}). \end{aligned}$$

(6.20)

Let $\mathcal {u}^{*}=[\mathcal {u}_1^{*T},\mathcal {u}_2^{*T}]^{T}$ and $\breve{\mathcal {u}}^{*}=[(\breve{\mathcal {u}}_1^{*}-\mathcal {u}_1^{*})^{T},(\breve{\mathcal {u}}_2^{*}-\mathcal {u}_2^{*})^{T}]^{T}$. Then we can derive that

$$\begin{aligned} \dot{L}_{ya}=&-x^{T}\varUpsilon _1 x-x^{T}\varUpsilon _2 x-\sum _{\imath =1}^{N=2}\sum _{\jmath =1}^{N=2}\mathcal {u}_\jmath ^{*T}\mathcal {R}_{\imath \jmath }\mathcal {u}_\jmath ^{*} \nonumber \\&+2\sum _{\imath =1}^{N=2}\mathcal {u}_\imath ^{*T}\mathcal {R}_{\imath \imath }\mathcal {g}_\imath ^{-1}\sum _{\jmath =1}^{N=2}\mathcal {g}_\jmath (\mathcal {u}_\jmath ^{*}-\breve{\mathcal {u}}_\jmath ^{*})+\varrho _1 V_1^{*}+\varrho _2 V_2^{*} \nonumber \\ =&-x^{T}\varUpsilon x-\mathcal {u}^{*T}\mathcal {R}\mathcal {u}^{*}-2\mathcal {u}^{*T}Z\breve{\mathcal {u}}^{*} \nonumber \\&+\varrho _1 V_1^{*}+\varrho _2 V_2^{*}, \end{aligned}$$

(6.21)

where $\varUpsilon =\varUpsilon _1+\varUpsilon _2$, $\mathcal {R}=diag\{\mathcal {R}_{11}+\mathcal {R}_{21},\mathcal {R}_{12}+\mathcal {R}_{22}\}$, and $Z=[Z_1,Z_2]$ with $Z_\imath =[\mathcal {R}_{11}\mathcal {g}_1^{-1}\mathcal {g}_\imath ,\mathcal {R}_{22}\mathcal {g}_2^{-1}\mathcal {g}_\imath ]^{T},\imath =1,2$. Applying Young’s inequality, we have

$$\begin{aligned} \dot{L}_{ya}\le&-x^{T}\varUpsilon x-\mathcal {u}^{*T}\mathcal {R}\mathcal {u}^{*}+\mathcal {u}^{*T}\mathcal {R}\mathcal {u}^{*} \nonumber \\&+\breve{\mathcal {u}}^{*T}Z^{T}\mathcal {R}^{-1}Z\breve{\mathcal {u}}^{*}+\varrho _V \nonumber \\ =&-x^{T}\varUpsilon x+\breve{\mathcal {u}}^{*T}Y\breve{\mathcal {u}}^{*}+\varrho _V, \end{aligned}$$

(6.22)

where $Y=Z^{T}\mathcal {R}^{-1}Z$. It’s noted that $\mathcal {u}_\imath ^{*}$ is the admissible strategy, we can derive that $V_\imath ^{*}$ is bounded. Hence $\varrho _V$ is the bound of the term $\varrho _1 V_1^{*}+\varrho _2 V_2^{*}$. According to the definitions of $\varUpsilon $ and Y, we have that $\lambda _m(\varUpsilon )>0$ and $\lambda _M(Y)>0$. Furthermore, we can obtain

$$\begin{aligned} \dot{L}_{ya}\le&-2\zeta \lambda _m(\varUpsilon )\Vert x\Vert ^2-(1-2\zeta )\lambda _m(\varUpsilon )\Vert x\Vert ^2 \nonumber \\&+\lambda _M(Y)\theta \Vert z_\ell \Vert ^2+\varrho _V, \end{aligned}$$

(6.23)

where $\theta =\theta _1+\theta _2$. When the indication (6.17) is satisfied, we derive that $\dot{L}_{ya}\le -2\zeta \lambda _m(\varUpsilon )\Vert x\Vert ^2+\varrho _V$. Then we can find that $\dot{L}_{ya}<0$ holds when $\Vert x\Vert >\sqrt{\frac{\varrho _V}{2\zeta \lambda _m(\varUpsilon )}}$. In light of Lyapunov theorem, the strategy (6.15) can stabilize system (6.8). This completes the proof.$\blacksquare $

6.3.2 Implementation of Adaptive Critic Learning Method

In this section, the approximate optimal strategy under MDRM is derived by ADP method of single-critic architecture. In light of the universal approximation properties of neural networks (NNs), $V_\imath ^{*}$ can be obtained by

$$\begin{aligned} V_\imath ^{*}=\omega _\imath ^{*T}\nu _\imath (x)+\sigma _\imath , \imath =1,2, \end{aligned}$$

(6.24)

where $\omega _\imath ^{*}$ is the ideal weight vector, $\nu _\imath $ the activation function and $\sigma _\imath $ the approximate error. To acquire the approximate version of the unknown vector $\omega _\imath ^{*}$, the critic NN is constructed by

$$\begin{aligned} \hat{V}_\imath =\hat{\omega }_\imath ^{T}\nu _\imath (x),\imath =1,2, \end{aligned}$$

(6.25)

with $\hat{\omega }$ being the approximate vector. With the aid of critic NN, we can present the optimal strategy as

$$\begin{aligned} \mathcal {u}_\imath ^{*}=-\frac{1}{2}\mathcal {R}_{\imath \imath }^{-1}\mathcal {g}_\imath ^{T}\big ((\nabla \nu _\imath )^{T}\omega _\imath ^{*}+\nabla \sigma _\imath \big ),\imath =1,2. \end{aligned}$$

(6.26)

Accordingly, we can obtain the optimal and approximate optimal strategies under MDRM as

$$\begin{aligned} \breve{\mathcal {u}}_\imath ^{*}=-\frac{1}{2}\mathcal {R}_{\imath \imath }^{-1}\mathcal {g}_\imath ^{T}(\breve{x}_\ell )\big ((\nabla \nu _\imath (\breve{x}_\ell ))^{T}\omega _\imath ^{*}+\nabla \sigma _\imath (\breve{x}_\ell )\big ), \end{aligned}$$

(6.27)

and

$$\begin{aligned} \breve{\mathcal {u}}_\imath =-\frac{1}{2}\mathcal {R}_{\imath \imath }^{-1}\mathcal {g}_\imath ^{T}(\breve{x}_\ell )(\nabla \nu _\imath (\breve{x}_\ell ))^{T}\hat{\omega }_\imath . \end{aligned}$$

(6.28)

Then the approximate Hamiltonian can be presented as

$$\begin{aligned} H_\imath (x,\breve{\mathcal {u}}_1,\breve{\mathcal {u}}_2,\hat{\omega }_\imath ) =\hat{\omega }_\imath ^{T}\psi _\imath +\delta _\imath (x,\breve{\mathcal {u}}_1,\breve{\mathcal {u}}_2)\triangleq \epsilon _\imath , \end{aligned}$$

(6.29)

where $\psi _\imath =\nabla \nu _\imath \big (f+\mathcal {g}_1\mathcal {u}_1(\breve{x}_\ell )+\mathcal {g}_2\mathcal {u}_2(\breve{x}_\ell )\big )-\varrho _\imath \nu _\imath $.

In order to minimize $\epsilon _\imath $ in (6.29), we set the target of minimization as $E=E_1+E_2=1/2\epsilon _1^2+1/2\epsilon _2^2$. Via applying gradient descent approach, we obtain

$$\begin{aligned} \dot{\hat{\omega }}_\imath =-\gamma _\imath \frac{1}{(\psi _\imath ^{T}\psi _\imath +1)^2}\frac{\partial E}{\partial \hat{\omega }_\imath } =-\gamma _\imath \frac{\psi _\imath }{(\psi _\imath ^{T}\psi _\imath +1)^2}\epsilon _\imath =-\gamma _\imath \breve{\psi }_\imath \epsilon _\imath , \end{aligned}$$

(6.30)

where $\gamma _\imath $ is the adjustable parameter and $\breve{\psi }_\imath =\psi _\imath /(\psi _\imath ^{T}\psi _\imath +1)^2$. Define $\tilde{\omega }_\imath =\omega _\imath ^{*}-\hat{\omega }_\imath $. From (6.30), we derive that

$$\begin{aligned} \dot{\tilde{\omega }}_\imath =-\gamma _\imath \bar{\psi }_\imath \bar{\psi }_\imath ^{T} \tilde{\omega }_\imath +\gamma _\imath \breve{\psi }_\imath e_\imath , \end{aligned}$$

(6.31)

where $\bar{\psi }_\imath =\psi _\imath /(\psi _\imath ^{T}\psi _\imath +1)$ and the approximated residual error $e_\imath =-\nabla \sigma _\imath ^{T}(f+\mathcal {g}_1\breve{\mathcal {u}}_1+\mathcal {g}_2\breve{\mathcal {u}}_2)+\varrho _\imath \sigma _\imath $. For proceeding further, the following assumptions are required [11, 26, 27].

Assumption 6.2

For any $\imath \in \{1,2\}$, the signal $\bar{\psi }_\imath $ is persistently excited on the time interval $[t,t+T]$. That is, there exists the positive constant $b_{\psi \imath }$ such that

$$\begin{aligned} b_{\psi \imath } I_{N_{c\imath }\times N_{c\imath }}\le \int _t^{t+T}\bar{\psi _\imath }\bar{\psi }_\imath ^{T}d\varsigma , \end{aligned}$$

(6.32)

with $N_{c\imath }$ being the neuron number of the $\imath $th critic network.

Assumption 6.3

For $\imath \in \{1,2\}$, there exist positive constants such that $\Vert \omega _\imath ^{*}\Vert \le b_{\omega \imath }$, $\Vert \nabla \nu _\imath \Vert \le b_{\nu \imath }$, $\Vert \nabla \sigma _\imath \Vert \le b_{\sigma \imath }$ and $\Vert e_\imath \Vert \le b_{e\imath }$.

6.4 Stability Analysis

In this section, the asymptotic stability of the controlled system is analyzed by applying Lyapunov theory. Before presenting the main results, the boundedness of critic weight is discussed in the following lemma.

Lemma 6.4

For any $\imath \in \{1,2\}$, suppose that Assumptions 6.2–6.3 hold and the initial weight is finite. If the critic tuning law (6.30) is applied, then it holds that $\tilde{\omega }_\imath $ is locally ultimately bounded.

Proof

Consider the Lyapunov function as $L_{y\omega }$. It’s noted that the derivative of $\tilde{\omega }_\imath $ is flow dynamics, which indicates that there doesn’t exist any jumps in the values of $\tilde{\omega }_\imath $. More specially, $\tilde{\omega }_\imath $ is continuous at the regulating instant. Thus we only need to consider the time interval between two adjoining regulating instants.

According to Assumptions 6.2–6.3, it can be derived that

$$\begin{aligned} \dot{L}_{y\omega }=&\,2\gamma _1\tilde{\omega }_1^T\dot{\tilde{\omega }}_1+2\gamma _2\tilde{\omega }_2^T\dot{\tilde{\omega }}_2 \nonumber \\ =&\,2\gamma _1(-\tilde{\omega }_1 \bar{\psi }_1 \bar{\psi }_1^{T} \tilde{\omega }_1+\tilde{\omega }_1 \breve{\psi }_1 e_1) \nonumber \\&+\,2\gamma _2(-\tilde{\omega }_2 \bar{\psi }_2 \bar{\psi }_2^{T} \tilde{\omega }_2+\tilde{\omega }_2 \breve{\psi }_2 e_2). \end{aligned}$$

(6.33)

By applying Young’s inequation, we can get

$$\begin{aligned} \dot{L}_{y\omega }\le&-\gamma _1(\tilde{\omega }_1 \bar{\psi }_1 \bar{\psi }_1^{T} \tilde{\omega }_1-e_1^{T} e_1) \nonumber \\&-\gamma _2(\tilde{\omega }_2 \bar{\psi }_2 \bar{\psi }_2^{T} \tilde{\omega }_2-e_2^{T} e_2) \nonumber \\ \le&-\gamma _1 b_{\psi 1}\Vert \tilde{\omega }_1\Vert ^2-\gamma _2 b_{\psi 2}\Vert \tilde{\omega }_2\Vert ^2+\varGamma _1, \end{aligned}$$

(6.34)

where $\varGamma _1=\gamma _1 b_{e1}^2+\gamma _2 b_{e2}^2$. Furthermore, when $\Vert \tilde{\omega }_1\Vert >\sqrt{\frac{\varGamma _1}{\gamma _1 b_{\psi 1}}}\triangleq b_{\tilde{\omega }_1}$ or $\Vert \tilde{\omega }_2\Vert >\sqrt{\frac{\varGamma _1}{\gamma _2 b_{\psi 2}}}\triangleq b_{\tilde{\omega }_2}$, it yields that $\dot{L}_{y\omega }<0$. The lemma is proved. $\blacksquare $

Theorem 6.4

Consider the system (6.8) with strategy formulated as (6.28). Suppose that Assumptions 6.1–6.3 hold. The tuning law for critic network is given by (6.30). Then the state x and weight estimation error $\tilde{\omega }_\imath $ are UUB provided that the indication is applied

$$\begin{aligned} \Vert z_\ell \Vert ^2\le \frac{(1-\varpi _1^2)\lambda _m(\varUpsilon )}{(1+\varpi _2)\theta \lambda _M(Y)}\Vert x\Vert ^2\triangleq \Vert z_e\Vert ^2, \end{aligned}$$

(6.35)

with $\varpi _1$ and $\varpi _2$ being the adjustable parameters.

Proof

Select the Lyapunov function candidate as

$$\begin{aligned} L_Y=&\sum _{\imath =1}^{N=2}V_\imath ^{*}(\breve{x}_\ell )+\sum _{\imath =1}^{N=2}V_\imath ^{*}(x) +\frac{1}{2}\sum _{\imath =1}^{N=2}\tilde{\omega }_\imath ^{T}\tilde{\omega }_\imath \nonumber \\ =&L_{Ya}+L_{Yb}+L_{Yc}. \end{aligned}$$

(6.36)

Due to the utilization of MDRM, we present the proof process in two cases.

Case I: No regulation occurs, i.e., $t\in [\hbar _\ell ,\hbar _{\ell +1})$. Then we obtain $\dot{L}_{Ya}=0$. The derivative of $L_{Yb}$ can be obtained as

$$\begin{aligned} \dot{L}_{Yb}=\sum _{\imath =1}^{N=2}(\nabla V_\imath ^{*})^{T}(f+\mathcal {g}_1\breve{\mathcal {u}}_1 +\mathcal {g}_2\breve{\mathcal {u}}_2). \end{aligned}$$

(6.37)

Let $\breve{\mathcal {u}}=[(\breve{\mathcal {u}}_1-\mathcal {u}_1^{*})^{T},(\breve{\mathcal {u}}_2-\mathcal {u}_2^{*})^{T}]^{T}$. Applying the operations similar to that in Theorem 6.3, we have

$$\begin{aligned} \dot{L}_{Yb}\le&-x^{T}\varUpsilon x+\breve{\mathcal {u}}^{T}Y\breve{\mathcal {u}}+\varrho _V \nonumber \\ \le&-x^{T}\varUpsilon x+\varrho _V+\lambda _M(Y)\Vert \mathcal {u}_1^{*}-\breve{\mathcal {u}}_1^{*} +\breve{\mathcal {u}}_1^{*}-\breve{\mathcal {u}}_1\Vert ^2 \nonumber \\&+\lambda _M(Y)\Vert \mathcal {u}_2^{*}-\breve{\mathcal {u}}_2^{*}+\breve{\mathcal {u}}_2^{*}-\breve{\mathcal {u}}_2\Vert ^2 \nonumber \\ \le&-x^{T}\varUpsilon x+\varrho _V+\lambda _M(Y)(1+1/\varpi _2)\Vert \breve{\mathcal {u}}_1^{*}-\breve{\mathcal {u}}_1\Vert ^2 \nonumber \\&+\lambda _M(Y)(1+\varpi _2)\Vert \mathcal {u}_1^{*}-\breve{\mathcal {u}}_1^{*}\Vert ^2 \nonumber \\&+\lambda _M(Y)(1+1/\varpi _2)\Vert \breve{\mathcal {u}}_2^{*}-\breve{\mathcal {u}}_2\Vert ^2 \nonumber \\&+\lambda _M(Y)(1+\varpi _2)\Vert \mathcal {u}_2^{*}-\breve{\mathcal {u}}_2^{*}\Vert ^2. \end{aligned}$$

(6.38)

Recall that $\theta =\theta _1+\theta _2$, and substitute (6.27) and (6.28) into (6.38). Then we can derive

$$\begin{aligned} \dot{L}_{Yb}\le&-x^{T}\varUpsilon x+(1+\varpi _2)\theta \lambda _M(Y)\Vert x-\breve{x}_\ell \Vert ^2 \nonumber \\&+\varrho _V+\varGamma _2, \end{aligned}$$

(6.39)

where $\varGamma _2=\frac{1}{4}\lambda _M(Y)(1+1/\varpi _2)^2\big (\Vert \mathcal {R}_{11}^{-1}\Vert ^2 b_{g1}^2 b_{\nu 1}^2 b_{\tilde{\omega }1}^2+\Vert \mathcal {R}_{22}^{-1}\Vert ^2 b_{g2}^2 b_{\nu 2}^2 b_{\tilde{\omega }2}^2\big )+\frac{1}{4\varpi _2}\lambda _M(Y)(1+\varpi _2)^2\big (\Vert \mathcal {R}_{11}^{-1}\Vert ^2 b_{g1}^2 b_{\sigma 1}^2+\Vert \mathcal {R}_{22}^{-1}\Vert ^2 b_{g2}^2 b_{\sigma 2}^2\big )$ with $b_{g1}$ and $b_{g2}$ denoting the bounds of known $\mathcal {g}_1$ and $\mathcal {g}_2$.

According to Assumption 6.2 and Assumption 6.3, we derive that

$$\begin{aligned} \dot{L}_{Yc}\le -\gamma _1 b_{\psi 1}\Vert \tilde{\omega }_1\Vert ^2-\gamma _2 b_{\psi 2}\Vert \tilde{\omega }_2\Vert ^2+\varGamma _1. \end{aligned}$$

(6.40)

Based on (6.39) and (6.40), we can obtain

$$\begin{aligned} \dot{L}_Y\le&-(1-\varpi _1^2)\lambda _m(\varUpsilon )\Vert x\Vert ^2-\varpi _1^2\lambda _m(\varUpsilon )\Vert x\Vert ^2 \nonumber \\&+(1+\varpi _2)\lambda _M(Y)\theta \Vert x-\breve{x}_\ell \Vert ^2-\gamma _1 b_{\psi 1}\Vert \tilde{\omega }_1\Vert ^2 \nonumber \\&-\gamma _2 b_{\psi 2}\Vert \tilde{\omega }_2\Vert ^2+\pounds , \end{aligned}$$

(6.41)

where $\pounds =\varGamma _1+\varGamma _2+\varrho _V$. Applying the indication (6.35), then we conclude that $\dot{L}_Y<0$ when one of the conditions hold that

$$\begin{aligned} \Vert x\Vert >\frac{1}{\varpi _1}\sqrt{\frac{\pounds }{\lambda _m(\varUpsilon )}}\triangleq \beta _x, \end{aligned}$$

(6.42)

$$\begin{aligned} \Vert \tilde{\omega }_\imath \Vert >\sqrt{\frac{\pounds }{\gamma _\imath b_{\psi \imath }}}\triangleq \beta _{\tilde{\omega }\imath },\imath =1,2. \end{aligned}$$

(6.43)

Thus x and $\tilde{\omega }_\imath $ can be guaranteed to be UUB.

Case II: A regulation occurs, that is, $t=\hbar _{\ell +1}$. The difference of $L_Y$ can be given by

$$\begin{aligned} \triangle L_Y=\triangle L_{Ya}+\triangle L_{Yb}+\triangle L_{Yc}, \end{aligned}$$

(6.44)

where the terms are defined by $\triangle L_{Ya}=V_1^{*}(\breve{x}_{\ell +1})-V_1^{*}(\breve{x}_{\ell })+V_2^{*}(\breve{x}_{\ell +1}) -V_2^{*}(\breve{x}_{\ell })$, $\triangle L_{Yb}=V_1^{*}(x(\hbar _{\ell +1}))-V_1^{*}(x(\hbar _{\ell +1}^-))+V_2^{*}(x(\hbar _{\ell +1}))-V_2^{*}(x(\hbar _{\ell +1}^-))$, $\triangle L_{Yc}=1/2\tilde{\omega }_1^{T}(\hbar _{\ell +1})\tilde{\omega }_1(\hbar _{\ell +1}) -1/2\tilde{\omega }_1^{T}(\hbar _{\ell +1}^-)\tilde{\omega }_1(\hbar _{\ell +1}^-)+1/2\tilde{\omega }_2^{T}(\hbar _{\ell +1})\tilde{\omega }_2(\hbar _{\ell +1}) -1/2\tilde{\omega }_2^{T}(\hbar _{\ell +1}^-)\tilde{\omega }_2(\hbar _{\ell +1}^-)$. Recalling the analysis in Case I, we obtain that $\dot{L}_Y<0$ when x or $\tilde{\omega }_\imath $ is out of the corresponding bound. Furthermore, we can derive that $L_{Yb}+L_{Yc}$ is monotonically decreasing when $t\in [\hbar _\ell ,\hbar _{\ell +1})$. In light of the properties of limits, we have

$$\begin{aligned} 0\le&V_\imath ^{*}(x(\hbar _{\ell +1}^-))+\frac{1}{2}\tilde{\omega }_\imath ^{T}(\hbar _{\ell +1}^-)\tilde{\omega }_\imath (\hbar _{\ell +1}^-) \nonumber \\&-V_\imath ^{*}(x(\hbar _{\ell +1}))-\frac{1}{2}\tilde{\omega }_\imath ^{T}(\hbar _{\ell +1})\tilde{\omega }_\imath (\hbar _{\ell +1}). \end{aligned}$$

(6.45)

As x is proved to be UUB, we can obtain

$$\begin{aligned} V_\imath ^{*}(\breve{x}_{\ell +1})\le V_\imath ^{*}(\breve{x}_\ell ). \end{aligned}$$

(6.46)

According to (6.45) and (6.46), we can derive $\triangle L_Y<0$, which indicates that the selected Lyapunov (6.36) is monotonically decreasing when $t=\hbar _{\ell +1}$. This completes the proof. $\blacksquare $

Remark 6.5

$\varpi _1$ and $\varpi _2$ in (6.35) are the adjustable parameters which determine the frequency of medicine dosage regulation. A larger $\varpi _1$ or $\varpi _2$ leads to a higher regulation frequency, and a smaller parameter implies a lower adjustment frequency. Thus we can determine these parameters according to the clinical data.

Remark 6.6

In thischapter, the approximate optimal combination therapeutic strategy is derived via ADP method to inhibit the proliferation of tumor cells under the mechanism of medicine dosage regulation. The MDRM is constructed on the foundation of the above-mentioned medicine indication (6.35). The data at the dosage-regulating instants should be recorded and will be utilized as reference data in the future. When the difference between the current clinical data and latest reference data is larger than the threshold, the medicine dosage can be regulated. Therefore, this mechanism can guarantee the derived therapeutic strategy to be regulated timely and necessarily.

6.5 Simulation Study

In this section, the mathematical model (6.7) is considered which presents the relations between cells and drugs. For simplicity, we have constructed the rephrased system (6.8) of which the control issue could be deemed as NZSGs.

In light of the clinical medical statistics and literature [38], the parameters on cells and drugs for model (6.7) are given in Table 6.1 and Table 6.2, respectively. For the discounted value function (6.9) of system (6.8), the corresponding parameters are set as $\mathcal {R}_{11}=0.8I_{2\times 2}$, $\mathcal {R}_{12}=15I_{2\times 2}$, $\mathcal {R}_{21}=5I_{2\times 2}$, $\mathcal {R}_{22}=I_{2\times 2}$, $\varUpsilon _1=0.02I_{6\times 6}$ and $\varUpsilon _2=0.06I_{6\times 6}$. In addition, the discounted factors $\varrho _1=\varrho _2=0.2$.

Table 6.1 Parameter specifications of the cells

Full size table

Table 6.2 Parameter specifications of the drugs

Full size table

For the critic NNs, the activation functions are both set as $[x_1^2$, $x_1 x_2$, $x_1 x_3$, $x_1 x_4$, $x_1 x_5$, $x_1 x_6$, $x_2^2$, $x_2 x_3$, $x_2 x_4$, $x_2 x_5$, $x_2 x_6$, $x_3^2$, $x_3 x_4$, $x_3 x_5$, $x_3 x_6$, $x_4^2$, $x_4 x_5$, $x_4 x_6$, $x_5^2$, $x_5 x_6$, $x_6^2]^{T}$, and the learning laws are set by $\gamma _1=1.5$ and $\gamma _2=2$. Besides, the parameters $\theta =8$, $\varpi _1=0.8$ and $\varpi _2=8$.

The evolution curves of the model (6.7) are depicted in Fig. 6.1. From Fig. 6.1 we can observe that when $t=200d$, the population of tumor cells reduces to zero, and when $t=600d$, the population of normal cells almost returns to 1 and that of endothelial cells drops down to a small steady value. This indicates that the proliferation of tumor cells can be suppressed after 600 days under the optimal therapy strategy. In Figs. 6.1, 6.2, 6.3, 6.4, 6.5, 6.6 and 6.7, we compare the medicine dosages of the derived therapy strategy and that of initial therapy strategy. It indicates that the medicine dosages of our near-optimal therapy strategy are significantly less than the dosages of initial strategy. It’s of great practical significance since superfluous drugs may well affect the health of patients and impose additional financial burdens on patients. Besides, one can find that when the clinical data becomes better, the regulation frequency of the derived therapy strategy becomes lower. This implies that the therapy strategy based on medicine dosage regulation mechanism can be regulated with the indications for medicine timely and necessarily. Figures 6.5, 6.6 and 6.7 present the curves of the cells under different therapy strategies, that is, chemotherapy drug 1, chemotherapy drug 2, anti-angiogenic drug and the therapy comprised of these three drugs. We can conclude from Figs. 6.5, 6.6 and 6.7 that the therapeutic effect of the derived therapy is the best. Thus simulation results validate the effectiveness of our therapy strategy

6.6 Conclusion

In this chapter, an ADP-based method using medicine dosage regulation mechanism has been proposed to obtain the optimal combination therapy for curing cancer. A mathematical model is employed to describe the interactions among the normal cells, tumor cells, endothelial cells, chemotherapy drugs and anti-angiogenic drug. The mathematical model provides the foundation for us to solve the optimization issue under the architecture of NZSGs. The ADP method of single-critic framework is proposed to approximately seek the optimal strategy. In addition, the introduction of the medicine dosage adjustment mechanism guarantees the therapy strategy to be adjusted timely and necessary. Finally, the theory analysis and simulation results both indicate that the designed strategy can effectively decrease the population of tumor cells and endothelial cells with very few medicine dosage, which verifies the availability of the proposed method. Our future research direction is to seek the optimal strategy for decreasing tumor cells or other harmful cells with latest therapies, for example, the therapy applying oncolytic virus.

References

Sharma S, Samanta GP (2016) Analysis of the dynamics of a tumor-immune system with chemotherapy and immunotherapy and quadratic optimal control. Differ Equ Dyn Syst 24(2):149–171
Article MathSciNet MATH Google Scholar
Evans CM (1991) The metastatic cell, behaviour and biochemistry. Chapman and Hall, London
Google Scholar
Sherbet GV (1982) The biology of tumour malignancy. Academic, London
Google Scholar
de Pillis LG, Gu W, Radunskaya AE (2006) Mixed immunotherapy and chemotherapy of tumors: modeling, applications and biological interpretations. J Theor Biol 238(4):841–862
Article MathSciNet MATH Google Scholar
Bikfalvi A (1995) Significance of angiogenesis in tumour progression and metastass. Eur J Cancer 31(7–8):1101–1104
Article Google Scholar
Beecken WC, Fernandes A, Joussen AM,..., Shing Y (2001) Effect of antiangiogenic therapy on slowly growing, poorly vascularized tumors in mice. J Natl Cancer Inst 93(5):382–387
Google Scholar
Kerbel RS, Bertolini F, Man S, Hicklin DA, Emmenegger U, Shaked Y (2006) Antiangiogenic drugs as broadly effective chemosensitizing agents. Angiogenesis, pp 195–212
Google Scholar
Harmon ME, Baird LC, Klopf AH (1995) Reinforcement learning applied to a differential game. Adapt Behav 4(1):3–28
Article Google Scholar
Littman ML (2015) Reinforcement learning improves behaviour from evaluative feedback. Nature 521(7553):445–451
Article Google Scholar
Li T, Yang D, Xie X, Zhang H (2022) Event-triggered control of nonlinear discrete-time system with unknown dynamics based on HDP($\lambda $). IEEE Trans Cybernet 52(7):6046–6058
Article Google Scholar
Vamvoudakis KG, Lewis FL (2010) Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5):878–888
Article MathSciNet MATH Google Scholar
Yang X, He H (2021) Decentralized event-triggered control for a class of nonlinear-interconnected systems using reinforcement learning. IEEE Trans Cybernet 51(2):635–648
Article Google Scholar
Liu Y, Yao D, Li H, Lu R (2022) Distributed cooperative compound tracking control for a platoon of vehicles with adaptive NN. IEEE Trans Cybernet 52(7):7039–7048
Article Google Scholar
Tan G, Wang Z, Shi Z (2023) Proportional-integral state estimator for quaternion-valued neural networks with time-varying delays. IEEE Trans Neural Netw Learn Syst 34(2):1074–1079
Article MathSciNet Google Scholar
Liu D, Yang X, Wang D, Wei Q (2015) Reinforcement-learning-based robust controller design for continuous-time uncertain nonlinear systems subject to input constraints. IEEE Trans Cybernet 45(7):1372–1385
Article Google Scholar
Wang D, Liu D, Li H, Luo B, Ma H (2016) An approximate optimal control approach for robust stabilization of a class of discrete-time nonlinear systems with uncertainties. IEEE Trans Syst Man Cybernet: Syst 6(5):713–717
Article Google Scholar
Zhang H, Cai Y, Wang Y, Su H (2020) Adaptive bipartite event-triggered output consensus of heterogeneous linear multiagent systems under fixed and switching topologies. IEEE Trans Neural Netw Learn Syst 31(1):4816–4830
Article MathSciNet Google Scholar
Wei Q, Liu D, Lewis FL (2015) Optimal distributed synchronization control for continuous-time heterogeneous multi-agent differential graphical games. Inf Sci 317:96–113
Article MATH Google Scholar
Zhang J, Zhang H, Feng T (2018) Distributed optimal consensus control for nonlinear multiagent system with unknown dynamic. IEEE Trans Neural Netw Learn Syst 29(8):3339–3348
Article MathSciNet Google Scholar
Kamalapurkar R, Dinh H, Bhasin S, Dixon WE (2015) Approximate optimal trajectory tracking for continuous-time nonlinear systems. Automatica 51:40–48
Article MathSciNet MATH Google Scholar
Gao W, Jiang Z (2018) Learning-based adaptive optimal tracking control of strict-feedback nonlinear systems. IEEE Trans Neural Netw Learn Syst 29(6):2614–2624
Article MathSciNet Google Scholar
Starr AW, Ho YC (1969) Nonzero-sum differential games. J Optim Theory Appl 3(3):184–206
Article MathSciNet MATH Google Scholar
Vamvoudakis KG, Lewis FL (2011) Online adaptive learning solution of coupled Hamilton-Jacobi equations for multi-player non-zero-sum games. Automatica 47(8):1556–1569
Article MathSciNet MATH Google Scholar
Zhu Y, Zhao D, Li X (2017) Iterative adaptive dynamic programming for solving unknown nonlinear zero-sum game based on online data. IEEE Trans Neural Netw Learn Syst 28(3):714–725
Article MathSciNet Google Scholar
Song R, Li J, Lewis FL (2020) Robust optimal control for disturbed nonlinear zero-sum differential games based on single NN and least squares. IEEE Trans Syst Man Cybernet: Syst 50(11):4009–4019
Article Google Scholar
Zhang H, Cui L, Luo Y (2013) Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Trans Cybernet 43(1):206–216
Article Google Scholar
Zhao D, Zhang Q, Wang D, Zhu Y (2016) Experience replay for optimal control of nonzero-sum game systems with unknown dynamics. IEEE Trans Cybernet 46(3):854–865
Article Google Scholar
Zhang Q, Zhao D (2019) Data-based reinforcement learning for nonzero-sum games with unknown drift dynamics. IEEE Trans Cybernet 49(8):2874–2885
Article Google Scholar
Song R, Wei Q, Zhang H, Lewis FL (2021) Discrete-time non-zero-sum games with completely unknown dynamics. IEEE Trans Cybernet 51(6):2929–2943
Article Google Scholar
Zhang H, Qin C, Jiang B, Luo Y (2014) Online adaptive policy learning algorithm for $H_{\infty }$ state feedback control of unknown affine nonlinear discrete-time systems. IEEE Trans Cybernet 44(12):2706–2718
Article Google Scholar
Chen L, Zhu Y, Ahn CK (2023) Adaptive neural network-based observer design for switched systems with quantized measurements. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2021.3131412
Zhang H, Su H, Zhang K, Luo Y (2019) Event-triggered adaptive dynamic programming for non-zero-sum games of unknown nonlinear systems via generalized fuzzy hyperbolic models. IEEE Trans Fuzzy Syst 27(11):2202–2214
Article Google Scholar
Liu X, Ge SS, Zhao F, Mei X (2021) Optimized impedance adaptation of robot manipulator interacting with unknown environment. IEEE Trans Control Syst Technol 29(1):411–419
Article Google Scholar
Massenio PR, Naso D, Lewis FL, Davoudi A (2020) Assistive power buffer control via adaptive dynamic programming. IEEE Trans Energy Convers 35(3):1534–1546
Article Google Scholar
Ghasempour T, Nicholson GL, Kirkwood D, Fujiyama T, Heydecker B (2020) Distributed approximate dynamic control for traffic management of busy railway networks. IEEE Trans Intell Transp Syst 21(9):3788–3798
Article Google Scholar
Wei Q, Liao Z, Shi G (2021) Generalized actor-critic learning optimal control in smart home energy management. IEEE Trans Ind Inf 17(10):6614–6623
Article Google Scholar
Zhao J, Wang T, Pedrycz W, Wang W (2021) Granular prediction and dynamic scheduling based on adaptive dynamic programming for the blast furnace gas system. IEEE Trans Cybernet 51(4):2201–2214
Article Google Scholar
Davari M, Gao W, Jiang ZP, Lewis FL (2021) An optimal primary frequency control based on adaptive dynamic programming for islanded modernized microgrids. IEEE Trans Autom Sci Eng 18(3):1109–1121
Article Google Scholar
Pinho STRD, Bacelar FS, Andrade RFS, Freedman HI (2013) A mathematical model for the effect of anti-angiogenic therapy in the treatment of cancer tumours by chemotherapy. Nonlinear Anal Real World Appl 14(1):815–828
Article MathSciNet MATH Google Scholar
Liu D, Li H, Wang D (2014) Online synchronous approximate optimal learning algorithm for multiplayer nonzero-sum games with unknown dynamics. IEEE Trans Syst Man Cybernet: Syst 44(8):1015–1027
Article Google Scholar
Yang X, Wei Q (2021) Adaptive critic learning for constrained optimal event-triggered control with discounted cost. IEEE Trans Neural Netw Learn Syst 32(1):91–104
Article Google Scholar

Download references

Author information

Authors and Affiliations

The State Key Laboratory of Synthetical Automation for Process Industries and the College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning, China
Jiayue Sun & Huaguang Zhang
The Department of Thoracic Surgery, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning, China
Shun Xu & Yang Liu

Authors

Jiayue Sun
View author publications
You can also search for this author in PubMed Google Scholar
Shun Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Huaguang Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiayue Sun .

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sun, J., Xu, S., Liu, Y., Zhang, H. (2024). Combination Therapy-Based Adaptive Control for Organism Using Medicine Dosage Regulation Mechanism. In: Adaptive Dynamic Programming. Springer, Singapore. https://doi.org/10.1007/978-981-99-5929-7_6

Download citation

DOI: https://doi.org/10.1007/978-981-99-5929-7_6
Published: 13 September 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-5928-0
Online ISBN: 978-981-99-5929-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Combination Therapy-Based Adaptive Control for Organism Using Medicine Dosage Regulation Mechanism

Abstract

6.1 Introduction

6.2 Preliminaries

6.2.1 Establishment of Mathematical Model

Remark 6.1

6.2.2 Nonzero-Sum Games Formulation

Remark 6.2

6.3 MDRM-Based Adaptive Critic Learning Method for NZSGs

6.3.1 MDRM-Based Optimal Strategy Derivation

Assumption 6.1

Theorem 6.3

Proof

6.3.2 Implementation of Adaptive Critic Learning Method

Assumption 6.2

Assumption 6.3

6.4 Stability Analysis

Lemma 6.4

Proof

Theorem 6.4

Proof

Remark 6.5

Remark 6.6

6.5 Simulation Study

6.6 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation