Abstract
In this chapter, the optimal control strategy for organism is investigated by using adaptive dynamic programming (ADP) method under the architecture of nonzero-sum games (NZSGs). Firstly, a tumor model is established to formulate the interaction relationships among normal cells, tumor cells, endothelial cells and the concentrations of drugs. Then, the ADP-based method of single-critic network architecture is proposed to approximate the coupled Hamilton-Jacobi equations (HJEs) under the medicine dosage regulation mechanism (MDRM). According to game theory, the approximate MDRM-based optimal strategy can be derived, which is of great practical significance. Owing to the proposed mechanism, the dosages of the chemotherapy and anti-angiogenic drugs can be regulated timely and necessarily. Furthermore, the stability of the closed-loop system with the obtained strategy is analyzed via Lyapunov theory. Finally, a simulation experiment is conducted to verify the effectiveness of the proposed method.
You have full access to this open access chapter, Download chapter PDF
6.1 Introduction
The death toll is soaring caused by neoplastic diseases, and the issues on nonlinear dynamics and control of tumour growth have motivated a widespread concern as [1]. Essential nutrients in humans are the resources for which the normal cells and tumor cells compete. Tumour cells will keep proliferating, robbing the limited energy supply of the body, and eventually disintegrating the somatic function to death. Somatic cells constantly divide, and new cells differentiate which end with apoptosis. In this manner, the relative balance can be maintained in human bodies. Nevertheless, when the process of differentiation for normal cells is out of control, the cells may well evolve into tumor cells. It is the nature for the tumor cells of which the tendency is to eat the bodyās nutrients crazily.
The population of tumour cells progressively increases for the following three characteristics. Firstly, the most obvious characteristic is the insensitivity to anti-growth signals. There exists strict control mechanism for normal cells, but for tumor cells, this mechanism is no longer valid. During the continuous process of division, tumor cells can escape from the monitoring of the anti-growth signals, which leads to the crazy growth of tumour cells. Secondly, tumour cells have the ability to promote the growth of blood vessels which are essential for providing nutrients, and it is the reason why the blood vessel density is associated with the malignant degree of the tumor tissue. Finally, tumor cells are also duplicitous, evolving camouflage abilities during their constant battle with immune cells to mislead the immune system into regarding them as normal cells, which results in the tumor immune escape. Thus, to suppress the growth of tumour cells, obstructing the generative mechanisms which relies on the necessary nutrients was an effective approach as literatures [2, 3].
Distinguishing from the mixed tumor treatment approach of immunotherapy and chemotherapy as [4], this chapter explores a more effective adaptive control strategy for organism using medicine dosage regulation mechanism. An additional population of cells which called endothelial cells enjoy the substances induced by malignant tumour cells, and they could transfer oxygen and nutrients to the primary focus causing proliferating of blood cells, which will increase carrying capacity of tumour cells known as tumour angiogenesis in [5]. As indicated in literature [6], anti-angiogenic agent could particularly decrease the growing rate of tumours, reaching saturation to some extent without killing the endothelial cells completely. When the chemotherapy agent was used in combination with anti-angiogenic agent to reduce the population of tumor cells , the latter could increase the effect of the former as described in [7]. Nevertheless, as the key element of promoting the growth of the vasculature, the endothelial cells could not be completely destroyed. Otherwise, it may not exist that the specified number of vasculature for constructing access of chemotherapy agent. On the basis of the pharmaceutical science concerning the chemotherapy agent and anti-angiogenic agent, the adaptive control strategy for organism will provide a guidance for clinical practice under the medicine dosage regulation mechanism, especially in the treatment process of Lung cancer. Furthermore, what counts is that since the anti-tumor drugs often kill both tumor cells and normal cells, itās of significance to utilize less drugs to achieve the therapeutic goal during the treatment process.
ADP is derived from dynamic programming and reinforcement learning, is a powerful tool to tackle optimization issues [8,9,10]. In general, the successful implementation of ADP-based methods depends on the cooperative work of actor and critic networks [11]. Under this framework, the actor is responsible for performing the control strategy with current data [12]. The goal of critic is to provide actor with the feedback information derived from the evaluation of the cost under the strategy. The distinct merit of this type of algorithm lies in that the optimal control strategy could be approximately acquired in the manner of iteration computation, and the ācurse of dimensionalityā could be obviated with effect. Different ADP-based methods have been researched by scholars to tackle multifarious optimal control problems with the aid of the artificial neural networks of which the performance is outstanding [13, 14], such as the robust control [15, 16], optimal consensus control [17,18,19] and the optimal tracking issues [20, 21]. Furthermore, for the system with multiple controllers, the optimization issues can be formulated by game theory. As a vital branch of game theory, NZSGs is derived from [22] with the goal of attaining the optimal strategy pair that can minimize the personal performance index for each player when stabilizing the controlled system [23,24,25]. Due to the excellent ability to approximate optimization, the ADP methods have been proposed to solve NZSGs. In [26], the adaptive method of critic-only structure was developed to solve two-player NZSGs without any initial stabilizing control. The experience replay technique was integrated into the ADP algorithm in [27] to concurrently utilize the historical data together with the real-time data to approximate the value function such that the persistence of excitation condition was not indispensable. In [28], the data-based integral reinforcement learning algorithm was proposed to solve NZSGs. More specially, it was a novel iterative learning algorithm based on both off-line and online manner which could extend the applicability of the data-based control scheme. Furthermore, in [29], the discrete-time N-player NZSGs was tackled via the off-policy reinforcement learning method which was independent of system dynamics.
Although the relevant academic achievements have been presented in theories and applications as [30,31,32,33,34,35,36,37,38], there is seldom any literature on this filed according to literature survey of the authors. The contributions can be shown as follows. First, the near-optimal therapy for the treatment of tumor is firstly acquired via the ADP approach which is an efficient adaptive intelligent learning algorithm. Second, the interactive system with discounted value function is constructed based on the mathematical model simulating the interaction relationships among cells and drugs. Besides, two kinds of chemotherapy drugs and a kind of anti-angiogenic agent participate in the therapy such that the combination therapeutic strategy can be derived under the architecture of NZSGs. Third, the idea of cybernetics is extended to the frontier fields of medicine, more precisely, the therapy of tumor. Under the MDRM, the derived therapeutic strategy can achieve the therapeutic goal with the lowest doses of drugs, and the practical indications for medicine is considered for the first time.
Notations: \(\mathbb {N}^{+}\) denotes the set containing all positive integers. \(\parallel \cdot \parallel \), \(diag\{\cdot \}\) and \(\bigtriangledown (\cdot )\triangleq \partial (\cdot )/\partial x\) respectively represent the Euclidean norm of a vector/matrix, the operation of constructing diagonal matrix and the gradient operator. \(\lambda _m(\cdot )\) and \(\lambda _M(\cdot )\) separately denote the minimum eigenvalue and maximum eigenvalue of a matrix. \(I_{n\times n}\) is the unit matrix whose dimension is n.
6.2 Preliminaries
6.2.1 Establishment ofĀ Mathematical Model
In this section, the growth mathematical model is established which considers the interaction relationships among the normal cells, tumor cells and endothelial cells. Moreover, the effects of control inputs, i.e., the chemotherapy and anti-angiogenic drugs, on these cells are embodied in the model. Thus, in the model formed from ordinary differential equations as follows, \(P_{NC}(t)\), \(P_{TC}(t)\) and \(P_{EC}(t)\) respectively represent the populations of normal cells, tumor cells and endothelial cells, \(P_{CD\jmath }(t) (\jmath =1,2)\) and \(P_{AD}(t)\) denote the concentrations of chemotherapy and anti-angiogenic drugs.
The population of normal cells, which is influenced by tumor cells, endothelial cells and the concentrations of chemotherapy and anti-angiogenic drugs, is modeled by
where \(\varXi _\imath \big (P_{EC}(t),P_{AD}(t)\big )=\varXi _{\imath 1}P_{EC}(t)+\varXi _{\imath 2}P_{AD}(t)+\varXi _{\imath 0}, \imath =1,2\). The parameters \(\alpha _1\), \(B_1\), \(C_1\) denote the proliferation rate, Holling type 2 constant and carrying capacity for normal cells, respectively. \(A_1\) is the contention parameter between normal cells and tumor cells.
As the tumor cells contend with normal cells for necessary nutrients, the population of tumor cells is affected by that of normal cells. Besides, there exist mutual effects among tumor cells, endothelial cells and the drugs. Thus the corresponding model can be written as
where \(\varPi _\jmath \big (P_{EC}(t),P_{AD}(t)\big )=\varPi _{\jmath 1}P_{EC}(t)+\varPi _{\jmath 2}P_{AD}(t)+\varPi _{\jmath 0}\), \(\jmath =1,2\). The parameters \(\alpha _2\), \(B_2\), \(C_2\) are multiplication rate, Holling type 2 constant and carrying capacity for tumor cells. \(A_2\) is contention parameter between normal cells and tumor cells.
The population of endothelial cells is associated with tumor cells and anti-angiogenic drugs. The relations can be given as
where K is multiplication rate caused by tumor cells and \(s_1\) the inflow rate. Similarly, the parameters \(\alpha _3\), \(B_3\), \(C_3\) are multiplication rate, Holling type 2 constant and carrying capacity for endothelial cells. \(\varXi _3\) is the killing rate for endothelial cells.
The concentrations of the drugs decrease during the treatment phases, owing to the washout process. Hence we can model the evolution process of the concentrations of chemotherapy and anti-angiogenic drugs by
and
where \(Dr_{c1}\), \(Dr_{c2}\) and \(Dr_a\) are the control inputs. \(\beta _{c1}\), \(\beta _{c2}\) and \(\beta _a\) denote the washout rates for the drugs. \(m_1\), \(m_2\), \(m_3\), \(m_4\) and \(m_5\) are the rates at which the drugs integrate into the cells. Based on the operations similar to that in [39], we obtain the simplified version of the model as
where \(\xi _\imath \big (p_{EC}(t),p_{AD}(t)\big )=\) \(\xi _{\imath 1}p_{EC}(t)+\xi _{\imath 2}p_{AD}(t)+\xi _{\imath 0}\) and \(\pi _\jmath \big (p_{EC}(t),p_{AD}(t)\big )\) \(=\pi _{\jmath 1}p_{EC}(t)+\pi _{\jmath 2}p_{AD}(t)+\pi _{\jmath 0}\) with \(\imath , \jmath =1,2\). The states \(p_{NC}(t)\), \(p_{TC}(t)\), \(p_{EC}(t)\), \(p_{CD1}(t)\), \(p_{CD2}\) and \(p_{AD}\) are nonnegative.
Remark 6.1
The differential equation (6.7) is the simplified model describing the interaction relationships among cells and drug. Observing the model, one can discover that there exists competition between normal cells and tumor cells. The tumor cells require more nutrients such that they facilitate the proliferation of endothelial cells, which could provide the indispensable nutrients to promote the growth of tumor. The tumor cells can be effectively damaged by the chemotherapy drugs which have side-effects on normal cells to some extent, and the anti-angiogenic drug contributes to the proliferation inhibition of the endothelial cells.
6.2.2 Nonzero-Sum Games Formulation
Consider the interaction model (6.7) rewritten as
where \(\mathcal {u}_1=[u_{c1},u_a]^{T}\), \(\mathcal {u}_2=[u_{c2},0]^{T}\) and f(x) is constructed by the right-hand side parts of (6.7) excluding the terms \(u_{c1}\), \(u_{c2}\) and \(u_a\).
Define the value function for player \(\imath (\imath =1,2)\) as
where the utility function \(\delta _\imath (x,\mathcal {u}_1,\mathcal {u}_2)=x^{T}\varUpsilon _\imath x+\mathcal {u}_1^{T}\mathcal {R}_{\imath 1}\mathcal {u}_1+\mathcal {u}_2^{T}\mathcal {R}_{\imath 2}\mathcal {u}_2\). The matrixes \(\mathcal {R}_{\imath \jmath }(\imath ,\jmath =1,2)\) and \(\varUpsilon _\imath \) are positive definite, and \(\varrho _\imath >0\) is the discount factor. According to the value function (6.9), we can define the corresponding Hamiltonian function as
The optimal value function is defined as
The target of NZSGs is to attain the admissible strategy pair \(\{\mathcal {u}_1^*,\mathcal {u}_2^*\}\) with the definition given in [23, 40]. According to the stationarity condition, the optimal strategy for player \(\imath \) could be obtained by
Thus the HJEs can be obtained as
Remark 6.2
Itās noteworthy that there exists no zero equilibrium for system (6.8), which may well result in the divergence of \(V_\imath (x(t))\). To resolve this issue, the discounted factor \(\varrho _\imath \) is introduced to form decay term such that \(V_\imath (x(t))\) can be convergent.
In general, solving NZSGs is synonymous with solving the equations (6.13). Nevertheless, for nonlinear system, itās very intractable to tackle the coupled equations. To resolve this difficulty, an ADP method utilizing dosage regulation mechanism is proposed in the following sections.
6.3 MDRM-Based Adaptive Critic Learning Method forĀ NZSGs
Firstly, we introduce the indications for medicine to judge when the medicine dosage should be regulated. Then under the MDRM, the ADP method of single-critic architecture is proposed to approximately seek the optimal strategy for the NZSGs of model (6.7).
6.3.1 MDRM-Based Optimal Strategy Derivation
For the sake of realizing conditioned therapy strategy, MDRM is required to handle the clinical data such that the strategy can be changed timely and necessarily. The time sequence \(\{\hbar _\ell \}\) is constructed for recording the regulating instants and \(\ell \) denotes the \(\ell \)th regulating instant. Then the state could be denoted as
For evaluating the difference between real-time data and latest recorded data, itās necessary to define an error function that \(z_\ell =\breve{x}_\ell -x(t),t\in [\hbar _\ell ,\hbar _{\ell +1})\). The operation of MDRM depends on the regulating condition which compares the error \(z_\ell \) with the threshold associated with real-time data. The strategy is adjusted only when \(z_\ell \) is larger than the threshold. That is, \(\breve{\mathcal {u}}_\imath =\mathcal {u}_\imath (\breve{x}_\ell ),\imath =1,2\), and \(\ell \in \mathbb {N}^{+}\). Thus the MDRM-based strategy could be got as
where \(\nabla \breve{V}_\imath ^{*}=\partial V_\imath ^{*}/\partial x\) when \(t=\hbar _\ell \). The version that based on the adjustment mechanism of HJEs is derived as
Differing from HJEs (6.13), due to the existence of the error \(z_\ell \), (6.16) does not equal to zero. Before proceed with the discussion, the following assumption is required [41].
Assumption 6.1
The optimal strategy \(\mathcal {u}_\imath ^{*}\) is locally Lipschitz. That is, for \(\imath =1,2\), there exists a constant \(\theta _\imath >0\) such that \(\Vert \mathcal {u}_\imath ^{*}-\breve{\mathcal {u}}_\imath ^{*}\Vert ^2\le \theta _\imath \Vert x-\breve{x}_\ell \Vert ^2\).
Theorem 6.3
Consider the system (6.8), and suppose that Assumption 6.1 holds and \(V_\imath ^{*}\) is the solution of (6.13). Then \(\breve{u}_\imath ^{*}\) formulated as (6.15) can stabilize system (6.8) when the following medicine indication is applied
where \(\zeta \in (0,1/2)\) is adjustable parameter. The terms \(\theta \), \(\varUpsilon \) and Y are given in (6.21) and (6.22).
Proof
Selecting the Lyapunov function \(L_{ya}=V_1^{*}+V_2^{*}\), we can obtain the corresponding derivative as
According to (6.13), we have
and
Let \(\mathcal {u}^{*}=[\mathcal {u}_1^{*T},\mathcal {u}_2^{*T}]^{T}\) and \(\breve{\mathcal {u}}^{*}=[(\breve{\mathcal {u}}_1^{*}-\mathcal {u}_1^{*})^{T},(\breve{\mathcal {u}}_2^{*}-\mathcal {u}_2^{*})^{T}]^{T}\). Then we can derive that
where \(\varUpsilon =\varUpsilon _1+\varUpsilon _2\), \(\mathcal {R}=diag\{\mathcal {R}_{11}+\mathcal {R}_{21},\mathcal {R}_{12}+\mathcal {R}_{22}\}\), and \(Z=[Z_1,Z_2]\) with \(Z_\imath =[\mathcal {R}_{11}\mathcal {g}_1^{-1}\mathcal {g}_\imath ,\mathcal {R}_{22}\mathcal {g}_2^{-1}\mathcal {g}_\imath ]^{T},\imath =1,2\). Applying Youngās inequality, we have
where \(Y=Z^{T}\mathcal {R}^{-1}Z\). Itās noted that \(\mathcal {u}_\imath ^{*}\) is the admissible strategy, we can derive that \(V_\imath ^{*}\) is bounded. Hence \(\varrho _V\) is the bound of the term \(\varrho _1 V_1^{*}+\varrho _2 V_2^{*}\). According to the definitions of \(\varUpsilon \) and Y, we have that \(\lambda _m(\varUpsilon )>0\) and \(\lambda _M(Y)>0\). Furthermore, we can obtain
where \(\theta =\theta _1+\theta _2\). When the indication (6.17) is satisfied, we derive that \(\dot{L}_{ya}\le -2\zeta \lambda _m(\varUpsilon )\Vert x\Vert ^2+\varrho _V\). Then we can find that \(\dot{L}_{ya}<0\) holds when \(\Vert x\Vert >\sqrt{\frac{\varrho _V}{2\zeta \lambda _m(\varUpsilon )}}\). In light of Lyapunov theorem, the strategy (6.15) can stabilize system (6.8). This completes the proof.\(\blacksquare \)
6.3.2 Implementation ofĀ Adaptive Critic Learning Method
In this section, the approximate optimal strategy under MDRM is derived by ADP method of single-critic architecture. In light of the universal approximation properties of neural networks (NNs), \(V_\imath ^{*}\) can be obtained by
where \(\omega _\imath ^{*}\) is the ideal weight vector, \(\nu _\imath \) the activation function and \(\sigma _\imath \) the approximate error. To acquire the approximate version of the unknown vector \(\omega _\imath ^{*}\), the critic NN is constructed by
with \(\hat{\omega }\) being the approximate vector. With the aid of critic NN, we can present the optimal strategy as
Accordingly, we can obtain the optimal and approximate optimal strategies under MDRM as
and
Then the approximate Hamiltonian can be presented as
where \(\psi _\imath =\nabla \nu _\imath \big (f+\mathcal {g}_1\mathcal {u}_1(\breve{x}_\ell )+\mathcal {g}_2\mathcal {u}_2(\breve{x}_\ell )\big )-\varrho _\imath \nu _\imath \).
In order to minimize \(\epsilon _\imath \) in (6.29), we set the target of minimization as \(E=E_1+E_2=1/2\epsilon _1^2+1/2\epsilon _2^2\). Via applying gradient descent approach, we obtain
where \(\gamma _\imath \) is the adjustable parameter and \(\breve{\psi }_\imath =\psi _\imath /(\psi _\imath ^{T}\psi _\imath +1)^2\). Define \(\tilde{\omega }_\imath =\omega _\imath ^{*}-\hat{\omega }_\imath \). From (6.30), we derive that
where \(\bar{\psi }_\imath =\psi _\imath /(\psi _\imath ^{T}\psi _\imath +1)\) and the approximated residual error \(e_\imath =-\nabla \sigma _\imath ^{T}(f+\mathcal {g}_1\breve{\mathcal {u}}_1+\mathcal {g}_2\breve{\mathcal {u}}_2)+\varrho _\imath \sigma _\imath \). For proceeding further, the following assumptions are required [11, 26, 27].
Assumption 6.2
For any \(\imath \in \{1,2\}\), the signal \(\bar{\psi }_\imath \) is persistently excited on the time interval \([t,t+T]\). That is, there exists the positive constant \(b_{\psi \imath }\) such that
with \(N_{c\imath }\) being the neuron number of the \(\imath \)th critic network.
Assumption 6.3
For \(\imath \in \{1,2\}\), there exist positive constants such that \(\Vert \omega _\imath ^{*}\Vert \le b_{\omega \imath }\), \(\Vert \nabla \nu _\imath \Vert \le b_{\nu \imath }\), \(\Vert \nabla \sigma _\imath \Vert \le b_{\sigma \imath }\) and \(\Vert e_\imath \Vert \le b_{e\imath }\).
6.4 Stability Analysis
In this section, the asymptotic stability of the controlled system is analyzed by applying Lyapunov theory. Before presenting the main results, the boundedness of critic weight is discussed in the following lemma.
Lemma 6.4
For any \(\imath \in \{1,2\}\), suppose that Assumptions 6.2ā6.3 hold and the initial weight is finite. If the critic tuning law (6.30) is applied, then it holds that \(\tilde{\omega }_\imath \) is locally ultimately bounded.
Proof
Consider the Lyapunov function as \(L_{y\omega }\). Itās noted that the derivative of \(\tilde{\omega }_\imath \) is flow dynamics, which indicates that there doesnāt exist any jumps in the values of \(\tilde{\omega }_\imath \). More specially, \(\tilde{\omega }_\imath \) is continuous at the regulating instant. Thus we only need to consider the time interval between two adjoining regulating instants.
According to Assumptions 6.2ā6.3, it can be derived that
By applying Youngās inequation, we can get
where \(\varGamma _1=\gamma _1 b_{e1}^2+\gamma _2 b_{e2}^2\). Furthermore, when \(\Vert \tilde{\omega }_1\Vert >\sqrt{\frac{\varGamma _1}{\gamma _1 b_{\psi 1}}}\triangleq b_{\tilde{\omega }_1}\) or \(\Vert \tilde{\omega }_2\Vert >\sqrt{\frac{\varGamma _1}{\gamma _2 b_{\psi 2}}}\triangleq b_{\tilde{\omega }_2}\), it yields that \(\dot{L}_{y\omega }<0\). The lemma is proved. \(\blacksquare \)
Theorem 6.4
Consider the system (6.8) with strategy formulated as (6.28). Suppose that Assumptions 6.1ā6.3 hold. The tuning law for critic network is given by (6.30). Then the state x and weight estimation error \(\tilde{\omega }_\imath \) are UUB provided that the indication is applied
with \(\varpi _1\) and \(\varpi _2\) being the adjustable parameters.
Proof
Select the Lyapunov function candidate as
Due to the utilization of MDRM, we present the proof process in two cases.
Case I: No regulation occurs, i.e., \(t\in [\hbar _\ell ,\hbar _{\ell +1})\). Then we obtain \(\dot{L}_{Ya}=0\). The derivative of \(L_{Yb}\) can be obtained as
Let \(\breve{\mathcal {u}}=[(\breve{\mathcal {u}}_1-\mathcal {u}_1^{*})^{T},(\breve{\mathcal {u}}_2-\mathcal {u}_2^{*})^{T}]^{T}\). Applying the operations similar to that in Theorem 6.3, we have
Recall that \(\theta =\theta _1+\theta _2\), and substitute (6.27) and (6.28) into (6.38). Then we can derive
where \(\varGamma _2=\frac{1}{4}\lambda _M(Y)(1+1/\varpi _2)^2\big (\Vert \mathcal {R}_{11}^{-1}\Vert ^2 b_{g1}^2 b_{\nu 1}^2 b_{\tilde{\omega }1}^2+\Vert \mathcal {R}_{22}^{-1}\Vert ^2 b_{g2}^2 b_{\nu 2}^2 b_{\tilde{\omega }2}^2\big )+\frac{1}{4\varpi _2}\lambda _M(Y)(1+\varpi _2)^2\big (\Vert \mathcal {R}_{11}^{-1}\Vert ^2 b_{g1}^2 b_{\sigma 1}^2+\Vert \mathcal {R}_{22}^{-1}\Vert ^2 b_{g2}^2 b_{\sigma 2}^2\big )\) with \(b_{g1}\) and \(b_{g2}\) denoting the bounds of known \(\mathcal {g}_1\) and \(\mathcal {g}_2\).
According to Assumption 6.2 and Assumption 6.3, we derive that
Based on (6.39) and (6.40), we can obtain
where \(\pounds =\varGamma _1+\varGamma _2+\varrho _V\). Applying the indication (6.35), then we conclude that \(\dot{L}_Y<0\) when one of the conditions hold that
Thus x and \(\tilde{\omega }_\imath \) can be guaranteed to be UUB.
Case II: A regulation occurs, that is, \(t=\hbar _{\ell +1}\). The difference of \(L_Y\) can be given by
where the terms are defined by \(\triangle L_{Ya}=V_1^{*}(\breve{x}_{\ell +1})-V_1^{*}(\breve{x}_{\ell })+V_2^{*}(\breve{x}_{\ell +1}) -V_2^{*}(\breve{x}_{\ell })\), \(\triangle L_{Yb}=V_1^{*}(x(\hbar _{\ell +1}))-V_1^{*}(x(\hbar _{\ell +1}^-))+V_2^{*}(x(\hbar _{\ell +1}))-V_2^{*}(x(\hbar _{\ell +1}^-))\), \(\triangle L_{Yc}=1/2\tilde{\omega }_1^{T}(\hbar _{\ell +1})\tilde{\omega }_1(\hbar _{\ell +1}) -1/2\tilde{\omega }_1^{T}(\hbar _{\ell +1}^-)\tilde{\omega }_1(\hbar _{\ell +1}^-)+1/2\tilde{\omega }_2^{T}(\hbar _{\ell +1})\tilde{\omega }_2(\hbar _{\ell +1}) -1/2\tilde{\omega }_2^{T}(\hbar _{\ell +1}^-)\tilde{\omega }_2(\hbar _{\ell +1}^-)\). Recalling the analysis in Case I, we obtain that \(\dot{L}_Y<0\) when x or \(\tilde{\omega }_\imath \) is out of the corresponding bound. Furthermore, we can derive that \(L_{Yb}+L_{Yc}\) is monotonically decreasing when \(t\in [\hbar _\ell ,\hbar _{\ell +1})\). In light of the properties of limits, we have
As x is proved to be UUB, we can obtain
According to (6.45) and (6.46), we can derive \(\triangle L_Y<0\), which indicates that the selected Lyapunov (6.36) is monotonically decreasing when \(t=\hbar _{\ell +1}\). This completes the proof. \(\blacksquare \)
Remark 6.5
\(\varpi _1\) and \(\varpi _2\) in (6.35) are the adjustable parameters which determine the frequency of medicine dosage regulation. A larger \(\varpi _1\) or \(\varpi _2\) leads to a higher regulation frequency, and a smaller parameter implies a lower adjustment frequency. Thus we can determine these parameters according to the clinical data.
Remark 6.6
In thischapter, the approximate optimal combination therapeutic strategy is derived via ADP method to inhibit the proliferation of tumor cells under the mechanism of medicine dosage regulation. The MDRM is constructed on the foundation of the above-mentioned medicine indication (6.35). The data at the dosage-regulating instants should be recorded and will be utilized as reference data in the future. When the difference between the current clinical data and latest reference data is larger than the threshold, the medicine dosage can be regulated. Therefore, this mechanism can guarantee the derived therapeutic strategy to be regulated timely and necessarily.
6.5 Simulation Study
In this section, the mathematical model (6.7) is considered which presents the relations between cells and drugs. For simplicity, we have constructed the rephrased system (6.8) of which the control issue could be deemed as NZSGs.
In light of the clinical medical statistics and literature [38], the parameters on cells and drugs for model (6.7) are given in TableĀ 6.1 and TableĀ 6.2, respectively. For the discounted value function (6.9) of system (6.8), the corresponding parameters are set as \(\mathcal {R}_{11}=0.8I_{2\times 2}\), \(\mathcal {R}_{12}=15I_{2\times 2}\), \(\mathcal {R}_{21}=5I_{2\times 2}\), \(\mathcal {R}_{22}=I_{2\times 2}\), \(\varUpsilon _1=0.02I_{6\times 6}\) and \(\varUpsilon _2=0.06I_{6\times 6}\). In addition, the discounted factors \(\varrho _1=\varrho _2=0.2\).
For the critic NNs, the activation functions are both set as \([x_1^2\), \(x_1 x_2\), \(x_1 x_3\), \(x_1 x_4\), \(x_1 x_5\), \(x_1 x_6\), \(x_2^2\), \(x_2 x_3\), \(x_2 x_4\), \(x_2 x_5\), \(x_2 x_6\), \(x_3^2\), \(x_3 x_4\), \(x_3 x_5\), \(x_3 x_6\), \(x_4^2\), \(x_4 x_5\), \(x_4 x_6\), \(x_5^2\), \(x_5 x_6\), \(x_6^2]^{T}\), and the learning laws are set by \(\gamma _1=1.5\) and \(\gamma _2=2\). Besides, the parameters \(\theta =8\), \(\varpi _1=0.8\) and \(\varpi _2=8\).
The evolution curves of the model (6.7) are depicted in Fig.Ā 6.1. From Fig.Ā 6.1 we can observe that when \(t=200d\), the population of tumor cells reduces to zero, and when \(t=600d\), the population of normal cells almost returns to 1 and that of endothelial cells drops down to a small steady value. This indicates that the proliferation of tumor cells can be suppressed after 600 days under the optimal therapy strategy. In Figs.Ā 6.1, 6.2, 6.3, 6.4, 6.5, 6.6 and 6.7, we compare the medicine dosages of the derived therapy strategy and that of initial therapy strategy. It indicates that the medicine dosages of our near-optimal therapy strategy are significantly less than the dosages of initial strategy. Itās of great practical significance since superfluous drugs may well affect the health of patients and impose additional financial burdens on patients. Besides, one can find that when the clinical data becomes better, the regulation frequency of the derived therapy strategy becomes lower. This implies that the therapy strategy based on medicine dosage regulation mechanism can be regulated with the indications for medicine timely and necessarily. FiguresĀ 6.5, 6.6 and 6.7 present the curves of the cells under different therapy strategies, that is, chemotherapy drug 1, chemotherapy drug 2, anti-angiogenic drug and the therapy comprised of these three drugs. We can conclude from Figs.Ā 6.5, 6.6 and 6.7 that the therapeutic effect of the derived therapy is the best. Thus simulation results validate the effectiveness of our therapy strategy
6.6 Conclusion
In this chapter, an ADP-based method using medicine dosage regulation mechanism has been proposed to obtain the optimal combination therapy for curing cancer. A mathematical model is employed to describe the interactions among the normal cells, tumor cells, endothelial cells, chemotherapy drugs and anti-angiogenic drug. The mathematical model provides the foundation for us to solve the optimization issue under the architecture of NZSGs. The ADP method of single-critic framework is proposed to approximately seek the optimal strategy. In addition, the introduction of the medicine dosage adjustment mechanism guarantees the therapy strategy to be adjusted timely and necessary. Finally, the theory analysis and simulation results both indicate that the designed strategy can effectively decrease the population of tumor cells and endothelial cells with very few medicine dosage, which verifies the availability of the proposed method. Our future research direction is to seek the optimal strategy for decreasing tumor cells or other harmful cells with latest therapies, for example, the therapy applying oncolytic virus.
References
Sharma S, Samanta GP (2016) Analysis of the dynamics of a tumor-immune system with chemotherapy and immunotherapy and quadratic optimal control. Differ Equ Dyn Syst 24(2):149ā171
Evans CM (1991) The metastatic cell, behaviour and biochemistry. Chapman and Hall, London
Sherbet GV (1982) The biology of tumour malignancy. Academic, London
de Pillis LG, Gu W, Radunskaya AE (2006) Mixed immunotherapy and chemotherapy of tumors: modeling, applications and biological interpretations. J Theor Biol 238(4):841ā862
Bikfalvi A (1995) Significance of angiogenesis in tumour progression and metastass. Eur J Cancer 31(7ā8):1101ā1104
Beecken WC, Fernandes A, Joussen AM,..., Shing Y (2001) Effect of antiangiogenic therapy on slowly growing, poorly vascularized tumors in mice. J Natl Cancer Inst 93(5):382ā387
Kerbel RS, Bertolini F, Man S, Hicklin DA, Emmenegger U, Shaked Y (2006) Antiangiogenic drugs as broadly effective chemosensitizing agents. Angiogenesis, pp 195ā212
Harmon ME, Baird LC, Klopf AH (1995) Reinforcement learning applied to a differential game. Adapt Behav 4(1):3ā28
Littman ML (2015) Reinforcement learning improves behaviour from evaluative feedback. Nature 521(7553):445ā451
Li T, Yang D, Xie X, Zhang H (2022) Event-triggered control of nonlinear discrete-time system with unknown dynamics based on HDP(\(\lambda \)). IEEE Trans Cybernet 52(7):6046ā6058
Vamvoudakis KG, Lewis FL (2010) Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5):878ā888
Yang X, He H (2021) Decentralized event-triggered control for a class of nonlinear-interconnected systems using reinforcement learning. IEEE Trans Cybernet 51(2):635ā648
Liu Y, Yao D, Li H, Lu R (2022) Distributed cooperative compound tracking control for a platoon of vehicles with adaptive NN. IEEE Trans Cybernet 52(7):7039ā7048
Tan G, Wang Z, Shi Z (2023) Proportional-integral state estimator for quaternion-valued neural networks with time-varying delays. IEEE Trans Neural Netw Learn Syst 34(2):1074ā1079
Liu D, Yang X, Wang D, Wei Q (2015) Reinforcement-learning-based robust controller design for continuous-time uncertain nonlinear systems subject to input constraints. IEEE Trans Cybernet 45(7):1372ā1385
Wang D, Liu D, Li H, Luo B, Ma H (2016) An approximate optimal control approach for robust stabilization of a class of discrete-time nonlinear systems with uncertainties. IEEE Trans Syst Man Cybernet: Syst 6(5):713ā717
Zhang H, Cai Y, Wang Y, Su H (2020) Adaptive bipartite event-triggered output consensus of heterogeneous linear multiagent systems under fixed and switching topologies. IEEE Trans Neural Netw Learn Syst 31(1):4816ā4830
Wei Q, Liu D, Lewis FL (2015) Optimal distributed synchronization control for continuous-time heterogeneous multi-agent differential graphical games. Inf Sci 317:96ā113
Zhang J, Zhang H, Feng T (2018) Distributed optimal consensus control for nonlinear multiagent system with unknown dynamic. IEEE Trans Neural Netw Learn Syst 29(8):3339ā3348
Kamalapurkar R, Dinh H, Bhasin S, Dixon WE (2015) Approximate optimal trajectory tracking for continuous-time nonlinear systems. Automatica 51:40ā48
Gao W, Jiang Z (2018) Learning-based adaptive optimal tracking control of strict-feedback nonlinear systems. IEEE Trans Neural Netw Learn Syst 29(6):2614ā2624
Starr AW, Ho YC (1969) Nonzero-sum differential games. J Optim Theory Appl 3(3):184ā206
Vamvoudakis KG, Lewis FL (2011) Online adaptive learning solution of coupled Hamilton-Jacobi equations for multi-player non-zero-sum games. Automatica 47(8):1556ā1569
Zhu Y, Zhao D, Li X (2017) Iterative adaptive dynamic programming for solving unknown nonlinear zero-sum game based on online data. IEEE Trans Neural Netw Learn Syst 28(3):714ā725
Song R, Li J, Lewis FL (2020) Robust optimal control for disturbed nonlinear zero-sum differential games based on single NN and least squares. IEEE Trans Syst Man Cybernet: Syst 50(11):4009ā4019
Zhang H, Cui L, Luo Y (2013) Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Trans Cybernet 43(1):206ā216
Zhao D, Zhang Q, Wang D, Zhu Y (2016) Experience replay for optimal control of nonzero-sum game systems with unknown dynamics. IEEE Trans Cybernet 46(3):854ā865
Zhang Q, Zhao D (2019) Data-based reinforcement learning for nonzero-sum games with unknown drift dynamics. IEEE Trans Cybernet 49(8):2874ā2885
Song R, Wei Q, Zhang H, Lewis FL (2021) Discrete-time non-zero-sum games with completely unknown dynamics. IEEE Trans Cybernet 51(6):2929ā2943
Zhang H, Qin C, Jiang B, Luo Y (2014) Online adaptive policy learning algorithm for \(H_{\infty }\) state feedback control of unknown affine nonlinear discrete-time systems. IEEE Trans Cybernet 44(12):2706ā2718
Chen L, Zhu Y, Ahn CK (2023) Adaptive neural network-based observer design for switched systems with quantized measurements. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2021.3131412
Zhang H, Su H, Zhang K, Luo Y (2019) Event-triggered adaptive dynamic programming for non-zero-sum games of unknown nonlinear systems via generalized fuzzy hyperbolic models. IEEE Trans Fuzzy Syst 27(11):2202ā2214
Liu X, Ge SS, Zhao F, Mei X (2021) Optimized impedance adaptation of robot manipulator interacting with unknown environment. IEEE Trans Control Syst Technol 29(1):411ā419
Massenio PR, Naso D, Lewis FL, Davoudi A (2020) Assistive power buffer control via adaptive dynamic programming. IEEE Trans Energy Convers 35(3):1534ā1546
Ghasempour T, Nicholson GL, Kirkwood D, Fujiyama T, Heydecker B (2020) Distributed approximate dynamic control for traffic management of busy railway networks. IEEE Trans Intell Transp Syst 21(9):3788ā3798
Wei Q, Liao Z, Shi G (2021) Generalized actor-critic learning optimal control in smart home energy management. IEEE Trans Ind Inf 17(10):6614ā6623
Zhao J, Wang T, Pedrycz W, Wang W (2021) Granular prediction and dynamic scheduling based on adaptive dynamic programming for the blast furnace gas system. IEEE Trans Cybernet 51(4):2201ā2214
Davari M, Gao W, Jiang ZP, Lewis FL (2021) An optimal primary frequency control based on adaptive dynamic programming for islanded modernized microgrids. IEEE Trans Autom Sci Eng 18(3):1109ā1121
Pinho STRD, Bacelar FS, Andrade RFS, Freedman HI (2013) A mathematical model for the effect of anti-angiogenic therapy in the treatment of cancer tumours by chemotherapy. Nonlinear Anal Real World Appl 14(1):815ā828
Liu D, Li H, Wang D (2014) Online synchronous approximate optimal learning algorithm for multiplayer nonzero-sum games with unknown dynamics. IEEE Trans Syst Man Cybernet: Syst 44(8):1015ā1027
Yang X, Wei Q (2021) Adaptive critic learning for constrained optimal event-triggered control with discounted cost. IEEE Trans Neural Netw Learn Syst 32(1):91ā104
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
Ā© 2024 The Author(s)
About this chapter
Cite this chapter
Sun, J., Xu, S., Liu, Y., Zhang, H. (2024). Combination Therapy-Based Adaptive Control forĀ Organism Using Medicine Dosage Regulation Mechanism. In: Adaptive Dynamic Programming. Springer, Singapore. https://doi.org/10.1007/978-981-99-5929-7_6
Download citation
DOI: https://doi.org/10.1007/978-981-99-5929-7_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-5928-0
Online ISBN: 978-981-99-5929-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)