Abstract
In this chapter, the constrained adaptive control strategy based on virotherapy is investigated for organism using the medicine dosage regulation mechanism (MDRM). Firstly, the tumor-virus-immune interaction dynamics is established to model the relations among the tumor cells (TCs), virus particles and the immune response. ADP method is extended to approximately obtain the optimal strategy for the interaction system to reduce the populations of TCs. Due to the consideration of asymmetric control constraints, the non-quadratic functions are proposed to formulate the value function such that the corresponding Hamilton-Jacobi-Bellman equation (HJBE) is derived which can be deemed as the cornerstone of ADP algorithms. Then, the ADP method of single-critic network architecture which integrates MDRM is proposed to obtain the approximate solutions of HJBE and eventually derive the optimal strategy. The design of MDRM makes it possible for the dosage of the agentia containing oncolytic virus particles to be regulated timely and necessarily. Furthermore, the uniform ultimate boundedness of the system states and critic weight estimation errors are validated by Lyapunov stability analysis. Finally, simulation results are given to show the effectiveness of the derived therapeutic strategy.
You have full access to this open access chapter, Download chapter PDF
7.1 Introduction
Low efficacy and high toxicity for patients is the characteristics of traditional therapies as surgery, chemotherapy, and radiation, hence the most prosperous tumor treatment strategy, oncolytic virotherapy which depends on the virus with relatively weak pathogenicity and appropriate gene modification, simultaneously, the therapeutic effect benefits from strong replication capabilities. Similar to the principle of targeted therapy, gene-modified viruses repressed selectively infect tumor cells (ITCs) through rapid replication increment, and ultimately destroy TCs, concurrently, activate the body’s immune response. Soluble tumor virus therapy not only can kill TCs, but also attract more immune cells to kill residual cancer cells, however, it doesn’t deplete normal cells in the body. Oncolytic virus (OVs) enjoyed the superiority of minimal side effects and optimal therapeutic effects compared with traditional treatment strategies as literature [1]. Development of oncolytic viruses benefit from the virus-specific lytic CTL response eliciting immunostimulatory signals and contributing to killing of ITCs as literature [2], thus, viral doses, number of doses and timing with reliable mathematical models are the future research direction.
To lucubrate cancer virotherapy, mathematical models which described mechanisms of TCs, OVs and immune cells have been proposed and updated as literatures [3, 4]. Literature [5] expounded the inner mechanism including uninfected tumor cells (UTCs), ITCs and free viruses. Successively, the infected cells and uninfected cells are distinguished through logistic growth of TCs and elimination of free recombinant measles viruses as [6]. What matters most is the immune response which leads to inhibitory effect of viral therapy for misregarding of genetically modified viruses.
Therapy efficiency depends on hyperimmunity or not, in other words, infected cancer cells and viruses are swallowed for indistinguishability. Literatures has demonstrated the side effect of immune cells, and immunosuppressive agent cyclophosphamide is chosen to reduce immune response [7]. Reference [8] has considered the virus-free population adding the previous three variables, reflecting interactive relationship between innate immune with infected cancer cells and the virus cells, evolving into an effective mechanism analysis model, but more effective control strategy is in urgent need. Cytokines form natural killer cells contribute most strength on destruction of both tumor and virus-infected cells. The proposed model gives explicitation of interplay among TCs, OVs, and immune response, which is the guideline of optimal therapeutic strategies or dosage regimen on oncotherapy. Although correlational research on regulation on immune system and TCs has been proposed using ADP as [9], selective oncolysis will enjoy optimal therapeutic effect through gene-modified viruses compared with wild-type OVs based on ADP method.
As a vital branch in machine learning, obtaining information from interactive environment [10,11,12], reinforcement learning (RL) has been demonstrated to perform well in solving optimal control issues of nonlinear systems [13]. The ADP method, which was derived from RL and dynamic programming, generally attempts to obtain the optimal strategies with the aid of the classic critic-actor algorithm framework [14]. Under this architecture, the critic evaluates the cost when the current strategy is applied, and actor updates the control strategy in accordance with the feedback information provided by the critic. Thus the approximate optimal strategy can be derived and the “curse of dimensionality” can be obviated. Recently, ADP-based methods have been widely researched to tackle various optimal issues, for instances, tracking control [15,16,17], optimal consensus control [18,19,20], zero-sum games and nonzero-sum games [21,22,23]. Different from fuzzy approximation as [24], the robust dynamic NN was established to asymptotically identify the uncertain system with additive disturbances, and the critic and actor worked together to find the equilibrium solution for nonzero-sum games subject to nonlinear system. The identifier was developed to reconstruct the unknown dynamics and the critic was tuned by a concurrent learning strategy which could effectively use real-time data and recorded data such that the persistence of excitation (PE) condition could be removed. By utilizing both online and off-line data, a data-based policy gradient ADP method was developed to seek optimal scheme in [25]. To address global optimum control issue and avoid falling into local optimality as[26], the ADP method which combined with the predesigned extra compensators was proposed in [27]. The introductions of these compensators contributed to deriving the augmented neighborhood error systems, thus the system dynamics requirement for ADP was avoided. In [28], integrating the neural network learning ability and the spirits of ADP, a general architecture of intelligent critic control was proposed to solve the robustness issues of disturbed nonlinear systems.
As saturation phenomena which exist widely in many practical systems can affect the system performance, multifarious ADP-based method were proposed to achieve optimal control with input constraints [29,30,31]. For the tumor-virus-immune system in this , the control input is the medicine containing the virus particles. Redundant or insufficient medicine dosages may well influence the therapeutic effect or patients’ health. Thus we consider the asymmetric input constraints and construct the corresponding non-quadratic value functions associated with the tumor-virus-immune system.
Recently, ADP-based methods have been proposed to develop approximate optimal strategies in various practical applications [32,33,34,35]. However, there exist seldom any literatures associated with optimal strategy based on virotherapy which is derived from ADP-based methods. Enlightened by the literatures mentioned above, we design the virotherapy-based optimal strategy via ADP method with MDRM. The contributions can be stated as follows. Firstly, the mathematic model is introduced to simulate the relationships between TCs, OVs and immune cells. Due to the asymmetric dosage constraints for medicine, a non-quadratic utility function is constructed to form the discounted value function. Then, on the basis of the tumor-virus-immune model, ADP method of single-critic architecture is proposed to solve HJBE such that the approximate optimal strategy can be achieved, which means that the TCs can be largely eliminated with the constrained optimal virotherapy-based strategy. Furthermore, the reasonable the medicine dosage regulation mechanism is firstly introduced into this algorithm framework, and the indications for medicine is considered for the first time. Finally, theoretical analysis and simulation experiments both validate the effectiveness of the designed therapeutic strategy.
7.2 Problem Formulation and Preliminaries
7.2.1 Establishment of Interaction Model
In the section, tumor-virus-immune interaction model is introduced to describe the relations between TCs, viruses and immune cells. Due to the behavior of OVs, we can divide TCs into UTCs and ITCs. In the model composed of four ordinary differential equations as follows, \(P_{TU}(t)\), \(P_{TI}(t)\), \(P_{VI}(t)\) and \(P_{IM}(t)\) respectively denote the populations of UTCs, ITCs, free OVs and immune cells.
The population of UTCs can be affected by multiple factors, that is, the multiplication and apoptosis of TCs, the infection by OVs and the reduction caused by immune cells. Moreover, the growth dynamics of UTCs is presented as
where \(A_1\) is the tumor proliferation rate, \(A_2\) is the infection rate of virus, \(B_1\) denotes the killing-efficiency of immune cells, and \(C_1\) is the apoptosis rate of UTCs.
Similarly, the population of ITCs can be modeled by
where \(B_2\) denotes the immune killing-efficiency of ITCs and \(\varphi \) is apoptosis rate of ITCs.
The lysis of ITCs which contain multiple replicated virion particles and the input of virus agentia can both contribute to the rise of the free virus population. Thus the evolution dynamics of virus population can be presented as
where \(\mathcal {U}\) denotes the input of agentia, \(\kappa \) the burst size of free viruses, \(B_3\) the immune killing-efficiency rate of OVs, and \(C_2\) the clearance rate of OVs.
The immune response dynamics can be formulated as
where \(D_1\) and \(D_2\) are immune response rates stimulated by infected and uninfected cells. And \(C_3\) is the apoptosis rate of immune cells. For purpose of simplifying the interaction model, we utilize the nondimensionalization technique [36, 37] to derive the simplified version as
Herein the nonnegative states of nondimensionalization version are represented as \(p_{TU}(t)\), \(p_{TI}(t)\), \(p_{VI}(t)\) and \(p_{IM}(t)\).
Remark 7.1
In virotherapy, the viruses achieved their reproductive objective by infecting tumor cells and replicating themselves. After the lysis of infected cells, new reproductions burst out and infect other tumor cells. Under this mechanism, the tumor cells can be effectively eliminated. Furthermore, comparing with uninfected tumor cells, the infected cells can activate immune cells more effectually to kill tumor cells.
7.2.2 Problem Formulation
Consider the system (7.5) as
where \(\mathcal {g}=[0,0,1,0]^T\), and f(x) is constructed by the right-hand side parts of (7.5) excluding the control input \(\mathcal {u}\). \(\mathcal {u}\in [\mathcal {u}_m,\mathcal {u}_M]\) where \(\mathcal {u}_m\) and \(\mathcal {u}_M\) denote the minimum and maximum thresholds for medicine input dosage.
For system (7.6), the corresponding discounted value function is defined as
with the discounted factor \(\theta >0\). The utility function is given by
where the matrix \(\varUpsilon \) is positive definite, and \(\chi (\mathcal {u})\) is non-negative function. It’s noted that for system (7.6) the input constraints are not symmetric. In order to cope with this issue, function \(\chi (\mathcal {u})\) is defined as
where \(\alpha =(\mathcal {u}_m+\mathcal {u}_M)/2\) and \(\hbar =(\mathcal {u}_M-\mathcal {u}_m)/2\). \(\psi (\cdot )\) is a monotonic odd function which is continuously differential with \(\psi (0)=0\). Without loss of generality, we select the hyperbolic tangent function as \(\psi (\cdot )\), that is, \(\psi (\cdot )=\tanh (\cdot )\).
Differentiating the value function (7.7) along system (7.6), we obtain that
Then the Hamiltonian function can be expressed as
The optimal value function is defined as
which satisfies HJBE
Applying the stationary condition, we can derive the optimal strategy as
On the basis of (7.13) and (7.14), we rewrite the HJBE as
Remark 7.2
In the conventional optimal control issue with control constraints, it’s often required that the input constraints should be symmetric. Nevertheless, the proposed method in this takes the asymmetric input constraints into account. Thus the symmetric constrained condition is relaxed by constructing the unconventional utility function (7.8).
Due to the nonlinear nature of (7.15), it’s often intractable to derive the analytical solution, which is requisite for designing the optimal strategy. To overcome this issue, in the following sections, ADP method of single-critic network using dosage regulation mechanism is designed to approximately solve (7.15).
7.3 Optimal Strategy Based on MDRM
In order to achieve the goal of regulating therapeutic strategy timely and necessarily, MDRM is introduced to provide indications for medicine to determine the time when it’s necessary to make some regulation. Therefore, the time sequence \(\{z_\imath \}\) is required to record the regulating instants. The parameter \(\imath \in \mathbb {N}^{+}\) represents the \(\imath \)th updating instant and \(\mathbb {N}^{+}\) is the set including all positive integers. Then we can define the state as
In general, the clinical data after the latest regulation is different from the current comparable data. Hence the error is given by
Based on \(\nu _\imath \) and the threshold associated with state x, the medicine regulation mechanism is established. When a regulation occurs, \(\nu _\imath =0\), which means the medicine dosage is regulated to be equal to the current medicine indication. The comparable data is updated by the clinical data at regulation instant, and the medicine dosage remains unchanged until the occurrence of the next regulation. That is, \(\breve{\mathcal {u}}=\mathcal {u}(x_\imath )\). Thus we derive the MDRM-based strategy as
where \(\nabla \breve{V}^{*}=\partial V^{*}/\partial x\) when \(t=z_\imath \). Then the medicine regulation mechanism-based HJBE can be denoted as
The existence of the error \(\nu _\imath \) lead to that (7.19) does equal to 0, which is different from HJBE (7.15). Before proceeding, an assumption is necessary [31].
Assumption 7.1
The optimal strategy \(\mathcal {u}^{*}\) is locally Lipschitz with respect to error \(\nu _\imath \), i.e., \(\Vert \mathcal {u}^{*}-\breve{\mathcal {u}}^{*}\Vert ^2\le K_\mathcal {u}\Vert x-\breve{x}_\imath \Vert ^2=K_\mathcal {u}\Vert \nu _\imath \Vert ^2\) where \(K_\mathcal {u}\) is a positive constant.
Theorem 7.1
Consider the nonlinear system (7.6). Suppose that Assumption 7.1 is tenable and there exists function \(V^{*}\) satisfying (7.15). If the optimal strategy is formulated as (7.18) with the medicine indication
where \(\zeta \in (0,1)\) is the designed parameter, then the controlled system is guaranteed to be asymptotically stable in the sense of UUB.
Proof
Select the Lyapunov function \(\bar{Y}=V^{*}(x)\). Then we can obtain the derivative of \(V^{*}\)
According to (7.14) and (7.15), we derive that
and
Then (7.21) can be rewritten as
where \(\varpi =-2\hbar (\tanh ^{-1}((\mathcal {u}^{*}-\alpha )/\hbar ))^T(\breve{\mathcal {u}}^{*}-\mathcal {u}^{*})\). Due to Young’s inequality, from (7.24) we derive
Via variable substitution approach, we have
The function (7.26) can be further expressed as
Based on (7.24), (7.25) and (7.27), we can obtain
where \(\varXi _1(x)=2\hbar ^2\int _0^{\tanh ^{-1}((\mathcal {u}^{*}-\alpha )/\hbar )}\varsigma \tanh ^2(\varsigma )d\varsigma \). Via utilizing integral mean-value theorem, we derive that
where \(\rho \in (0,\tanh ^{-1}((\mathcal {u}^{*}-\alpha )/\hbar ))\). As \(\mathcal {u}^{*}\) is admissible, it can be deduced that \(V^{*}\) and \(\nabla V^{*}\) are bounded. Let \(\Vert V^{*}\Vert \le b_V\) and \(\Vert \nabla V^{*}\Vert \le b_{\nabla V}\) with \(b_V\) and \(b_{\nabla V}\) being positive constants. Then (7.29) becomes that
where the positive constant \(b_{\mathcal {g}}\) denotes the bound of \(\mathcal {g}(x)\). According to (7.28) and (7.30), it can be obtained that
When the indication (7.20) is satisfied, it yields that \(\dot{\bar{Y}}\le -\zeta ^2\lambda _m(\varUpsilon )\Vert x\Vert ^2+\theta b_V+b_{\varXi _1}\). Then we can conclude that \(\dot{\bar{Y}}<0\) when \(\Vert x\Vert >\sqrt{\frac{\theta b_V+b_{\varXi _1}}{\zeta ^2\lambda _m(\varUpsilon )}}\).\(\blacksquare \)
Theorem 7.1 indicates that with the utilization of medicine regulation mechanism, the MDRM-based optimal strategy can asymptotically stabilize the controlled system.
7.4 MDRM-Based Approximate Optimal Control Design
The approximate optimal control strategy is designed based on the ADP algorithm which integrates the medicine regulation mechanism. Furthermore, for the closed-loop controlled system, the asymptotically stability in the sense of UUB is guaranteed when the proposed medicine indication is applied.
7.4.1 Implementation of the Adaptive Dynamic Programming Method
In this section, the approximate optimal strategy is designed by the ADP method of single-critic framework which integrates the medicine regulation mechanism.
Based on the universal approximation properties of NN, \(V^{*}\) can be represented as
where \(\omega ^{*}\) is the ideal weight vector, \(\vartheta (\cdot )\) the activation function and \(\tau \) the approximate error. Let \(\varGamma _1(\breve{x}_\imath )=\frac{1}{2\hbar }\mathcal {g}^T(\breve{x}_\imath )\nabla \vartheta ^T(\breve{x}_\imath )\omega \), then we have
where \(\bar{\tau }(\breve{x}_\imath )=-(1/2)(1-\tanh ^2(\Phi (\breve{x}_\imath )))\mathcal {g}^T(\breve{x}_\imath )\nabla \tau (\breve{x}_\imath )\). Herein, \(\Phi (\breve{x}_\imath )\) is selected between \(1/(2\hbar )\mathcal {g}^T(\breve{x}_\imath )\nabla V^{*}(\breve{x}_\imath )\) and \(\varGamma _1(\breve{x}_\imath )\). As the ideal weight \(\omega ^{*}\) is unknown, the approximate version of \(V^{*}\) is derived by the critic NN, which is presented as
where \(\hat{\omega }\) is the approximate vector. Then the MDRM-based approximate strategy can be obtained
where \(\varGamma _2(\breve{x}_\imath )=1/(2\hbar )\mathcal {g}^T(\breve{x}_\imath )\nabla \vartheta ^T(\breve{x}_\imath )\hat{\omega }\). Then the approximate Hamiltonian could be restated as
where \(\xi =\nabla \vartheta (f+\mathcal {g}\breve{\mathcal {u}})-\theta \vartheta \).
The goal of tuning \(\hat{\omega }\) is to minimize the term \(\varepsilon _H\). Thus we set the target function as \(E=\frac{1}{2}\varepsilon _H^T\varepsilon _H\). Using the gradient descent approach, we obtain
where \(\ell \) is the learning parameter and \(\breve{\xi }=\xi /(\xi ^T \xi +1)^2\). Define \(\tilde{\omega }=\omega ^{*}-\hat{\omega }\). From (7.37) we derive that
where \(\bar{\xi }=\xi /(\xi ^T\xi +1)\) and the approximate residual error \(e_H=-\nabla \tau ^{T}(f+\mathcal {g}\breve{\mathcal {u}})+\theta \tau \). Before presenting the main results, the following assumptions are requisite [38, 39].
Assumption 7.2
The signal \(\bar{\xi }\) is persistently excited over the time interval \([t,t+T]\). In another word, there exists the positive constants \(\phi \) and T such that
with \(N_{c}\) being the neuron number of the critic network.
Assumption 7.3
The terms \(\bar{\tau }\) and \(e_H\) are both bounded. That is, \(\Vert \bar{\tau }\Vert \le b_{\bar{\tau }}\) and \(\Vert e_H\Vert \le b_{eH}\) where \(b_{\bar{\tau }}\) and \(b_{eH}\) are positive constants.
7.4.2 Stability Analysis
This section discuss the asymptotic stability of the controlled system with the designed DARM-based strategy.
Theorem 7.2
Consider system (7.6) and let Assumptions 7.1–7.3 hold. The strategy is given by (7.35) and the weights tuning law for critic is set as (7.37). Then the closed-loop system (7.6) and weight estimation error \(\tilde{\omega }\) are asymptotically stable in the sense of UUB provided that the medicine indication is applied
with \(\eta \in (0,1)\) being the regulation parameter.
Proof
Select the Lyapunov function as
Note that when medicine indication is applied, the system can be described by the impulsive model comprising two components. One is flow dynamics for \(t\in [z_\imath ,z_{\imath +1})\) and the other is jump dynamics for \(t=z_\imath \). Hence we present the discussions over the two cases.
Case I: No regulation occurs, i.e., \(t\in [z_\imath ,z_{\imath +1})\). Then we can obtain \(\dot{Y}_a=0\). In light of (7.22) and (7.23), we could derive that
where \(\varXi _2=-2\hbar (\tanh ^{-1}((\mathcal {u}^{*}-\alpha )/\hbar ))^T(\breve{\mathcal {u}}-\mathcal {u}^{*})\). According to Young’s inequation, we have
Recalling (7.27), we obtain
As \(\varXi _1(x)\) and \(V^{*}(x)\) are bounded, (7.42) becomes
Applying the Young’s inequation, we derive that
As \(|\tanh (\cdot )|\le 1\), it could be obtained that
where \(\sigma =16\hbar ^2+4b_{\bar{\tau }}^2+\theta b_V+b_{\varXi _1}\).
Taking the derivative of \(Y_c\), we derive that
In light of Young’s inequation, it yields that
Then (7.48) can be further expressed as
where \(\delta =\bar{\xi }\bar{\xi }^T\).
According to (7.47) and (7.50), when the medicine indication (7.40) is satisfied, we can derive that
Then it can be concluded that \(\dot{Y}<0\) when one of the conditions holds that
and
Thus x and \(\tilde{\omega }\) are demonstrated to be UUB.
Case II: A regulation occurs, i.e., \(t=z_\imath \). The difference of \(L_Y\) is presented as
From the analysis in Case I, it can be derived that \(\dot{L}_Y<0\) when (7.52) or (7.53) is satisfied. It can be further deduced that \(Y_b+Y_c\) is monotonically decreasing when \(t\in [z_\imath ,z_{\imath +1})\), that is,
where \(\epsilon \in (0,z_{\imath +1}-z_\imath )\). According to the properties of limits, we can obtain
with \(x(z_\imath ^+)=\lim _{\epsilon \rightarrow 0}x(z_\imath +\epsilon )\). More specially, it yields that
As x is proved to be UUB, it can be obtained that
From (7.57) and (7.58), it’s derived that \(\triangle Y<0\), which indicates that the constructed Lyapunov (7.41) is monotonically decreasing when \(t=z_\imath \). \(\blacksquare \)
Remark 7.3
\(\zeta \) in (7.40) is the regulation parameter determining the frequency of medicine dosage regulation. A large \(\zeta \) means that the medicine dosage is regulated frequently while a small \(\zeta \) implies the regulation occurs rarely. It can be set as an appropriate value according to the clinical data.
Remark 7.4
Theorem 7.2 indicates that the designed MDRM-based approximate optimal strategy (7.35) can asymptotically stabilize system (7.6). The medicine indication (7.40), the cornerstone of MDRM, can provide a reasonable reference threshold for therapeutic strategy. When the difference derived from the current clinical data and latest reference data is larger than the threshold, the medicine dosage can be regulated, and the current indication data will be recorded and utilized as the new reference data in the future. Thus the designed therapeutic strategy can be regulated timely and necessarily according to the medicine indication.
Remark 7.5
The discount factor is programmed to avoid infinity and infinitesimal value function in the accumulation of rewards, and immediately return can earn more than the delayed return of interest. In human trials, we have found that human prefer to immediately return can present close to exponential growth, the discount factor is used to simulate such a cognitive model and biological process to make a decision.
7.5 Simulation Study
In this section, we consider the system (7.6) which is the simplified version of the growth dynamics of cells and viruses described by (7.1)–(7.4). Based on system (7.6), the simulation experiment is conducted to show the effectiveness of the proposed ADP method with medicine regulation mechanism.
According to the clinical medical statistics and literatures [36, 37, 40], the parameters associated with the dynamics (7.1)–(7.4) are presented in Table 7.1. After the nondimensionalization, the corresponding parameters are set as \(a_1=0.36\), \(a_2=0.1\), \(b_1=0.36\), \(b_2=0.48\), \(b_3=0.16\), \(c_1=0.1278\), \(c_2=0.2\), \(c_3=0.036\), \(d_1=0.6\), and \(d_2=0.29\). The initial state vector is \([0.8,0,0.2,0.05]^T\). The minimum and maximum thresholds are given by \(\mathcal {u}_m=0\) and \(\mathcal {u}_M=0.02\). For the discounted value function (7.7) of system (7.6), the parameters \(\varUpsilon =0.2I_{4\times 4}\) and \(\theta =0.5\).
For the critic network, we select the activation function as \([x_1^2\), \(x_1 x_2\), \(x_1 x_3\), \(x_1 x_4\), \(x_2^2\), \(x_2 x_3\), \(x_2 x_4\), \(x_3^2\), \(x_3 x_4\), \(x_4^2]^{T}\). The other parameters are respectively set as \(K_{\mathcal {u}}=20\), \(\zeta =0.9\) and \(\ell =1.6\).
Simulation results demonstrate that in Figs. 7.1, 7.2, 7.3, 7.4, 7.5, 7.6 and 7.7. For model (7.5), the evolution trajectories of states are respectively depicted in Figs. 7.1, 7.2, 7.3 and 7.4. From Fig. 7.1, we could observe that under the attacks from oncolytic viruses and immune cells, the population of uninfected tumor cells rapidly declines and reaches a stabilizing value which is very low after \(t=150 d\). Figures 7.2 and 7.3 reveal the relations between the population of infected tumor cells and that of virus particles which is large proportional. The immune cells are activated by the uninfected and infected tumor cells to kill tumor cells, which can be observed from Fig. 7.4. The medicine dosage of the derived approximate optimal therapeutic strategy and that of initial strategy are compared in Fig. 7.5. From Fig. 7.5, one can derive that the dosage of the obtained strategy is obviously less than that of initial strategy. On the other hand, the input dosages of the two strategies are both constrained by the pre-designed thresholds. This is of great practical significance since excess medicines may well threaten the health of patients and cause a huge overhead. Furthermore, it can be observed that the medicine dosage regulation frequency steps down when the clinic data becomes better, which means that with the aid of medicine regulation mechanism, the medicine dosage can be regulated timely and necessarily. Figures 7.6 and 7.7 present the population curves of the cells and viruses under the derived strategy with different burst sizes of viruses, that is, \(\kappa =2,5\). This verified that the obtained therapeutic strategy can effectively kill tumor cells with oncolytic viruses of different burst out sizes. However, when the parameter \(\kappa \) is large enough, it may cause an oscillation. When the innate immune response is considered, the tumor-virus-immune system becomes very complicated. Though the viruses with large \(\kappa \) try their best to produce more replicas and infect more tumor cells, the reduction of tumor cells inactivate the immune response in the meanwhile. The viruses dominate the dynamics and the warfare between tumor cells and viruses can last a long time such that the oscillation occurs repeatedly. The oncolytic virus has the ability to effectively kill the tumor cells, while the immune response can reduce the killing-efficiency of the viruses and block their infections. Furthermore, the activated immune response can eliminate tumor cells as well. Thus there exists a subtle balance between the viruses and the immune cells which demands a further investigation.
7.6 Conclusion
Medicine regulation mechanism has been designed such that the constrained therapeutic strategy based on virotherapy can be obtained to eliminate tumor cells, guaranteeing that the medicine dosage can be regulated timely and necessarily. Firstly, a mathematical model is utilized to describe the relations among the uninfected tumor cells, infected tumor cells, oncolytic viruses and immune cells. Meanwhile, as the simplified version of the tumor-virus-immune model, the non-quadratic function is proposed to formulate the value function to acquire HJBE. Secondly, to address the optimal therapeutic strategy, single-critic architecture has been designed to seek the approximate solution of the HJBE through ADP. Finally, the simulation results has verified the effectiveness of the proposed method. Furthermore, nonzero-sum optimal control based on differential games will be a edge of the new frontier in therapy of tumor treatment, cardiovascular, orthodontic treatment, osteoporosis and cerebrovascular diseases.
References
Andtbacka RH, Kaufman HL, ..., Coffin RS (2015) Talimogene laherparepvec improves durable response rate in patients with advanced melanoma. J Clin Oncol 33(25):2780–2788
Wodarz D (2001) Viruses as antitumor weapons: defining conditions for tumor remission. Can Res 61(8):3501–3507
Wu JT, Byrne HM, Kirn DH, Wein LM (2001) Modeling and analysis of a virus that replicates selectively in tumor cells. Bull Math Biol 63(4):731–768
Wein LM, Wu JT, Kirn DH (2003) Validation and analysis of a mathematical model of a replication-competent oncolytic virus for cancer treatment: Implications for virus design and delivery. Can Res 63(6):1317–1324
Wodarz D (2003) Gene therapy for killing p53-negative cancer cells: use of replicating versus nonreplicating agents. Hum Gene Ther 14(2):153–159
Bajzer Z, Carr T, Josić K, Russell SJ, Dingli D (2008) Modeling of cancer virotherapy with recombinant measles viruses. J Theor Biol 252(1):109–12
Friedman A, Tian JP, Fulci G, Chiocca EA, Wang J (2006) Glioma virotherapy: effects of innate immune suppression and increased viral replication capacity. Can Res 66(4):2314–2319
Phan TA, Tian JP (2017) The role of the innate immune system in oncolytic virotherapy. Comput Math Methods Med 2017:6587258
Sun J, Zhang H, Yan Y, Xu S, Fan X (2023) Optimal regulation strategy for nonzero-sum games of the immune system using adaptive dynamic programming. IEEE Trans Cybernet 53(3):1475–1484
Zhao D, Wen G, Wu ZG, Lv Y, Zhou J (2023) Resilient consensus of multi-agent systems under collusive attacks on communication links. IEEE Trans Cybernet. https://doi.org/10.1109/TCYB.2022.3201909.
Zou B, Jiang H, Xu C, Xu J, You X, Tang YY (2023) Learning performance of weighted distributed learning with support vector machines. IEEE Trans Cybernet. https://doi.org/10.1109/TCYB.2021.3131424.
Zhang K, Jiang B, Ding SX, Zhou D (2022) Robust asymptotic fault estimation of discrete-time interconnected systems with sensor faults. IEEE Trans Cybernet 52(3):1691–1700
Littman ML (2015) Reinforcement learning improves behaviour from evaluative feedback. Nature 521(7553):445–451
Al-Dabooni S, Wunsch DC (2020) An improved n-step value gradient learning adaptive dynamic programming algorithm for online learning. IEEE Trans Neural Netw Learn Syst 31(4):1155–1169
Kamalapurkar R, Dinh H, Bhasin S, Dixon WE (2015) Approximate optimal trajectory tracking for continuous-time nonlinear systems. Automatica 51:40–48
Gao W, Jiang Z (2018) Learning-based adaptive optimal tracking control of strict-feedback nonlinear systems. IEEE Trans Neural Netw Learn Syst 29(6):2614–2624
Cui L, Xie X, Wang X, Luo Y, Liu J (2019) Event-triggered single-network ADP method for constrained optimal tracking control of continuous-time non-linear systems. Appl Math Comput 352:220–234
Zhong X, He H (2020) GrHDP solution for optimal consensus control of multiagent discrete-time systems. IEEE Trans Syst Man Cybernet: Syst 50(7):2362–2374
Zhang H, Zhang J, Yang G-H, Luo Y (2015) Leader-based optimal coordination control for the consensus problem of multiagent differential games via fuzzy adaptive dynamic programming. IEEE Trans Fuzzy Syst 23(1):152–163
Zheng X, Li H, Ahn C, Yao D (2023) NN-based fixed-time attitude tracking control for multiple unmanned aerial vehicles with nonlinear faults. IEEE Trans Aerosp Electron Syst. https://doi.org/10.1109/TAES.2022.3205566
Song R, Wei Q, Zhang H, Lewis FL (2021) Discrete-time non-zero-sum games with completely unknown dynamics. IEEE Trans Cybernet 51(6):2929–2943
Liu P, Sun J, Zhang H, Xu S, Liu Y (2023) Combination therapy-based adaptive control for organism using medicine dosage regulation mechanism. IEEE Trans Cybernet. https://doi.org/10.1109/TCYB.2022.3196003
Zhang Z, Xu J, Fu M (2022) Q-Learning for feedback nash strategy of finite-horizon nonzero-sum difference games. IEEE Trans Cybernet 52(9):9170–9178
Sun J, Zhang H, Wang Y, Sun S (2022) Fault-tolerant control for stochastic switched IT2 fuzzy uncertain time-delayed nonlinear systems. IEEE Trans Cybernet 52(2):1335–1346
Luo B, Liu D, Wu HN, Wang D, Lewis FL (2017) Policy gradient adaptive dynamic programming for data-based optimal control. IEEE Trans Cybernet 47(10):3341–3354
Zhang K, Jiang B, Chen M, Yan XG (2021) Distributed Fault estimation and fault-tolerant control of interconnected systems. IEEE Trans Cybernet 51(3):1230–1240
Zhang J, Zhang H, Feng T (2018) Distributed optimal consensus control for nonlinear multiagent system with unknown dynamic. IEEE Trans Neural Netw Learn Syst 29(8):3339–3348
Wang D (2020) Intelligent critic control with robustness guarantee of disturbed nonlinear plants. IEEE Trans Cybernet 50(6):2740–2748
Vamvoudakis KG, Miranda MF, Hespanha JP (2016) Asymptotically stable adaptive-optimal control algorithm with saturating actuators and relaxed persistence of excitation. IEEE Trans Neural Netw Learn Syst 27(11):2386–2398
Yang D, Li T, Xie X, Zhang H (2020) Event-triggered integral sliding-mode control for nonlinear constrained-input systems with disturbances via adaptive dynamic programming. IEEE Trans Syst Man Cybernet: Syst 50(11):4086–4096
Dong L, Zhong X, Sun C, He H (2017) Adaptive event-triggered control based on heuristic dynamic programming for nonlinear discrete-time systems. IEEE Trans Neural Netw Learn Syst 28(7):1594–1605
Ghasempour T, Nicholson GL, Kirkwood D, Fujiyama T, Heydecker B (2020) Distributed approximate dynamic control for traffic management of busy railway networks. IEEE Trans Intell Transp Syst 21(9):3788–3798
Chen N, Li B, Luo B, Gui W, Yang C (2023) Event-triggered optimal control for temperature field of roller kiln based on adaptive dynamic programming. IEEE Trans Cybernet 53(5):2805–2817
Dhebar Y, Deb K, Nageshrao S, Zhu L, Filev D (2023) Toward interpretable-ai policies using evolutionary nonlinear decision trees for discrete-action systems. IEEE Trans Cybernet. https://doi.org/10.1109/TCYB.2022.3180664.
Zhao J, Wang T, Pedrycz W, Wang W (2021) Granular prediction and dynamic scheduling based on adaptive dynamic programming for the blast furnace gas system. IEEE Trans Cybernet 51(4):2201–2214
Tian JP (2011) The replicability of oncolytic virus: defining conditions in tumor virotherapy. Math Biosci Eng 8(3):841–860
Al-Tuwairqi SM, Al-Johani NO, Simbawa EA (2020) Modeling dynamics of cancer virotherapy with immune response. Adv Differ Equ 2020:438
Yang X, He H (2020) Event-triggered robust stabilization of nonlinear input-constrained systems using single network adaptive critic designs. IEEE Trans Syst Man Cybernet: Syst 50(9):3145–3157
Vamvoudakis KG, Lewis FL (2010) Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5):878–888
Kuznetsov V, Makalkin I, Taylor M, Perelson A (1994) Nonlinear dynamics of immunogenic tumors: parameter estimation and global bifurcation analysis. Bull Math Biol 56(2):295–321
Kerbel RS, Bertolini F, Man S, Hicklin DA, Emmenegger U, Shaked Y (2006) Antiangiogenic drugs as broadly effective chemosensitizing agents. Angiogenesis, pp 195–212 (2006)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2024 The Author(s)
About this chapter
Cite this chapter
Sun, J., Xu, S., Liu, Y., Zhang, H. (2024). Adaptive Virotherapy Strategy for Organism with Constrained Input Using Medicine Dosage Regulation Mechanism. In: Adaptive Dynamic Programming. Springer, Singapore. https://doi.org/10.1007/978-981-99-5929-7_7
Download citation
DOI: https://doi.org/10.1007/978-981-99-5929-7_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-5928-0
Online ISBN: 978-981-99-5929-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)