1 Introduction

HIV is a virus that reproduces in the human body and weakens the immune system so that the infected individual has a hard time fighting off other viral diseases. HIV infection spreads mainly from person to person via unprotected sexual contact and sharing needles. In the absence of effective treatment, the HIV infection gets worse over time and eventually causes AIDS. The symptoms of HIV are classified into three stages: (1) acute HIV infection symptoms, (2) clinical latency symptoms (chronic HIV infection), and (3) AIDS symptoms. T-cells are a subdivision of white blood cells that have a significant functional role in the coordination of the body’s immune system. CD4+ T cells are known as “helper” cells that trigger the body’s response to infections. In the acute phase of the disease and without effective treatment, HIV can kill CD4+ T cells and overcome the immune system.

In epidemiology, mathematical models have a substantial function in predicting the behavior of transmission dynamics and evaluating control approaches for HIV, Ebola, SARS, cancerous cells, influenza epidemics, novel coronavirus disease (COVID-19), etc. Following the spread of the human immunodeficiency virus, numerous models have been suggested to investigate the dynamic behavior and curb HIV/AIDS infection (Jacques et al. 2016; Adams et al. 2005; Perelson et al. 1993; Wodarz and Nowak 2002). The immune system response is an effective tool in suppressing HIV disease in infected individuals. Hence, the effects of this nonpharmaceutical intervention are incorporated (Adams et al. 2005; Perelson et al. 1993). A novel mathematical model for HIV/AIDS by considering the compartments, including aware and unaware individuals, diagnosed and undiagnosed patients, and tracking infected people, has been developed Ayele et al. (2021). Also, an optimal control strategy using the forward-backward sweep (FBS) approach has been implemented on the proposed model in order to investigate the cost-effectiveness of the combination of the control efforts.

Since HIV/AIDS is a highly contagious and deadly viral disease, optimal control approaches have been proposed for the optimal management of drug injections. For instance, the Reverse Transcriptase Inhibitor (RTI) intervention as the antiretroviral class of drugs for the management and treatment of HIV is investigated (Wodarz and Nowak 2002; Moysis et al. 2018). Furthermore, the Protease Inhibitor (PI) method as a pharmaceutical intervention is considered (Kirschner et al. 1997; Barry 2018). Zurakowski et al. (2004) investigated the effects of target cells acting as hosts of the HIV virus on the rapid convergence and reduction in compromise to the immune system of infected individuals. The effects of antiretroviral therapy (ART) and highly active antiretroviral therapy (HAART) treatment are considered and illustrated as control strategies by Habibah and Sari (2018). The multi-drug dynamic approaches based on the combination of RTI and PI protocols are discussed (Wein et al. 1997; Adams et al. 2004). In this strategy, the dose of either of these two methods can be changed independently of each other and can be either continuous or on-off. The impacts of continuous drug-based measures using an external feedback control term as a therapeutic drug regimen on bolstering of the immune system components have been extensively used by Brandt and Chen (2001). Recently, silent cure treatment, known as structured cessation therapy, has received much attention in the medical literature (Thomas et al. 2020). One advantage of this method is that it can reduce the risk of HIV resistance to the current drug regimen. However, it is not a safe method for the immune system and, in some cases, may cause damage to CD4+ T cells.

Various control strategies have been examined in order to diminish the prevalence of infectious diseases and reduce the disease-death rate. To pick the best drug-dosing schedule, the nonlinear receding horizon control was used to synthesize feedback into HIV drug treatment Hyungbo et al. (2003). Lemos and Barao (2011) presented the adaptive nonlinear control method that includes a combination of LQ control, exact linearization, and a joint control Lyapunov function in order to design the estimation law for viral load in an HIV infection. A nonlinear robust adaptive control approach in the form of antiviral treatment and vaccination to decrease the number of susceptible individuals and increase the population of recovered humans against the flu is examined (Sharifi and Moradi 2017). Moreover, the analysis of the global stability for the HIV dynamic model using the geometric approach and the Lyapunov direct method based on the higher-order generalization of Bendixson’s criterion is introduced by Buonomo and Vargas-De-León (2012). Sweilam and Al-Mekhlafi (2017), developed the fractional-order into the HIV infection model in the sense of Caputo and characterized the optimal control for the HIV disease via Pontryagin’s maximum principle (PMP). The predictive control approaches to ensure the stability and suppression of the HID/AIDS infectious system are studied (Zurakowski and Teel 2006; Pinheiro et al. 2011; Alazabi et al. 2012). Based on gradient descent laws, a robust adaptive sliding mode controller is presented by Mahmoodabadi and Lakmesari (2021), for the antiretroviral therapy of HIV-1 infection. The proposed control approach is implemented in two steps, including designing a sliding mode controller for the HIV model and then adjusting the control parameters via the adaptive gradient descent laws. Taking into account the uncertain parameters in the HIV model, Fatemi and Mahmoodian (2021) have applied the error dynamic shaping (EDS) approach to curbing system state variables to track the desired trajectories. They also used the genetic algorithm (GA) method to recognize the free control parameters to reduce the dose of the drug prescribed to manage HIV. Izadbakhsh et al. (2021) investigated the novel adaptive observer-based robust controller to reduce the number of infected CD4+ T cells to zero asymptotically using Baskakov operators as a universal approximator for modeling the uncertainties of the HIV model. The results of this approach with the controller/observer based on the radial basis functions neural network (RBFNN) method have been compared.

The Linear Quadratic Regulator (LQR) procedure is an accepted and well-known method for controller synthesis of linear systems, although most mathematical models of biological systems, including the dynamics of the HIV virus and the immune response that will be addressed in this study, are nonlinear. In this regard, one of the emerging and effective approaches to synthesizing nonlinear sub-optimal controllers is the SDRE framework. Basically, the state-dependent Riccati equation is a systematic procedure for designing a nonlinear feedback controller in which the nonlinearities of the system are considered in a linear-like form. However, the pseudo-linearization method is not unidirectional and can be done in other ways, which creates flexibility in the design of the controller. The state-of-the-art Riccati equation methodology has been previously applied to several fields of engineering problems. For the first time, Banks et al. (2007) investigated the concepts of asymptotic convergence of the estimator and compensated system. In addition, the interpolation numerical method is examined for the numerical approximation of the solution to the SDRE for a wide range of nonlinear problems.

A comprehensive survey of the application of the state-dependent Riccati equation technique in various fields, including robotics, aircraft, surface vessels, aerospace, UAVs, and other nonlinear systems is investigated by Nekoo (2019). In order to minimize the energy consumption of the robot/prosthesis system, a nonlinear robust optimal controller for an active transfemoral prosthesis using sliding mode control and state-dependent Riccati equation control is considered by Bavarsad et al. (2020). The combination of the SDRE technique and sliding mode control to optimize the energy and reduce the effects of the model parametric uncertainties and ground reaction forces as nonparametric uncertainties in the form of an estimator-based nonlinear robust optimal controller for an active prosthetic leg for transfemoral amputees is investigated (Bavarsad et al. 2021b, a). Nasiri et al. (2021) applied a novel combination of the SDRE approach and the Function Approximation Technique (FAT) to the regulation and tracking of flexible-joint manipulators in the presence of uncertainties and disturbances as a function of time. A novel robust state-dependent coefficient (SDC) control considering the mismatched time-varying disturbances for uncertain Electrical Flexible-Joint Robot (EFJR) is presented by Nasiri et al. (2020). The optimal control schemes are implemented to minimize the average population of cancer cells, reduce the side effects of chemotherapy, enlarge the system’s domain of attraction, and reduce treatment time for a nonlinear immune oncology system by (Monfared et al. 2020, 2021). Shadi et al. (2021) investigated the SDRE approach to curbing the prevalence of Ebola disease by maintaining the nonlinear properties of the system in comparison with the Terminal Synergistic Control (TSC) method. The application of the SDRE strategy in the Air-Handling Unit (AHU) to regulate and circulate air in Heating, Ventilating, and Air Conditioning (HVAC) systems while preserving the nonlinear properties of the model is examined (Liavoli and Fakharian 2019, 2017), and the simulation results are compared with the LQR method. Based on the SDRE approach and pseudo-linearization technique, the tracking of reference paths by preserving the nonlinear nature of the DC microgrid system via optimal control is investigated by Shahradfar and Fakharian (2021). The optimality of the sliding mode control approach has been inspected through its combination with the SDRE technique by Korayem et al. (2019). For this aim, in the integral form of the sliding surface, the state-dependent differential Riccati equation (SDDRE) is applied. The resulting approach (practically and theoretically) is implemented on the Scout robot via the LabVIEW software. Since continuous measurement of CD4+ T cell numbers as the output of the HIV dynamic model can be considered impossible, the state-dependent impulsive observer, by considering the intracellular delay, carries out state estimation of the HIV model Kalamian et al. (2021). The presented impulsive observer is designed via an extended pseudo-linearization method.

The idea of using this method to control the HIV virus was first used by Kwon (2005), in which the pseudo-linearization is performed so that the coefficient matrix of the state is actually equivalent to the matrix obtained by the local linearization method. The use of output-based feedback controllers to control HIV seems to be necessary given that only the variables of CD4+ T state and viral load are measured. Using a predictive controller with output feedback is one of the methods used (Kwon 2005). Also, Zhang et al. (2012) used a state-of-the-art Riccati equation approach to design a system observer. Given that both the control and the observer are based on this method, it can be claimed that this method is highly compatible.

In this study, based on the state-dependent Riccati equation method, a nonlinear state feedback controller to managing HIV infection is illustrated. One of the greatest benefits of the SDRE is the ability to consider patient’s specific conditions by establishing appropriate weight matrix coefficients in the cost function and restricting the amount of drug administered to reduce the side effects. Furthermore, another advantage of this technique is that the state-dependent matrices can be formed in an infinite number of procedures. To examine the dynamic behavior of the HIV model, an ODE deterministic one-compartment system with three state variables, including uninfected CD4+ T cells (T), CD4+ T-infected cell particles \(({T}^{*})\), and free virus particles (v), is considered. Professional equipment is required to measure the size of free virion particles. In this regard, the extended Kalman filter can be considered as a state observer for the nonlinear system to estimate the immeasurable states. An SDRE-based optimal control has been implemented in order to restrict the concentration of virions and infected cells. We will show that the proposed approach fulfills the management problem of drug intervention. The effects of several weighting matrices on the SDRE execution have been examined. The numerical simulations and comparison with the Linear-Quadratic Regulator (LQR) method confirmed that the proposed optimal control strategies provide flexibility for drug administration.

The framework of this article is structured as follows. The deterministic compartmental model for HIV is investigated in Sect. 2. Section 3 includes the HIV dynamic model by considering treatment protocols as optimal control inputs. Section 4 contains the process of designing the control effort signal and obtaining it through the proposed method. The extended Kalman filter as a nonlinear estimator for some unknown variables is described in Sect. 5. Section 6 examines the behavior of the system around equilibrium points and then evaluates the numerical simulation results of the HIV model resulting from different cases of selecting tuning coefficients of cost function. Finally, the conclusion is presented in Sect. 7.

2 Dynamic Model of HIV

In this section, the deterministic model of HIV described by Barão and Lemos (2007) is considered. This model can well analyze the effects of this viral disease on the immune systems of infected individuals by interpreting the interaction between compartments of healthy CD4+ T cells, CD4+ T-infected cells, and the virus.

The CD4+ T cells are produced and depleted at constant rates of s and d, respectively. Also, the number of uninfected CD4+ T cells will decrease at a rate corresponding to the number of viral cells, which is determined by the \(\beta \) constant. The variation rate of infected CD4+ T cells is affected by the infection of healthy CD4+ T cells and the natural death of infected cells at a constant rate of \(m_2\). The change rate of virus cells from unhealthy CD4+ T cells is affected by a constant k rate and a constant death rate \(m_1\) (Landi et al. 2008; Craig et al. 2004; Barão and Lemos 2007).

Based on the aforementioned statements, the transmission dynamics of the HIV model are given as follows:

$$\begin{aligned} \begin{aligned} {\dot{T}}\left( t \right)&=s-dT\left( t \right) -\beta T\left( t \right) v\left( t \right) \\ {{{{\dot{T}}}}^{*}}\left( t \right)&=\beta T\left( t \right) v\left( t \right) -{{m}_{2}}{{T}^{*}}\left( t \right) \\ {\dot{v}}\left( t \right)&=k{{T}^{*}}\left( t \right) -{{m}_{1}}v\left( t \right) . \end{aligned} \end{aligned}$$
(1)

where the numerical values and explanation of the aforementioned parameters are mentioned in Table 1.

Table 1 Interpretations and values of HIV model parameters Barão and Lemos (2007)

3 Dynamic model of HIV with Treatment

In this section, the preventive pharmaceutical measures in order to suppress HIV replication are introduced. The pair of time-varying control variables \((u_1(t), u_2(t))\) is considered as RTIs and PIs, respectively. The HAART treatment involves the use of several drugs with different antivirals that keep the immune system functioning optimally and suppress the viruses in vivo. Antiviral drugs are divided into two major categories: RTIs and PIs, which target the HIV replication cycle extracellularly. There are other different drugs, such as non-nucleoside reverse transcriptase inhibitors (NNRTIs), nucleoside reverse transcriptase inhibitors (NRTIs), and fusion inhibitors (FIs), but since RTI and PI are more common, they are considered in this study. RTIs prevent CD4+ T cell infection, while PI inhibits the production of new viruses. To fulfill this aim, the proposed model, including the pharmaceutical measures, is rewritten as follows:

$$\begin{aligned} \begin{aligned}&{\dot{T}}\left( t \right) =s-dT\left( t \right) -\left( 1-{{u}_{1}}\left( t \right) \right) \beta T\left( t \right) v\left( t \right) \\&{{{{\dot{T}}}}^{*}}\left( t \right) =\left( 1-{{u}_{1}}\left( t \right) \right) \beta T\left( t \right) v\left( t \right) -{{m}_{2}}{{T}^{*}}\left( t \right) \\&{\dot{v}}\left( t \right) =\left( 1-{{u}_{2}}\left( t \right) \right) k{{T}^{*}}\left( t \right) -{{m}_{1}}v\left( t \right) , \end{aligned} \end{aligned}$$
(2)

where the terms \(\left( 1-{{u}_{1}}\left( t \right) \right) \) and \(\left( 1-{{u}_{2}}\left( t \right) \right) \) indicate the efficacy of using RTI and PI therapies. The following definition is considered to simplify the control design framework:

$$\begin{aligned}x=\left( \begin{matrix} T \\ {{T}^{*}} \\ v \\ \end{matrix} \right) ,\,\,\,\,\,u=\left( \begin{matrix} {{u}_{1}} \\ {{u}_{2}} \\ \end{matrix} \right) .\end{aligned}$$

According to the aforementioned notation, the HIV model in (2) can be illustrated in the following affine form:

$$\begin{aligned} {\dot{x}}=f\left( x \right) +g\left( x \right) u, \end{aligned}$$
(3)

where the value of \(f\left( x \right) \) and \(g\left( x \right) \) will be determined in Sect. 4.

4 Optimal SDRE Control

In general, the implementation of optimal control for nonlinear systems is not straightforward like in linear systems, because the Hamilton–Jacobi–Bellman (HJB) equations do not have a closed solution. Although approximate numerical methods can be used to solve these equations, the point is that these methods are usually approximate and may differ greatly from the actual solution if the degree of nonlinearity of the main system is raised Monfared et al. (2020). On the other hand, approximate responses result in a loss of optimality and may yield a sub-optimal response.

The application of linear control methods in the treatment of viral disease models will not be efficient because the linearization of the nonlinear dynamic model will impose a lot of modeling errors, and as a result, the controller may not perform in the linearized region of the system, which may lead to exacerbation of the disease. One of the optimal controller design methods for nonlinear systems is the state-dependent Riccati equation technique. The state-dependent Riccati equation method, in addition to satisfying the concept of stability and performance, also guarantees robustness properties for a wide class of nonlinear systems. The theory of SDRE approach is discussed in detailed by (Mracek and Cloutier 1998; Çimen 2008). The SDRE method operates on a pseudo-linear structure of a nonlinear system, and the related response is obtained using the state-dependent Riccati algebraic equation at each sampling time. In this regard, the nonlinear system is considered as (3), where \(x\in {{\Re }^{n\times n}}\) and \(u\in {{\Re }^{n\times m}}\) are the state and input variables of the system, respectively. Note that in general, the relation \(f(0)=0\) is not true. In particular, the dynamic model considered in this paper does not meet this requirement. Therefore, f(x) can be separated into two terms:

$$\begin{aligned}a \quad {\text {and}} \quad f(x)-a.\end{aligned}$$

We can select a so that \(a=f(0)\); then, the system (3) can be rewritten in the following linear-like form:

$$\begin{aligned} {\dot{x}}=A\left( x \right) x+a+B\left( x \right) u, \end{aligned}$$
(4)

where \(A(x)x+a=f(x)\) and \(B(x)=g(x)\). In (4), \(A\left( x \right) \in {{\Re }^{n\times n}}\) and \(B\left( x \right) \in {{\Re }^{n\times m}}\) are the matrices of state-dependent coefficients in pseudo-linear form. Obviously, in the pseudo-linearization procedure, obtaining the linear-like representation of a nonlinear system is not unique, and A(x) and B(x) matrices are functions of the states of the system Itik et al. (2010).

The ultimate goal is to obtain the sub-optimal controller such that it minimizes the following cost function for the HIV model:

$$\begin{aligned} J=\frac{1}{2}\int _{0}^{\infty }{\left( {{\left( x-y \right) }^{T}}Q(x)\left( x-y \right) +{{u}^{T}}R(x)u \right) {\text {d}}t}, \end{aligned}$$
(5)

where \(Q (x) \ge 0\) and \(R (x) > 0\) are state-dependent coefficient matrices and symmetric. In the cost function (5), the variable y demonstrates the disease-free state for the infectious individual. In this study, the constraints on the states and inputs of the HIV infectious model are considered and must be met during the controller design process. Proper weighting coefficient selection for Q(x) and R(x) can play a fundamental role in satisfying constraints and developing control problem. On the other hand, these constraints can be applied to the Hamiltonian equation (Itik et al. 2010; Shadi et al. 2021). For the proposed optimal control problem, the Hamiltonian equation is as follows:

$$\begin{aligned} \begin{aligned} H\left( x,u,\lambda \right)&=\frac{1}{2}\left( {{\left( x-y \right) }^{T}}Q\left( x-y \right) +{{u}^{T}}R(x)u \right) \\&\quad +{{\lambda }^{T}} \left( A(x)x + a(x) + B(x)u \right) \\&\quad -{{{\hat{w}}}^{T}}\left( {{u}_{\max }}-u \right) -{{\bar{w}}^{T}}\left( u-{{u}_{\min }} \right) , \end{aligned} \end{aligned}$$
(6)

where \(\lambda \) is an adjoint (costate) variable. In (6), \(\bar{w}\) and \({\hat{w}}\) are the non-negative m-dimensional punishment vectors that can be used to introduce bounded control input into the problem and must meet the following constraint:

$$\begin{aligned} {{{\hat{w}}}^{T}}\left( {{u}_{\max }-u}\right) ={{\bar{w}}^{T}}\left( u-{{u}_{\min }} \right) =0. \end{aligned}$$

The corresponding necessary conditions of Pontryagin’s maximum principle used in solution process are as follows:

$$\begin{aligned}&{\dot{x}}=\frac{\partial H}{\partial \lambda }=A\left( x \right) x+a+B\left( x \right) u, \end{aligned}$$
(7)
$$\begin{aligned}&{\dot{\lambda }}=-\frac{\partial H}{\partial x}=-Q\left( x-y \right) -{{\left( \frac{d\left( A\left( x \right) \right) }{dx} \right) }^{T}}\lambda -{{\left( \frac{d\left( B\left( x \right) u \right) }{dx} \right) }^{T}}\lambda \nonumber \\&\qquad \qquad \quad -{{\left( \frac{d\left( a\left( x \right) \right) }{dx} \right) }^{T}}\lambda , \end{aligned}$$
(8)
$$\begin{aligned}&0=\frac{\partial H}{\partial u}=B{{\left( x \right) }^{T}}\lambda -\bar{w}+{\hat{w}}+R(x)u. \end{aligned}$$
(9)

Using (9) yields the optimal control in the following form:

$$\begin{aligned} u(t)=-{{R}^{-1}}(x)\left( B^{T}{{\left( x \right) }}\lambda -\bar{w}+{\hat{w}} \right) . \end{aligned}$$
(10)

By implementing the LQR control theory, the adjoint variable \(\lambda \) can be obtained as follows:

$$\begin{aligned} \lambda =b\left( x \right) +P\left( x \right) x. \end{aligned}$$
(11)

In (11), the symmetric matrix \(P\left( x \right) \in {{\Re }^{n\times n}}\) is state dependent. Considering the constrained control input, the sub-optimal control can be obtained as follows:

$$\begin{aligned} u\left( t \right) =\min \left( \max \left( {\tilde{u}}(t),{{u}_{\min }} \right) ,{{u}_{\max }} \right) , \end{aligned}$$
(12)

where the constants \(u_{min}\) and \(u_{max}\) denote the minimum and maximum boundaries of the control effort, respectively, and

$$\begin{aligned} {\tilde{u}}\left( t \right) =-{{R}^{-1}(x)}{{B}^{T}}\left( x \right) \left( P\left( x \right) x+b\left( x \right) \right) . \end{aligned}$$
(13)

The function \(b\left( x \right) \) is given by:

$$\begin{aligned}&b\left( x \right) ={{\left( {{A}^{T}}\left( x \right) -P\left( x \right) B\left( x \right) {{R}^{-1}}{{B}^{T}}\left( x \right) \right) }^{-1}}\nonumber \\&\quad \left( Qy-P\left( x \right) a \right) , \end{aligned}$$
(14)

where the symmetric and positive definite matrix P(x) is calculated by solving the algebraic Riccati equation Çimen (2008):

$$\begin{aligned}&{{A}^{T}}\left( x \right) P\left( x \right) +P\left( x \right) A\left( x \right) -P\left( x \right) B\nonumber \\&\quad \left( x \right) {{R}^{-1}}{{B}^{T}}\left( x \right) P\left( x \right) +Q\left( x \right) =0. \end{aligned}$$
(15)

The asymptotic local stability of the above system can be illustrated in the form of the following theorem.

Theorem 1

Itik et al. (2010) Suppose the following system

$$\begin{aligned}{\dot{x}}=f\left( x \right) +g\left( x \right) u,\end{aligned}$$

where f(x) and \({\partial f\left( x \right) }/{\partial {{x}_{i}}}\) \((i=1,\dots ,n)\) are continuous for all x in \(\left\| x \right\| <r\) and \(r>0\), and f(x) can be presented in linear-like form, \(f(x)=A(x)x\). In addition, suppose that A(x) and B(x) are continuous and the system defined in (4) is stable and even (A(x), Q) in a zero-order neighborhood \(\Omega \in {{B}_{r}}\left( 0 \right) \). In this case, the system will be locally asymptotically stable via optimal control (13).

The SDRE approach is applied to the system via the LQR controller at each time point. Therefore, the cost function is minimized at each time instance.

5 Kalman Filter

Kalman filtering is an algorithm for estimating unknown variables based on measurements taken over time. In general, Kalman filter has two steps, update and predict. First, the estimated state from the previous time instant is used to generate an estimate of the state at the present time, and then, the current prediction is combined with current observation information to correct the state estimate. The Kalman filter has demonstrated its efficiency in various applications.

5.1 Extended Kalman Filter

The Extended Kalman Filter (EKF) is an approximate filter for nonlinear systems based on first-order linearization. The application of the Kalman filter to state estimation problems for linear systems with unknown parameters is well known and widely developed. The extended Kalman filter consists of linearizing the nonlinear system along a nominal state trajectory and approximating the probability density function (PDF) as Gaussian in the posterior (Kwon 2005; Kalamian et al. 2021).

Consider the nonlinear system as follows:

$$\begin{aligned}&{\dot{x}}(t)=f(x(t),u(t))+W(t) \\&z(t)=h(x(t))+V(t), \\ \end{aligned}$$

where f and h are two differentiable functions. Besides, W(t) represents the vector of process noise that is supposed to be zero-mean white Gaussian, with the covariance \({\tilde{Q}}(t)\), i.e., \(W(t)\sim {\mathcal {N}} (0,{\tilde{Q}}(t))\), and V(t) is the observation noise vector that is supposed to be zero-mean white Gaussian, with the covariance \({\tilde{R}}(t)\), i.e., \(V(t)\sim {\mathcal {N}} (0,{\tilde{R}}(t))\).

Define the Jacobian matrix as follows:

$$\begin{aligned}&F(t)={ \frac{\partial f}{\partial x}|_{{\hat{x}}(t),u(t)}}, \\&H(t)={{ \frac{\partial h}{\partial x} }|_{{\hat{x}}(t)}}. \\ \end{aligned}$$

The following equations constituent of the EKF:

$$\begin{aligned}&\frac{{\text {d}}{\hat{x}}(t)}{{\text {d}}t}=f({\hat{x}}(t),u(t))+{\tilde{K}}(t)\left[ z(t)-h({\hat{x}}(t)) \right] ,\\&\quad {\hat{x}}(0)={{x}_{0}} \\&\frac{{\text {d}}{\tilde{P}}(t)}{{\text {d}}t}=F(t){\tilde{P}}(t)+{\tilde{P}}(t){{F}^{T}}(t)\\&\quad +{\tilde{Q}}(t)-{\tilde{K}}(t){\tilde{R}}(t){{{{\tilde{K}}}}^{T}}(t),\,\,\,\,\,{\tilde{P}}(0)=\psi , \\ \end{aligned}$$

where \({\hat{x}}(t)\) and \({\tilde{P}}(t)\) are the predicted state and predicted estimate covariance, respectively. For the EKF, the gain \({\tilde{K}}(t)\) obtained as follows:

$$\begin{aligned}{\tilde{K}}(t)={\tilde{P}}(t){{H}^{T}}(t){{{\tilde{R}}}^{-1}}(t).\end{aligned}$$

Figure 1 depicts the suggested control system strategy, which includes the EKF estimator.

Fig. 1
figure 1

Proposed control system approach including the EKF estimator

6 Simulation

In this section, the dynamic behavior of the HIV infection near equilibrium points is investigated. The equilibrium points of the system (1) without considering the inputs are obtained as follows:

$$\begin{aligned}&{\dot{T}}\left( t \right) =0\Rightarrow s-dT-\beta Tv=0 \nonumber \\&{{\dot{T}}^{*}}\left( t \right) =0\Rightarrow \beta Tv-{{m}_{2}}{{T}^{*}}=0 \nonumber \\&{\dot{v}}\left( t \right) =0\Rightarrow k{{T}^{*}}-{{m}_{1}}v=0. \end{aligned}$$
(16)

In the absence of infection in the population, i.e., \({T}^{*}=v=0\), the disease-free equilibrium point of the nonlinear dynamic model of HIV is given as follows:

$$\begin{aligned} \left( \begin{array}{l} {}{\frac{s}{d}} \\ {} 0 \\ {} 0 \\ \end{array} \right) , \end{aligned}$$
(17)

and while \(v\ne 0\), the endemic equilibrium point is

$$\begin{aligned}&{{\varepsilon }_{E}}=\left( \frac{{{m}_{1}}{{m}_{2}}}{k\beta },\frac{s}{{{m}_{2}}}-\frac{{{m}_{1}}d}{k\beta },\frac{ks}{{{m}_{1}}{{m}_{2}}} -\frac{d}{\beta } \right) \nonumber \\&\quad =\left( 240,21.6667,902.778 \right) . \end{aligned}$$
(18)

The first equilibrium point is proportional to the disease-free equilibrium point, whereas the second one corresponds to the state when the patient enters the latent period of illness. In the first phase of the illness, when a person becomes infected with HIV, the concentration of viruses and infected CD4+ T cells is extremely small (close to zero), and the number of healthy CD4+ T cells is in normal condition. Figure 2 depicts the transient time response to the model (2) over time when \(u_1(t)=u_2(t)=0\). As can be observed, the dramatic increase in the concentration of virions and infected cells is clear, together with a significant decrease in the number of healthy CD4+ T cells.

Fig. 2
figure 2

Time response of HIV-infected system without drug therapy

In order to carry out the feedback control design, the current value of the state variables is required. In this study, it is assumed that the first two state variables of the system are available. In other words, the third state variable that expresses the number of viruses is measured at any given moment, so the output of the system is determined as follows:

$$\begin{aligned} z=\left( \begin{matrix} 0 &{} 0 &{} 1 \\ \end{matrix} \right) x+\nu \left( t \right) =Cx\left( t \right) +\nu \left( t \right) , \end{aligned}$$
(19)

where z represents the number of viruses. It is assumed that the process noise W(t) is equal to zero, and the number of free virus particles is measured, in the presence of the white Gaussian noise \(\nu (t)\) with the covariance \({\tilde{R}}(t)\).

6.1 Evaluation of the Sub-optimal Control Approach for Injection of Drugs

In this subsection, the impact of nonpharmaceutical interventions using the sub-optimal control mechanism is investigated. Based on (2) and according to the notations that are provided in Sect. 3, the matrices A(x) and B(x) can be chosen as follows:

$$\begin{aligned}A(x)=\left[ \begin{matrix} -d &{} 0 &{} \beta {{x}_{1}} \\ \beta {{x}_{3}} &{} -{{m}_{2}} &{} 0 \\ 0 &{} k &{} -{{m}_{1}} \\ \end{matrix} \right] \quad {\text {and}}\quad B(x)=\left[ \begin{array}{*{35}{l}} \beta {{x}_{1}}{{x}_{3}} &{} 0 \\ -\beta {{x}_{1}}{{x}_{3}} &{} 0 \\ 0 &{} -k{{x}_{2}} \\ \end{array} \right] .\end{aligned}$$

The main aim is to seek control efforts \(u_1(t)\) \((0 \le u_1 \le 1, \text {RTIs})\) and \(u_2(t)\) \((0 \le u_2 \le 1, \text {PIs})\) that minimize the cost functional (5) subject to reach the following disease-free equilibrium point:

$$\begin{aligned} \varepsilon _{0}=\begin{pmatrix} T_0,T_{0}^{*},v_0 \end{pmatrix} = \begin{pmatrix} 500, 0 , 0 \end{pmatrix}. \end{aligned}$$
(20)

In order to design the optimal control process, it is assumed that the infected individual is currently at the onset of disease progression. In this regard, the initial conditions of the HIV model for numerical simulation are given as follows:

$$\begin{aligned}&T\left( 0 \right) =240, \nonumber \\&{{T}^{*}}\left( 0 \right) =23, \nonumber \\&v\left( 0 \right) =935 . \end{aligned}$$
(21)

For the carry out of optimal control design, the weighting matrix coefficients of the cost functional (5) are chosen as follows:

$$\begin{aligned} Q=\left[ \begin{matrix} 1 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 1 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 1 \\ \end{matrix} \right] ,\,\,\,\,\,\,\,R=\left[ \begin{matrix} 1 &{}\quad 0 \\ 0 &{}\quad 1 \\ \end{matrix} \right] . \end{aligned}$$
(22)

In the presence of pharmaceutical intervention, Fig. 3 depicts the behavior of the dynamic model of HIV infection. In this case, by selecting the constant weighting matrices Q and R as (22), the number of healthy CD4+ T cells increases from the corresponding initial value of 240 cells/mm3 and, as expected, tends an optimal value of 500 cells/mm3 over time. Furthermore, the concentration of infected CD4+ T cells and virus cells starts at 23 cells/mm3 and 935 cells/mm3 and eventually attains an optimal value of \(2.65\times {{10}^{-6}}\) cells/mm3 and \(9.22\times {{10}^{-5}}\) cells/mm3, respectively. Figure 4 demonstrates how a combination of preventive measures \((u_1(t), u_2(t))\) affects HIV replication suppression. In order to minimize the objective functional (5), the optimal control \(u_2(t)\) is maintained at the maximum level of \(100\%\) for about 55 days before relaxing to the minimum in the final time. The sizes of CD4+ T-infected cell particles \(({T}^{*})\) and free virus particles (v) are diminished when the sub-optimal strategies are in place. It can be inferred that the combination of both control efforts is significantly effective in mitigating the HIV life cycle.

Fig. 3
figure 3

Time response of dynamic HIV states under optimal control when Q and R are both identity matrices (vertical axis plotted logarithmically)

Fig. 4
figure 4

Evolution of drug injection dose (control efforts) over time

Here, the influence of various values of Q and R matrices on the behavior of the HIV model under sub-optimal control is investigated. First, suppose that the Q matrix is larger than the R matrix. For instance, \(Q=100\times {{I}_{(3\times 3)}}\) and \(R={{I}_{(2\times 2)}}\). In this case, Figs. 5 and 6 illustrate the dynamic response of the HIV model under pharmaceutical interventions and the dose of drug injected over time, respectively. Figures 7 and 8 depict the behavior of the optimality system in the case of R matrix that is much larger than the Q matrix.

Comparing Figs. 4, 5, 6 and 7, it can be seen that the convergence occurs later when the matrix Q is much larger than the matrix R, and the convergence error is very low. This is because that the input cost is much lower than the cost of the system states.

Fig. 5
figure 5

Time response of the HIV disease with the optimal control input applied when the Q matrix is much larger than R

Fig. 6
figure 6

Evolution of the drug injection dose (control inputs) over time when the Q matrix is much larger than the R matrix

Fig. 7
figure 7

Time response of the HIV disease with the optimal control input applied when the R matrix is much larger than Q

Fig. 8
figure 8

Evolution of drug injection dose (control inputs) over time when R matrix is much larger than Q matrix

6.2 Implementation of the Optimal Observer to Estimate the Dynamic States of HIV Disease

In this subsection, to estimate the unknown variables of the HIV model, we intend to evaluate the performance of the developed Kalman filter. It is assumed that at any given time, some states of the system are measured and that the measured data contain additive white Gaussian noise. In this regard, we investigate the power output of noise in three modes: weak, medium, and strong. It can be said that in the case of white Gaussian noise with a mean of zero, the covariance matrix \({\tilde{R}}\) can be an appropriate criteria for the noise magnitude. Therefore, the three cases with the covariance of the noises \({\tilde{R}}=1\), \({\tilde{R}}=10\), and \({\tilde{R}}=50\) are considered. In this approach, both the sub-optimal controller and the optimal observer are applied simultaneously to the system. The HIV model and optimal observer are equipped with initial conditions as follows:

$$\begin{aligned}&T\left( 0 \right) =240,\,\,\,\,\,{\hat{T}}\left( 0 \right) =800 \nonumber \\&{{T}^{*}}\left( 0 \right) =23,\,\,\,\,\,{{{{\hat{T}}}}^{*}}\left( 0 \right) =400 \nonumber \\&v\left( 0 \right) =935,\,\,\,\,\,{\hat{v}}\left( 0 \right) =800, \end{aligned}$$
(23)

where the initial conditions of the HIV model are selected as (21), and the initial conditions of the estimated system are chosen based on speculation.

Fig. 9
figure 9

States of the HIV model and corresponding estimations for the identity weighting matrices (22) based on the state matrix A(x) and \({\tilde{R}}=1\)

In order to evaluate the performance of the proposed optimal observer, Fig. 9 depicts the response of the optimality model and estimation of the state variables of the HIV model when system states are unavailable and the output measurement is associated with a noise \({\tilde{R}}=1\). As shown, the dynamic states are able to converge to the desired values during the time period. Furthermore, Fig. 10 demonstrates the error between system states and the estimation of the state variables for \({\tilde{R}}=1\). The estimation error, in this case, tends to be negligible after a short period of time (approximately twenty-two days) and, due to the noise of the measurements, it will be accompanied by a small amount of noise.

Fig. 10
figure 10

Time response error graph of estimation of HIV dynamic states in the presence of measured noise \({\tilde{R}}=1\)

In Figs. 11, 12, 13 and 14, the transient response of the system states, their estimation, and the corresponding estimation error for medium and strong noise are presented. As expected, the estimation error increases with increasing noise intensity, or in other words, the output noise covariance matrix.

Fig. 11
figure 11

States of the HIV model and corresponding estimations for the identity weighting matrices (22) based on the state matrix A(x) and \({\tilde{R}}=10\)

Fig. 12
figure 12

Time response error graph of estimation of HIV dynamic states in the presence of measured noise \({\tilde{R}}=10\)

Fig. 13
figure 13

States of the HIV model and corresponding estimations for the identity weighting matrices (22) based on the state matrix A(x) and \({\tilde{R}}=50\)

Fig. 14
figure 14

Time response error graph of estimation of HIV dynamic states in the presence of measured noise \({\tilde{R}}=50\)

To evaluate and analyze the performance of the proposed control approach, two well-known performance indicators, namely root mean square error (RMSE) based on steady-state error and control effort signal, are considered as follows:

$$\begin{aligned}&\text {RMS}{{\text {E}}_{i}}=\sqrt{\frac{1}{T}\int \limits _{0}^{T}{{{({{x}_{i}}(t)-{{{{\hat{x}}}}_{i}}(t))}^{2}}dt}},\,\,\,\,\,\,\,\,\,i=1,2,3 \end{aligned}$$
(24)
$$\begin{aligned}&\text {RMS}{{\text {U}}_{j}}=\sqrt{\frac{1}{T}\int \limits _{0}^{T}{{{({{u}_{j}}(t))}^{2}}dt}},\,\,\,\,\,\,\,\,\,j=1,2. \end{aligned}$$
(25)

The results of the aforementioned numerical indicators for different cases are tabulated in Table 2.

Table 2 Numerical indicators of \(\text {RMS}{{\text {E}}_{i}}\), \(\text {RMS}{{\text {U}}_{i}}\)

6.3 Comparison of SDRE and LQR Approaches in HIV Suppression

In this subsection, in order to compare the advantages of the proposed technique with the well-known Linear Quadratic Regulator (LQR) approach, we examine the system behavior under both of the mentioned optimal control methods. To realize the initiative, based on the notation \(x_1=T\), \(x_2=T^{*}\), and \(x_3=\nu \), the HIV model (2) is linearized around the equilibrium points as follows:

$$\begin{aligned} A=\left[ \begin{matrix} -\beta x_{3}^{*}-d &{} 0 &{} -\beta x_{1}^{*} \\ \beta x_{3}^{*} &{} -{{m}_{2}} &{} \beta x_{1}^{*} \\ 0 &{} k &{} -{{m}_{1}} \\ \end{matrix} \right] , \qquad B=\left[ \begin{matrix} \beta x_{1}^{*}x_{3}^{*} &{} 0 \\ -\beta x_{1}^{*}x_{3}^{*} &{} 0 \\ 0 &{} -kx_{2}^{*} \\ \end{matrix} \right] , \end{aligned}$$
(26)

where the matrices A and B are the state transition matrix and input matrix, respectively. According to the numerical values of the model parameters presented in Table 1, at the disease-free equilibrium point (20), we have:

$$\begin{aligned} {{A}|_{{{\varepsilon }_{0}}}}=\left[ \begin{matrix} -0.02 &{} 0 &{} -0.012 \\ 0 &{} -0.24 &{} 0.012 \\ 0 &{} 100 &{} -2.4 \\ \end{matrix} \right] , \qquad {{B}|_{{{\varepsilon }_{0}}}}=\left[ \begin{matrix} 0 &{}\quad 0 \\ 0 &{}\quad 0 \\ 0 &{}\quad 0 \\ \end{matrix} \right] . \end{aligned}$$
(27)

The eigenvalues of \({{A}|_{{{\varepsilon }_{0}}}}\) can be obtained as \((-0.020, 0.218, -2.858)\). This means that the system is unstable at this equilibrium point. Similarly, for the endemic equilibrium point (18), we obtain:

$$\begin{aligned}&{{A}|_{{\varepsilon }_{E}}}=\left[ \begin{matrix} -0.04167 &{} 0 &{} -0.00576 \\ 0.02167 &{} -0.24 &{} 0.00576 \\ 0 &{} 100 &{} -2.4 \\ \end{matrix} \right] , \nonumber \\&{{B}|_{{\varepsilon }_{E}}}=\left[ \begin{matrix} 5.2 &{}\quad 0 \\ -5.2 &{}\quad 0 \\ 0 &{}\quad -2167 \\ \end{matrix} \right] , \end{aligned}$$
(28)

where the eigenvalues of \({{A}|_{{{\varepsilon }_{E}}}}\) can be calculated as \((-0.0199+0.0658i, -0.0199-0.0658i, -2.6418+0.0000i)\). So the system is stable at endemic equilibrium point. In fact, the equilibrium point \({{{\varepsilon }_{E}}}\) corresponds to the phase at which the patient is asymptomatic. The main goal at this step is to design an optimal controller to enhance healthy CD4+ T cells to a normal level and reduce infected CD4+ T cells and viral load as much as possible.

The linear quadratic regulator was chosen because this method is able to overcome disturbance and can be tracked optimally without a concern about controllability matrix rank reduction. The weighting matrices of the optimal approach can be chosen to be either a function of state variables or constant. So, the weighting coefficients are considered as follows:

$$\begin{aligned} R={\text {diag}}\,\left( 10,10 \right) , Q={\text {diag}}\, \left( {10}^{2},{10}^{2},{10}^{2} \right) . \end{aligned}$$

Figure 15 shows that after about 45 days of drug administration, the behavior of the HIV model under the proposed SDRE method was able to converge to the desired number of healthy CD4+ T cells in the normal situation. In addition, at the onset of treatment, the concentration of virions and infected CD4+ T cells is reduced dramatically.

Fig. 15
figure 15

Time response of the HIV model during drug therapy under the SDRE control approach

The results of applying the LQR controller in a similar situation are presented in Figure 16.

Fig. 16
figure 16

Time response of the HIV model during drug therapy under the LQR control method

By qualitatively comparing Figs. 15 and 16, it can be observed that the convergence rate of the system states to the disease-free equilibrium in the LQR method is slower than the proposed SDRE approach.

7 Conclusion

Recent developments in the state-dependent Riccati equation (SDRE) technique as a systematic approach in the field of optimal control design for nonlinear systems are quite apparent. In this paper, a nonlinear sub-optimal control method based on the SDRE to suppress the viral evolution in vivo and break the cycle of HIV replication via the medication regimen is presented. In order to examine HIV remission, the effects of multiple pharmaceutical interventions, such as Reverse Transcriptase Inhibitors (RTIs) and Protease Inhibitors (PIs) treatments, as the optimal control efforts are investigated. The significant advantage of this control approach for drug regimen management is that once the concentration of CD4+ T-infected cell particles \((T^*)\) and free virus particles (v) is decreased to the lowest levels, drug dosage can be diminished as far as possible. According to this, the side effects of medications for the patient are reduced in the long run. In the proposed control design process, the nonlinearity of the system is preserved. Also, the effect of weight matrices is examined. It can be seen that if the Q matrix is larger than the R matrix, convergence occurs later, but the convergence error is smaller and vice versa. Epidemiologically, some system states are not available for controller design. In this regard, the extended Kalman filter-based method as a nonlinear state observer for estimating the unavailable state variables has been developed. Based on the obtained results from the numerical simulations and comparison with the Linear-Quadratic Regulator (LQR) method, the speed of convergence, efficiency, and flexibility of the proposed control method in suppressing the spread of the disease are clearly evident.