OptiDose: Computing the Individualized Optimal Drug Dosing Regimen Using Optimal Control

Providing the optimal dosing strategy of a drug for an individual patient is an important task in pharmaceutical sciences and daily clinical application. We developed and validated an optimal dosing algorithm (OptiDose) that computes the optimal individualized dosing regimen for pharmacokinetic–pharmacodynamic models in substantially different scenarios with various routes of administration by solving an optimal control problem. The aim is to compute a control that brings the underlying system as closely as possible to a desired reference function by minimizing a cost functional. In pharmacokinetic–pharmacodynamic modeling, the controls are the administered doses and the reference function can be the disease progression. Drug administration at certain time points provides a finite number of discrete controls, the drug doses, determining the drug concentration and its effect on the disease progression. Consequently, rewriting the cost functional gives a finite-dimensional optimal control problem depending only on the doses. Adjoint techniques allow to compute the gradient of the cost functional efficiently. This admits to solve the optimal control problem with robust algorithms such as quasi-Newton methods from finite-dimensional optimization. OptiDose is applied to three relevant but substantially different pharmacokinetic–pharmacodynamic examples.


Introduction
An optimal drug dosing regimen is a prerequisite to provide the best possible care for every individual patient. However, a diversity of individual factors need to be considered including current disease state, patient characteristics and the clinical goal for this patient. In pediatric patients, developmental changes have to be incorporated additionally. In (preterm) neonates, we face even more difficulties because fast maturation processes start immediately after birth which impact the drug effect, see, e.g., [1][2][3]. Therefore, it is essential to support clinicians with sophisticated mathematical methods to compute the optimal dosing regimen for every individual patient.
Mathematical modeling has become an essential tool in drug developing industries and clinical pharmacology departments in hospitals. The US Food and Drug Administration recognized such computational modeling and simulation tools as an improvement in the efficiency for developing safe and effective drugs [4], especially the so-called pharmacokinetics (PK) and pharmacodynamics (PD) models [5][6][7], a combination of mathematical and statistical methods, incorporate biological, physiological and pharmacological behavior. Roughly speaking, the PK deals with the distribution and elimination of a drug and the PD characterizes the effect of the drug on a specific target. PKPD models are formulated with a system of nonlinear differential equations [7][8][9][10].
Up to now, the computation of a "good" dosing regimen for an individual patient is a laborious and biased task in clinical pharmacology. Often, a large number of simulations for varying dosing regimens are performed with the developed PKPD model and the "best" dosing regimen is selected "by hand." In contrast, e.g., in the development process of oncology drugs [11], optimal control approaches are already used to increase the probability of successful phase 2 clinical trials [12]. However, the research questions in such clinical trials during drug development may differ from the goals in clinical application, and therefore, often the drug concentration of potential drug candidates is optimized and not the doses itself that cause the drug concentration.
In clinics, the goal for the individual patient is usually clearly defined by the physician. For example, in hormonal diseases the aim is not always to return the hormone levels as quickly as possible to the normal range but to follow a moderate disease reduction over several weeks. Another situation is to achieve a specific concentration of a target (e.g., a drug-receptor complex) having the most beneficial impact on the disease [13].
Although control theory is widely applied in many different engineering fields such as aerospace (the field where it was originally developed by Pontryagin [14]) or economics, its application in daily clinical practice is still quite rare. Various reasons are possible, e.g., many drugs are promoted as "one size fits all," and therefore, individual patient characteristics are completely ignored. Another reason might be much simpler; currently, to our knowledge no software for solving optimal control problems (OCP) is available that is custom-built for PKPD models and addresses the needs in daily clinical application.
In this paper, we develop a mathematical OCP that is especially designed for PKPD models and name the software OptiDose. In contrast with many applications [12,15,16], we are not using the time course of the drug as a continuous control. We directly optimize the doses for a given time schedule which is the clinical situation for patients treated with oral, subcutaneous or intravenous bolus administration. This allows to construct a finite-dimensional reduced OCP which can be solved using robust algorithms from finite-dimensional optimization such as quasi-Newton methods.
Moreover, we will apply nonlinear model predictive control (NMPC), in engineering also called closed-loop strategy, see [17] for theoretical details and [18] for an application to hemodialysis. This meets clinical needs as it allows to adapt patient parameters during the optimization if the covariates change over time, e.g., due to maturation processes or sudden changes in the disease characteristics, or if more clinical measurements are available. In addition, this strategy enables to react to unforeseen events such as missed or wrong doses.
Finally, we present three examples of different complexity, a biomarker indirect response model, a tumor growth inhibition model and a model characterizing various binding dynamics of a bispecific monoclonal antibody from immuno-oncology.

The Pharmacokinetic-Pharmacodynamic Model
In this section, we present the pharmacokinetic-pharmacodynamic (PKPD) model, usually a system of nonlinear ordinary differential equations describing the dynamics of a certain disease and the action of a drug, and discuss its unique solvability.
Let us consider the time interval [0, T ] with (possibly large) final time T > 0. We assume that for i = 1, . . . , m with m ∈ N the drug doses u i are administered n i ∈ N times at specific time points t i,l , l = 1, . . . , n i satisfying 0 ≤ t 1,1 < . . . < t 1,n 1 < . . . < t m,1 < . . . < t m,n m < T orally or subcutaneously (SC), as intravenous (IV) bolus or as IV infusion with a constant infusion rate for a certain duration Δt i , i = 1, . . . , m. In accordance with the typical notation in optimal control, we introduce u = (u 1 , . . . , u m ) and the associated finite-dimensional control space U = R m . As only nonnegative doses or rates with a given upper bound u max ∈ R m , u max > 0 can be administered, we define the convex and compact admissible subset where '≤' denotes the componentwise comparison. Let u ∈ U ad be arbitrarily chosen. Then, the PKPD model, in the context of optimal control also called the state equation, describes the model state t ∈ ]0, T ] almost everywhere Here g : [0, T ] × Θ × U × R n → R n is the PKPD mechanism, θ the model parameter in the set Θ of admissible model parameters, y 0 (θ ) ∈ R n the initial value and u = (u 1 , . . . , u m ) ∈ U ad denotes the administered doses. A reliable PKPD model (1) for a population with the same disease has to be validated thoroughly from measured data, and the model parameter θ has to be estimated reasonably before addressing the optimal dosing problem. In the optimal dosing step, the parameter θ is fixed, i.e., θ = θ ind for a certain individual in a population or θ = θ av characterizing the average value of a population. Without loss of generality, we thus use t ∈ ]0, T ] almost everywhere with g : [0, T ]×U × R n → R n as PKPD model and initial value y 0 . Moreover, PKPD models inherit an additive structure between state y and control u from their design principles. In (3),g : [0, T ] × R n → R n describes the pharmacological mechanism and r : [0, T ] × U → R n the dosing regimen. For the dosing, we encounter oral or SC administration into an absorption compartment, or IV bolus injection or IV infusion for a certain duration Δt i into the central compartment characterizing the blood in the body. This leads to where I n(t, u) denotes the input function depending on the dosing regimen u, δ is the Dirac distribution, j 0 is the component of y to which the dose will be added to and e j 0 denotes the j 0 -th unit vector. The parameter V describes the volume of distribution of the body for the specific drug.
Equation (4) implies that solutions of (2) have jumps in the j 0 -th component with jump conditions for l = 1, . . . , n i , i = 1, . . . , m where t + i,l and t − i,l denote the limits at the dosing time point t i,l from above and below, respectively. As a result, these routes of administration lead to impulse ordinary differential equations (ODEs), see [19]. In PKPD modeling, these impulse ODEs are handled by stopping the integration process at every dosing time point, adding the dose to the corresponding state (e.g., an absorption or central compartment) and continuing with the integration to the next dosing time point. In contrast with that, the IV infusion administration yields an OCP with continuous piecewise constant controls where no impulse control is present.

Unique Solution of the State Equation
A minimal assumption in PKPD analysis is that for every dosing regimen there is exactly one solution of the state equation which, in addition, has to be sufficiently smooth. To do so, we formally modify our PKPD model (2) by replacing the Dirac distribution δ(t − t i,l ) in r by the scaled indicator function 1 1 [t i,l ,t i,l + ] (t). Hence, instead of oral / SC and IV bolus administration we apply a short infusion of length . This leads to g(t, u, y) =g(t, y) +r (t, u) with For any u ∈ U ad the functionr (·, u) is a step function andr (t, ·) is linear in u for t ∈ [0, T ]. We assume the pharmacological mechanismg to be continuously differentiable and globally Lipschitz continuous with respect to y. Then iterative application of the Picard-Lindelöf theorem implies that for any u there is a unique solution y ∈ C([0, T ], R n ) which is continuously differentiable when restricted to any time interval I ⊂ [0, T ] on whichr (t, u), t ∈ I is constant. Due to Example 6.2.5 in [20], the weak derivative of y exists and y ∈ Y := H 1 (0, T ; R n ) ∩ C([0, T ], R n ) follows for the modified state equation (2), (6), (7).
Although mathematically different, from a clinical perspective the two dosing formulas (4) and (7) are practically equivalent.
Our goal is to achieve a certain outcome of the disease, e.g., a normal level of a hormone or tumor eradication. This desired disease progression is specified by a reference function h ref : [0, T ] → R. Then, we characterize optimality by minimizing the cost functional J : with α = (α 1 , . . . , α m ) ≥ 0 and a C 1 -functional h : R n → R describing the actual state of the patient resulting from a particular dosing regimen u. In oncology, h could be the sum of proliferating and different stages of apoptotic tumor cells. The standard recommendation is α = 0, but for some PKPD models it can be useful to add a small α > 0 in favor of a linear dependency [12] to lower drug doses.
To formulate the optimal control problem, we follow the approach of [21], since it provides direct access to the adjoint and an efficient method to compute the gradient of the cost functional. The regularized PKPD model (2), (6), (7) is rewritten as equality constraint, i.e., In the sequel, we work with the following set of assumptions: The assumptions follow the framework introduced in [21, Ch. 1.7.2], and the additional property U ad ⊂ U bounded originates from the design of OCPs in PKPD modeling. Assumption 1) holds by definition of U ad , 2) and 3) are fulfilled due to the smoothness assumptions ofg, h and the properties of the step functionr . Computing the Fréchet derivative in assumption 4) for arbitrary but fixed u ∈Ũ yields the operator: Obviously, T (u) is a linear operator which is bijective due to the Picard-Lindelöf theorem and the approximation of L 2 -functions by continuous functions. Moreover, the Lipschitz continuity of g yields that ∂ ∂ y g(·, u, y(u)) is bounded, and therefore, T (u) is continuous. Then assumption 4) follows by the bounded inverse theorem, cf. [22].

Existence of Optimal Controls
Definition 3.1 A controlū ∈ U ad is called optimal for (P) andȳ = y(ū) ∈ Y is called the associated optimal state if The assumptions of the OCP admit to introduce the reduced cost functional and the reduced optimal control problem minĴ (u) subject to u ∈ U ad . (P) The existence of an optimal controlū then follows from the compactness of U ad ⊂ R m and the Weierstraß theorem, since the mapĴ : U ad → R is continuously differentiable as the solution operator of the state equation u ∈Ũ → y(u) ∈ Y is continuously differentiable by the implicit function theorem and the assumptions of the OCP.

Remark 3.1
Due to the nonlinearity of the PKPD model and the lacking strictness of the convexity of J , we cannot guarantee uniqueness of the optimal control in general.
Theorem 5.2.2 in [23] ensures that ifū ∈ U ad is a local solution of the reduced problem (P), thenū satisfies the variational inequality Now, we will derive necessary optimality conditions for a local solutionū ∈ U ad .

Necessary First-Order Optimality Conditions
Following the notation in [21, Ch. 1.6.4], we define the Lagrange function associated with (P): Here, we identified Z with its dual space Z * and p = (p,p 0 ) is the so-called adjoint state. Moreover, asp 0 =p(0) holds we write p =:p. The inner product ·, · Z is given by If (ȳ,ū) is an optimal solution to the problem (P), then there exists a Lagrange multiplierp ∈ Z such that the following optimality conditions hold, also called the KKT conditions after Karush, Kuhn and Tucker [21, Ch. 1.7.2]: For arbitrary u δ ∈ R m , we compute the Fréchet derivative with N = diag(n 1 , . . . , n m ), and analogously for arbitrary y δ ∈ Y we have From the second KKT condition, we can derive the adjoint equation for t ∈ [0, T ] almost everywhere: Using adjoint techniques and (6), it can be shown [21, Ch. 1.6.4] that for all u δ ∈ R m , and therefore, the variational inequality (9) is the first KKT condition with the derivative computed in (12). Having a formula for ∇Ĵ (u), we can now solve (P) numerically using well-known algorithms from finite-dimensional optimization, e.g., quasi-Newton methods.

Remark 3.2 (a)
In contrast with the general, infinite-dimensional setting discussed in [21] we construct a finite-dimensional OCP by exploiting the design of PKPD models, see (6), (7). Therefore, the evaluation of ∇Ĵ (u) comes at very low computational cost. (b) Second-order sufficient conditions for a minimum can be verified by approximating the reduced Hessian [23, Thm. 5.3.2] with central differences of the gradients and computing the eigenvalues.

Open-Loop Problems and Convergence Properties
Assuming that all model parameters are known prior to the optimization, we can solve (P) in a so-called open-loop process in which the controls are computed iteratively until a certain stopping criterion is satisfied. It should be noted that during the iteration process there is no option to update parameters or react to external perturbations. We will use a quasi-Newton method with Armijo step size control where the Hessian ∇ 2Ĵ (u) is approximated by projected BFGS updates named after Broyden, Fletcher, Goldfarb and Shanno. For more details on the step size strategy, the algorithm, local and global convergence results and second-order sufficient conditions we refer the reader to [23, Ch. 5.5.3].

Numerical Adaptions for Impulse Optimal Control Problems
In case of oral / SC or IV bolus administration, the solution y = y(t) of the impulse state equations (2) Consequently, the impulse OCP is solved.

The Nonlinear Model Predictive Control Method
For long-term treatments over several years, patient parameters can change over time, e.g., for pediatric patients due to developmental changes. Suppose we are at time t 0 i for i ≤ m − + 1 and consider the horizon I i . Previously, the optimal doses u 1 , . . . , u i−1 and the associated optimal state trajectory y i−1 from the last iteration were computed. Therefore, y i 0 := y i−1 (t 0 i ) is set as initial value for the state equation on I i (y i ) (t) = g(t, u (i) , y i (t)), for t ∈ I i almost everywhere ad and being the corresponding admissible set. On the prediction horizon I i , the cost functional is used. As before, the reduced cost functional with the unique solution to the state equation y i (u (i) ) is introduced. The open-loop problem is solved as in Sect. 3.3 by computing an optimal control vector and predicting the dynamics on the time horizon [t 0 i , t f i ]. Now, only the first component of the optimal control vector u i = (u (i) ) 1 is applied and the time horizon is shifted to the next interval I i+1 with initial condition y i+1 0 := y i (t 0 i+1 ). Before solving (P i+1 ), it is possible to update parameters or react to unforeseen perturbations such as dosing errors or missed doses.
In Algorithm 1, a pseudocode for the described closed-loop technique is displayed.

Application of OptiDose to Relevant Examples of Pharmacokinetic-Pharmacodynamic Models and Presentation of Numerical Results
We present three relevant examples of PKPD models used in drug development and clinical pharmacology and apply the developed OptiDose algorithm to compute the optimal dosing regimen.

The OptiDose Software
For the software OptiDose, the presented open-and closed-loop algorithms were implemented in MATLAB [24] using the built-in solver ode15s to solve the arising initial value problems. The open-loop problems were solved with the projected BFGS-Armijo method [23,Ch. 5.5.3]. All computations were performed on an ASUSTek computer with Intel(R) Core(TM) i7-7700HQ CPU processor with 2.80GHz and 16GB RAM.

Biomarker Indirect Response Models: a Test Model for OptiDose
In Fig. 1 a, we display the model schematic for a test example where a biomarker B is elevated and a drug is administered to return those high biomarker levels to the normal range. We further assume that the disease cannot be cured, i.e., in absence of the drug the biomarker will return to its initial state B 0 at diagnosis. The test model is a so-called indirect response model (IDR) which is a fundamental tool in PKPD modeling, cf., e.g., [25,26]. IDR models consist of a zero-order production rate k in and a first-order elimination rate k out , and the drug stimulates or inhibits these rates usually with Michaelis-Menten type of terms [27]. Here the drug will be administered via IV bolus into the central compartment according to (4). The drug concentration C is described by a linear one-compartment model with drug elimination rate k el . Maximal stimulating effect of the drug is E max , and the drug concentration to produce the half-maximal effect is EC 50 . The PKPD model reads A clinically realistic dosing scenario for non-hospitalized patients is that the doses are administered daily, but they change only every week (n i = 7, i = 1, . . . , m). The aim is to control the biomarker B to follow a predefined reference function characterized by providing a slow quadratic approach toward the target biomarker level B tar . We choose α = 0 in the cost functional, a target value of B tar = 10, a loading phase of m 1 = 2 weeks and m = 6 weeks in total. Figure 2 shows the optimal solution on the left the actual and desired biomarker levels and on the right the optimal doses and the resulting drug concentration. Starting from an initial guess of u 0 = 1 for all doses, the optimal open-loop solution u * was computed within 85 seconds and a cost functional value ofĴ (u * ) = 3.89 with norm of the gradient 5.9 · 10 −6 and all eigenvalues of the Hessian at the optimal solution are positive. The same optimal solution is found for different initial guesses on the doses.
However, in a real scenario the treatment will not stop after the above considered six weeks as we assumed the disease cannot be cured. The closed-loop algorithm with an observation horizon of, e.g., three weeks that gets shifted weekly reproduces the open-loop solution up to neglectable differences.

Tumor Growth Inhibition Model
Proliferating tumor cells usually grow exponentially in the beginning and transition later to a linear growth, cf. [28,29]. Depending on the tumor type, size and environment, a saturation of the growth can be observed; however, this is neglected in the following example. Many drugs act in a cytotoxic manner, meaning that the tumor cells are attacked by the drug. Attacked cells then undergo apoptosis until they eventually die. The presented example is based on preclinical oncology drug development in mice, see [29] for details and Fig. 1 b for the model schematic. The structure of the model is widely applied in industry and academia for these type of experiments.
Let P be the proliferating tumor cells with an exponential growth rate λ 0 and a linear growth rate λ 1 [29]. The drug C, orally administered into an absorption compartment Abs with absorption rate k a , acts on the proliferating cells with a linear drug effect term with potency k pot . The apoptotic cell population is described with three transit compartments D 1 , D 2 , D 3 [30], each reflecting a certain age stage of the apoptotic cells with transit rate k t . The sum of the proliferating and apoptotic cells is the total tumor weight W = P + D 1 + D 2 + D 3 . The PKPD model reads d dt Abs = I n(t, u) − k a Abs, First, the tumor is grown to a specific size, then the drug is administered daily (n i = 1 for all i) from day 12 to 28. The aim is to decrease the tumor weight W toward zero. For the reference function, we choose a sigmoid shape starting at 0.5 on day 12 and tending to zero: As the drug is acting on the proliferating tumor cells via k pot C · P in the third equation in the PKPD model, its impact decreases as the tumor size shrinks toward 0. In fact, the problem loses its controllability meaning significantly different doses, e.g., 10, 100, 1000, achieve nearly the same pharmacodynamic behavior. Naturally, small doses are preferred which is why we choose α = 10 −7 > 0 in the cost functional in favor of lower doses.
Moreover, we observe that large changes in the doses result in only very small changes in the cost functional as well as in the gradient which might result in stopping criteria being satisfied already despite the minimum was not reached yet. Therefore, for numerical reasons, we minimize the theoretically equivalent cost functional by including a scaling factor γ > 0 which pulls through to the gradient and hereby tightens the stopping criterion that the norm of the projected gradient is small enough. For the presented example, we choose a scaling factor of γ = 100.
The initial guess for each dose is u 0 = 0. We solve the OCP with the NMPC (i.e., closed-loop) method as described in Algorithm 1 with an observation horizon of = 7 doses which is shifted daily after applying the first of the computed optimal doses. Altogether, we find the closed-loop solutionû, see provides higher doses on days 16 and especially 17 whose necessity the shorter horizon does not see, but the closed loop (Fig. 3) compensates this with larger doses in the following days. The difference in the cost functional values is small. Numerical computations with a variety of different initial values confirm that u * is the unique optimal control. In addition, the algorithm shows good non-local convergence properties. Both findings fit in the framework of optimal control problems with convex cost functionals.

Binding Kinetics of a Bispecific Antibody and Formation of the Drug-Receptor Complex
Bispecific antibodies (BsAbs) are promising drug candidates in immuno-oncology [31], and currently extensive research is performed by both academia and pharmaceutical companies. A BsAb is an artificial protein and exerts the effect, e.g., by bridging effector T cells and cancer cells. The idea of BsAbs is to use the own human immune system to attack tumor cells instead of administering cytotoxic drugs as in the previous example.
Mechanistic modeling of a BsAb is a sophisticated task and includes characterization of many different binding kinetics [32,33]. A BsAb C binds to two different receptors R A and R B forming the two binary drug-receptor complexes RC A and RC B . Both of these complexes further cross-bind to the other receptors to form the ternary complex RC AB which finally drives the drug effect. The absorption process of the BsAb is described by Abs. The full BsAb model is schematically displayed in Fig. 1 c and reads (see [13] for details): (t, u) and the equations for the complexes Briefly, the k on and k off are binding rates, k syn and k deg production, resp. degradation rates, k int internalization rates, k el is the elimination rate of the BsAb and k 12 , k 21 describe the transfer to a peripheral compartment AP. Antibody drugs have a very long half-life and are often administered via SC injection, cf. (4). Therefore, we consider the time interval [0, 140] days with administration of the same dose into the absorption compartment Abs at days 0, 48, 96, i.e., n 1 = 3. Initially, we assume the system to be at baseline, i.e., R A (0) = k syn A /k degA and R B (0) = k synB /k degB and all others are 0. Throughout the whole interval, our goal is to keep the ternary complex RC AB at the best possible reference value min(k syn A /k degA , k synB /k degB ). In the cost functional, we choose α = 0. Starting from an initial guess for the dose of 800, we find the optimal (open-loop) solution u * = 663.78 within 6 iterations of BFGS-Armijo method and 21 seconds. The optimal doses and resulting ternary complex are shown in Fig. 5. The cost functional value at the optimal solution isĴ (u * ) = 5.0, the norm of its gradient is 1.2 · 10 −8 , and its Hessian is positive. Starting from an initial guess far from the optimal solution resulting in a large cost functional value, e.g., u 0 = 0 withĴ (u 0 ) = 7000 yields the same optimal solution within 22 iterations and 32 seconds, i.e., the algorithm shows numerical robustness.

Conclusions
We have set up an OCP which is especially designed for PKPD models using the doses as finite-dimensional control variables (instead of optimizing the drug concentration). Therefore, the reduced OCP can be solved by robust algorithms from finite-dimensional optimization such as quasi-Newton methods. An efficient calculation of the required derivatives is ensured by the use of adjoint techniques. In addition, we incorporated closed-loop (NMPC) algorithms which provide, in particular in case of long-term treatments, the potential to guide patients safely along a desired reference function h ref . Application to a variety of PKPD models shows the robustness and efficiency of the software. Thus, the presented OptiDose code is a widely usable tool for computing the individualized optimal drug dosing regimen in PKPD and, to our knowledge, the first calculating the doses directly.
On the other hand, we still need to address some PKPD issues, e.g., a rigorous sensitivity analysis of the computed optimal doses with respect to uncertainty in the estimation of the parameter values. Suitable measures to identify sensitive parameters should be discussed.
Another important issue will be how to handle negative side effects from drug treatment such as myelosuppression in oncology: The administration of cytotoxic drugs to attack a tumor also suppresses the production of new blood cells in the bone marrow leading to low levels of circulating blood cells. As white blood cells play an important role in the immune system, they need to stay above a certain threshold in order to not risk the life of the patient. Mathematically, this means to include socalled state constraints in the OCP making it usually much harder to solve numerically. A promising way is to apply augmented Lagrangian techniques using an additional penalization term in the cost functional.
Furthermore, in daily clinical routine the available oral doses are typically restricted to certain sizes which leads to discrete ODE-constrained OCPs. The presented open issues are both mathematically challenging and of huge interest in applications and provide a wide field for future research.