Optimal EPO dosing in hemodialysis patients using a non-linear model predictive control approach

Anemia management with erythropoiesis stimulating agents is a challenging task in hemodialysis patients since their response to treatment varies highly. In general, it is difficult to achieve and maintain the predefined hemoglobin (Hgb) target levels in clinical practice. The aim of this study is to develop a fully personalizable controller scheme to stabilize Hgb levels within a narrow target window while keeping drug doses low to mitigate side effects. First in-silico results of this framework are presented in this paper. Based on a model of erythropoiesis we formulate a non-linear model predictive control (NMPC) algorithm for the individualized optimization of epoetin alfa (EPO) doses. Previous to this work, model parameters were estimated for individual patients using clinical data. The optimal control problem is formulated for a continuous drug administration. This is currently a hypothetical form of drug administration for EPO as it would require a programmable EPO pump similar to insulin pumps used to treat patients with diabetes mellitus. In each step of the NMPC method the open-loop problem is solved with a projected quasi-Newton method. The controller is successfully tested in-silico on several patient parameter sets. An appropriate control is feasible in the tested patients under the assumption that the controlled quantity is measured regularly and that continuous EPO administration is adjusted on a daily, weekly or monthly basis. Further, the controller satisfactorily handles the following challenging problems in simulations: bleedings, missed administrations and dosing errors.


Introduction
According to the 2018 United States Renal Data System annual data report (USRDS 2018), approximately 2.5 million patients were treated world-wide for end-stage renal disease in 2016. In nearly all reporting countries hemodialysis (HD) was the predominant form of dialysis therapy. On December 31, 2016, there were 726,331 prevalent cases of end-stage kidney disease in the U.S., of which 63.1% were receiving HD. Due to reduced erythropoietin production in the kidneys almost all HD patients suffer from a chronically decreased number of circulating red blood cells (RBCs) and associated low hemoglobin (Hgb) levels. This condition is called anemia. Untreated anemia is associated with poor quality of life and increased morbidity and mortality. Therefore, physicians aim for a partial correction of anemia with erythropoiesis stimulating agents (ESA). In this paper we consider the treatment with epoetin alfa (EPO), a human recombinant erythropoietin produced in cell culture.
Currently, the recommended Hgb target range for anemia management is 10-12 g/dl (see Mactier et al. 2011). Albeit, the KDIGO (Kidney Disease Improving Global Outcomes) Work Group, which provides clinical practice guidelines for anemia in chronic kidney disease and guidance on diagnosis, evaluation, management and treatment of HD patients, recommends not to exceed the limit of 11.5 g/dl in general but suggests to individualize therapy for patients whose quality of life may be improved at Hgb levels above 11.5 g/dl. Usually, dialysis facilities use dosing protocols that work as follows: A starting dose gets specified and based on the resulting Hgb change and depending on whether the patient is within the Hgb target range the ESA dose gets adjusted. This one-size-fits-all approach results in about 65% of patients achieving the set Hgb target. The Dialysis Outcomes and Practice Patterns Study (DOPPS) Practice Monitor reports that in the US, since December 2014, the percentage of patients with Hgb above 12 g/dl is 14-15% while 18-20% of patients have a Hgb below 10 g/dl. Moreover, Hgb variability and cycling are well known to occur in HD patients treated with ESA (Berns et al. 2003;Berns 2005, 2007). According to Yang et al. (2007) a greater Hgb variability is independently associated with higher mortality. The challenge in anemia treatment is the patients' difference in long-term Hgb response to ESA. The drug concentration in plasma influences the maturation, proliferation and apoptosis of cells in the erythroid lineage. However, these cells remain about two weeks unobservable in the bone marrow before being released into the bloodstream. Consequently, it is difficult to anticipate the resulting delayed effect of the drug administration.
The mathematical model of erythropoiesis presented in Fuertinger et al. (2013) predicts patients' erythropoietic response. We utilize this model to design a modelbased feedback controller. The successful use of control algorithms for drug dosing has been shown, for example, in Magni et al. (2007) and in Bequette (2013). Both works are about closed-loop insulin dosing (i.e., the artificial pancreas). The control strategy we use is called model predictive control (MPC), also known as moving or receding horizon control. In 2011, Brier and Gaweda have already published an MPCbased algorithm for improved anemia management that has been tested and validated in clinical studies. Unlike our approach their predictive model is based on the concept of artificial neural networks (see also Barbieri et al. 2016;Brier et al. 2010). One of the limitations of using an artificial neural network approach is the need for large training and validation data sets making it very difficult to fully tailor this approach to individual patients and to personalize target Hgb ranges for patients whose quality of life would improve from a Hgb level higher than 11.5 g/dl as suggested by the KDIGO work group.
The purpose of this work is to develop a controller scheme that is fully personalizable. Previous to the study, the used model was adapted to individual patients using clinical data on Hgb levels and EPO administration. The details of the parameter estimation procedure and its results were published in Fuertinger et al. (2018). Note that insufficient iron availability is not modelled explicitly. However, several bone marrow parameters of the erythropoiesis model that are influenced by iron availability were estimated on a per patient level. The presented feedback controller is tested on various patient parameter sets. Throughout this manuscript we assume that EPO is administered continuously. This is currently not done in clinical practice but could be achieved with an "EPO pump" similar to the programmable insulin pumps used to treat diabetes mellitus. The approach is chosen nonetheless since it yields a continuous control which provides the best situation for stabilizing a system. Having developed a functional controller scheme it can then be adapted to actual dosing schedules and the effect of reducing the dosing frequency on the achievable stability can be analyzed. The given model equations are coupled hyperbolic partial differential equations (PDEs) and the control variable enters these equations non-linearly. In the presence of a nonlinear model MPC is referred to as non-linear MPC (NMPC). More details on MPC can be found, e.g., in the books by Grüne and Pannek (2011) and by Rawlings and Mayne (2009). The basic principle of MPC consists of repeatedly solving finite horizon open loop optimal control problems. In each step, an open loop problem is solved. Then, only the first component of the obtained optimal control is applied and the optimization horizon gets pushed. This allows to include measurements and to react to unforeseen disturbances or complications. Here, we assume to know the true model and that we are able to measure Hgb perfectly. We simulate (gastro-intestinal) bleedings which are a common complication in HD patients and consider a malfunction of the pump by simulating the complete failure to administer EPO for an entire day or by applying an incorrect dose. Further, we simulate different frequencies of rate change with which the pump is programmed, ranging from daily to only once a month.
The paper is organized as follows: In Sect. 2 we introduce the control variable and present the model equations. The numerical approximation of these so-called state equations is investigated in Sect. 3. Both the model and its numerical approximation are recalled from Fuertinger et al. (2013). In addition, we regularize the erythrocytes model equation to obtain differentiability required for numerically solving the optimal control problem utilizing first-order optimality conditions. In Sect. 4 we formulate the optimal control problem and the NMPC algorithm is described. In Sect. 5 we present our numerical results of the following in-silico experiments: bleedings, missed administrations or wrongly administered doses and the restriction of EPO administration rates to be constant over several weeks. We draw some conclusions in Sect. 6. Finally, all parameters used for simulations are presented in Appendix A.

The model of erythropoiesis
We start by describing the structure of the control process; compare Fig. 1. The control which is a vector of EPO administration rates can be altered to change a patient's Hgb being the outcome. First, the EPO rates change the patient's EPO concentration in plasma which affects the production of RBCs. The process of erythropoiesis is described by a mathematical model presented in Fuertinger et al. (2013). The different cell types during erythropoiesis are grouped into five population classes and the model allows to calculate their densities y 1 , y 2 , . . . , y 5 . From a control perspective these are our states. Given the density y 5 of erythrocytes we can calculate their number and finally the Hgb. In the following we explain the named components step by step. Note that the lemmas of this section will allow to utilize first-order optimality conditions for solving the optimal control problem numerically.

The dosing of EPO as the control variable
We first write the time-varying EPO concentration in plasma as a function of EPO administration rates. Let us consider the time interval [0, T ] with large final time T 1. We assume that EPO can be applied continuously, with a constant administration rate per day or per multiple days. The EPO rates are given in U/day, where U stands for units. The EPO concentration E(t), t ∈ [0, T ], in plasma is separated into a constant summand E end > 0 modeling the patient's remaining endogenous erythropoietin level and a time-dependent summand E ex (t) resulting from the administered EPO: (1) Note that the endogenous EPO production E end is estimated for each patient individually when adapting the model to patient data. For the number n u ∈ N let the days {t . . , n u , be given as We introduce the associated finite-dimensional control space U = R n u . Only nonnegative EPO rates with a given upper positive limit u max ∈ R can be applied. Therefore, we are considering the (convex and compact) admissible set where '≤' is interpreted componentwise. Throughout, vectors are denoted by boldface letters. Let u ∈ U ad be given. Then, the time-and control-dependent summand E ex = E ex (·; u) satisfies the following initial value problem (cf. Fuertinger et al. 2013, equation (12)): where ), c tbv > 0 stands for the total blood volume, E ex • is a non-negative initial condition for the exogenous EPO level and λ = log(2)/T 1/2 > 0 is the EPO degradation rate with half-life time T 1/2 . Thus, for t ∈ [t j u , t j+1 u ), j = 1, . . . , n u , the EPO concentration is given by It follows from (4) that u → E(·; u) is a function mapping from the admissible set is the space of all continuous functions from [0, T ] to R. From (4) and λ > 0 we infer that Thus, we define the interval of possible EPO concentrations and observe that E(t; u) ∈ E ad holds for all t ∈ [0, T ] and u ∈ U ad .

Lemma 1
The mapping E introduced in (4), has the following properties.
(2) The mapping E(t; ·) : U ad → R is twice continuously differentiable for any t ∈ [0, T ]. Its gradient is given as ), j = 1, . . . , n u , and u ∈ U ad . Further, the hessian matrix ∇ 2 u E(t; u) ∈ R n u ×n u is zero.
Proof The claims follow directly from formula (4). Since ∇ u E(t; u) is independent of u, the hessian ∇ 2 u E(t; u) is zero.

The PDE model of erythropoiesis
A schematic of the model is shown in Fig. 2. For the underlying assumptions and more details we refer to Fuertinger (2012), Fuertinger et al. (2013). Stem cells commit to the erythroid lineage at a constant rate S 0 > 0. Once a stem cell has committed it passes the five shown cell classes/stages over time (if it does not die along the way): BFU-E (burst-forming unit erythroids), CFU-E (colony-forming unit erythroids), erythroblasts, marrow reticulocytes and erythrocytes (including blood reticulocytes). This means that there is a flux of cells from each population class to the subsequent one. For example, when a CFU-E cell has reached maximum age of that class it leaves the class and becomes an erythroblast with minimum maturity. The population densities depend on maturity and time. For each class an age-structured population model is given which describes the development of the respective class subject to a given EPO concentration in plasma. EPO has a direct effect on the rate of apoptosis of CFU-E cells (α 2 ), the maturation velocity of marrow reticulocytes (ν) and the mortality rate of erythrocytes (α 5 ). For each class we are given an individual maturity The interval boundaries are given by Ma et al. (2017) have measured this shortened RBC lifespan in HD patients to be in the range of 37.7 to 115.8 days. Note that the first four cell classes are in bone marrow while the fifth class describes cells circulating in blood. That is why x 5 is set to 0. For conciseness , we write all five state equations in one form. Suppose that for given u ∈ U ad and E ex • ≥ 0 the EPO concentration E = E(t; u) is given as (4). Then, the equation reads where β i > 0 describes the profileration rate and α i the rate of apoptosis. Actually, the function α 5 and the sigmoid functions α 2 and ν depend on the bounded (patient-dependent) parameter vector µ = (μ i ) ∈ R 10 + with R + = {s ∈ R | s > 0}. To simplify the notation we do not indicate dependencies on µ. We refer to Appendix A, where all fixed and all individualized parameters, which we utilize in our numerical experiments, are listed. The individualized parameters are obtained via parameter estimation; see Fuertinger et al. (2018). The functions for the different classes read as follows: where, for E ∈ E ad , the real-valued functions α 2 and ν are given by and the function stands for the erythrocytes mortality rate with an EPO threshold τ E > 0 for neocytolysis. The (closed) non-empty interval Ω 5 Ω 5 denotes the age interval, where neocytolysis is possible.
Proof The claim follows directly from (6).

Total RBC population
If the erythrocytes population density y 5 is known, the total RBC population P = P(t), t ∈ [0, T ], is given as where y = (y i ) 1≤i≤5 solve the state system (S).

Hgb concentration
Given a patient's total blood volume c tbv (in ml) together with the total number of RBCs P, the Hgb concentration (in g/dl) is calculated as where MCH = 29 pg denotes the mean corpuscular hemoglobin.

Regularization of the equation for the erythrocytes
In Sect. 4 we formally introduce the non-linear optimal control problem. In order to solve it numerically by utilizing first-order necessary optimality conditions we have to differentiate the state system (S) with respect to the state variable y = (y i ) 1≤i≤5 and the control variable u = (u j ) 1≤ j≤n u . From Lemma 1 we already know that u → E(t; u) is continuously differentiable for every t ∈ [0, T ]. Moreover, the mappings α 2 and ν are continuously differentiable by Lemma 2. However, the mapping E ad E → α 5 (x; E) is non-differentiable for every x ∈ Ω 5 . Therefore, we have to regularize α 5 in order to get smooth state equations.
For that reason we introduce the Heaviside function H : R → R defined as Then, the mortality rate can equivalently be written as For ε > 0 we utilize the following regularized Heaviside function H ε : which is twice continuously differentiable. Notice that the function is an approximation of min(s, τ ). Thus, the function R defined in (10) can be regularized as which allows us to replace the non-smooth coefficient function α 5 by the smooth (with respect to E) mapping Proof The claim follows directly from (11) because of E min > 0.

Numerical approximation of the state equations
The numerical solution of the age-structured population models is based on semigroup theory. We formulate the five state equations as abstract Cauchy problems which are then approximated by semigroups acting on finite dimensional subspaces. We have compared this discretization to an upwind finite difference scheme (cf. Strikwerda 2004). Due to computational speed while obtaining a similar approximation quality we have favored the semigroup based approach. We refer the reader to, e.g., Ito and Kappel (2002) and Kappel and Zhang (1993) for results on evolution operators and their approximation.
In the following we derive the discretization of the state equations by means of the general form (S.i) where we omit the dependency on i but replace κ by κ ε .

The state equations as abstract Cauchy problems
For n ∈ N we introduce the function where (x, x) = Ω denotes the maturity interval of the respective cell class. Hence, the sequence {δ n } n∈N approximates the δ-distribution. Let y n , n ∈ N, be a mild solution of the non-homogeneous Cauchy problem It can be shown that lim n→∞ y n (t) = y(t) in L 2 (Ω) holds for t ∈ [0, T ], where y is the solution to the corresponding state equation (S.1)-(S.4) or (S.5 ε ). Since the range for the attribute is different for the cell populations considered, it is useful to normalize these attributes such that the range of the normalized attribute ξ is [0, 1]. In order to achieve this we set w = x − x > 0 and define the mapping with its inverse given by We denote by L 2 w (0, 1) the Hilbert space L 2 (0, 1) endowed with the weighted inner product The induced norm is · w = w 1/2 · L 2 (0,1) . The Hilbert spaces L 2 ω (0, 1) and L 2 (Ω) are isomorphic with the isomorphism Ξ : L 2 (Ω) → L 2 w (0, 1) and its inverse Ξ −1 : L 2 w (0, 1) → L 2 (Ω) given by Applying the operator Ξ to the sequence {δ n } n∈N yields Then, from (12), we derive the normalized Cauchy problem in L 2 w (0, 1): (12), whereỹ n is the mild solution of the Cauchy problem (13).

Approximation of the abstract Cauchy problems
In the following we derive a discretization of the Cauchy problem (13) based on shifted Legendre polynomials. This approach was originally presented in Kappel and Zhang (1993) where they apply it to a very similar type of equation. In the considered 1D case only few basis elements are needed which yields a fast approximation. However, there are situations where one would expect diffculties due to the (oscillating) characteristics of Legendre polynomials. Higher spatial dimension of the hyperbolic equation or large maturation velocities are examples for such situations. In the following, we first recall certain characteristics and further utilized features of Legendre polynomials. For more details we refer the reader to the book by Abramowitz and Stegun (1970).
The polynomials can be computed with Bonnet's recursion formula where L 0 (x) = 1, and L 1 (x) = x. Thus, the first five Legendre polynomials are of the following form To compute the approximations for the partial differential equations we use the following formula representing the derivatives of Legendre polynomials in terms of Legendre polynomials: Furthermore, the following consequences from Bonnet's formula (eq. (14)) are needed for the approximation: Another property of the Legendre polynomials which is used is: Approximation Let us define the basis functions The sequence {e j } j∈N is an orthogonal sequence in L 2 w (0, 1) with e j , e k w = 0 for j = k and e j 2 w = e j , e j w = For N ∈ N we introduce the N -dimensional subspace X N ⊂ L 2 w (0, 1) by Let P N : L 2 w (0, 1) → X N be the orthogonal projection defined as The approximating delta distributionδ N on X N is defined by For any E ∈ E ad we define the approximating linear operator Note that the projection P N needs only be applied on κ ε h(·); E ϕ in case of the erythrocytes model equation due to ϕ ∈ X N for ϕ ∈ X N . Now, the discretization of (13) is given as Let whereỹ N is a solution to (19). It follows that lim N →∞ y N (t) = y(t) in L 2 (Ω) for t ∈ [0, T ]. A proof for the case g ≡ 0 can be found in Kappel and Zhang (1993, Theorem 4.3). Using (17), (18) and we derive the Galerkin scheme ; u))e j , e i w + g(t; E(t; u))e i (0), for i = 0, . . . , N − 1. Setting we can express (20) as the following system of N ordinary differential equations Theorem 1 Let E(·; u) given by (4) for an arbitrarily chosen u ∈ U ad . Moreover, g(·; E) : [0, T ] → R is assumed to be piecewise continuous for every E ∈ E ad . Then, the functioñ is the only one that satisfies (21) at every time instance, where g(·; E) is continuous for every E ∈ E ad . Furthermore,ỹ ∈ H 1 (0, T ; R N ) holds.

The control-to-state operator
Let u ∈ U ad be a given control variable and E(·; u) be given by (4). Then, the five state variables (y i ) 1≤i≤5 solving (S ε ) are approximated by the functions where the coefficient vectorỹ This is Eq. (21) with added index i. Due to Theorem 1 we define the non-linear solution operator where×is the generalized Cartesian product. Hence,ỹ = (ỹ i ) 1≤i≤5 = S N (u) satisfies ( S 1 )-( S 5 ). Then, the discretized total RBC population is given by (cf. (7)) where againỹ = (ỹ i ) 1≤i≤5 satisfies ( S 1 )-( S 5 ). Note that the equations ( S i ), i = 1, 2, . . . , 5 can be solved consecutively. At any time t ∈ [0, T ] the discrete state vectorỹ(t) is of length 5N .

The optimal EPO dosing
In this section we formulate the optimal EPO dosing as an optimal control problem on the long time horizon [0, T ]. For the number n u ∈ N let the days {t j u } n u +1 j=1 with a constant EPO rate in [t j u , t j+1 u ), j = 1, . . . , n u , be given as

The optimal control problem
The goal is to stabilize the Hgb around a desired value of 10.5 g/dl in order to bring and keep the Hgb into the target window of 10-12 g/dl. Values in this range are considered as safe. We choose this lower value within the range because in case of an overshoot the Hgb can not be pulled down actively but one has to wait till it decreases of itself.
The optimal control problem is formulated for the number of RBCs and not for the Hgb. This means that, for each patient, prior to optimization a desired total amount P d of RBCs is calculated using formula (8).
In the cost functional, we measure and penalize the deviation of the total RBC population from the desired population P d over the whole time horizon [0, T ] where the discretized total RBC population P N [ỹ] has been introduced in (24) and y = S N (u) satisfies ( S 1 )-( S 5 ). Further, γ 1 , . . . , γ n u > 0 are regularization parameters and σ Ω is a non-negative weight. Now, the optimal control problem is formulated as follows: Remark 2 (1) Note that (P) is a non-linear optimization problem. Hence, (P) is nonconvex, so that several local minima might exist. ♦ (2) It can be shown that (P) possesses at least one optimal control in U ad . For the sake of brevity, we do not present the proof here.

The NMPC method
Problem (P) can not be treated as an open-loop problem since unforeseen events and disturbances can occur. In reality, predicted and measured Hgb values will differ which has to be taken into account by means of a closed-loop controller. Moreover, patient parameters will not be constant over time which even makes readaptations of the model necessary in actual application.
Let all time intervals of constant administration rates be of same length Δ t > 0 and T = LΔ t for L ∈ N.
Suppose we are at time t • ∈ [0, T − MΔ t ] and consider the time horizon [t • , t f ] with t f = t • + MΔ t , M ∈ N and MΔ t T . Next we introduce a notation for the days which belong to the current horizon [t • , t f ]; cf.
(2). At t • we are given the initial where the coefficient vectorỹ = (ỹ i ) 1≤i≤5 satisfies ( S 1 )-( S 5 ) on the time horizon [t • , t f ] with initial conditionsỹ •i , 1 ≤ i ≤ 5. In (26) we have added a summand penalizing the deviation of the population at the final time t f for stability reasons. For Considering a constant EPO rate per day in our numerical experiments we set γ j ≡ c γ MΔ t for some constant c γ > 0. Note that MΔ t is the length of the prediction horizon. Now, the controller predicts the future evolution of the system under control over this prediction horizon, the cost functionalĴ N (u; t • ,ỹ • , E ex • ) gets minimized and we obtain an optimal control vector for the given time period. Then, (only) the first component of the optimal control vector is applied and yields new initial conditions for the next initial time point t • + Δ t , to where the finite horizon gets pushed. In Algorithm 1 we summarize the NMPC method. The iterative computation of the control u can be seen in line 5 where in each iteration the first component of the optimal open loop solution to problem (P(t • )) from line 4 is appended to the already given control vector. Then, this control is applied on the corresponding time interval [t • , t • + Δ t ] and the initial condition for the next iteration is set toỹ(t • + Δ t ); see line 6. If the state vectors can be measured or estimated based on measurements, the new initial conditionỹ • in line 6 can be set therewith.

Numerical solution of the open-loop problem (P(t • ))
It remains to discuss how the open-loop problem in line 4 of Algorithm 1 is numerically solved. In this work we apply a projected BFGS method with Armijo linesearch as described in Kelley (1999, Sect. 5.5.3). This requires to compute the gradient . This is done using a Lagrangian-based approach as shown in Hinze et al. (2009)[Sect. 1.6.4]. The gradient is given by Note that the adjoint statep is the Lagrange multiplier associated with the state equation.
We have presented the gradient and the adjoint equations ( A 1 )-( A 5 ) in a very compact format. So let us explain what it actually means to calculate the gradient (1) Solve the state equations (

Numerical results
For our numerical experiments we have chosen the data sets from five patients which capture the main occurring characteristics. The constant endogenous erythropoietin concentration E end for example is once far above the threshold for neocytolysis (τ E = 80), twice just slightly below and twice clearly smaller. The aim was to find a general optimization setting that works for diverse data sets. The parameters can be looked up in Table 8.
For computation of the population densities we scale the hyperbolic equations by 10 8 which is legitimate since the equations are linear with respect to the state variables y i , i = 1, . . . , 5. For discretization we use N = 15 Legendre polynomials and the time step size Δt = 0.01.
We first have a look at the predicted Hgb curves when no EPO is administered. Then, we show how the NMPC algorithm is able to correct anemia under the assumption of knowing the true model and patient parameters and being able to continuously and perfectly measure Hgb.

Uncontrolled Hgb concentration
We begin by taking a look at the predicted Hgb concentrations without EPO administration. As can be seen exemplarily in Fig. 3 the Hgb levels are running in a steady state far below the target range. This state of anemia would be critical for patients since it increases cardiovascular disease and death risk (Strippoli et al. 2004).

Settings
Throughout we consider a total time period of 24 weeks, i.e. T = 168 days. Excluding our last experiment, the days {t j u } n u +1 j=1 are given by {0, 1, . . . , 168} and the length of the NMPC horizon is four weeks. This means, we set Δ t = 1, M = 28 and L = 168. Note that the length of the horizon determines the computational effort of the NMPC method: the longer the horizon, the more time it takes to calculate the numerical solution to the open-loop problem in each step. Hence, one is searching for the minimal horizon that provides stability of the NMPC closed-loop. In our simulations we have observed that four weeks serves the purpose for the named setting. The fact that the cells remain about two weeks in the bone marrow (first four classes) before released into the bloodstream explains the need for this length. In pratice, the maximum cumulative weekly EPO dose is 60,000 U with up to three administrations a week. Given our higher administration frequency we set the maximum weekly EPO dose to 7000 U which leads us to set u max = 1000 U/day. Given the time horizon MΔ t we set where only the parameter c γ is chosen in an individualized manner. We start by testing the NMPC algorithm for different constants c γ . This parameter penalizes control costs in the objective functional. A larger c γ results in a stronger penalization of control costs, i.e. the controller must try to bring and keep the Hgb curve in the target range with less EPO. In Fig. 4 we present the results for patients 1 and 2. For both patients the value c γ = 0.1 is the right choice for control costs penalization in the sense that it allows to accurately stabilize the Hgb around 10.5 g/dl. The corresponding EPO rates can be seen in Figs. 5 and 6. Interestingly, the smallest penalization in patient 1 results in a very specific and time sensitive administration pattern. Patient 2 requires a lot higher doses than patient 1. It is actually about 2.5  Table 1. When looking at the uncontrolled Hgb levels in Fig. 3 patient 2 shows a higher asymptotic Hgb level than patient 1. So the higher doses are actually counterintuitive which underlines the need for a closer look into the time dynamics. For c γ = 10 the Hgb concentration for patient 1 is still within the target range while this penalization results in a drop below the target range for patient 2. Still, for c γ = 1 the total population for patient 2 remains as well within the target range. Using c γ = 100 for patient 1 leads to a Hgb level below the target range. This  indicates that the penalization of the control costs needs to be chosen carefully on a subgroup or even per-patient level. Our primary goal is Hgb in target. Thus, we choose c γ such that the penalization of control costs does not block the NMPC algorithm from controlling the Hgb levels into the target range. The above test we have also done for the other patients. For patient 5 the value c γ = 0.1 is fine as well while patients 3 and 4 require a weaker penalization of c γ = 0.01. Concluding, with c γ = 0.01 the Hgb levels of all patients can be brought into target only that for patients 1,2 and 5 this can as well be achieved with c γ = 0.1 and consequently a lower total EPO dose.

Bleeding
In the following we analyze how the NMPC algorithm handles a sudden and unforeseen (gastro-intestinal) bleeding. This is a frequent complication in HD patients. Note that the NMPC algorithm learns about the bleeding first after it has happened by an assumed measurement. In line 6 of Algorithm 1 the initial conditionỹ • is overwritten based on that measurement. The results for patients 3 and 4 for a bleeding which corresponds to a drop in Hgb down to 7.5 g/dl are shown in Fig. 7. Patient 4 has a fast recovery time so that we add a later smaller bleeding to 9.0 g/dl. According to Pottgiesser et al. (2008) the mean recovery time period after a blood donation of 550 ml is 36 days (range 20-59). This average recovery time is added as a vertical arrow to Fig. 7. We run the optimization such that we keep the maximum EPO rate at 1000 U/day or after the first bleeding we increase u max to analyze by how much the recovery time could be decreased. In addition, we set c γ = 10 −10 and u max = 25, 000 U/day after the first bleeding to see if this results in an overshoot. We first analyze the results for a constant maximum EPO rate of 1000 U/day and when that rate is increased to 2000 U/day after the first bleeding; upper plots in Figs. 8 and 9. We observe that the NMPC algorithm is reacting to the bleeding by directly administering the maximum available EPO dose over a certain period of time. For u max = 2000 U/day this higher amount is administered but for a shorter period of time. Hence, the total drug amount, see Table 2, is only 13% and 16% higher. The Hgb curves are shown in Fig. 7. The recovery time after the first bleeding for patient 4 can be significantly reduced from about 36 to about 25 days by allowing higher administration rates. For patient 3 the differences are minor. The reason is that a reduction of the recovery time for patient 3 comes along with an increase of the overshoot because the constant endogenous EPO concentration E end for patient 3 is around 280. For patient 4 it is only 43. Hence, by reducing the administration rate for patient 4 the controller can abruptly let drop down the EPO concentration in plasma  Up to the first bleeding u max equals 1000. For patient 3 the first bleeding is at day 22 (7.5 g/d) and the second at day 80 (9.0 g/d). In case of patient 3 there is only one bleeding at day 60 (7.5 g/dl) which allows a fast recovery time without any overshoot. Even for the high maximum EPO rate of 25,000 U/day together with a minor control penalization (c γ = 10 −10 ) we do not observe an overshoot in patient 4. Note that the maximum rate is never reached; see Fig. 8c. But for patient 3 the overshoot gets more pronounced while the target range is reached some days earlier; Fig. 7b. Again, the maximum rate is not hit; Fig. 9c.

Missed administrations/dosing errors
In this section we consider a malfunction of the EPO pump. We simulate a complete failure to administer EPO for an entire day or that and incorrect rate is applied (not known in advance). In Fig. 10 we present the results for patient 1 with missed adminis- trations on days 23-25, 35-39 and 110-114. In addition, on days 80-84 the maximum amount u max = 1000 U/day gets administered by mistake. Missing an administration leads to a certain direct drop in the Hgb level. After every period of missed administrations the algorithm compensates for these by applying a higher EPO rate which is then gradually reduced in order to smoothly reincrease the Hgb concentration. After the period when the maximum rate is wrongly applied the algorithm administers nothing so that the Hgb level decreases and that an overshoot is avoided. But interestingly, it then restarts with the maximum amount before dropping on the desired level and a minor reincrease is even accepted. In doing so it achieves to balance the delayed effect of having administered no drug before and a later drastic drop-down is avoided. The total administered EPO dose is 50,853 U.
In Fig. 11 we present the results for periodically missed EPO administrations. More precisely, starting at day ten, every two weeks three days drop out. This scheme is chosen for analyzing if a periodic missing of administrations results in periodic administration rates. But as can be seen, the EPO rates after each period of missed treatments look different and the NMPC alogithm achieves to keep the Hgb level within the desired range if u max is set to 1500 U/day. Patient 5 demands comparatively high EPO rates so that the Hgb concentration falls below the target range if u max equals 1000 U/day. Note that the Hgb curve can be kept within the target range for  10-12, 30-32, 50-52, 70-72, 90-92 and u max = 1500. In addition, the optimal Hgb curve for u max = 1000 is shown in the upper plot u max = 1000 if there are no missed treatments. The total administered EPO doses belonging to Fig. 11 are 167,848 (u max = 1500) and 150,921 U/day (u max = 1000).

Constant EPO rates
Finally, we simulate different frequencies of rate change for the EPO pump. Note that less frequent adaptations could be a consequence of less frequent Hgb measurements. We investigate the effect of enlarging the period of a constant EPO rate from 1 day (Δ t = 1) to 1 week (Δ t = 7), 2 weeks (Δ t = 14), 3 weeks (Δ t = 21) and 4 weeks (Δ t = 28). This requires to adjust the length of the NMPC horizon to 4 weeks (constant period 1 week), 6 weeks (constant period 2 or 3 weeks) and 8 weeks (constant period The results for patient 4 are exemplarily shown in Figs. 12 and 13. The Hgb curves for the different constant periods are all stabilized within the target range. The same holds true for the other patients. From a mathematical point of view a longer constant period means a restriction of degrees of freedom. Therefore, the Hgb level is best controllable for the shortest constant period. This is why the Hgb level is closest to the line of 10.5 g/dl when the controller can adjust the doses on a daily basis. And the longer the constant period is, the larger are the oscillations around this line. The total EPO doses, see Table 3, differ only little. The results for patient 4 look similar to those from patient 1 which is very interesting. Figure 5 can lead one to assume that the patient requires a very specific and time sensitive administration pattern to stabilize the Hgb level. But even for a constant period of a whole month the Hgb can

Conclusion
The presented NMPC algorithm to correct anemia in HD patients was tested in various in-silico experiments and showed excellent performance in stabilizing simulated Hgb levels. The introduced framework uses a system of non-linear hyperbolic PDEs, which previously has been adapted to individual patients using clinically measured Hgb levels to predict individual patient response to EPO treatment (Fuertinger et al. 2018). The proposed NMPC would allow to optimize anemia treatment based on single patient data sets only. It would further allow to set individual Hgb targets for certain patient groups as proposed by the KDIGO work group. Thus, the presented work is a first step towards fully individualized anemia therapy. The conducted in-silico experiments show that a fixed optimization setting for the controller scheme is sufficient to correct the anemia of patients with very different characteristics in the underlying prediction model. However, the penalization of the control costs needs to be tuned on a subgroup or even per-patient level to minimize the amount of administered EPO. The NMPC method requires a comparatively long horizon to account for the system's large time delay. We have determined the required horizon length experimentally. Given this horizon length, the controller can handle the delay of the system response to treatment and achieves to stabilize Hgb levels even when presented with simulated events such as bleedings, missed administrations and EPO dosing errors. The presented controller has been tested under the assumption that treatment with EPO can be provided continuously. While this is an interesting in-silico experiment, such a therapy is currently clinically not possible as there are no "EPO pumps", similar to the insulin pumps used for diabetes treatment, available. As a first restriction on the control we have investigated to allow changes in the EPO rate less frequently resulting in constant EPO rates over several weeks. These lead to slight oscillations of the total RBC population around the target state. However, for periods of up to four weeks the oscillations are negligible.
Different controller schemes to correct anemia in HD patients have been proposed over the last years by various groups (Barbieri et al. 2015(Barbieri et al. , 2016Brier and Gaweda 2011;Brier et al. 2010;Martínez-Martínez et al. 2014;McAllister 2017;Nichols et al. 2011). Some of them have been successfully tested in clinical trials (Brier and Gaweda 2011;Brier et al. 2010) or even been implemented in the clinical routine (Barbieri et al. 2015(Barbieri et al. , 2016. Most of the proposed solutions to correct anemia are based on MPC techniques. In general, the time-varying nature of the process, long time delays in the system, high inter-individual variability in the specifics or the response of the system to treatment and the need to react to unforeseen events such as bleedings and missed doses renders the problem of correcting and stabilizing anemia in HD patients more suitable to MPC based controller approaches than PID (proportional-integral-derivative) ones. Previously presented and clinically validated MPC approaches have been based on neural networks as the underlying prediction model (Barbieri et al. 2015(Barbieri et al. , 2016Brier and Gaweda 2011;Brier et al. 2010). Such models require large training and validation sets to find the optimal weights for the neural network. The presented approach, where the underlying model is a system of coupled PDEs, although one of the more complex ones currently proposed, allows to optimize anemia treatment based on single patient data sets only. The required data from a single patient is routinely measured in clinics (as presented in Fuertinger et al. 2018). Thus, the proposed NMPC approach would provide a fully personalized anemia treatment strategy that even allows the setting of individual Hgb goals as suggested by the KDIGO guidelines. Further, estimated model parameters and their longitudinal development in individual patients potentially allow to gain further insights in the specifics of renal anemia. It might allow to better understand why some patients do not respond to treatment (e.g. short red blood cell life span versus insufficient bone marrow reaction to treatment).
In summary, the presented NMPC algorithm has the potential to bring more patients in the Hgb target range while decreasing Hgb variability and EPO utilization. However, we are still two major steps away from clinically testing the proposed NMPC approach: First, the control structure needs to be changed such that EPO is only administered during dialysis treatments (in general three times per week). With the chosen approach we will be able to analyze the effect of reducing administration times on Hgb stability. Second, the patient-model mismatch and uncertainty in parameter estimates together with measurement noise need to be addressed. In order to deal with parameter uncertainty in the underlying model, for instance, so called "robust MPC" methods would need to be incorporated into the framework. In addition, model estimates will need to be updated on a regular basis using the measured Hgb data. This is currently under investigation by the group but is beyond the scope of this publication.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

A Appendix: Parameters for the model and the methods
In Tables 4, 5, 6, 7 and 8 we present all parameters and units utilized in our numerical experiments.