Model reduction in mathematical pharmacology

In this paper we present a framework for the reduction and linking of physiologically based pharmacokinetic (PBPK) models with models of systems biology to describe the effects of drug administration across multiple scales. To address the issue of model complexity, we propose the reduction of each type of model separately prior to being linked. We highlight the use of balanced truncation in reducing the linear components of PBPK models, whilst proper lumping is shown to be efficient in reducing typically nonlinear systems biology type models. The overall methodology is demonstrated via two example systems; a model of bacterial chemotactic signalling in Escherichia coli and a model of extracellular regulatory kinase activation mediated via the extracellular growth factor and nerve growth factor receptor pathways. Each system is tested under the simulated administration of three hypothetical compounds; a strong base, a weak base, and an acid, mirroring the parameterisation of pindolol, midazolam, and thiopental, respectively. Our method can produce up to an 80% decrease in simulation time, allowing substantial speed-up for computationally intensive applications including parameter fitting or agent based modelling. The approach provides a straightforward means to construct simplified Quantitative Systems Pharmacology models that still provide significant insight into the mechanisms of drug action. Such a framework can potentially bridge pre-clinical and clinical modelling - providing an intermediate level of model granularity between classical, empirical approaches and mechanistic systems describing the molecular scale. Electronic supplementary material The online version of this article (doi:10.1007/s10928-018-9584-y) contains supplementary material, which is available to authorized users.

1 Overview of model reduction methods 1

.1 Conservation analysis
To understand the nature of conservation relations and how they might be found computationally, first note that a common means for representing the network structure underlying a system of chemical equations is that of the stoichiometry matrix. This is an n × m matrix S, with each of the rows corresponding to a single species and each of the columns to a reaction. The matrix is populated such that its entries s ij give the net value of the stoichiometric coefficients (product minus reactant) of the i-th species in the j-th reaction. If the concentration of a particular species is not affected by a reaction the corresponding entry is populated with a 0. Hence, the sign of the entry indicates whether the species is a net reactant or a net product in the relevant reaction. A positive sign implies that the species is a product (i.e. the number of molecules is increased by the reaction), whilst a negative sign indicates that the species is a reactant (i.e. the number of molecules is decreased).
This matrix can be considered as mapping the vector of reaction rates v(x(t)) to the change in species concentration. Hence it is possible to represent a system of ODEs describing such chemical interactions in the formẋ (t) = Sv(x(t)).
Now turning to conservation analysis and following the ideas outlined by Reder [1], the existence of conservation relations in a model implies that Γẋ(t) = 0 (2) where Γ is an h × n matrix that will be referred to here as the conservation matrix, the rows of which represent the linear combinations of species that are constant in time. Alternatively, by integration, the h individual elements of which are known as conservation relations, with c ∈ R h representing a set of constants known as conserved values. It is hence possible to express a model containing conservation relations in the form of a system of differential algebraic equations (DAEs). To see this, first partition x into two subsets: x d an h dimensional subset of the species with each element corresponding to a single species involved in a given conservation relation termed the dependent species. And x i an n − h dimensional subset accommodating all remaining state-variables, termed the independent species, such that Then from equation (3) Γ This is a system of linear equations and hence if Γ is expressed in reduced row echelon form, such that with I h representing the h dimensional identity matrix and N 0 a h × (n − h) matrix, it becomes apparent that This implies that the subset of dependent species x d can be eliminated from the governing system of ODEs by substituting in the appropriate element of equation (7). Hence, given the stoichiometric form given in equation (1), a system exhibiting conservation relations can be expressed in the form of a semi-explicit system of DAEs, such thatẋ where equation (8b) has been exploited in equation (8a) to obtain a system of ODEs such that state-variables x d are no longer explicitly given. Additionally, S i here represents the rows of the stoichiometric matrix corresponding to the independent state-variables x i . Obtaining the conservation matrix Γ, particularly for large systems, is often not feasible from simple inspection. To understand a more algorithmic approach for obtaining this matrix, begin by recalling the stoichiometric form of a model. Decomposing the stoichiometric matrix via the same partition as the set of species leads to the system However, via differentiation of equation (7) x Hence, S d = −N 0 S i and therefore each conservation relation can be seen as corresponding to a linear dependency in the stoichiometry matrix. As such, conservation relations can be found by seeking the left null space Z n of S (i.e. via finding the null space of S T ) such that and hence Z n S = 0. This implies that and therefore via comparison to equation (2) it is clear that such that the conservation matrix is equal to the transpose of the left null space of the stoichiometry matrix. Hence, a more mathematically rigorous approach for finding conservation relations has been provided. For very large systems (i.e. n > 100) S will be a large and, typically, sparse matrix. As a result, solving the system of linear equations S z = 0 for each conservation relation may not always be numerically stable or efficient under traditional approaches such as Gaussian elimination. Therefore, the combined algorithm employs QR factorisation via Householder reflections, as is discussed in the main text.

PBPK model summary
The specific PBPK modelling framework analysed in the main manuscript was originally described in Jones and Rowland-Yeo [2]. Letting C T issue (t) represent the instantaneous concentration of the drug in each named tissue compartment, this framework can be used to yield the following system of ODEs dC dC dC dC where Q T issue terms represent the total blood flows associated with each of the tissues, V T issue terms describe the physiological volumes of each of the associated tissue compartments, and Kp T issue terms relate to the compound specific tissue to plasma partition coefficients. Additionally, there are two inputs into the model u 1 (t) and u 2 (t) representing routes of drug administration. u 1 (t) represents an oral dosing route and u 2 (t) represents intravenous dosing and can be defined to represent any pattern of dosing, including infusion or bolus. Given the ODEs defined above, this 16 dimensional model is linear, and can be expressed in a control affine state-space representation of the form where x(t) is defined as a vector representing the instantaneous tissue concentrations, such that

Application of Balanced Truncation
In this section we will describe how balanced truncation can be applied to yield a reduced description of the PBPK model described by the system of equations given in (15). Balanced truncation requires that we begin with a system in the form (17b) From the system of equations given in (15) and both the physiological parameterisation and the drug specific parameterisations given for Midazolam in the main manuscript, we obtain the following matrix If we additionally define an input u(t) to represent an oral dosing route and define 2 outputs, y(t) = (y 1 (t), y 2 (t)) representing the concentration in the gut compartment and the intravenous concentration respectively, we can additionally define the matrices B and C as Given this formulation, it is then possible to obtain the controllability and observability Gramians by solving the Lyapunov equations,  , and its inverseT such that the reduced 3 dimensional system can be constructed as Finally, this gives the following reduced system of differential equations with the approximations for the outputs given as

Chemotaxis model summary
The specific model of chemotactic signalling analysed in the main text was originally described in Tindall et al. [3]. It is an 11 dimensional, nonlinear model describing a system of 12 biochemical reactions given by CheA + CheY k9 k−9 CheA · CheY, Whilst a more detailed account of this network can be found in the original paper, the model can be broadly understood as follows: Here, CheA represents a histidine kinase whose rate of autophosphorylation is modulated via extracellular attractant-receptor binding (in this case represented in equations (20a) and (20c) by the reaction rates k 1 (L) and k 10 (L), where L represents the concentration of extracellular ligand). An increase in attractant binding results in a decreased rate of phosphorylation of CheA. Two response regulator proteins (here represented by CheB and CheY and their interactions in equations (20b), (20d), and (20h)) compete to bind with protein CheA. Once bound CheA P will transfer its phosphoryl group, hence providing the only means of phosphorylation for proteins CheB and CheY. In the case of protein CheY this phosphotransfer is also reversible via the reactions given in equation (20c). After they have been phosphorylated both response regulator proteins steadily auto-dephosphorylate as given by equations (20f) and (20g). In the case of protein CheY, however, this process can be greatly accelerated by forming a complex with the phosphatase CheZ as described by equation (20e). Note that it is the concentration of phosphorylated protein CheY that regulates the process of chemotaxis by binding with parts of the flagellar motor complex (the process is not explicitly described in this model). The model differentiates itself somewhat from previously published approaches in that explicitly describes the intermediary complexes involved in the phosphorylation of proteins CheB and CheY. Employing the Law of Mass Action, chemical equations (20) can be modelled by the following system of ODEs dx 6 (t) dt = k 11 x 8 (t)x 9 (t) − k −11 x 6 (t) − k 12 x 6 (t),  [3].
The specific parameterisation employed for this model can be found in Table 1.
To understand the input u, note that receptor binding of the extracellular chemotactic ligand (denoted by L) is described by Michealis-Menten kinetics, such that where K is a Michaelis-Menten constant and h is a Hill-coefficient. For the sake of simplicity the input u(t) is thus defined here to be such that k 1 (L) = k 10 (L) = k 1 u(t).

Reduction
In the following sections we will describe how the model of bacterial chemotaxis can be reduced via nondimensionalisation, conservation analysis and lumping to yield the results given in the main manuscript.

Conservation Analysis
Now we select species CheA, CheA · CheY, CheZ, and CheB P for algebraic replacement via the application of conservation relations.This can be achieved by substituting the following algebraic equations into the set   Table 3. An additional set of nondimensionalised parameter values associated with the chemotaxis signalling pathway model in E. coli as defined by equation (29).

.1.3 Lumping
We then proceed to reduce the system via proper lumping to a 4 dimensional form. In this section we will symbolically reproduce this reduced model. Through the forward selection procedure described in the main text, the algorithm determines that the optimal scheme is to lump together state-variables z 3 (τ ), z 4 (τ ), and z 8 (τ ) as one lumped variable, and z 7 (τ ) and z 10 (τ ). These lumped variables correspond to the original species CheA P · CheY, CheA · CheY P , CheY P , and CheY, CheB, respectively. If we assume a vector of state-variables, such that y(τ ) = [y 2 (τ ), y 3 (τ ), y 4 (τ ), y 6 (τ ), y 7 (τ ), y 8 (τ ), y 10 (τ )] , then this scheme corresponds to the lumping matrix Hence we have a set of reduced state-variablesx(τ ) = Lz(τ ) such that Following the general descriptions provided in the main text it is now necessary to construct a generalised right inverseL of the lumping matrix prior to application of the Galerkin projection. Such a matrix can be constructed by computingL where X represents a diagonal matrix, typically containing steady-state values of the system. In the case of the nondimensionalised model described above, this would yield a matrix of the form Given these matrices it is then possible to construct a reduced description of the system's dynamical behaviour via the Galerkin projection, this yields the analytically reduced system where the new parameters (γ 1 , γ 2 , γ 3 , γ 4 , and γ 5 ) are defined in Table 4.

ERK activation model
An SBML formatted version of the ERK activation can be obtained at at www.ebi.ac.uk/biomodels-main/ BIOMD0000000049. Many models such as this can commonly be obtained from online model repositories stored in the form of Systems Biology Markup Language (SBML) -a standardised format for the representation, storage and easy communication of Systems Biology models [4]. Publicly open databases, such as the BioModels Database, contain thousands of such models enabling researchers to share their work in a more accessible way. The ERK activation model, outlined in Sasagawa et al. [5], describes extracellular signal-regulated kinase (ERK) activation as mediated via the epidermal growth factor (EGF) and the nerve growth factor (NGF) receptor pathways. It features 99 chemical species and 150 reactions. Due to the size and complexity of the ERK activation model it is not practical to provide the same level of detail into its reduction as was provided for the model of bacterial chemotaxis signalling and the PBPK model. The general procedure We have, however, provided a number of Matlab files that describe the form of the model at various stages of reduction along with its initial conditions, associated lumping matrices and their generalised inverses. Use of these files does require Matlab's Symbolic Math toolbox. All of this is made available in Additional File [???].