FormalPara Key Points

Induction and post-induction therapeutic threshold levels associated with favourable treatment outcomes were determined in paediatric inflammatory bowel disease [1,2,3]. Model-informed precision dosing is a necessary tool to reach these desired infliximab trough concentrations.

In paediatric inflammatory bowel disease, the Fasanmade model deemed most suitable.

1 Introduction

Infliximab, a monoclonal antibody against tumour necrosis factor-α, is an effective treatment in the management of inflammatory bowel disease (IBD). However, a significant proportion of patients do not respond to infliximab initially (primary nonresponders), and around 50% of children eventually lose response over time (secondary loss of response) with a recurrence of symptoms. Almost 50% of cases with a secondary loss of response require dose intensification (increased dose or frequency) during maintenance therapy [4,5,6,7].

Response to infliximab therapy is strongly associated with drug exposure. Therapeutic drug monitoring (TDM), which involves measuring drug concentrations and interpreting these concentrations for adjusting further drug dosages, is used to optimise the effectiveness of infliximab in the induction and maintenance phases. Several paediatric studies have shown benefit in measuring trough concentrations and subsequently optimising therapy. For the target infliximab trough concentration in the maintenance phase, different thresholds were defined but ≥5.0 mg/L is a generally accepted cut-off [8, 9] except for specific situations such as perianal disease warranting higher trough concentrations.

Population pharmacokinetic (popPK) models offer a quantitative framework for individual dosing. Population pharmacokinetic models use Bayesian forecasting, where individual patient characteristics can be used to derive initial estimates of pharmacokinetics and drug exposure for the patient, even in the absence of levels. After drug concentrations become available, those estimates can then be adjusted using Bayesian methodology, which then allows dose adjustments.

Despite the advances of this model-informed precision dosing (MIPD) approach, its implementation in clinical practice has been limited thus far. We aimed to identify previously reported popPK models describing infliximab pharmacokinetics in paediatric patients with IBD and to evaluate the predictive performance of these models using a real practice dataset.

2 Methods

2.1 Identification and Evaluation of popPK Models of Infliximab in Children

We searched MEDLINE (through PubMed) from March 2010 to February 2023 with the following strategy:

figure a

All searches were carried out on 9 March, 2023. We included studies that focused on infliximab pharmacokinetic (PK) models in individuals with IBD, preferably aged ≤18 years. In total, 95 articles were identified. Of these, 88 were excluded after title and abstract screening. A total of seven infliximab popPK studies were thoroughly studied.

We then compared the models based on their structure and inclusion of covariates. Models were encoded and processed in NONMEM version 7.4 (Icon Development Solutions, Hanover, MD, USA). Goodness-of-fit plots for model evaluation were created in R (version 4.2.0, https://www.R-project.org/) and prediction-corrected visual predictive checks were generated in PsN (version 4.7.0, https://ascpt.onlinelibrary.wiley.com/doi/pdf/https://doi.org/10.1038/psp.2013.24) and the ‘vpc’ R package (version 1.0.1 (https://cran.r-project.org/web/packages/vpc/index.html) [Figs. SI 1–3 of the Electronic Supplementary Material (ESM)].

2.2 Clinical Data Used for Model Qualification and Evaluation

Data of 70 paediatric patients with IBD who started infliximab treatment in Zuyderland Medical Center or Maastricht University Medical Center in the Netherlands were repurposed. Previously, we had used these data in an exposure–response study to define optimal infliximab target concentrations at 6 and 14 weeks after initiating infliximab therapy. (submitted for publication).

All participating patients received a standard dose of 5 mg/kg bodyweight on weeks 0, 2 and 6, irrespective of the disease phenotype. No dose adjustments were made during the induction phase. From week 14 onwards, dose intervals varied between 4 and 8 weeks, based on the physician’s discretion. Children eligible for inclusion were all biological naive with Crohn’s disease (CD) or ulcerative colitis. A minimum of two trough concentrations per patient were required to be included in this MIPD study.

Patient characteristics were assessed retrospectively from routinely collected healthcare data and included age, sex, IBD type (CD, ulcerative colitis), location of disease, and concomitant treatment at the start and during the follow-up of infliximab therapy. Infliximab doses and intervals were noted along with patient’s anthropometric measures (body weight, height, age). C-reactive protein, faecal calprotectin, erythrocyte sedimentation rate, albumin and serum infliximab trough concentrations, and antibodies towards infliximab (ATIs) were measured before each infliximab infusion.

In the Zuyderland Medical Center, infliximab concentrations were measured with a commercially available, enzyme-linked immunosorbent assay (ELISA) kit (apDia, Turnhout, Belgium), which was implemented on an automated ELISA processor at the laboratory for clinical chemistry and haematology. When a patient’s infliximab concentration was <1.0 mg/L, ATIs were determined with the in-house ELISA kit of Sanquin Diagnostics (Amsterdam, the Netherlands) as described elsewhere [10].

Blood samples collected at Maastricht University Medical Center were sent to Sanquin Diagnostic Services (Amsterdam, the Netherlands) for evaluation of infliximab and ATI concentrations. The infliximab assays of both hospitals agreed sufficiently to be used interchangeably [11]. This study was approved by the Medical Research Ethics Committee of both Zuyderland Medical Center and Maastricht University Medical Center (respectively approval numbers METCZ20220045 and METC 2022-3300).

2.3 Accuracy and Bias for Predicted Versus Observed Infliximab Concentrations

To compare accuracy and bias, we computed the relative root-mean-square error (rRMSE, Eq. 1) and relative bias (rBias, Eq. 2) for each model [12, 13],

$${\text{rRMSE}} = \frac{{\sqrt {\frac{{\sum \left( {{\text{obs}} - {\text{pred}}} \right)^{2} }}{n}} }}{{\overline{{{\text{obs}}}} }} \cdot 100\% ,$$
(1)
$$rBias = \frac{{\sum \left( {obs - pred} \right)}}{{\overline{obs} }} \cdot 100\% ,$$
(2)

where n represents the total number of patients and obs and pred indicate the observed and forecasted infliximab concentrations, respectively. No rRMSE or rBias threshold for clinical acceptability was prespecified, but lower values indicated more precise predictions [14]. We did not compute the commonly reported mean absolute percentage error or mean percentage error as many observations in our dataset were close to zero. The definitions of mean percentage error and mean absolute percentage error include the observed value as denominator and as such, these parameters were inflated inappropriately and rendered unusable for the assessment of overall model bias.

Model comparisons were performed for three scenarios:

  1. 1.

    Prior dosing, predictions based solely on the covariates available for the patient and not using a Bayesian inference to update the predictions.

  2. 2.

    Bayesian forecasting with all data, in which predictions for the subsequent infliximab concentration was based on both the covariates for the patient as well as all infliximab trough concentrations available up to that point.

  3. 3.

    Bayesian forecasting based on the last level (T-1), the last two levels (T-2) or the last three levels (T-3), in which predictions with forecasting based on both covariates and Bayesian update from the last level, or last two or three levels. The analysis assumed that covariates were known at the time of the subsequent level, for example, it is known whether a patient has antibodies against infliximab at that time. This enabled evaluation of the impact of observed infliximab concentrations on the predictive performance in comparison to prior predictions, and the impact of the inclusion of multiple infliximab trough concentrations in addition to the most recent value.

2.4 Translated to the Clinic: Predicting Future Infliximab Concentrations Being ≥ 5 mg/L Using Various TDM Approaches

The forecasted concentrations obtained in Sect. 2.3 were dichotomised between whether the prediction was ≥ 5 or < 5 mg/L. A receiver operating characteristic curve was created to investigate the predictive ability of the PK model to predict whether the next level was higher or lower than this threshold. Additionally, sensitivity and specificity for the dichotomised forecasts of the PK model were calculated.

3 Results

3.1 Identification of popPK Models of Infliximab in Children

We identified seven models with our search strategy in PubMed [15,16,17,18,19,20, 21]. An additional model was identified built with Dutch adult patient data [22]. Although not the intended population, it was considered plausible that the population parameters described in this model might fit Dutch children better than paediatric models from different countries. Table 1 shows that the eight infliximab popPK models differed with respect to the structural model (both one-compartment and two-compartment models) and the covariates included (Table 1). All models used body weight and albumin as covariates, while some models included sex, age, erythrocyte sedimentation rate, ATI and/or concomitant use of immunomodulatory drugs, or whether the patient was in maintenance phase as a covariate explaining some of the variability in drug clearance.

Table 1 Overview of the eight candidate infliximab population pharmacokinetic models and the applied covariates for paediatric patients with IBD

3.2 Clinical Data

Seventy children were included in the final analyses with boys and girls equally divided (Table 2). There were 443 infliximab concentrations available for analysis (median 6 per patient, range 2–11), and the dosing history included a median of 17 doses (range 4–25) per patient with a follow-up time of 52 weeks. There were 50 patients with ≥ 5-mg/L trough concentrations and eight patients with ≥ 10-mg/L trough concentrations. The median (interquartile range) age at the start of therapy for the cohort was 14.7 (12.2–15.8) years, median disease duration before the start of infliximab therapy was 5.6 (1.2–16.0) months and 87% of patients had CD. Thirty-three children (47%) had concomitant immunosuppression at baseline.

Table 2 Baseline characteristics of 70 paediatric patients with IBD

3.3 Evaluation of Predictive Performance (Prior and Posterior Predictions)

Providing previous trough concentrations (TL-1, TL−2 or TL−3) improved rBias and rRMSE (Table SI-4 of the ESM, Fig. 1) for every selected model. Providing more than one previous TL improved rRMSE only marginally in the Bauman and Xiong model, all other models performed more optimally using only the last trough concentration.

Fig. 1
figure 1

Posterior predictions (relative root-mean-square error [rRMSE] and relative bias [rBias]) for each model, using all available drug levels, based on the last level (T-1), the last two levels (T-2) or the last three levels (T-3). Predictions on TL14 using TL6. Whiskers indicate the parametric 95% confidence interval for rBias and rRMSE. TDM therapeutic drug monitoring

For the Fasanmade model, which showed the lowest RMSE overall, in the posterior setting, the average bias was −0.7 mg/L. Individual forecasted concentrations for the Fasanmade model were within ±1.0 and ± 0.0 mg/L of the observed concentrations for 35.2% and 54.9% of measurements.

We conducted a subgroup analysis, determining rBias and RMSE in prior and posterior settings (see Fig. SI-5 of the ESM) for patients with and without immunomodulators (IMs). As IM use was high in our population, this could potentially be advantageous for the Fasanmade model. In the posterior setting, the use of concomitant immune suppression is non-contributory. The use of drug concentrations (TDM) is stronger. In the prior setting, it can make a difference.

We compared our study population with the populations used for model development in the referenced published PK models to look for similarities and differences. The covariates applied in the models, sex assigned at birth and weight (Table SI-7 of the ESM). Although our patients were slightly younger and lighter, there were no remarkable other differences. Immunomodulator use varied considerably between 22 and 100%.

3.4 Predicting That the Next Infliximab Concentration is in the Target Range (≥ 5 mg/L) Using the Fasanmade Model

Given that the Fasanmade model exhibited the highest overall precision, we present the corresponding receiver operating characteristic curves for predicting the next infliximab concentration to be ≥ 5 mg/L in Fig. 2. Sensitivity and specificity for the forecasting approaches are summarised in Table 3. The receiver operating characteristic clearly shows that any MIPD approach using TDM is more predictive than not using TDM levels (red solid line). The area under the receiver operating characteristic curve (AUROC) for the most predictive TDM approach was significantly different from the AUROC for the approach that did not use TDM (we confirmed that 95% confidence intervals for the AUROC values were not overlapping between results with/without TDM, i.e. the equivalence of a z-test; normality was assumed and standard estimates were obtained) [23]. However, in this example, the differences between the various MIPD approaches were small meaning that inputting the last infliximab concentration was almost as good as inputting all known infliximab concentrations up to that moment. For the trough concentration 6 weeks after initiating infliximab therapy, predicting any next trough concentration has low sensitivity and specificity, but predicting the trough concentration 14 weeks after initiating infliximab therapy is possible with high sensitivity (84%) and specificity (90%) [AUROC 0.914].

Fig. 2
figure 2

ROC curve for next IFX level ≥ 5 mg/L, for one prior and four posterior methods using the Fasanmade model

Table 3 Sensitivity, specificity and area under the receiver operating characteristic curve for iteratively predicting future IFX concentrations being ≥ 5 mg/L using various therapeutic drug monitoring approaches (Fasanmade model)

4 Discussion

4.1 Key Results

We observed a substantial level of agreement between observed and predicted infliximab concentrations in children with IBD for all eight popPK models. The Fasanmade popPK model performed best in our cohort of Dutch children. It was most precise in predicting the next target trough concentration, but differences between models were small. Using Bayesian forecasting alongside poPK models enhanced precision in predicting the next infliximab trough concentration. Inputting only the last trough concentration was almost as good as inputting all infliximab trough concentrations known at that moment. Prediction of the post-induction trough concentration (14 weeks after starting infliximab therapy) was excellent using the induction trough concentration 6 weeks after starting infliximab therapy.

4.2 Comparison with Other Studies

In a study by Samuels et al. [24], the aim was to evaluate the precision and bias of three popPK models in children with CD. They concluded that precision dosing in children with CD should be largely based on the last known infliximab concentration and the most recent covariates of drug clearance, which is in line with our findings. The three included models in their study (Fasanmade, Xiong and Dotan) performed equally in their study, comparable to our results. It also corresponds to the study of Kantasiripitak et al., which states that “predictive performances of all models greatly improved by considering at least one infliximab trough concentration, where using additional previous trough concentrations improved the predictions only marginally” [25]. On the contrary, Eser et al. [13] published that model performance improved as additional observed data were included in the forecast; they explained additional covariate/observations to update the model and help adjust for analytical and sample timing errors.

Frymoyer et al. reported low bias and good precision for their model fits, but did not evaluate forecasting performance [26]. Bauman et al. published their model in 2020 and validated it in a cohort of 228 paediatric patients; the reported AUROC for prior predictions was 0.86 versus 0.90 when predictions were updated with drug concentrations (posterior predictions), which is comparable, but slightly higher than ours. However, it was not clear whether the AUROC was based on prospectively predicted or “fitted” data.

4.3 Strengths and Limitations

The strength of our study is the use of a real-world practice dataset with rich PK sampling and clinical data. Additionally, we measured trough concentrations in the induction and maintenance phases. The excellent predictive ability of the PopPK models for the week 14 trough concentration with input of the week 6 trough concentration is clinically relevant as week 14 trough concentrations are strongly associated with long-term favourable treatment outcomes in several studies [1, 27, 28]. With this information, MIPD could be used to adjust infliximab dosing in the critical induction phase.

There are limitations in our study. First, the sample size was relatively small. Second, we did not include the novel biomarker that Xiong et al. used in their study in our analyses as we did not have access to these data and assumed a value of neutrophil CD64 = 4.6 for every patient, possibly underestimating the predictive performance of the Xiong et al. model.

Third, in the Fasanmade model as reported for children, ATI has not been included as a covariate. We used the version reported by Fasanmade for the overall population (children and adults) to allow the effect of ATI and the use of immunomodulatory drugs to be integrated into predictions, which is a definite advantage over the Fasanmade children-only model. The downside, however, is that the parameters in the model might be slightly less tailored to describe children data and more tailored towards the adult population, as the adult population was about five times larger in the training dataset. In any case, we believe MIPD can be performed with and without the ATI status provided recent infliximab concentrations are available and used in the Bayesian update, with more recent samples potentially weighted more heavily than earlier samples, or using only the last sample as used in one of the tested scenarios.

4.4 Implications for Clinical Practice and Further Research

The use of popPK models and Bayesian forecasting can help healthcare professionals tailor the dosing of infliximab for individual patients. Prospective research is necessary especially to see whether a better achievement of target trough concentrations correlates with better outcomes. Additional covariates and/or pharmacodynamic markers (such as calprotectin) would be of interest for future studies.

5 Conclusions

This work presents an evaluation of the predictive performance of eight published PopPK models in patients with IBD using real-world practice data. The Fasanmade model showed the highest overall accuracy (including classification accuracy [correctly identifying the next infliximab trough concentration to be ≥5 mg/L]), although differences between models were small. Predicting 14-week trough concentrations were excellent when the Fasanmade PopPK models was used with the input of week 6 trough concentrations.

These findings pave the way for more personalised and effective management of paediatric IBD with infliximab, contributing to better long-term disease control and patient well-being. However, further research and prospective clinical trials are needed to fully establish the clinical utility and widespread adoption of this approach in real-world practice.