A Combination of Machine Learning and PBPK Modeling Approach for Pharmacokinetics Prediction of Small Molecules in Humans

Li, Yuelin; Wang, Zonghu; Li, Yuru; Du, Jiewen; Gao, Xiangrui; Li, Yuanpeng; Lai, Lipeng

doi:10.1007/s11095-024-03725-y

A Combination of Machine Learning and PBPK Modeling Approach for Pharmacokinetics Prediction of Small Molecules in Humans

Review Article
Open access
Published: 25 June 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Pharmaceutical Research Aims and scope Submit manuscript

A Combination of Machine Learning and PBPK Modeling Approach for Pharmacokinetics Prediction of Small Molecules in Humans

Download PDF

Yuelin Li¹,
Zonghu Wang¹,
Yuru Li¹,
Jiewen Du¹,
Xiangrui Gao¹,
Yuanpeng Li¹ &
…
Lipeng Lai ORCID: orcid.org/0009-0007-1147-8586¹

246 Accesses
Explore all metrics

Abstract

Purpose:

Recently, there has been rapid development in model-informed drug development, which has the potential to reduce animal experiments and accelerate drug discovery. Physiologically based pharmacokinetic (PBPK) and machine learning (ML) models are commonly used in early drug discovery to predict drug properties. However, basic PBPK models require a large number of molecule-specific inputs from in vitro experiments, which hinders the efficiency and accuracy of these models. To address this issue, this paper introduces a new computational platform that combines ML and PBPK models. The platform predicts molecule PK profiles with high accuracy and without the need for experimental data.

Methods:

This study developed a whole-body PBPK model and ML models of plasma protein fraction unbound ($f_{up}$), Caco-2 cell permeability, and total plasma clearance to predict the PK of small molecules after intravenous administration. Pharmacokinetic profiles were simulated using a “bottom-up” PBPK modeling approach with ML inputs. Additionally, 40 compounds were used to evaluate the platform’s accuracy.

Results:

Results showed that the ML-PBPK model predicted the area under the concentration-time curve (AUC) with 65.0$\%$ accuracy within a 2-fold range, which was higher than using in vitro inputs with 47.5$\%$ accuracy.

Conclusion:

The ML-PBPK model platform provides high accuracy in prediction and reduces the number of experiments and time required compared to traditional PBPK approaches. The platform successfully predicts human PK parameters without in vitro and in vivo experiments and can potentially guide early drug discovery and development.

A hybrid modeling approach for assessing mechanistic models of small molecule partitioning in vivo using a machine learning-integrated modeling platform

Article Open access 27 May 2021

Utility of physiologically based pharmacokinetic (PBPK) modeling in oncology drug development and its accuracy: a systematic review

Article 05 July 2018

Current Approaches for Predicting Human PK for Small Molecule Development Candidates: Findings from the IQ Human PK Prediction Working Group Survey

Article 19 July 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Pharmacokinetics (PK) is a critical aspect of drug development, as it describes the absorption, distribution, metabolism, and excretion(ADME) of compounds in the body. During preclinical stages, lead compounds undergo evaluation for their PK properties through in vitro and in vivo animal experiments. The results of these evaluations can be used to rank compounds or optimize their structures based on the correlation between their physicochemical and PK properties. Moreover, these in vitro and animal PK results can be leveraged to predict human PK phenomena and guide clinical trial design through allometric scaling, compartment models, or PBPK models.

Table 1 Drug-specific Input Parameters from in vitro Experiments and Prediction

Full size table

In contrast to traditional PK models with allometric scaling, PBPK models have the ability to predict drug concentrations in plasma and various tissues without the need for animal experiments. As a result, the application of PBPK models has significantly increased in drug discovery and development over the past few years [1]. Three approaches are commonly used in PBPK model prediction, including “top-down,” “middle-out,” and “bottom-up.” The top-down approach relies predominantly on observed clinical data, while the middle-out approach combines both in vitro and vivo information to determine unknown or uncertain parameters of the model [2]. The bottom-up approach, in particular, offers the potential to minimize or replace animal PK studies, as it relies solely on in vitro data for drug-related input parameters. However, IMI-Oral Biopharmaceutics Tools projects show limitations of a “bottom-up” approach in human PK predictions, with only half of the area Under the Concentration-Time Curve (AUC) predictions being within a 2-fold prediction error [3]. This accuracy may be affected by errors in in-vitro experiments or the accuracy of clearance prediction using in vitro hepatic systems. Such limitations may be overcome by using ML models to predict physicochemical properties directly from structures.

Drug-specific input parameters, such as $f_{up}$, intrinsic hepatic clearance, and volume of distribution ($Vd_{ss}$), have been well predicted using in silico models. Previously, Doha combined in vitro and ML inputs with a minimal PBPK model and evaluated 240 compounds in rats. ML inputs with fup, LogD, and CL showed only 36.1$\%$ of systemic plasma AUC within a 2-fold prediction error [4]. Vector Group then developed a high-accuracy machine learning-integrated modeling platform, a whole-body PBPK model with an optimized Vdss prediction method [5].

Several studies have been conducted to optimize the calculation methods of Vdss to improve the accuracy of PBPK predictions. However, it is important to note that clearance also plays a significant role in PK prediction. Clearance, which determines the rate of drug elimination from the body, occurs in the liver, kidney, and bile. Early drug discovery focuses primarily on liver metabolism using in vitro experiments in hepatocytes or microsomes. As a result, several PBPK models only consider hepatic clearance from the in vitro to in vivo exploration (IVIVE) approach. This approach may result in the misprediction of clearance due to the exclusion of certain renal or bile elimination processes. Bowman has reported underprediction of clearance from the IVIVE method, with a 42.2$\%$ error rate within a 2-fold margin of error in the microsome system [6]. This highlights the need for total clearance to improve the accuracy of PBPK modeling in the early discovery stage.

In our study, we have developed a rapid ML-PBPK model platform that enables the simulation of human PK from compound structures. $F_{up}$, Caco-2 cell permeability, and total plasma clearance for humans were predicted using ML models. These predicted results were then used as input parameters for a whole-body PBPK model encompassing 14 tissues. The prediction accuracy of the platform was evaluated for 40 drugs PK profiles in humans to define its applicability for use in early discovery and clinical phases.

Materials and Methods

Data Collection

The human $f_{up}$ model relies on two data sources: the Watanabe study, which provides data for 2139 compounds, and the Votano study [7, 8], which provides data for 808 compounds. Overlaps between the two datasets were checked, and compounds were removed if two records had values greater than a 2-fold difference. Compounds with values that differed by less than 2-fold were kept from Watanabe’s study due to having more significant figures. Caco-2 cell permeability data for 6083 compounds were collected from public sources [9,10,11]. The human CLt model used intravenous PK parameters from Lombardo’s study [12]. Compounds were removed if they had duplicates or invalid SMILES, a molecular weight greater than 900 Da, or where CL was none. Additionally, 40 molecules that overlapped with the experimental data were removed. Finally, we created three datasets: $f_{up}$ containing 2292 compounds, Caco-2 containing 6083 compounds, and CLt containing 1215 compounds.

The human plasma PK data for the 40 tests in Table I were extracted from previously published papers in supplementary. All PK data were digitized using the free online tool WebPlotDigitizer [13]. Figure 1 presents the statistics regarding number of PK studies and data points for which PK data were collected.

ML Model Building

Compounds SMILES were standardized using the ChEMBL standardizer [22]. Three different methods (RDKit, Mordred, and PaDEL-Descriptors) were used to calculate descriptors [23]. These methods generated molecular physicochemical properties for each molecule, resulting in 1826 (Mordred), 1444 (PaDEL), and 208 (RDKit) features used for model construction.

The datasets were split into training, validation, and test sets using random selection at an 8:1:1 ratio. Features with variance values less than 0.05 or those with the same information and a correlation coefficient higher than 0.9 were removed. The Boruta Algorithm [24] was also used to select significant features in a given data set.

Four common approaches to molecular property prediction were used to build $f_{up}$, Caco-2, and CLt prediction models. These approaches [25, 26] included Support Vector Machine Regression (SVR), Random Forest (RF), XGBoost (XGB), and Gradient Boost Machine (GBM). In contrast to traditional chemical descriptors, message-passing neural networks (MPNNs) have exhibited advancements in molecular modeling and property prediction. MPNNs are a group of graph convolutional neural networks (GCNs) variants that can learn and aggregate local information of molecules through iterative message-passing iterations. Recently, Yang et al. [27] have proposed a directed MPNN (D-MPNN) and built the open-source package Chemprop for implementation of D-MPNN. D-MPNN constructs a learned molecular representation by operating on the graph structure of the molecule and passing a message through the edge-dependent neural network. In this study, D-MPNN builds the model based on different datasets and uses RDKit descriptors incorporated into D-MPNN to further improve performance.

The hyperparameters of the regression models were optimized with Bayesian optimization search. Five-fold cross-validation was used to check the stability and predictive ability of the model. Additionally, the performance of the regression models was assessed by the coefficient of determination ($R^{2}$) and root-mean-square error (RMSE).

PBPK Model Building

Figure 2a shows the compartment model for each tissue, which includes plasma, blood cells, interstitial space, and intracellular space, as previously discussed by Kawai [28].

Molecules move between adjacent compartments through passive diffusion and connect to the circulatory system through blood flow. The passive diffusion rate of drugs in each tissue was calculated by multiplying the cell permeability (P) by a tissue compartment surface area (SA) [29]. The cell permeability rate of tissues was assumed to be the same and was obtained through the Caco-2 cell system in this PBPK model. Figure 2c illustrates the structure of blood vessels, which comprises only plasma and blood cells.

In the ML-PBPK model, three key parameters, fup, caco-2 cell permeability, and CLt, were taken from predictions by ML models, and systemic drug elimination was assumed to occur in venous blood plasma through CLt. On the other hand, in the in vitro input model, a PBPK model using in vitro inputs, all in vitro parameters were taken from experiments. Specifically, elimination processes include hepatic clearance from microsomes or Hepatocytes stability experiments using the IVIVE method and renal clearance as glomerular filtration rate. The differential equations for venous blood vessels are described below as Eqs. 1 and 2. Where parameters are expressed as Q (blood flow), PSA (permeability-surface area), K (partition coefficient), CL (plasma clearance); bc and pls represent the blood cell and plasma compartment, and C is drug concentration in specific compartment. The equations for arterial blood and the portal vein are the same as for venous blood, except for the elimination process.

$$\begin{aligned} \frac{\textrm{d}}{\textrm{d}t} \, A_{venous_{bc}}= & {} \sum _{i=tissue}(Q_{i_{bc}}*C_{i_{bc}})-Q_{lung_{bc}}*C_{lung_{bc}}\nonumber \\{} & {} +PSA_{pls\leftrightarrow bc}*(C_{venous_{pls}}-\frac{C_{venous_{bc}}}{kbc})\nonumber \\ \end{aligned}$$

(1)

$$\begin{aligned} \frac{\textrm{d}}{\textrm{d}t} \, A_{venous_{pls}}= & {} \sum _{i=tissue}Q_{i_{pls}}*C_{i_{pls}}-Q_{lung_{pls}}*C_{lung_{pls}}\nonumber \\{} & {} -PSA_{pls\leftrightarrow bc}*(C_{venous_{pls}} -\frac{C_{venous_{bc}}}{kbc})\nonumber \\{} & {} -CL_{t}*C_{venous_{pls}} \end{aligned}$$

(2)

All physiological parameters were adapted from literature [30,31,32], including tissue volumes, blood flow rates, surface areas, tissue compositions, and tissue pH. Tissue partition coefficients (Kp) and blood: plasma ratio (BP) were calculated based on the Rowland-Roger method [33]. Drug physicochemical properties such as LogP, molecular weight, and pKa values were predicted from structure using ChemAxon. In vitro parameters such as $f_{up}$, BP, Caco-2 cell permeability and intrinsic hepatic clearance were obtained from previous publications [14, 18, 34]. Physicochemical inputs for the ML-PBPK model simulation were predicted by ML models.

When administered intravenously in a short time, such as a bolus, the maximum concentration ($C_{max}$) in venous plasma is often reported to over-predict compared to the clinical PK profiles. Prediction errors may be due to the different sampling sites, as clinical samples are usually taken from a peripheral vein in the arm [35]. To avoid this, the plasma concentration profile in peripheral blood was chosen to evaluate prediction accuracy with observed PK.

Table 2 Statistics Results of ML Model of Human $f_{up}$, Caco-2 cell Permeability and CLt

Full size table

Prediction Performance Assessment

PBPK models were used to simulate concentration-time profiles of tested drugs. Inputs for the models included machine learning and in vitro experimental data. Python was used for model development, and the matplotlib package was used to generate figures.

To evaluate the predicted PK data’s accuracy, non-compartmental analyses were conducted. This involved calculating important parameters such as half-life ($T_{1/2}$), area under the curve ($AUC_{0-\infty }$), clearance (CL), and volume of distribution at steady-state ($Vd_{ss}$) using specific equations. $AUC_{0-\infty }$ and area under the moment curve ($AUMC_{0-\infty }$) were calculated using the linear-trapezoidal method. The elimination rate constant ($k_{el}$) was calculated using the linear regression method. Mean residence time (MRT) was calculated by AUMC/AUC.

$$\begin{aligned} T_{1/2}= & {} \frac{\ln 2}{k_{el}}\end{aligned}$$

(3)

$$\begin{aligned} AUC_{0-\infty }= & {} \sum \limits _{i=1}^{n-1}\frac{(C_i+C_{i+1})}{2*(t_{i+1}-t_i)} +(\frac{C_{last}}{k_{el}})\end{aligned}$$

(4)

$$\begin{aligned} CL= & {} \frac{Dose}{AUC_{0-inf}}\end{aligned}$$

(5)

$$\begin{aligned} Vd_{ss}= & {} \frac{MRT}{CL} \end{aligned}$$

(6)

The accuracy of the predicted PK data was measured by calculating the average fold error (AFE) for each PK parameter Eq. 7. The total number of testing molecules was represented by n. This metric was used to evaluate the overall prediction accuracy of the model. Additionally, the prediction accuracy of the model was assessed by determining the percentage of prediction error within a 2-fold range for each PK parameter.

$$\begin{aligned} AFE = 10{^{\frac{1}{n}\,\sum _{i=1}^{n}\log (\frac{predicted_{i}}{observed_{i}})}} \end{aligned}$$

(7)

Results

ML models

Five ML methods were used to construct models for predicting human $f_{up}$, Caco-2 cell permeability, and CLt. These methods included SVR, RF, XGB, GBM, and D-MPNN. A series of models were built for each parameter using different training sets. Table II presents the statistical evaluation results of the ML models in training and testing.

The D-MPNN models outperformed the other models for all three parameters (Fig. 3). For predicting human $f_{up}$, the D-MPNN model achieved an $R^{2}$ of 0.92 for an independent training set of 2292 compounds and predicted 77.5$\%$ (31/40) of the test set within a 2-fold prediction error. The D-MPNN model exhibited the highest $R^{2}$ value of 0.95 for the human Caco-2 training set, compared to the GBM model with an $R^{2}$ of 0.55. Additionally, the D-MPNN model demonstrated the best predictive ability for human CLt compared to the other models, with 67.5$\%$ of the 40 testing compounds predicted within a 2-fold prediction error.

Overall, the best models were used to predict the human $f_{up}$, Caco-2 cell permeability, and CLt of the forty compounds. These predicted results were then used as inputs in PBPK models.

PBPK Models

For PBPK model with in-vitro inputs, the physicochemical properties are obtained from experimental values reported in literature, such as Caco-2, and the metabolism is based on LMS. On the other hand, for ML inputs, the physicochemical properties including metabolism clearance are predicted entirely using ML models.

Figure 4 compares predicted and observed PK parameters for 40 compounds in humans. A table with details on the prediction accuracy of each drug is in the supplementary. All parameters exhibit a good correlation between observed and simulated values, except for CL. Pearson correlation coefficient values ($R^{2}$) range from 0.6-0.9. Prediction accuracy of $AUC_{0-\infty }$ is 65$\%$ (26/40), with slightly better performance in the ML-PBPK model compared to the in vitro inputs model, which had 47.5$\%$ (19/40) within 2-fold error (Table III).

The ML-PBPK model showed relatively good results for CL prediction, with AAFEs of 2.00 and 2.59 and $R^{2}$ values of 0.4 and 0.21 respectively, compared to the in vitro inputs model. The predicted/observed ratios of PK parameters show a narrow range with ML inputs (Supplementary Figure 1), indicating a good agreement between predicted and observed values. Both models showed over- or under-predicted values of CL, with median predicted/observed ratios of 1.37 and 0.68. However, drugs extensively excreted in their unchanged form in urine and with elimination rates higher than normal GFR showed better prediction results with ML inputs, such as Vinorelbine.

Table 3 Prediction Accuracy of PK Parameters

Full size table

The results of $Vd_{ss}$ in the ML-PBPK model were similar to those of the in vitro inputs model, as the same tissue partition coefficient calculation method was chosen. $Vd_{ss}$ describes the overall drug distribution in plasma and tissues. In the PBPK model mechanism, drug-related parameters such as $f_{up}$, cell permeability, and Kp values affect drug distribution into tissues. Since the Kp values for tissues were the same in both the ML-PBPK and in vitro inputs models, comparable $Vd_{ss}$ values indicate that ML prediction of $f_{up}$ and Caco-2 cell permeability was able to replace experimental values without compromising accuracy.

The ML-PBPK model was also more efficient than the in vitro inputs model, with a runtime of only 10 seconds per simulation compared to the few days it may take to collect experimental data and perform model simulations. Overall, the ML-PBPK model was found to be a fast platform that accurately predicts human PK profiles in plasma and tissues.

Discussion

The development of PBPK models has been a significant advancement in pharmacology. Initially, in vitro data were used to create these models to predict animal and human PK. However, the accuracy of PBPK models and the integrity of input parameters were limited because these measurements did not fully capture the complexity of the human body. As a result, there has been growing interest in developing PBPK models that incorporate inputs without experiments.

The integration of machine learning (ML) with traditional physiologically based pharmacokinetic (PBPK) models has been a focus of early research efforts aimed at minimizing the need for experimental data in model development [26]. The research includes using absorption (ka), elimination (CLint), distribution (Vss) parameters or physicochemical property parameters predicted based on ML as inputs to predict the in vivo exposure of oral drugs through a simplified PBPK model. Since the model only considers plasma, absorption tissues and elimination tissues, it may not meet the prediction needs that are more relevant to the target tissues for some drug effects or toxicities. Moreover, the literature has often adopted the intrinsic hepatic clearance rate for the prediction of metabolic parameters. While the reported test set demonstrates favorable outcomes, the variability in prediction accuracy when extrapolating from intrinsic hepatic clearance to whole-body hepatic serum clearance cannot be overlooked. To mitigate this discrepancy, our approach considers the use of in vivo hepatic serum clearance as an input parameter. Furthermore, the significance of unbound drug fraction ($fu_{p}$) on model predictions, particularly for clearance, has been substantiated by several studies [36, 37]. It is also recognized that fup influences drug distribution; however, for drugs with high protein binding, the precision of empirical measurements diminishes. Consequently, ML has also been deployed to predict this parameter. Previous literature has also focused on improving the prediction of tissue distribution coefficients or the impact of liver clearance on PK simulation. In a study conducted by Murad, a machine learning model was used to predict $Vd_{ss}$, showing that 58$\%$ of predictions were within a 2-fold error [38]. The Miljkovic team used machine learning to predict PK parameters directly from structure, and 48.5$\%$ of their predictions in the test set were within a 2-fold error [39]. Although the ML predictions for $Vd_{ss}$ have been promising, and some studies have directly used this parameter as an input for PBPK modeling, it must be noted that $Vd_{ss}$ represents the overall drug distribution volume, inclusive of plasma and tissues, and does not delineate the specific distribution within tissues. Therefore, our study continues to employ the Rodgers and Rowland (RR) method to calculate drug distribution across various tissues, taking into account the specific composition of each tissue. Following the optimization methods previously mentioned, our model demonstrated improved predictive capabilities for the 40 testing compounds. We achieved the accuracies within a two-fold prediction error of 55$\%$, 65$\%$, 65$\%$, and 57.5$\%$ for $T_{1/2}$, AUC, CL and $Vd_{ss}$ respectively.

This study further showed the power of machine learning in predicting relevant parameters that may involve complex physiological processes and hard to be accurately measured by in vitro experiments. A large dataset of drug properties was used to develop ML models that predict $f_{up}$, Caco-2 cell permeability, and total plasma clearance of drugs. These ML prediction values were then used as inputs for the PBPK models.

We have implemented the Deep Message Passing Neural Network (D-MPNN) for the prediction of input parameters of drugs, achieving superior performance compared to traditional machine learning approaches such as Support Vector Regression (SVR), Random Forest (RF), XGBoost (XGB), and Gradient Boost Machine (GBM). The superior performance of the D-MPNN in predicting ADMET properties of small molecular compounds in contrast to other traditional ML methods can be attributed as follows:

Structure-awareness: D-MPNN uses a graph-based representation of molecules, capturing the relationships between atoms and chemical bonds within the molecular structure. This may enable the model to understand the structural and chemical context of the molecule, which is crucial for predicting ADMET properties.
End-to-end learning: D-MPNN is an end-to-end model that learns to predict target properties directly from molecular structures without the need for manual feature engineering. This allows the model to automatically discover relevant features and patterns in the data.
Message passing mechanism: The D-MPNN architecture uses multiple rounds of message passing to update atom representations. This process enables the model to capture both local and global chemical environments, which are essential for accurately predicting ADMET properties.

These factors may enable the model to capture more relevant chemical information and make more accurate predictions.

Results on 40 drugs showed that the ML-PBPK model predicts human PK parameters with higher accuracy than the in vitro inputs model, and most of the compounds have prediction errors within 2 or 3-fold. Especially, compounds with extensive renal clearance have an average fold error (AFE) of 0.94 compared to the in vitro inputs model of 1.28. In addition, each PK prediction in general was completed within seconds on a machine with the ML-PK model, showing a higher efficiency without compromising accuracy compared to in vitro inputs model.

The PBPK model is a four-compartment permeability-limited model used to predict the distribution of drugs in various tissues. It assumes that the drug’s membrane permeability into different tissues is equal to the value measured in vitro using Caco-2 cells. The LogP values of the tested compounds range from -0.11 to 8.49, and the fup values range from 0.002 to 0.95.Based on the results of 40 molecules, the PBPK model tends to underestimate the tissue distribution volume for highly lipophilic and highly permeable drugs, such as Desipramine and Imipramine. However, the model fails to effectively predict tissue distribution coefficients for extremely lipophilic molecules with a LogP value exceeding 5, like Montelukast. In these cases, the calculation method for tissue distribution coefficients based on in vitro inputs leads to an overestimation of Vd for these compounds.The main difference between the in vitro and ML models lies in the input parameter for the elimination process. The in vitro model, which is based on microsomal experimental data and predicted using the IVIVE method, significantly underestimates the clearance for drugs like Cimetidine, Prazosin, and Metoprolol, which are eliminated primarily through renal clearance or are substrates of uptake transporters (such as OCTs). Moreover, the in vitro model also underestimates CL for drugs with high plasma protein binding, which have difficulty entering the elimination tissue according to the modeling assumption. On the other hand, the ML model significantly improves the prediction accuracy for this class of molecules.

This study showed how ML methods could improve PK prediction by predicting relevant physicochemical parameters. Future improvements on data quantity and quality that are used to train ML models worth more work. For example, using data from GI organoids may help us train better models to aid the PK prediction for drugs with oral administration. In this study, we predicted total clearance for PK prediction. With more data, we may have separate models for distinct clearance routes, e.g., separate models for hepatic and renal clearance, that provides more information in drug reaserch and development. Thus development of better models as our understanding of deep learning progresses is another valuable direction for future work.

Conclusions

We evaluated the accuracy of our developed ML-PBPK model platform on 40 compounds by comparing the accuracy of in vitro inputs and ML prediction inputs. The commonly used IVIVE method has limitations in predicting hepatic clearance, and there is a limited experimental exploration of clearance pathways outside the liver in the early stages of drug discovery. As drug clearance is crucial for PK prediction, we used an ML model to predict total human plasma clearance as inputs into the PBPK model for predicting drug concentrations in plasma and tissues. This method was able to guide the development and prioritization of lead compounds based on molecular structure for PK prediction before in vitro experiments. In the future, the accuracy of the ML-PBPK model can be further improved or optimized for specific molecular structures by expanding the training set. Methods such as graph-based multi-task learning, pre-trained models, and model ensembles will be employed to improve accuracy. Furthermore, studying the interpretability of the prediction results is also essential.

References

Poggesi I, Snoeys J, Van Peer A. The successes and failures of physiologically based pharmacokinetic modeling: there is room for improvement. Expert Opinion on Drug Metabolism & Toxicology. 2014;10(5):631–5. https://doi.org/10.1517/17425255.2014.888058. Accessed 2023-06-12.
Tylutki Z, Polak S, Wiśniowska B. Top-down, Bottom-up and Middle-out Strategies for Drug Cardiac Safety Assessment via Modeling and Simulations. Current Pharmacology Reports. 2016;2(4):171–7. https://doi.org/10.1007/s40495-016-0060-3. Accessed 2023-06-13.
...Ahmad A, Pepin X, Aarons L, Wang Y, Darwich AS, Wood JM, Tannergren C, Karlsson E, Patterson C, Thörn H, Ruston L, Mattinson A, Carlert S, Berg S, Murphy D, Engman H, Laru J, Barker R, Flanagan T, Abrahamsson B, Budhdeo S, Franek F, Moir A, Hanisch G, Pathak SM, Turner D, Jamei M, Brown J, Good D, Vaidhyanathan S, Jackson C, Nicolas O, Beilles S, Nguefack JF, Louit G, Henrion L, Ollier C, Boulu L, Xu C, Heimbach T, Ren X, Lin W, Nguyen-Trung AT, Zhang J, He H, Wu F, Bolger MB, Mullin JM, Van Osdol B, Szeto K, Korjamo T, Pappinen S, Tuunainen J, Zhu W, Xia B, Daublain P, Wong S, Varma MVS, Modi S, Schäfer KJ, Schmid K, Lloyd R, Patel A, Tistaert C, Bevernage J, Nguyen MA, Lindley D, Carr R, Rostami-Hodjegan A. IMI-Oral biopharmaceutics tools project-Evaluation of bottom-up PBPK prediction success part 4: Prediction accuracy and software comparisons with improved data and modelling strategies. Eur J Pharm Biopharm. 2020;156:50–63. https://doi.org/10.1016/j.ejpb.2020.08.006. Accessed 2023-06-12.
Naga D, Parrott N, Ecker GF, Olivares-Morales A. Evaluation of the Success of High-Throughput Physiologically Based Pharmacokinetic (HT-PBPK) Modeling Predictions to Inform Early Drug Discovery. Mol Pharm. 2022;19(7):2203–16. https://doi.org/10.1021/acs.molpharmaceut.2c00040. Accessed 2023-06-12.
Antontsev V, Jagarapu A, Bundey Y, Hou H, Khotimchenko M, Walsh J, Varshney J. A hybrid modeling approach for assessing mechanistic models of small molecule partitioning in vivo using a machine learning-integrated modeling platform. Sci Rep. 2021;11(1):11143. https://doi.org/10.1038/s41598-021-90637-1. Accessed 2023-06-12.
Bowman CM, Benet LZ. In Vitro-In Vivo Extrapolation and Hepatic Clearance-Dependent Underprediction. J Pharm Sci. 2019;108(7):2500–4. https://doi.org/10.1016/j.xphs.2019.02.009. Accessed 2023-06-12.
Watanabe R, Esaki T, Kawashima H, Natsume-Kitatani Y, Nagao C, Ohashi R, Mizuguchi K. Predicting Fraction Unbound in Human Plasma from Chemical Structure: Improved Accuracy in the Low Value Ranges. Mol Pharm. 2018;15(11):5302–11. https://doi.org/10.1021/acs.molpharmaceut.8b00785. Accessed 2023-06-13.
Votano JR, Parham M, Hall LM, Hall LH, Kier LB, Oloff S, Tropsha A. QSAR Modeling of Human Serum Protein Binding with Several Modeling Techniques Utilizing Structure-Information Representation. J Med Chem. 2006;49(24):7169–81. https://doi.org/10.1021/jm051245v. Accessed 2023-06-13.
Orwat MJ, Qiao JX, He K, Rendina AR, Luettgen JM, Rossi KA, Xin B, Knabb RM, Wexler RR, Lam PYS, Pinto DJP. Orally bioavailable factor Xa inhibitors containing alpha-substituted gem-dimethyl P4 moieties. Bioorganic & Medicinal Chemistry Letters. 2014;24(15):3341–5. https://doi.org/10.1016/j.bmcl.2014.05.101. Accessed 2023-07-17.
Kotoku M, Maeba T, Fujioka S, Yokota M, Seki N, Ito K, Suwa Y, Ikenogami T, Hirata K, Hase Y, Katsuda Y, Miyagawa N, Arita K, Asahina K, Noguchi M, Nomura A, Doi S, Adachi T, Crowe P, Tao H, Thacher S, Hashimoto H, Suzuki T, Shiozaki M. Discovery of Second Generation ROR$\gamma $ Inhibitors Composed of an Azole Scaffold. Journal of Medicinal Chemistry. 2019;62(5):2837–2842. https://doi.org/10.1021/acs.jmedchem.8b01567 . Accessed 2023-07-17.
Ernst JT, Thompson PA, Nilewski C, Sprengeler PA, Sperry S, Packard G, Michels T, Xiang A, Tran C, Wegerski CJ, Eam B, Young NP, Fish S, Chen J, Howard H, Staunton J, Molter J, Clarine J, Nevarez A, Chiang GG, Appleman JR, Webster KR, Reich SH. Design of Development Candidate eFT226, a First in Class Inhibitor of Eukaryotic Initiation Factor 4A RNA Helicase. J Med Chem. 2020;63(11):5879–955. https://doi.org/10.1021/acs.jmedchem.0c00182. Accessed 2023-07-17.
Lombardo F, Berellini G, Obach RS. Trend Analysis of a Database of Intravenous Pharmacokinetic Parameters in Humans for 1352 Drug Compounds. Drug Metab Dispos. 2018;46(11):1466–77. https://doi.org/10.1124/dmd.118.082966.
WebPlotDigitizer. 2023. https://apps.automeris.io/wpd/.
Sohlenius-Sternbeck AK, Afzelius L, Prusis P, Neelissen J, Hoogstraate J, Johansson J, Floby E, Bengtsson A, Gissberg O, Sternbeck J, Petersson C. Evaluation of the human prediction of clearance from hepatocyte and microsome intrinsic clearance for 52 drug compounds. Xenobiotica. 2010;40(9):637–49. https://doi.org/10.3109/00498254.2010.500407. Accessed 2023-06-14.
Mamada H, Iwamoto K, Nomura Y, Uesawa Y. Predicting blood-to-plasma concentration ratios of drugs from chemical structures and volumes of distribution in humans. Mol Diversity. 2021;25(3):1261–70. https://doi.org/10.1007/s11030-021-10186-7. Accessed 2023-09-26.
Murad N, Pasikanti KK, Madej BD, Minnich A, McComas JM, Crouch S, Polli JW, Weber AD. Predicting Volume of Distribution in Humans: Performance of In Silico Methods for a Large Set of Structurally Diverse Clinical Compounds. Drug Metab Dispos. 2021;49(2):169–78. https://doi.org/10.1124/dmd.120.000202. Accessed 2023-09-26.
Pham-The H, González-Álvarez I, Bermejo M, Garrigues T, Le-Thi-Thu H, Cabrera-Pérez MA. The use of rule-based and qspr approaches in adme profiling: A case study on caco-2 permeability. Mol Inf. 2013;32(5–6):459–79. https://doi.org/10.1002/minf.201200166. Accessed 2023-09-26.
O’Hagan S, Kell DB. The apparent permeabilities of Caco-2 cells to marketed drugs: magnitude, and independence from both biophysical properties and endogenite similarities. PeerJ. 2015;3:1405. https://doi.org/10.7717/peerj.1405. Accessed 2023-06-12.
Bittermann K, Goss KU. Predicting apparent passive permeability of Caco-2 and MDCK cell-monolayers: A mechanistic model. PLoS ONE. 2017;12(12):0190319. https://doi.org/10.1371/journal.pone.0190319. Accessed 2023-09-26.
Hallifax D, Foster JA, Houston JB. Prediction of Human Metabolic Clearance from In Vitro Systems: Retrospective Analysis and Prospective View. Pharm Res. 2010;27(10):2150–61. https://doi.org/10.1007/s11095-010-0218-3. Accessed 2023-09-26.
Williamson B, Harlfinger S, McGinnity DF. Evaluation of the Disconnect between Hepatocyte and Microsome Intrinsic Clearance and In Vitro In Vivo Extrapolation Performance. Drug Metab Dispos. 2020;48(11):1137–46. https://doi.org/10.1124/dmd.120.000131. Accessed 2023-09-26.
Bento AP, Hersey A, Felix E, Landrum G, Gaulton A, Atkinson F, Bellis LJ, Veij MD, Leach AR. An Open Source Chemical Structure Curation Pipeline Using RDKit. J Cheminform. 2020;12:51. https://doi.org/10.1186/s13321-020-00456-1.
Yap CW. PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints. J Comput Chem. 2011;32(7):1466–74. https://doi.org/10.1002/jcc.21707.
Kursa MB, Rudnicki RW. Feature Selection with the Boruta Package. Journal of Statistical Software. 2010;36(11):1–13. https://doi.org/10.18637/jss.v036.i11.
Wang Y, Liu H, Fan Y, Chen X, Yang Y, Zhu L, Zhao J, Chen Y, Zhang Y. In silico prediction of human intravenous pharmacokinetic parameters with improved accuracy. J Chem Inf Model. 2019;59(9):3968–80. https://doi.org/10.1021/acs.jcim.9b00300.
Chou WC, Lin Z. Machine learning and artificial intelligence in physiologically based pharmacokinetic modeling. Toxicol Sci. 2023;191(1):1–14. https://doi.org/10.1093/toxsci/kfac101.
Article CAS PubMed Google Scholar
Yang K, Swanson K, Jin W, Coley C, Eiden P, Gao H, Guzman-Perez A, Hopper T, Kelley B, Mathea M, Palmer A, Settels V, Jaakkola T, Jensen K, Barzilay R. Analyzing Learned Molecular Representations for Property Prediction. J Chem Inf Model. 2019;59(8):3370–88. https://doi.org/10.1021/acs.jcim.9b00237.
Kawai R, Lemaire M, Steimer JL, Bruelisauer A, Niederberger W, Rowland M. Physiologically based pharmacokinetic study on a cyclosporin derivative, SDZ IMM 125. J Pharmacokinet Biopharm. 1994;22(5):327–65. https://doi.org/10.1007/BF02353860. Accessed 2023-06-12.
Burt HJ, Neuhoff S, Almond L, Gaohua L, Harwood MD, Jamei M, Rostami-Hodjegan A, Tucker GT, Rowland-Yeo K. Metformin and cimetidine: Physiologically based pharmacokinetic modelling to investigate transporter mediated drug-drug interactions. Eur J Pharm Sci. 2016;88:70–82. https://doi.org/10.1016/j.ejps.2016.03.020. Accessed 2023-09-26.
Valentin J. Basic anatomical and physiological data for use in radiological protection: reference values. A report of age-and gender-related differences in the anatomical and physiological characteristics of reference individuals. ICRP Publication 89. Annals of the ICRP. 2002;32(3–4):5–265.
Deurenberg P, Weststrate JA, Seidell JC. Body mass index as a measure of body fatness: age- and sex-specific prediction formulas. Br J Nutr. 1991;65(2):105–14. https://doi.org/10.1079/BJN19910073. Accessed 2023-06-12.
Pilari S, Gaub T, Block M, Görlitz L. Development of Physiologically Based Organ Models to Evaluate the Pharmacokinetics of Drugs in the Testes and the Thyroid Gland: Development of Physiologically Based Organ Models. CPT: Pharmacometrics & Systems Pharmacology. 2017;6(8):532–542. https://doi.org/10.1002/psp4.12205 . Accessed 2023-06-12.
Rodgers T, Leahy D, Rowland M. Physiologically Based Pharmacokinetic Modeling 1: Predicting the Tissue Distribution of Moderate-to-Strong Bases. J Pharm Sci. 2005;94(6):1259–76. https://doi.org/10.1002/jps.20322. Accessed 2023-06-13.
Sohlenius-Sternbeck AK, Terelius Y. Evaluation of ADMET Predictor in Early Discovery Drug Metabolism and Pharmacokinetics Project Work. Drug Metab Dispos. 2022;50(2):95–104. https://doi.org/10.1124/dmd.121.000552. Accessed 2023-06-14.
Musther H, Gill KL, Chetty M, Rostami-Hodjegan A, Rowland M, Jamei M. Are Physiologically Based Pharmacokinetic Models Reporting the Right Cmax? Central Venous Versus Peripheral Sampling Site. AAPS J. 2015;17(5):1268–79. https://doi.org/10.1208/s12248-015-9796-7. Accessed 2023-06-12.
Kamiya Y, al. In silico prediction of input parameters for simplified physiologically based pharmacokinetic models for estimating plasma, liver, and kidney exposures in rats after oral doses of 246 disparate chemicals. Chem. Res. Toxicol. 2021;34:507–513. https://doi.org/10.1021/acs.chemrestox.0c00457.
Habiballah S, Reisfeld B. Adapting physiologically-based pharmacokinetic models for machine learning applications. Sci Rep. 2023;13:14934. https://doi.org/10.1038/s41598-023-14487-4.
Murad N, Pasikanti KK, Madej DB, Minnich A, McComas MJ, Crouch S, Polli WJ, Weber DA. Predicting Volume of Distribution in Humans: Performance of In Silico Methods for a Large Set of Structurally Diverse Clinical Compounds. 2021;49(2):169–278. https://doi.org/10.1124/dmd.120.000202.
Miljkovic F, Martinsson A, Obrezanova O, Williamson B, Johnson M, Sykes A, Bender A, Greene N. Machine Learning Models for Human In Vivo Pharmacokinetic Parameters with In-House Validation. Mol Pharm. 2021;18(12):4520–30. https://doi.org/10.1021/acs.molpharmaceut.1c00718.

Download references

Author information

Authors and Affiliations

XtalPi Innovation Center, XtalPi Inc., Beijing, 100080, China
Yuelin Li, Zonghu Wang, Yuru Li, Jiewen Du, Xiangrui Gao, Yuanpeng Li & Lipeng Lai

Authors

Yuelin Li
View author publications
You can also search for this author in PubMed Google Scholar
Zonghu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yuru Li
View author publications
You can also search for this author in PubMed Google Scholar
Jiewen Du
View author publications
You can also search for this author in PubMed Google Scholar
Xiangrui Gao
View author publications
You can also search for this author in PubMed Google Scholar
Yuanpeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Lipeng Lai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lipeng Lai.

Ethics declarations

Conflict of Interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 894 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Li, Y., Wang, Z., Li, Y. et al. A Combination of Machine Learning and PBPK Modeling Approach for Pharmacokinetics Prediction of Small Molecules in Humans. Pharm Res (2024). https://doi.org/10.1007/s11095-024-03725-y

Download citation

Received: 21 October 2023
Accepted: 02 June 2024
Published: 25 June 2024
DOI: https://doi.org/10.1007/s11095-024-03725-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Combination of Machine Learning and PBPK Modeling Approach for Pharmacokinetics Prediction of Small Molecules in Humans