Introduction

Pharmacokinetics (PK) is a critical aspect of drug development, as it describes the absorption, distribution, metabolism, and excretion(ADME) of compounds in the body. During preclinical stages, lead compounds undergo evaluation for their PK properties through in vitro and in vivo animal experiments. The results of these evaluations can be used to rank compounds or optimize their structures based on the correlation between their physicochemical and PK properties. Moreover, these in vitro and animal PK results can be leveraged to predict human PK phenomena and guide clinical trial design through allometric scaling, compartment models, or PBPK models.

Table 1 Drug-specific Input Parameters from in vitro Experiments and Prediction

In contrast to traditional PK models with allometric scaling, PBPK models have the ability to predict drug concentrations in plasma and various tissues without the need for animal experiments. As a result, the application of PBPK models has significantly increased in drug discovery and development over the past few years [1]. Three approaches are commonly used in PBPK model prediction, including “top-down,” “middle-out,” and “bottom-up.” The top-down approach relies predominantly on observed clinical data, while the middle-out approach combines both in vitro and vivo information to determine unknown or uncertain parameters of the model [2]. The bottom-up approach, in particular, offers the potential to minimize or replace animal PK studies, as it relies solely on in vitro data for drug-related input parameters. However, IMI-Oral Biopharmaceutics Tools projects show limitations of a “bottom-up” approach in human PK predictions, with only half of the area Under the Concentration-Time Curve (AUC) predictions being within a 2-fold prediction error [3]. This accuracy may be affected by errors in in-vitro experiments or the accuracy of clearance prediction using in vitro hepatic systems. Such limitations may be overcome by using ML models to predict physicochemical properties directly from structures.

Drug-specific input parameters, such as \(f_{up}\), intrinsic hepatic clearance, and volume of distribution (\(Vd_{ss}\)), have been well predicted using in silico models. Previously, Doha combined in vitro and ML inputs with a minimal PBPK model and evaluated 240 compounds in rats. ML inputs with fup, LogD, and CL showed only 36.1\(\%\) of systemic plasma AUC within a 2-fold prediction error [4]. Vector Group then developed a high-accuracy machine learning-integrated modeling platform, a whole-body PBPK model with an optimized Vdss prediction method [5].

Several studies have been conducted to optimize the calculation methods of Vdss to improve the accuracy of PBPK predictions. However, it is important to note that clearance also plays a significant role in PK prediction. Clearance, which determines the rate of drug elimination from the body, occurs in the liver, kidney, and bile. Early drug discovery focuses primarily on liver metabolism using in vitro experiments in hepatocytes or microsomes. As a result, several PBPK models only consider hepatic clearance from the in vitro to in vivo exploration (IVIVE) approach. This approach may result in the misprediction of clearance due to the exclusion of certain renal or bile elimination processes. Bowman has reported underprediction of clearance from the IVIVE method, with a 42.2\(\%\) error rate within a 2-fold margin of error in the microsome system [6]. This highlights the need for total clearance to improve the accuracy of PBPK modeling in the early discovery stage.

In our study, we have developed a rapid ML-PBPK model platform that enables the simulation of human PK from compound structures. \(F_{up}\), Caco-2 cell permeability, and total plasma clearance for humans were predicted using ML models. These predicted results were then used as input parameters for a whole-body PBPK model encompassing 14 tissues. The prediction accuracy of the platform was evaluated for 40 drugs PK profiles in humans to define its applicability for use in early discovery and clinical phases.

Materials and Methods

Data Collection

The human \(f_{up}\) model relies on two data sources: the Watanabe study, which provides data for 2139 compounds, and the Votano study [7, 8], which provides data for 808 compounds. Overlaps between the two datasets were checked, and compounds were removed if two records had values greater than a 2-fold difference. Compounds with values that differed by less than 2-fold were kept from Watanabe’s study due to having more significant figures. Caco-2 cell permeability data for 6083 compounds were collected from public sources [9,10,11]. The human CLt model used intravenous PK parameters from Lombardo’s study [12]. Compounds were removed if they had duplicates or invalid SMILES, a molecular weight greater than 900 Da, or where CL was none. Additionally, 40 molecules that overlapped with the experimental data were removed. Finally, we created three datasets: \(f_{up}\) containing 2292 compounds, Caco-2 containing 6083 compounds, and CLt containing 1215 compounds.

The human plasma PK data for the 40 tests in Table I were extracted from previously published papers in supplementary. All PK data were digitized using the free online tool WebPlotDigitizer [13]. Figure 1 presents the statistics regarding number of PK studies and data points for which PK data were collected.

Fig. 1
figure 1

The statistics regarding number of PK studies and data points.

ML Model Building

Compounds SMILES were standardized using the ChEMBL standardizer [22]. Three different methods (RDKit, Mordred, and PaDEL-Descriptors) were used to calculate descriptors [23]. These methods generated molecular physicochemical properties for each molecule, resulting in 1826 (Mordred), 1444 (PaDEL), and 208 (RDKit) features used for model construction.

The datasets were split into training, validation, and test sets using random selection at an 8:1:1 ratio. Features with variance values less than 0.05 or those with the same information and a correlation coefficient higher than 0.9 were removed. The Boruta Algorithm [24] was also used to select significant features in a given data set.

Four common approaches to molecular property prediction were used to build \(f_{up}\), Caco-2, and CLt prediction models. These approaches [25, 26] included Support Vector Machine Regression (SVR), Random Forest (RF), XGBoost (XGB), and Gradient Boost Machine (GBM). In contrast to traditional chemical descriptors, message-passing neural networks (MPNNs) have exhibited advancements in molecular modeling and property prediction. MPNNs are a group of graph convolutional neural networks (GCNs) variants that can learn and aggregate local information of molecules through iterative message-passing iterations. Recently, Yang et al. [27] have proposed a directed MPNN (D-MPNN) and built the open-source package Chemprop for implementation of D-MPNN. D-MPNN constructs a learned molecular representation by operating on the graph structure of the molecule and passing a message through the edge-dependent neural network. In this study, D-MPNN builds the model based on different datasets and uses RDKit descriptors incorporated into D-MPNN to further improve performance.

The hyperparameters of the regression models were optimized with Bayesian optimization search. Five-fold cross-validation was used to check the stability and predictive ability of the model. Additionally, the performance of the regression models was assessed by the coefficient of determination (\(R^{2}\)) and root-mean-square error (RMSE).

PBPK Model Building

Figure 2a shows the compartment model for each tissue, which includes plasma, blood cells, interstitial space, and intracellular space, as previously discussed by Kawai [28].

Fig. 2
figure 2

Structure of the whole-body PBPK model (a). Each tissue is divided into blood cells, plasma, interstitial and intracellular spaces (b), and each blood vessel only has vascular compartments (c). Details of the model is presented in the method section.

Molecules move between adjacent compartments through passive diffusion and connect to the circulatory system through blood flow. The passive diffusion rate of drugs in each tissue was calculated by multiplying the cell permeability (P) by a tissue compartment surface area (SA) [29]. The cell permeability rate of tissues was assumed to be the same and was obtained through the Caco-2 cell system in this PBPK model. Figure 2c illustrates the structure of blood vessels, which comprises only plasma and blood cells.

In the ML-PBPK model, three key parameters, fup, caco-2 cell permeability, and CLt, were taken from predictions by ML models, and systemic drug elimination was assumed to occur in venous blood plasma through CLt. On the other hand, in the in vitro input model, a PBPK model using in vitro inputs, all in vitro parameters were taken from experiments. Specifically, elimination processes include hepatic clearance from microsomes or Hepatocytes stability experiments using the IVIVE method and renal clearance as glomerular filtration rate. The differential equations for venous blood vessels are described below as Eqs. 1 and 2. Where parameters are expressed as Q (blood flow), PSA (permeability-surface area), K (partition coefficient), CL (plasma clearance); bc and pls represent the blood cell and plasma compartment, and C is drug concentration in specific compartment. The equations for arterial blood and the portal vein are the same as for venous blood, except for the elimination process.

$$\begin{aligned} \frac{\textrm{d}}{\textrm{d}t} \, A_{venous_{bc}}= & {} \sum _{i=tissue}(Q_{i_{bc}}*C_{i_{bc}})-Q_{lung_{bc}}*C_{lung_{bc}}\nonumber \\{} & {} +PSA_{pls\leftrightarrow bc}*(C_{venous_{pls}}-\frac{C_{venous_{bc}}}{kbc})\nonumber \\ \end{aligned}$$
(1)
$$\begin{aligned} \frac{\textrm{d}}{\textrm{d}t} \, A_{venous_{pls}}= & {} \sum _{i=tissue}Q_{i_{pls}}*C_{i_{pls}}-Q_{lung_{pls}}*C_{lung_{pls}}\nonumber \\{} & {} -PSA_{pls\leftrightarrow bc}*(C_{venous_{pls}} -\frac{C_{venous_{bc}}}{kbc})\nonumber \\{} & {} -CL_{t}*C_{venous_{pls}} \end{aligned}$$
(2)

All physiological parameters were adapted from literature [30,31,32], including tissue volumes, blood flow rates, surface areas, tissue compositions, and tissue pH. Tissue partition coefficients (Kp) and blood: plasma ratio (BP) were calculated based on the Rowland-Roger method [33]. Drug physicochemical properties such as LogP, molecular weight, and pKa values were predicted from structure using ChemAxon. In vitro parameters such as \(f_{up}\), BP, Caco-2 cell permeability and intrinsic hepatic clearance were obtained from previous publications [14, 18, 34]. Physicochemical inputs for the ML-PBPK model simulation were predicted by ML models.

When administered intravenously in a short time, such as a bolus, the maximum concentration (\(C_{max}\)) in venous plasma is often reported to over-predict compared to the clinical PK profiles. Prediction errors may be due to the different sampling sites, as clinical samples are usually taken from a peripheral vein in the arm [35]. To avoid this, the plasma concentration profile in peripheral blood was chosen to evaluate prediction accuracy with observed PK.

Table 2 Statistics Results of ML Model of Human \(f_{up}\), Caco-2 cell Permeability and CLt
Fig. 3
figure 3

Plots of the observed and predicted \(f_{up}\), Caco-2 cell permeability and CLt of the training set and the test set of the D-MPNN models. The dashed line indicates the line of unity (x=y).

Prediction Performance Assessment

PBPK models were used to simulate concentration-time profiles of tested drugs. Inputs for the models included machine learning and in vitro experimental data. Python was used for model development, and the matplotlib package was used to generate figures.

To evaluate the predicted PK data’s accuracy, non-compartmental analyses were conducted. This involved calculating important parameters such as half-life (\(T_{1/2}\)), area under the curve (\(AUC_{0-\infty }\)), clearance (CL), and volume of distribution at steady-state (\(Vd_{ss}\)) using specific equations. \(AUC_{0-\infty }\) and area under the moment curve (\(AUMC_{0-\infty }\)) were calculated using the linear-trapezoidal method. The elimination rate constant (\(k_{el}\)) was calculated using the linear regression method. Mean residence time (MRT) was calculated by AUMC/AUC.

$$\begin{aligned} T_{1/2}= & {} \frac{\ln 2}{k_{el}}\end{aligned}$$
(3)
$$\begin{aligned} AUC_{0-\infty }= & {} \sum \limits _{i=1}^{n-1}\frac{(C_i+C_{i+1})}{2*(t_{i+1}-t_i)} +(\frac{C_{last}}{k_{el}})\end{aligned}$$
(4)
$$\begin{aligned} CL= & {} \frac{Dose}{AUC_{0-inf}}\end{aligned}$$
(5)
$$\begin{aligned} Vd_{ss}= & {} \frac{MRT}{CL} \end{aligned}$$
(6)

The accuracy of the predicted PK data was measured by calculating the average fold error (AFE) for each PK parameter Eq. 7. The total number of testing molecules was represented by n. This metric was used to evaluate the overall prediction accuracy of the model. Additionally, the prediction accuracy of the model was assessed by determining the percentage of prediction error within a 2-fold range for each PK parameter.

$$\begin{aligned} AFE = 10{^{\frac{1}{n}\,\sum _{i=1}^{n}\log (\frac{predicted_{i}}{observed_{i}})}} \end{aligned}$$
(7)

Results

ML models

Five ML methods were used to construct models for predicting human \(f_{up}\), Caco-2 cell permeability, and CLt. These methods included SVR, RF, XGB, GBM, and D-MPNN. A series of models were built for each parameter using different training sets. Table II presents the statistical evaluation results of the ML models in training and testing.

The D-MPNN models outperformed the other models for all three parameters (Fig. 3). For predicting human \(f_{up}\), the D-MPNN model achieved an \(R^{2}\) of 0.92 for an independent training set of 2292 compounds and predicted 77.5\(\%\) (31/40) of the test set within a 2-fold prediction error. The D-MPNN model exhibited the highest \(R^{2}\) value of 0.95 for the human Caco-2 training set, compared to the GBM model with an \(R^{2}\) of 0.55. Additionally, the D-MPNN model demonstrated the best predictive ability for human CLt compared to the other models, with 67.5\(\%\) of the 40 testing compounds predicted within a 2-fold prediction error.

Overall, the best models were used to predict the human \(f_{up}\), Caco-2 cell permeability, and CLt of the forty compounds. These predicted results were then used as inputs in PBPK models.

Fig. 4
figure 4

Scatter plots are shown comparision of the predictions and observations for PK parameters after IV dosing in humans using ML inputs (left) and in vitro inputs (right). Two red dashed lines represent±two-fold errors. \(R^{2}\) were the Pearson correlation coefficient values.

PBPK Models

For PBPK model with in-vitro inputs, the physicochemical properties are obtained from experimental values reported in literature, such as Caco-2, and the metabolism is based on LMS. On the other hand, for ML inputs, the physicochemical properties including metabolism clearance are predicted entirely using ML models.

Figure 4 compares predicted and observed PK parameters for 40 compounds in humans. A table with details on the prediction accuracy of each drug is in the supplementary. All parameters exhibit a good correlation between observed and simulated values, except for CL. Pearson correlation coefficient values (\(R^{2}\)) range from 0.6-0.9. Prediction accuracy of \(AUC_{0-\infty }\) is 65\(\%\) (26/40), with slightly better performance in the ML-PBPK model compared to the in vitro inputs model, which had 47.5\(\%\) (19/40) within 2-fold error (Table III).

The ML-PBPK model showed relatively good results for CL prediction, with AAFEs of 2.00 and 2.59 and \(R^{2}\) values of 0.4 and 0.21 respectively, compared to the in vitro inputs model. The predicted/observed ratios of PK parameters show a narrow range with ML inputs (Supplementary Figure 1), indicating a good agreement between predicted and observed values. Both models showed over- or under-predicted values of CL, with median predicted/observed ratios of 1.37 and 0.68. However, drugs extensively excreted in their unchanged form in urine and with elimination rates higher than normal GFR showed better prediction results with ML inputs, such as Vinorelbine.

Table 3 Prediction Accuracy of PK Parameters

The results of \(Vd_{ss}\) in the ML-PBPK model were similar to those of the in vitro inputs model, as the same tissue partition coefficient calculation method was chosen. \(Vd_{ss}\) describes the overall drug distribution in plasma and tissues. In the PBPK model mechanism, drug-related parameters such as \(f_{up}\), cell permeability, and Kp values affect drug distribution into tissues. Since the Kp values for tissues were the same in both the ML-PBPK and in vitro inputs models, comparable \(Vd_{ss}\) values indicate that ML prediction of \(f_{up}\) and Caco-2 cell permeability was able to replace experimental values without compromising accuracy.

The ML-PBPK model was also more efficient than the in vitro inputs model, with a runtime of only 10 seconds per simulation compared to the few days it may take to collect experimental data and perform model simulations. Overall, the ML-PBPK model was found to be a fast platform that accurately predicts human PK profiles in plasma and tissues.

Discussion

The development of PBPK models has been a significant advancement in pharmacology. Initially, in vitro data were used to create these models to predict animal and human PK. However, the accuracy of PBPK models and the integrity of input parameters were limited because these measurements did not fully capture the complexity of the human body. As a result, there has been growing interest in developing PBPK models that incorporate inputs without experiments.

The integration of machine learning (ML) with traditional physiologically based pharmacokinetic (PBPK) models has been a focus of early research efforts aimed at minimizing the need for experimental data in model development [26]. The research includes using absorption (ka), elimination (CLint), distribution (Vss) parameters or physicochemical property parameters predicted based on ML as inputs to predict the in vivo exposure of oral drugs through a simplified PBPK model. Since the model only considers plasma, absorption tissues and elimination tissues, it may not meet the prediction needs that are more relevant to the target tissues for some drug effects or toxicities. Moreover, the literature has often adopted the intrinsic hepatic clearance rate for the prediction of metabolic parameters. While the reported test set demonstrates favorable outcomes, the variability in prediction accuracy when extrapolating from intrinsic hepatic clearance to whole-body hepatic serum clearance cannot be overlooked. To mitigate this discrepancy, our approach considers the use of in vivo hepatic serum clearance as an input parameter. Furthermore, the significance of unbound drug fraction (\(fu_{p}\)) on model predictions, particularly for clearance, has been substantiated by several studies [36, 37]. It is also recognized that fup influences drug distribution; however, for drugs with high protein binding, the precision of empirical measurements diminishes. Consequently, ML has also been deployed to predict this parameter. Previous literature has also focused on improving the prediction of tissue distribution coefficients or the impact of liver clearance on PK simulation. In a study conducted by Murad, a machine learning model was used to predict \(Vd_{ss}\), showing that 58\(\%\) of predictions were within a 2-fold error [38]. The Miljkovic team used machine learning to predict PK parameters directly from structure, and 48.5\(\%\) of their predictions in the test set were within a 2-fold error [39]. Although the ML predictions for \(Vd_{ss}\) have been promising, and some studies have directly used this parameter as an input for PBPK modeling, it must be noted that \(Vd_{ss}\) represents the overall drug distribution volume, inclusive of plasma and tissues, and does not delineate the specific distribution within tissues. Therefore, our study continues to employ the Rodgers and Rowland (RR) method to calculate drug distribution across various tissues, taking into account the specific composition of each tissue. Following the optimization methods previously mentioned, our model demonstrated improved predictive capabilities for the 40 testing compounds. We achieved the accuracies within a two-fold prediction error of 55\(\%\), 65\(\%\), 65\(\%\), and 57.5\(\%\) for \(T_{1/2}\), AUC, CL and \(Vd_{ss}\) respectively.

This study further showed the power of machine learning in predicting relevant parameters that may involve complex physiological processes and hard to be accurately measured by in vitro experiments. A large dataset of drug properties was used to develop ML models that predict \(f_{up}\), Caco-2 cell permeability, and total plasma clearance of drugs. These ML prediction values were then used as inputs for the PBPK models.

We have implemented the Deep Message Passing Neural Network (D-MPNN) for the prediction of input parameters of drugs, achieving superior performance compared to traditional machine learning approaches such as Support Vector Regression (SVR), Random Forest (RF), XGBoost (XGB), and Gradient Boost Machine (GBM). The superior performance of the D-MPNN in predicting ADMET properties of small molecular compounds in contrast to other traditional ML methods can be attributed as follows:

  • Structure-awareness: D-MPNN uses a graph-based representation of molecules, capturing the relationships between atoms and chemical bonds within the molecular structure. This may enable the model to understand the structural and chemical context of the molecule, which is crucial for predicting ADMET properties.

  • End-to-end learning: D-MPNN is an end-to-end model that learns to predict target properties directly from molecular structures without the need for manual feature engineering. This allows the model to automatically discover relevant features and patterns in the data.

  • Message passing mechanism: The D-MPNN architecture uses multiple rounds of message passing to update atom representations. This process enables the model to capture both local and global chemical environments, which are essential for accurately predicting ADMET properties.

These factors may enable the model to capture more relevant chemical information and make more accurate predictions.

Results on 40 drugs showed that the ML-PBPK model predicts human PK parameters with higher accuracy than the in vitro inputs model, and most of the compounds have prediction errors within 2 or 3-fold. Especially, compounds with extensive renal clearance have an average fold error (AFE) of 0.94 compared to the in vitro inputs model of 1.28. In addition, each PK prediction in general was completed within seconds on a machine with the ML-PK model, showing a higher efficiency without compromising accuracy compared to in vitro inputs model.

The PBPK model is a four-compartment permeability-limited model used to predict the distribution of drugs in various tissues. It assumes that the drug’s membrane permeability into different tissues is equal to the value measured in vitro using Caco-2 cells. The LogP values of the tested compounds range from -0.11 to 8.49, and the fup values range from 0.002 to 0.95.Based on the results of 40 molecules, the PBPK model tends to underestimate the tissue distribution volume for highly lipophilic and highly permeable drugs, such as Desipramine and Imipramine. However, the model fails to effectively predict tissue distribution coefficients for extremely lipophilic molecules with a LogP value exceeding 5, like Montelukast. In these cases, the calculation method for tissue distribution coefficients based on in vitro inputs leads to an overestimation of Vd for these compounds.The main difference between the in vitro and ML models lies in the input parameter for the elimination process. The in vitro model, which is based on microsomal experimental data and predicted using the IVIVE method, significantly underestimates the clearance for drugs like Cimetidine, Prazosin, and Metoprolol, which are eliminated primarily through renal clearance or are substrates of uptake transporters (such as OCTs). Moreover, the in vitro model also underestimates CL for drugs with high plasma protein binding, which have difficulty entering the elimination tissue according to the modeling assumption. On the other hand, the ML model significantly improves the prediction accuracy for this class of molecules.

This study showed how ML methods could improve PK prediction by predicting relevant physicochemical parameters. Future improvements on data quantity and quality that are used to train ML models worth more work. For example, using data from GI organoids may help us train better models to aid the PK prediction for drugs with oral administration. In this study, we predicted total clearance for PK prediction. With more data, we may have separate models for distinct clearance routes, e.g., separate models for hepatic and renal clearance, that provides more information in drug reaserch and development. Thus development of better models as our understanding of deep learning progresses is another valuable direction for future work.

Conclusions

We evaluated the accuracy of our developed ML-PBPK model platform on 40 compounds by comparing the accuracy of in vitro inputs and ML prediction inputs. The commonly used IVIVE method has limitations in predicting hepatic clearance, and there is a limited experimental exploration of clearance pathways outside the liver in the early stages of drug discovery. As drug clearance is crucial for PK prediction, we used an ML model to predict total human plasma clearance as inputs into the PBPK model for predicting drug concentrations in plasma and tissues. This method was able to guide the development and prioritization of lead compounds based on molecular structure for PK prediction before in vitro experiments. In the future, the accuracy of the ML-PBPK model can be further improved or optimized for specific molecular structures by expanding the training set. Methods such as graph-based multi-task learning, pre-trained models, and model ensembles will be employed to improve accuracy. Furthermore, studying the interpretability of the prediction results is also essential.