Computational approaches for modeling human intestinal absorption and permeability

Subramanian, Govindan; Kitchen, Douglas B.

doi:10.1007/s00894-005-0065-z

Computational approaches for modeling human intestinal absorption and permeability

Original Paper
Open access
Published: 01 April 2006

Volume 12, pages 577–589, (2006)
Cite this article

Download PDF

You have full access to this open access article

Journal of Molecular Modeling Aims and scope Submit manuscript

Computational approaches for modeling human intestinal absorption and permeability

Download PDF

Govindan Subramanian¹^nAff2 &
Douglas B. Kitchen¹

3083 Accesses
3 Altmetric
Explore all metrics

Abstract

Human intestinal absorption (HIA) is an important roadblock in the formulation of new drug substances. Computational models are needed for the rapid estimation of this property. The measurements are determined via in vivo experiments or in vitro permeability studies. We present several computational models that are able to predict the absorption of drugs by the human intestine and the permeability through human Caco-2 cells. The training and prediction sets were derived from literature sources and carefully examined to eliminate compounds that are actively transported. We compare our results to models derived by other methods and find that the statistical quality is similar. We believe that models derived from both sources of experimental data would provide greater consistency in predictions. The performance of several QSPR models that we investigated to predict outside the training set for either experimental property clearly indicates that caution should be exercised while applying any of the models for quantitative predictions. However, we are able to show that the qualitative predictions can be obtained with close to a 70% success rate.

The gut microbiota and diabetes: research, translation, and clinical applications – 2023 Diabetes, Diabetes Care, and Diabetologia Expert Forum

Article Open access 24 June 2024

Deep learning in drug discovery: an integrative review and future challenges

Article Open access 17 November 2022

Regulating AI-Based Medical Devices in Saudi Arabia: New Legal Paradigms in an Evolving Global Legal Order

Article Open access 21 June 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Understanding the physicochemical and pharmacokinetic properties of known drugs and potential drug candidates is a major bottleneck for the low success rate of compounds in clinical development [1]. Traditional structure–activity relationship (SAR) studies optimize the potency and efficacy of a congeneric compound series on a protein target. A rapid understanding of the absorption, distribution, metabolism, and excretion (ADME) characteristics of compounds still impedes significant progress in this area [2–4]. For instance, oral dosing is usually the most desirable way to administer drugs and therefore the therapeutic efficacy of a compound often involves the efficient transport or absorption of a drug to the blood stream. Therefore, the plasma solubility, membrane permeability, protein binding, transport properties, and the diffusion kinetics are some of the components influencing the overall bioavailability of the drug. Metabolism or elimination of a drug may also decrease the efficacy of a compound or increase toxicity or unwanted side effects.

The primary barrier towards good bioavailability is human intestinal absorption (HIA). Therefore, there is an increasing need to understand and measure the effect of physicochemical properties of the drug on the intestinal absorption process [5, 6]. It requires dissolution, passage through the gut and finally diffusion or transport into the blood stream. For example, the P-glycoprotein (P-gp) activity in the apical cell membrane may limit the bioavailability, while absorption through transcellular (membrane diffusion, carrier-mediated) or paracellular routes alter the pharmacokinetic profile of the compounds [7, 8]. The cost and time factors involved in in vivo and in situ experiments [9–11], make it difficult to collect sufficient data to analyze structural contributions to the rate of intestinal absorption. Recent advances in high-throughput screening and combinatorial chemistry synthesis have elevated the need to gain a priori information regarding the intestinal absorption on a variety of new chemical entities. Determining the amount of HIA requires expensive experiments and hence simpler in vitro models have been developed.

In order to obtain pharmacokinetic information earlier in drug discovery projects, in vitro Caco-2 (immortalized human colon adenocarcinoma cell line) monolayers are now used because they exhibit remarkable morphological and functional similarity to the small intestinal columnar epithelium [12–15]. The permeability measured from these experiments agrees, in general, with HIA and hence is useful for experimental modeling of in vivo absorption. The measured apparent permeabilities (P _app) depend on the cell culture, growth factors, and other experimental conditions, resulting in slightly different values reported by various groups (see supporting information) [16–23]. There is a need to develop quantitative and predictive mathematical models that relate various physicochemical properties to intestinal permeability and absorption so that poorly absorbed compounds are eliminated in the initial stages of drug discovery. The primary goal of this work is to develop computational models to describe the initial absorption of compounds from the gut into the blood stream.

Cellular permeability can be understood as a series of partitioning and associated diffusion of a molecule from one region to another in a lipid bilayer that surrounds the cell. Therefore, early physical and computational studies intended to explain intestinal absorption of drugs focused mainly on a single physicochemical property like octanol–water partition coefficient or the distribution coefficient [24–29]. Reasonable correlations were obtained for a homologous series of compounds, although structural diversity impeded the model’s predictivity. Other properties such as the molecular weight, size, and shape, polar van der Waals surface area (PSA), and H-bonding capability are believed to be important for modeling intestinal permeability [30]. Hence, Lipinski proposed a scheme to classify the intestinal absorption using simpler yet powerful 1D-descriptors such as the count of polar atoms, logP, and the molecular weight [31]. Similarly, Clark used the computed PSA to categorize (good, medium, poor) the extent of intestinal absorption for structurally diverse compounds [32].

A number of quantitative structure–property relationship (QSPR) models have also been proposed in the literature for predicting logP _app and %HIA [33–35]. Most of the reported QSPR models utilize PSA as an indicator of intestinal permeability and absorption. For a diverse set of 17 compounds, van de Waterbeemd and Camenisch showed that the log of the apparent permeability (logP _app) measured from Caco-2 monolayers correlates well with the molecular weight and the PSA obtained for a single conformation [36]. By considering multiple conformations for the β-adrenoreceptor antagonists series, a sigmoidal relationship between the dynamic PSA and the fractional absorption (%HIA) was derived [37]. Similarly, the Boltzmann-averaged dynamic solvent-accessible surface area obtained from molecular dynamics simulations was related to the intestinal absorption [22].

Artursson and coworkers [38] extended the PSA model [17] by including the dynamic non-polar surface area (NPSA) component and showed a good correlation with the intestinal permeability of 19 oligopeptides. However, the PSA, NPSA and H-bond atom counts were not determined to be the critical elements responsible for the observed cellular permeability of the 21 peptide and peptidomimetic compounds considered by a different group [39]. This suggests that QSPR models based on logP, PSA or NPSA, while helpful for deriving meaningful correlations within a narrow structural class, may not extend universally. Including other physicochemical variables in the QSPR model would remove the over-dependence of P _app on PSA. Alternatively, descriptors replacing PSA can be mapped to HIA so that the intestinal absorption of both polar and apolar molecules can be modeled simultaneously.

QSPR efforts along these directions reveal similar or better performances compared to models obtained using PSA as the response variable. In fact, the statistical results obtained by employing multivariate partial least square fitting (PLS) protocols with several molecular descriptors excluding PSA [40] and also by using molecular hash keys [41] was comparable to that derived using PSA [37] for the same set of compounds (Table 1). Very recently, a quantitative model for predicting the %HIA was proposed by combining a genetic algorithm and a neural network scoring function [42]. The significance of the model can be appreciated from the small (9.4%) root-mean-square error (rmse) obtained for a training set of 67 compounds. However, the major drawback rests in the compound selection since a single model was used to describe both passive and carrier-mediated absorption mechanisms. In addition, with 52/67 compounds exhibiting >75% HIA (strongly absorbed), the training set data primarily contains compounds with high absorption and therefore may result in a biased model.

Table 1 Statistical results for logP_app (Eqs. 3, 8, 9, 10 and 11) and logHIA^a (Eqs. 4, 5, 6, 7 and 12) for training and prediction (sub)set data and comparisons with QSPR models reported in literature

Full size table

One of the primary goals of this study is to develop QSPR models for estimating both logP _app and %HIA following the procedure outlined in Fig. 1. As a first step, experimental P _app measurements and %HIA data reported in literature were compiled. Subsequently, actively transported compounds were identified and separated from the dataset. The remaining molecules were then divided into training and prediction sets. The training set was composed such that the experimental value was distributed uniformly with respect to the measured P _app and %HIA. Using these evenly distributed datasets, QSPR models for computing logP _app and %HIA are derived independently. By deriving both simple and complex QSPR equations, the influence of certain independent variables was assessed. The models were then extended to predict the logP _app and %HIA for an external validation set of passively absorbed compounds and compared to reported QSPR models [32, 36, 37, 40, 42]. Deriving similar mathematical relations for logP _app and %HIA and comparing the results obtained with the present training set data demonstrate the limitations with the oft-used training set compounds in the literature. As a next step, the logP _app and %HIA are predicted for molecules absorbed through active transport processes to see if these external influences are manifested in the molecular descriptors identified. This will also help determine if simple relationships exist between the active and passive absorption mechanisms. The quantitative logP _app and %HIA results are then extended to categorize the degree of intestinal permeability and absorption and compared to literature predictions [31, 32].

Dataset

Experimental P _app values determined using Caco-2 cell lines were pooled from several literature sources resulting in 117 compounds [16–23]. The data revealed that the P _app measurements reported by different laboratories for the same compounds varied significantly in magnitude. For example, the experimental logP _app for alprenolol ranges from −3.9 to −5.8, while that of atenolol varies between −5.7 and −7.5. For compounds where inter-laboratory experimental determinations were available, the values most consistent among the reported ones were used (see supporting information). The drawback of models derived based on the Caco-2 P _app values determined from a single laboratory is the potential bias in the results that limits extension to other compound sets reported by different groups. Alternatively, by using experimental P _app values that are consistent across different groups, realistic estimates of the experimental uncertainty in the various measurements can be ascertained for a larger set of compounds. This also provides a confidence limit for the experimental values reported for compounds from that group only. By following this procedure, some reported P _app values [43] were observed to deviate considerably from other literature reports for the same compounds and so were not included while generating the final training set. However, some of these compounds are used in the second logP _app prediction set but are not discussed in the text. The chemical structures used in this work are available from the authors upon request.

We believe that it is important to use compounds that permeate through passive absorption process in the dataset, since actively transported molecules are influenced by external environments that may not be modeled accurately. Prior to generating the final dataset, molecules reported to be substrates for various transport mechanisms or carrier-mediated processes (P-gp, peptide, nucleoside, etc.) were separated from those that are likely to undergo passive intestinal permeation. This division of compounds (Fig. 1) results in a dataset of only 59 compounds that most likely permeate through diffusion-controlled processes. A similar protocol was employed to gather the %HIA data from literature [19–21, 37, 42–45]. Although most of the fractional intestinal absorption data had been compiled [42], additional %HIA values reported in literature [43, 44] were also included, resulting in 121 compounds of which 76 molecules are probably passively absorbed. The training and prediction sets were chosen so that the logP _app and %HIA values span the whole range. Also, in the case of %HIA, compounds with values of 100% or 0% were used only in the prediction set.

Materials and methods

All the compounds were energy minimized using the universal force field [46] implemented in the Cerius² molecular modeling package (Accelrys, San Diego, CA) and imported into a Cerius² study table. Structural, constitutional, topological, and other calculated descriptors available through the Cerius² QSAR module were then included. In addition, the computed PSA [47] and the cube root of the gravitational index (GRAVIND) [42] were added. Since the passive transport of the compound across the intestinal epithelium is largely diffusion-controlled, and since the diffusion coefficient is inversely proportional to the square (SQINMW) or the cube root (CBINMW) of the molecular weight [48], descriptors representing such features were also included. By inspecting the independent variables in the study table, descriptors that did not have sufficient non-zero values or adequate variation were identified and removed.

A structurally heterogeneous logP _app dataset containing 22 compounds was used as the training set for predicting the apparent permeability of the remaining 19 molecules. Similarly, the training set for %HIA consisted of 30 compounds, leading to an external prediction set consisting of the remaining 46 molecules. The training set for modeling %HIA was selected such that molecules with 0 or 100 %HIA were not included. We believe that these limiting values do not correspond to the actual measurements, but reflect on the compound having crossed a threshold value. Therefore, using these compounds in the training set will affect the overall performance of the derived QSPR model. Hence, molecules with 0% and 100% HIA values are used in the prediction set and not in our training set. To have a direct comparison with reported QSPR models, additional equations were derived for the 17 and 20 compound training sets used in the literature for predicting logP _app and %HIA, respectively (Fig. 1) and the results are provided in the supporting information accompanying the paper.

Several studies in the literature have pointed out that P _app and %HIA are not likely to be linearly related to the computed descriptors [24–29, 36]. Therefore, the experimental P _app and the reported %HIA were transformed to logarithmic units. Since the %HIA has a closed scale with limiting values, fitting the data using linear approximations may lead to a statistically incorrect model. As in previous studies [40], a logit transformation was performed on the dependent variable (%HIA) using Eq. (1). The computed logitHIA was transformed back to %HIA using Eq. (2), so that direct comparisons with experimental values could be achieved.

$$\log {\text{it}}{\left( {{\text{HIA}}} \right)} = \ln {\left[ {{{\left( {\% {\text{HIA + 10}}} \right)}} \mathord{\left/ {\vphantom {{{\left( {\% {\text{HIA + 10}}} \right)}} {{\left( {110 - \% {\text{HIA}}} \right)}}}} \right. \kern-\nulldelimiterspace} {{\left( {110 - \% {\text{HIA}}} \right)}}} \right]}$$

(1)

$${\text{\% HIA = }}{\left[ {{{\left( {{\text{110}} * {\text{exp}}{\left( {{\text{logitHIA}}} \right)} - 10} \right)}} \mathord{\left/ {\vphantom {{{\left( {{\text{110}} * {\text{exp}}{\left( {{\text{logitHIA}}} \right)} - 10} \right)}} {{\left( {1 + \exp {\left( {\log {\text{itHIA}}} \right)}} \right)}}}} \right. \kern-\nulldelimiterspace} {{\left( {1 + \exp {\left( {\log {\text{itHIA}}} \right)}} \right)}}} \right]}$$

(2)

Using Cerius², the training-set molecules were initially subjected to a Genetic/Partial Least Squares (G/PLS) statistical fitting procedure [49, 50] to fit the dependent function (logP _app or logitHIA) using 15 variables and up to four PLS components. In this way, a plausible subset of descriptors that best describe logP _app and logitHIA were obtained from the pool of independent variables. Subsequent G/PLS runs on the same training-set data used only the descriptor subset and seven terms in the regression equation. The initial random population of 100 equations was evolved for 20,000 generations, resulting in a final set of 100 model equations that characterizes the observed logP _app and logitHIA through the square of the correlation coefficient (r ²) fitness function. All the variables involved in the G/PLS regression were scaled and a maximum of three principal components were allowed. To determine the role of certain variables like PSA, Jurs-terms [51], electrotopological indices [52], and AlogP [53], four different models were obtained for logitHIA by including or removing some descriptors from the initial descriptor subset used in the G/PLS run. In addition, QSPR models with varying complexity and for the standard training set used in the literature were obtained for direct comparison with reported models (Table 1).

The internal cross-validated correlation coefficient, r ²(cv) (using the leave-one-out method) was used to ascertain the statistical quality of the derived functions. The accuracy of the estimations was evaluated by comparing the results obtained using the best model equation against the experimental logP _app and logitHIA values for each of the prediction sets. The quantitative logP _app and %HIA results were also used to categorize the molecules based on the extent of intestinal permeability and absorption using the following threshold values. Thus, compounds with computed logP _app<−6 and<−5 are classified as poorly and moderately permeable, respectively. All the remaining molecules are considered to possess good intestinal permeabilities. Similarly, a %HIA of <30 suggests poor intestinal absorption, while molecules exhibiting HIA >70% possess good intestinal absorption. Compounds with medium HIA range between 30 and 70%.

Results

Table 1 summarizes the performances of the various QSPR models derived using the experimental logP _app and %HIA for the passively absorbed compounds listed in Tables 2 and 3. The computed logP _app and %HIA are also given for molecules absorbed through various transport processes but were not used as part of the training set.

Table 2 Experimental and computed logP_app for training set, prediction set, and compounds permeating through carrier-mediated processes^a

Full size table

Table 3 Experimental and computed %HIA values for the training set, prediction set, and compounds absorbed through carrier-mediated mechanisms

Full size table

Statistical results for logP _app

The G/PLS analysis on the 22 training-set molecules (Table 2) resulted in Eq. (3) with an r ² of 0.89 for logP _app and an r ²(cv) of 0.54. The 0.3 log unit root-mean-square error (rmse) for the training set is within experimental error limits.

$$\log P_{{app}} = - 4.628 + 0.399*{\text{Hbond acceptor}} + 0.450*A\log P - 0.329*{\text{Hbond donor}} + 0.698*{\text{Jurs\_RPCS}} - 0.166*{\text{Kappa}}\_1 + 0.00161*{\text{Jurs}} - {\text{WNSA}}\_1$$

(3)

The computed r ²(pr) and rmse obtained for the 37 prediction-set compounds suggest a satisfactory performance of the model when applied to an external dataset, but with a significant rms error. However, closer examination reveals that the predicted logP _app is greater than one log unit when compared with the experimental values for six compounds identified in Table 2. Since the measured apparent permeabilities are available from one literature source only (see supporting information), it is difficult to ascertain if these compounds possess large uncertainty in the observed logP _app values or should be treated as the prediction-set outliers. Assuming the latter, significant improvement in the predicted r ² and rmse is achieved for the remaining 31 molecules in the prediction subset (Table 1). Clearly, there are many other interpretations available for the outliers in this case, including structural diversity outside the training set or alternate absorption mechanisms in these compounds.

Statistical results for logitHIA

The statistical quality of logitHIA (Eqs. 4, 5, 6, 7) and the quantitative estimations of %HIA for the 30 training-set compounds (Table 3) show that good QSPR models are achieved for %HIA. Among the four models, the best results are obtained through Eq. (4) with an r ²(tr) of 0.907 while the logitHIA estimations obtained using Eq. (7) account for only 67% of the variance in the training set (Table 1). This is also reflected in the larger rmse obtained using Eq. (7) compared to the 9% error in HIA associated with Eq. (4). The subtle changes in the r ²(tr) obtained through Eqs. (4, 5, 6 and 7) are attributed to the effect of certain independent variables like PSA, Jurs term, etc. in modeling the intestinal absorption.

$$\log {\text{itHIA}} = 10.342 + 0.783 * S\_dssC - 104.493 * SQINMW - 0.0262 * S\_dO - 0.022 * PSA - 0.164 * S\_ssCH2 - 0.366 * A\log P$$

(4)

$$\log {\text{itHIA}} = 3.849 + 10.991 * Jurs{\text{\_}}RNCG + 0.686 * S\_dssC + 0.114 * S\_ssO + 0.883 * JX - 95.220 * SQINM$$

(5)

$$\log {\text{itHIA = 11}}{\text{.324 - 0}}{\text{.015}} * PSA - 55.141 * CBINMW + 1.032 * Jurs{\text{\_}}FNSA{\text{\_}}2 + 0.0234 * Jurs{\text{\_}}WNSA{\text{\_}}3 + 11.32$$

(6)

$$\log {\text{itHIA = 8}}{\text{.029 - 0}}{\text{.00379}} * Area + 0.735 * GRAVIND - 0.0577 * PSA - 61.408 * CBINMW + 0.724 * Hbond\,donor - 0.0142 * MW9 * Jurs{\text{\_}}RNCG - 0.171 * $$

(7)

The computed r ²(pr) and rmse associated with logitHIA for the 46 prediction-set compounds suggest a weaker performance of the above models for quantitative logitHIA predictions. In striking contrast to the training set results, the predictions obtained through Eqs. (6) and (7) are the best among the four G/PLS models derived here for estimating %HIA. The poor performance of Eqs. (4) and (5) is attributed to the use of certain electrotopological indices in these models that are not represented in some of the validation-set compounds. For example, the absence of methylene carbons (>CH₂) in ketoprofen, ibuprofen, and coumarin is partially responsible for the >40% rmse observed for these molecules by using Eq. (4) (Table 3). Similarly, the lack of ether-like oxygen (–O–) in ticrilast, caffeine, and ciprofloxacin contributes to the smaller r ² observed by using Eq. (5), while compounds lacking a carbonyl carbon (nadolol, chlorothiazide, etc.) are predicted poorly by Eqs. (4) and (5). When molecules predicted with large rmse (>40%) are excluded from the validation set, the correlation coefficient for the validation subset is improved significantly along with a substantial decrease in the rmse (∼10%). Considering the large deviations in the reported inter-laboratory %HIA values for certain compounds (see supporting information), the predictions are still quantitative and within acceptable experimental error limits.

Discussion

The G/PLS results presented above reveal that reliable models have been obtained for estimating logP _app and %HIA across a heterogeneous dataset (Tables 2 and 3). However, the choice of the training-set data and the complexity of the regression equations (Eqs. 3, 4, 5, 6, 7) obtained in this study do not allow a direct comparison of the present results with reported QSPR models. Consequently, additional models were derived for the standard 17 and 20 training-set compounds used in the literature for estimating logP_app and %HIA and compared with the existing models (Table 1) derived using 22 and 30 compound sets, respectively (see supporting information).

Comparison with reported logP _app models

Palm et al. reported one of the earliest logP_app models by correlating the dynamic PSA for six β-adrenoreceptor-blocking agents [17, 23] and subsequently extended to model nine homologues [23] with considerable success. Our initial attempts to obtain a linear relationship between the computed PSA and logP _app for the 22 training-set compounds yielded an r ² of 0.583, considerably lower than the results obtained with Eq. (3) (Table 1). This reveals the limitation on the use of a single variable for quantitative logP _app predictions across a diverse dataset. However, a statistically significant correlation coefficient (0.833) was obtained for the literature training set of 17 compounds when logP _app was modeled using molecular weight and PSA (Eq. 8) [32]. This equation also agrees with the general notion that intestinal permeability is enhanced by increasing the molecular weight and by decreasing PSA. Since Eq. (8) was not validated through predictions for an external dataset, the generality of the model could not be assessed. Therefore, we refit a model using these terms to verify its extension for all the molecules considered in this study (Table 2). The comparable statistical quality (Table 1) and the coefficient on the variables obtained through Eq. (9) for the same 17 training-set compounds reflect the similarity of our model with Eq. (8). The marginal difference in the coefficients in Eqs. (8) and (9) is attributed to the use of slightly different PSA values and also to the use of consensus logP _app values (see supporting information), rather than the experimental values reported in the previous work [9].

$$\log {\text{P}}_{{{\text{app}}}} = 0.008 * MW - 0.043 * PSA - 5.165{\left( {n = 17} \right)}A\log PW - 0.232 * Hbond\,acceptor$$

(8)

$$\log {\text{P}}_{{{\text{app}}}} = 0.00443 * MW - 0.0288 * PSA - 4.459{\left( {n = 17} \right)}$$

(9)

$$\log {\text{P}}_{{{\text{app}}}} = 3.238 - 0.695 * CHI{\text{\_}}3\_C + 0.00129 * Jurs{\text{\_}}PPSA{\text{\_}}2 - 0.309 * Kappa{\text{\_}}3 - 0.00730 * Jurs{\text{\_}}TPSA - 0.149 * Hbond\,donor + 0$$

(10)

$$\log {\text{P}}_{{{\text{app}}}} = - 0.00291 * MW - 0.0150 * PSA - 3.052{\left( {n = 22} \right)}.0655 * Rotlbonds$$

(11)

The simplicity and the easily computable variables suggest Eqs. (8) or (9) to be versatile in modeling the apparent intestinal permeability of virtual compound-libraries. Applying Eq. (9) to a validation set of 50 molecules resulted in an inferior performance with an r ²(pr) of 0.455 and an rmse of 0.73 log units. To determine if the predictions are affected by the choice of descriptors, additional regressions were performed and compared with a reported PLS model [54] (r ²=0.91 versus logP _app and using nine variables). The G/PLS model represented by Eq. (10) for the same 17 compounds demonstrates a much better performance with fewer variables (Table 1). In contrast, the correlation coefficient obtained for the 50 validation-set molecules clearly invalidates the extension of Eq. (10) for logP _app predictions on unknowns.

In comparing previous QSPR models of logP _app and Eq. (10), it is clear that neither the choice of descriptors nor the number of terms in the QSPR equation is a major contributor for the weaker correlations of the models. One of the key reasons for the failure of Eqs. (9) and (10) derived using the standard 17 compounds in the literature arises due to the lack of structural diversity in the training set. Molecules like alprenolol, atenolol, practolol, and metaprolol fall within a homologous series, as do the steroids corticosterone, hydrocortisone, and testosterone. In addition, recent pharmacokinetic studies show some of the literature training-set compounds, like dexamethasone and sulfasalazine, to be transported through efflux mechanisms. [43] Since our attempt was to derive a QSPR model for passively permeable compounds, a training set that spans more of the diversity space was essential. The 22 training-set compounds (Table 2) used for deriving the logP _app model in the present study overcome some of the contaminations in the dataset and represent diverse compounds that appear to permeate predominantly through passive processes.

Equation (11) derived from 22 training-set compounds yielded weaker correlations with an increased rmse compared to equations obtained with the 17-compound training set (Table 1). However, the contrasting interpretations of the effect of molecular weight (compare the coefficients in Eqs. (9) and (11) suggest the sensitivity of the independent variables to the number and choice of the compounds in the dataset. On the other hand, the comparable statistical quality of Eq. (3) with the reported PLS model [54] (Table 1) clearly demonstrates that the lack of predictive ability of Eq. (11) is not due to the different selection of the training set molecules but caused by the incomplete representation of the descriptors. This incomplete representation is further substantiated by the fact that models of HIV-protease-inhibitor uptake by Caco-2 monolayers required multivariate regressions with proper choice of independent variables [55].

Comparison with reported %HIA models

Similar to the logP _app models, QSPR efforts in modeling %HIA involved a standard dataset of 20 molecules in the literature (see supporting information). Using only PSA as the independent variable, a simple non-linear model was obtained with an r ² of 0.90 [23, 29]. Similarly, a linear model was derived for logitHIA by using multivariate statistics [40]. In spite of a good correlation [r ²(tr)=0.916] for the 20 training-set compounds, neither of these models has been used for quantitative %HIA prediction on an external validation set. Our attempts to model the %HIA using the Boltzmann sigmoidal curve function (Eq. 12) and PSA for the same 20 compounds yielded a correlation coefficient of 0.941, in close agreement to reported QSPR models (Table 1).

$${\text{\% HIA = }}{{\text{100}}} \mathord{\left/ {\vphantom {{{\text{100}}} {\left[ {{\text{1 + exp}}{\left( {{{\left( {{\text{PSA}}_{{{\text{50}}}} - PSA} \right)}} \mathord{\left/ {\vphantom {{{\left( {{\text{PSA}}_{{{\text{50}}}} - PSA} \right)}} {{\text{slope}}}}} \right. \kern-\nulldelimiterspace} {{\text{slope}}}} \right)}\quad {\text{where}}\quad {\text{PSA}}_{{50}} \,{\text{is}}\,{\text{the}}\,{\text{value}}\,{\text{at}}\,{\text{which}}\,{\text{HIA}} = 50\% } \right.}}} \right. \kern-\nulldelimiterspace} {\left[ {{\text{1 + exp}}{\left( {{{\left( {{\text{PSA}}_{{{\text{50}}}} - PSA} \right)}} \mathord{\left/ {\vphantom {{{\left( {{\text{PSA}}_{{{\text{50}}}} - PSA} \right)}} {{\text{slope}}}}} \right. \kern-\nulldelimiterspace} {{\text{slope}}}} \right)}\quad {\text{where}}\quad {\text{PSA}}_{{50}} \,{\text{is}}\,{\text{the}}\,{\text{value}}\,{\text{at}}\,{\text{which}}\,{\text{HIA}} = 50\% } \right.}$$

(12)

However, the use of both actively transported and passively diffused compounds in the literature %HIA and logitHIA models can affect the predictive ability. Although stringent conditions were used in obtaining the 20 training-set compounds in the literature [23], pharmacokinetic studies reveal foscarnet and sulfsalazine to be absorbed through various transport mechanisms [7, 8, 43]. Consequently, Eq. (12) was fitted against the experimental %HIA values for the 30 training-set compounds used in this study (Table 3) and compared with the G/PLS results obtained through Eqs. (4, 5, 6 and 7). In spite of explaining only 55% of the variance in the training-set data, this single descriptor non-linear model outperforms all the G/PLS models in its predictive ability. The reversal in the r ² and rmse trends between the training and prediction sets (Table 1) for the %HIA models also demonstrates that the G/PLS procedure perhaps overfits the 30 training-set compounds considerably and more so when electrotopological descriptors are used (compare Eqs. 4 and 5 with 6 and 7). Given the similar r ²(ss) and rmse obtained through Eqs. (6) and (7) with that of Eq. (12), it is tempting to use the PSA model for the sake of simplicity. However, this non-linear model breaks down when apolar molecules are considered and while predicting the %HIA of a congeneric series where hydrophobic substitutions (–CH₃, –C₂H₅, –Ph, etc.) are made. In both these cases, no change in the computed %HIA will be necessarily observed, although the intestinal absorption process could be affected significantly. Again, by virtue of the sigmoidal relationship of PSA with %HIA, the PSA₅₀ and the slope values in Eq. (12) are highly sensitive to the number of compounds used in the training set (Table 1).

Interpretation of the physicochemical descriptors

Several studies have demonstrated that the intestinal absorption and permeability are governed by a number of factors including the lipophilicity, molecular size and shape, and hydrogen-bonding capabilities [27]. The G/PLS models derived here (Eqs. 3, 4, 5, 6, 7 and 10) utilize all these variables in the regression equations and illustrate that random descriptors are not included in explaining the dependent property. For example, three out of the five variables considered by Lipinski [31] as features consistent with drug-like compounds for classifying the intestinal absorption process are manifested through Equation (3). The diversity in the physicochemical variables describing the logP _app also shows that the H-bonding and the Jurs’ terms in Eq. (3) capture the effects of PSA in Eq. (11). Eq. (3) further reveals that H-bond acceptor atoms and lipophilic substitutions improve the intestinal permeability while H-bond donors decrease permeability.

The multiple QSPR models for logitHIA substitute the Jurs descriptors (compare Eq. 4 with Eq. 5) and the substitution of electrotopological indices through components of Jurs-terms, gravitational index, and area (compare Eqs. 4 and 5 with Eqs. 6 and 7). The use of molecular weight in all the G/PLS equations agrees with the general notion that molecular weight and diffusion are interrelated.

Qualitative predictions of intestinal permeability and absorption

Although quantitative estimates of logP _app and %HIA are effective for comparisons across a homologous series, such accuracies are not essential for screening large compound libraries. Furthermore, given the considerable uncertainty in the experimental %HIA measurements and the sizeable errors in the computed values, it would be beneficial if the QSPR models reported in this study reproduce qualitative features correctly. The coarse filters (poor, medium, good) used to define the extent of intestinal permeability (see Materials and methods) demonstrate that reasonable hit rates are observed when the logP _app values obtained from multivariate analysis are used (Table 4; Eq. 3). The results also show that the 2-term logP _app model is approximately 20% less accurate in qualitatively classifying the apparent permeability of compounds in the training and the prediction sets. In spite of the relatively poor r ²(pr) observed for the four %HIA models (Eqs. 4, 5, 6 and 7), the consistent classification of the degree of intestinal absorption adds confidence in extending these equations for qualitative %HIA predictions of molecules that have not been considered in the prediction set. Although the classifications obtained through the non-linear %HIA model (Eq. 12) perform better for the prediction-set compounds, Eqs. (6) and (7) perform well when outliers are removed. Also, Eq. (12) may not perform well for a homologous series where the analogues differ by their hydrophobic substitutions (–CH₃, –C₂H₅, –CH(CH₃)₂, etc.) while Eqs. (6) and (7) should predict trends in these series more consistently. The <50% hit rates obtained in correctly classifying compounds that are carrier-mediated (Table 4) suggests that Eqs. (6), (7) and (12) discriminate molecules transported through different transport mechanisms from those that are passively diffused. In addition, the QSPR models derived here indirectly suggest that the intestinal permeability and absorption of actively transported compounds are modeled reasonably, except for the inability in accounting for the factors contributed by the solute environment and the kinetics of the mediators.

Table 4 Total and percent correct classification (%, in parenthesis) of HIA based on Eqs. (4, 5, 6, 7, and 12) for the training set, prediction set, and the molecules considered to be absorbed through carrier-mediated processes

Full size table

Since the five QSPR models derived for predicting the %HIA perform only moderately well for the prediction-set data, a consensus approach was designed to minimize the number of false positives in the classification scheme. Alternatively, comparing the results obtained using all five %HIA models provides an indirect mechanism of assessing the reliability of computed predictions. The results in Table 5 show that 21 of the 32 compounds in the training-set compounds are classified correctly by all the models, resulting in a 100% confidence in the computed values. The predictions are 80% accurate if four out of the five models classify a molecule similarly. Alternatively, borderline cases are identified when three out of the five models predict similarly. These results show an improved performance when the results of all the five models are used collectively (Table 5) over the %HIA models considered individually (Table 4) for qualitative classification.

Table 5 Total and percent correct consensus prediction and success rate (%, in parenthesis) based on Eqs. (4, 5, 6, 7 and 12) for the training set, prediction set, and the molecules considered to be absorbed through other mechanisms (Transport Set)

Full size table

Conclusions

To derive the logP _app and logitHIA models, linear relationships were assumed to exist with the descriptors investigated. However, a non-linear behavior is illustrative when %HIA is modeled using PSA. In comparing the QSPR models derived for logP _app and %HIA, we believe that the multivariate-statistics approach performs the best for estimating the P _app while a single variable like PSA describes the human intestinal absorption process satisfactorily. The logP _app models derived in the present study also demonstrate that the intestinal permeability is governed by several parameters. The varied dataset used in the reported QSPR models restricts a direct comparison on the performance of the present models to that proposed in literature for modeling logP _app and HIA. However, QSPR models derived using the standard training set used in the literature resulted in weaker correlations when extended to an external prediction set. This is attributed to the lack of structural diversity and the inclusion of actively transported compounds in the literature training set and to the use of non-passively absorbed molecules. In contrast, the use of training-set data containing more passively absorbed compounds resulted in models with good predictive ability. In the absence of a quantitative %HIA comparison for the models derived here with reported QSPR results, qualitative features on the extent of intestinal absorption can be used as an index to assess the performance.

We believe that Eqs. (4, 5, 6, 7) describe %HIA and Eqs. (9, 10 and 11) describe logP _app well. The complexity of both experiments implies that careful use of any models for these properties is necessary. We have attempted to eliminate actively transported compounds from the data. Further work on computational models will be required as additional measurements are reported. We do find that we can obtain predictive models without large numbers of variables and that these models are predictive outside the training set.

The fact that coefficients on common variables in the two equation sets are not always consistent is problematic. P _app is often used as a surrogate test for HIA, yet some of our equations imply the physical processes involved in the two experiments are not the same. Because the correlations and validations are better for P _app and that it is a simpler experiment, we believe that the P _app models are more likely to extend to new compounds.

References

Smith DA, van de Waterbeemd H (1999) Curr Opin Chem Biol 3:373–378
Article PubMed CAS Google Scholar
Clark DE, Pickett SD (2000) Drug Discovery Today 5:49–58
Article CAS Google Scholar
Fecik RA, Frank KE, Gentry EJ, Menon SR, Mitscher LA, Telikepalli H (1998) Med Res Rev 18:149–185
Article PubMed CAS Google Scholar
Tarbit MH, Berman J (1998) Curr Opin Chem Biol 2:411–416
Article PubMed CAS Google Scholar
Navia MA, Chaturvedi PR (1996) Drug Discovery Today 1:179–189
Article CAS Google Scholar
Chan OH, Stewart BH (1996) Drug Discovery Today 1:461–473
Article CAS Google Scholar
Tsuji A, Tamai I (1996) Pharm Res 13:963–977
Article PubMed CAS Google Scholar
Lin JH, Lu AYH (1997) Pharmacol Rev 49:403–449
CAS Google Scholar
Stewart BH, Chan OH, Lu RH, Reyner EL, Schmid HL, Hamilton HW, Steinbaugh BA, Taylor MD (1995) Pharm Res 12:693–699
Article PubMed CAS Google Scholar
Stewart BH, Chan OH, Jezyk N, Fleisher D (1997) Advanced Drug Delivery Reviews 23:27–45
Article CAS Google Scholar
Barthe L, Woodley J, Houin G (1999) Fundam Clin Pharm 13:154–168
Article CAS Google Scholar
Pinto M, Robine-Leon S, Appay M-D, Kedinger M, Triadou N, Dussaulx E, LaCroix B, Simon-Assmann P, Haffen K, Fogh J, Zweibaum A (1983) Biol Cell 47:323–330
Google Scholar
Hidalgo IJ, Raub TJ, Borchardt RT (1989) Gastroenterology 96:736–749
Google Scholar
Wilson G, Hassan IF, Dix CJ, Williamson I, Shah R, Mackay M (1990) J Controlled Release 11:25–40
Article CAS Google Scholar
Delie F, Rubas WA (1997) Crit Rev Ther Drug Carrier Syst 14:221–286
PubMed CAS Google Scholar
Artursson P, Karlsson J (1991) Biochem Biophy Res Commun 175:880–885
Article CAS Google Scholar
Palm K, Luthman K, Ungell A-L, Strandlund G, Artursson P (1996) J Pharm Sci 85:32–39
Article PubMed CAS Google Scholar
Rubas W, Cromwell MEM (1997) Advanced Drug Delivery Rev 23:157–162
Article CAS Google Scholar
Yee S (1997) Pharm Res 14:763–766
Article PubMed CAS Google Scholar
Lennernäs H (1998) J Pharm Sci 87:403–410
Article PubMed Google Scholar
Grès M-C, Julian B, Bourrié M, Meunier V, Roques C, Berger M, Boulenc X, Berger Y, Fabre G (1998) Pharm Res 15:726–733
Article PubMed Google Scholar
Krarup LH, Christensen IT, Hovgaard L, Frokjaer S (1998) Pharm Res 15:972–978
Article PubMed CAS Google Scholar
Palm K, Luthman K, Ungell A-L, Strandlund G, Beigi F, Lundahl P, Artursson P (1998) J Med Chem 41:5382–5392
Article PubMed CAS Google Scholar
Martin YC (1981) J Med Chem 24:229–237
Article PubMed CAS Google Scholar
Nook T, Doelker E, Buri P (1988) Int J Pharmaceut 43:119–129
Article CAS Google Scholar
Merino V, Freixas J, Val Bermejo MD, Garrigues TM, Moreno J, Plá-Delfina JM (1995) J Pharm Sci 84:777–782
Article PubMed CAS Google Scholar
Camenisch G, Folkers G, van de Waterbeemd H (1996) Pharmaceutica Acta Helvetiae 71:309–327
Article PubMed CAS Google Scholar
Testa B, Carrupt P-A, Gaillard P, Billois F, Weber P (1996) Pharm Res 11:335–343
Article Google Scholar
Camenisch G, Folkers G, van de Waterbeemd H (1998) Eur J Pharm Sci 6:325–333
PubMed CAS Google Scholar
Camenisch G, Alsenz J, van de Waterbeemd H, Folkers G (1998) Europ J Pharm Sci 6:313–319
Article CAS Google Scholar
Lipinski CA, Christopher AL (1997) Advanced Drug Delivery Rev 23:3–25
Article CAS Google Scholar
Clark DE (1999) J Pharmaceutical Sci 88:807–814
Article CAS Google Scholar
Sugawara M, Takekuma Y, Yamada H, Kobayashi M, Iseki K, Miyazaki K (1998) J Pharmaceutical Sci 87:960–966
Article CAS Google Scholar
Winiwarter S, Bonham NM, Ax F, Hallberg A, Lennernäs H, Karlén A (1998) J Med Chem 41:4939–4949
Article PubMed CAS Google Scholar
Bermejo M, Merino V, Garrigues TM, Plá-Delfina JM, Mulet A, Vizet P, Trouiller G, Mercier C (1999) J Pharm Sci 88:398–405
Article PubMed CAS Google Scholar
van de Waterbeemd H, Camenisch G (1996) Quant Struct Act Relat 15:480–490
Article Google Scholar
Palm K, Stenberg P, Luthman K, Artursson P (1997) Pharm Res 14:568–571
Article PubMed CAS Google Scholar
Stenberg P, Luthman K, Artursson P (1999) Pharm Res 16:205–212
Article PubMed CAS Google Scholar
Goodwin JT, Mao B, Vidmar TJ, Conradi RA, Burton PS (1999) J Peptide Res 53:355–369
Article CAS Google Scholar
Norinder U, Österberg T, Artursson P (1999) European J Pharm Sci 8:49–56
Article CAS Google Scholar
Ghuloum AM, Sage CR, Jain AN (1999) J Med Chem 42:1739–1748
Article PubMed CAS Google Scholar
Wessel MD, Jurs PC, Tolan JW, Muskal SM (1998) J Chem Inf Comp Sci 38:726–735
Article CAS Google Scholar
Irvine JD, Takahashi L, Lockhart K, Cheong J, Tolan JW, Selick HE, Grove JR (1999) J Pharm Sci 88:28–33
Article PubMed CAS Google Scholar
Kansy M, Senner F, Gubernator K (1998) J Med Chem 41:1007–1010
Article PubMed CAS Google Scholar
Pade V, Stavchansky S (1998) J Pharmaceutical Sci 87:1604–1607
Article CAS Google Scholar
Rappe AK, Casewit CJ, Colwell KS, Goddard WA, Skiff WM (1992) J Am Chem Soc 114:10024–10035
Article CAS Google Scholar
The PSA arising from the oxygen, nitrogen, and halogen atoms, and the –OH, –NH2 groups was calculated using the solvent accessible surface area in Cerius2 with the van der Waals radii reported in reference 37.
Herman RA, Veng-Pedersen P (1994) J Pharm Sci 83:423–428
Article PubMed CAS Google Scholar
Rogers D, Hopfinger AJ (1994) J Chem Inf Comp Sci 34:854–866
Article CAS Google Scholar
Genetic partial least squares in QSAR: Rogers D (1996) In: Devillers J (ed) Genetic algorithms in molecular modeling. Academic, London
Google Scholar
Stanton DT, Jurs PC (1990) Anal Chem 62:2323–2329
Article CAS Google Scholar
Hall LH, Kier LB, Brown BB (1995) J Chem Inf Comp Sci 35:1074–1080
Article CAS Google Scholar
Viswanadham VN, Ghose AK, Revankar GR, Robins RK (1989) J Chem Inf Comput Sci 29:163–172
Article Google Scholar
Norinder U, Österberg T, Artursson P (1997) Pharm Res 14:1786–1791
Article PubMed CAS Google Scholar
Stewart BH, Chung FY, Tait B, Blankley CJ, Chan OH, John C (1998) Pharm Res 15:1401–1406
Article PubMed CAS Google Scholar

Download references

Author information

Govindan Subramanian
Present address: Transtech Pharma., 4170 Mendenhall Oaks Parkway, High Point, NC, 27265, USA

Authors and Affiliations

Computer-Aided Drug Discovery Department, Albany Molecular Research, Inc., 21 Corporate Circle, P.O. Box 15098, Albany, NY, 12212-5098, USA
Govindan Subramanian & Douglas B. Kitchen

Authors

Govindan Subramanian
View author publications
You can also search for this author in PubMed Google Scholar
Douglas B. Kitchen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Douglas B. Kitchen.

Additional information

Dedicated to Professor Dr. Paul von Ragué Schleyer on the occasion of his 75th birthday.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(PDF 106 kb)

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License ( https://creativecommons.org/licenses/by-nc/2.0 ), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Subramanian, G., Kitchen, D.B. Computational approaches for modeling human intestinal absorption and permeability. J Mol Model 12, 577–589 (2006). https://doi.org/10.1007/s00894-005-0065-z

Download citation

Received: 04 April 2005
Accepted: 28 September 2005
Published: 01 April 2006
Issue Date: July 2006
DOI: https://doi.org/10.1007/s00894-005-0065-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Computational approaches for modeling human intestinal absorption and permeability

Abstract

Similar content being viewed by others

The gut microbiota and diabetes: research, translation, and clinical applications – 2023 Diabetes, Diabetes Care, and Diabetologia Expert Forum

Deep learning in drug discovery: an integrative review and future challenges

Regulating AI-Based Medical Devices in Saudi Arabia: New Legal Paradigms in an Evolving Global Legal Order

Introduction

Dataset

Materials and methods

Results

Statistical results for logP _app

Statistical results for logitHIA

Discussion

Comparison with reported logP _app models

Comparison with reported %HIA models

Interpretation of the physicochemical descriptors

Qualitative predictions of intestinal permeability and absorption

Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

(PDF 106 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Computational approaches for modeling human intestinal absorption and permeability

Abstract

Similar content being viewed by others

The gut microbiota and diabetes: research, translation, and clinical applications – 2023 Diabetes, Diabetes Care, and Diabetologia Expert Forum

Deep learning in drug discovery: an integrative review and future challenges

Regulating AI-Based Medical Devices in Saudi Arabia: New Legal Paradigms in an Evolving Global Legal Order

Introduction

Dataset

Materials and methods

Results

Statistical results for logP app

Statistical results for logitHIA

Discussion

Comparison with reported logP app models

Comparison with reported %HIA models

Interpretation of the physicochemical descriptors

Qualitative predictions of intestinal permeability and absorption

Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

(PDF 106 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Statistical results for logP _app

Comparison with reported logP _app models