Introduction

The blood–brain barrier (BBB) is a complex system, tightly regulating the transport from and to the central nervous system (CNS) [1]. It separates the systemic bloodstream from the CNS and is therefore important for drug diffusion and transport between them [2]. Drugs targeting the CNS need to be able to pass the BBB to reach their target [3]. In contrast, low BBB permeability reduces the chance of undesirable CNS-related side effects [4, 5]. Therefore an early estimation of BBB permeability would be highly valuable for drug design [6, 7]. The relevance of BBB permeability of therapeutic drugs has been reported in the context of numerous clinical dysfunctions, like dementia [8] and other clinical disorders [911].

The most common numeric value describing permeability across BBB is the logBB [12]. It is defined as logarithmic ratio between the concentration of a compound in brain and blood (Eq. 1).

$$ \log {\text{BB}} = \log \left( {\frac{{c_{\text{Brain}} }}{{c_{\text{Blood}} }}} \right) $$
(1)

Unfortunately, experiments to measure logBB are time-consuming, laborious and expensive in vitro [1316] and even more in vivo [17, 18]. So it is not surprising that the number of published experimental values is limited. Experimental methods to assess BBB permeability range from artificial membranes and complex cell culture systems to in vivo methods. The PAMPA assay uses artificial membranes to observe passive (effective) membrane permeability, quantified by Peff [19, 20]. Obviously, those experiments are only able to observe permeability, neglecting the special characteristics of the BBB. Nevertheless, results from these studies support the validity of lipid bilayer systems as strongly simplified representations of the BBB. The main drawback of cell-free methods is that they neglect active transporters acting at the BBB and therefore incorrectly predict substrates to transport systems [21]. Numerous active transport systems and efflux transport systems play an important role at the BBB [22, 23]. One of the most commonly reported transport systems acting at the BBB is P-Glycoprotein (P-Gp) [2426]. In contrast, in vivo methods, like in situ brain perfusion [17], are able to capture real BBB permeability as given by PS (permeability surface product) or logPS values [27].

Due to these experimental difficulties, it is not surprising, that BBB is frequently addressed via computational approaches. Computer-aided methods applied to this field of interest include multiple linear regression [2832], bagged regression [33], partial least square analysis [3437], support vector machines [3840] and artificial neural networks [39, 41]. These methods are frequently combined with descriptor selection algorithms based on genetic algorithms to name only one [42, 43]. A comprehensive overview of previous models for BBB prediction has been published by Vastag and Keseru [44].

Depending on the size of the dataset, the number of descriptors, and the mathematical approach for prediction range from rough guidelines to quantitative predictions. Complex methods like partial least square analysis and artificial neural networks suffer from the drawback of being hard to interpret, whereas simple methods like multiple linear regression often yield less accurate results or even only rough guidelines [45]. Although different mathematical techniques make it hard to compare the results directly, the performance decreases strongly with larger datasets. High squared correlation coefficients above 0.85 are reported frequently for focused data sets with a size of approximately 50–90 compounds [31]. Predictions based on larger compound collections with a size of over 200 compounds resulted mainly in “rules of thumb” for good BBB permeability [45]. Altogether, these findings clearly show that there is still need for further research on BBB permeability [46].

Summarizing recent work, there is broad agreement on the importance of some molecular properties and descriptors which have been found in numerous publications to influence BBB permeability [45]:

  • The descriptor most frequently reported with BBB permeability is the polar surface area. The majority of publications report correlation of logBB with the polar surface [30, 47] or a property closely related to it [35]. The sum of oxygen and nitrogen atoms for example is extremely cheap in computation-time, but has still proven to be useful.

  • There is consensus that BBB permeability is also highly influenced by lipophilicity [48, 49]. One way to quantify lipophilicity is logP, the logarithmic partition coefficient between 1-octanol and water. However, the ability of logP to represent lipophilicity come under discussion recently [50], as octanol is a good hydrogen donor and therefore probably not a typical apolar solvent, even more when used as a calculated in silico descriptor [50, 51]. In addition to that, logP is defined for the neutralized state of a compound. LogP values for ionized (e.g. protonated) compounds are basically not defined [52]. Liu et al. [47] introduced ‘lipoaffinity’ as an easily-accessible descriptor. It is calculated by adding the contributions to the logP values of all but nitrogen and oxygen atoms.

  • Molecular flexibility has also been reported to influence BBB permeability. This is in agreement with the theory we used in this study (see below), since rigid molecules seem to fit less well to the membrane than more flexible ones (given that both molecules have approximately the same weight) [53]. A simple descriptor representing molecular flexibility would be the number of rotatable bonds, for example [29].

In this study we followed an approach based on physico-chemical properties to address permeation across the BBB, proposed by Fischer et al. [53]. According to this hypothesis, the process of integrating a compound into a membrane can be split into essential steps that can be added up to form the process of membrane permeation:

  • In the first step the compound needs to be desolvated from the aqueous environment. The process of desolvation is often addressed by molecular dynamics simulations [54]. Simultaneously, a cavity, appropriate for embedding the compound within the membrane is created. The amount of energy required to create this cavity is correlated to the energy needed to insert a molecule into the membrane. Fischer et al. [53] assume that the size of this cavity is crucial for membrane permeation.

  • In the second step the compound is inserted into the cavity. It is stabilized by electrostatic interactions with the polar headgroup of lipids and hydrophobic interactions with the core region of the lipid bilayer [55].

  • Finally, the compound needs to resolvate behind the lipid bilayer. This process is similar to the reversion of the solvation process.

Figure 1 schematically illustrates how a molecule is inserted into a membrane according to this hypothesis. Based on this theory Gerebtzoff and Seelig [56] introduced the cross-sectional area (CSA) of a molecule as a novel descriptor to presumably represent BBB permeability. Because of its well-founded physico-chemical background it promised to achieve good predictability and interpretability, although this descriptor neglects all thermodynamic aspects of desolvation and resolvation. Descriptors based on valid mechanistic models have proven to contribute to the design and optimization of drug molecules [57]. Thus we reproduced this promising molecular descriptor and critically analysed its ability to predict BBB permeability. For this purpose, we compiled a large data set with experimental logBB values from numerous published datasets, instead of using single focused sets.

Fig. 1
figure 1

The cross-sectional area (CSA) has been introduced as a measure for the area occupied by a compound after insertion into a lipid membrane. Local polarity of the membrane determines the orientation of the ligand

Methods

Calculation of the CSA

We calculated the amphiphilic axis and CSA, as described in detail in Gerebtzoff and Seelig [56]. Modifications were introduced wherever the description was not clear or the results did not match our expectations. The following section describes and explains these modifications.

The amphiphilic axis is defined by the hydrophobic and hydrophilic center of a molecule. The hydrophilic center was calculated by averaging oxygen and nitrogen atom positions weighted by their contribution to the topological polar surface area (TPSA). Assuming that hydrogen bonds mainly influence BBB permeability [32], we decided to consider only nitrogen and oxygen as hydrophilic atoms and neglect sulphur atoms. The weighting factors were based on TPSA provided by MOE [58]. To emphasize the increased polar character of charged atoms compared to polarized atoms, we assigned a factor of 100 to charged atoms according to Eq. 2, where wf is the weighting factor, z is the charge and w0 is the weighting factor according to the TPSA.

$$ {\text{wf}} = 100 \cdot {\text{z}} + {\text{w}}_{0} \cdot \left( {1 - {\text{z}}} \right) $$
(2)

Halogen and carbon atoms were taken into account to place the hydrophobic center. Hydrophobic atom positions were weighted by their contribution to logP prediction by MOE (logP(o/w)) [59]. This fragment-based calculated logP suggests that halogen atoms have a large negative contribution to logP, which results in a displaced hydrophobic center for molecules containing halogen atoms. Thus we removed the logarithm before weighting to avoid negative contributions. Removal of the logarithm resulted in a more intuitive placement of the hydrophobic center (see Fig. 2.)

Fig. 2
figure 2

Comparison of two different strategies to calculate the hydrophobic center (red sphere) for compounds with halogen atoms (like perphenazine). On the left side, the hydrophobic center is calculated weighting atom positions by their contribution to logP prediction; on the right side the calculation is done with modifications presented in this study

According to the mechanism outlined by Fischer et al. [53], a molecule inserts into a membrane along the amphiphilic axis. The CSA reflects the area occupied by the molecule when projected to the plane perpendicular to the amphiphilic axis (see Figs. 1, 3). Projecting a molecule onto an area reduces computational efforts from 3D into 2D space, which dramatically increases the calculation speed for larger molecules in contrast to the published procedure [56].

Fig. 3
figure 3

Amitriptyline with hydrophilic center (yellow sphere), hydrophobic center (red sphere), amphiphilic axis (green line) and CSA (green dotted area). This BBB-permeable compound illustrates the role of the amphiphilic axis and the CSA

Calculation of amphiphilic axis and CSA were performed with MOE [60] using its scripting language SVL (complete script is available as supplementary information). Partial charges were calculated using MMFF94x forcefield. Protonation states were assigned according to physiological pH of 7.4.

Experimental CSA values

To validate our CSA calculations, we compared our results with experimental CSA values [56]. We obtained all structures as SDF files from PubChem [61]. The reported dataset [56] consists of 32 compounds with experimental CSA values for pH 7.4 and 8. The experimental CSA at pH 7.4 was used, as it represents physiological pH. Carebastine was excluded from the set since the two CSA values for pH 8 and 7.4 differed significantly. We also removed beta-cyclodextrin from the dataset, since it is not a typical drug-like molecule with a molecular weight over 1,000 Da (see Lipinski’s rule-of-5 [62]). For each compounds the most stable conformation according to its conformational energy calculated by Omega (version 2.0) was used to calculate the CSA.

Experimental logBB values

In contrast to the small number of experimental CSA values, various studies containing experimental logBB values have been published. To investigate the ability of CSA for logBB prediction we combined all available published datasets and generated a novel large logBB dataset. Our dataset consisted of 195 compounds from Vilar et al. [63], 119 compounds from Platts et al. [64], 38 compounds from Naranayan and Gunturi [43], 94 compounds from Mente and Lombardo [33], 147 compounds from Zhang et al. [40], 197 compounds from Abraham et al. [65], 168 compounds from Garg and Verma [66], 106 compounds from Guerra et al. [67], 95 compounds from Rose et al. [68], 36 compounds from Kelder et al. [31], 165 compounds from Konovalov et al. [69] and 36 compounds from Zerara et al. [70]. Many compounds were reported multiple times with similar or identical logBB values, especially drugs with CNS-related effects such as antidepressants or neuroleptics. The average of the logBB values was used for identical compounds reported more than once. After removing duplicate structures, we ended up with 362 unique compounds with experimental logBB values ranging from −2.2 to +1.6. 199 logBB values were positive, 163 were negative or zero.

From this set we also wanted to exclude actively transported compounds, since their mechanism of passing the BBB is different to those passively entering CNS. Therefore we searched for substrates of P-Gp, one of the major transport systems acting at the BBB, in three previously published datasets [7173]. Combining results from these sources we excluded 18 known substrates of P-Gp (bunitrolol, cimetidine, digoxin, domperidone, etoposide, fexofenadine, flunitrazepam, levodopa, loperamide, methotrexate, morphine, nevirapine, phenytoin, quinidine, risperidone, triflupromazine, vincristine, yamatetan). In addition to these 18 compounds, six compounds (chlorpromazine, doxorubicin, nelfinavir, saquinavir, verapamil, vinblastine) are reported ambiguously in the publications, thus we did not exclude them.

To the best of our knowledge, this is to date the largest set of quantitative logBB values, compiled from various resources. This dataset promises to be a very elaborate and refined selection of compounds. The complete dataset can be found in the supplementary material.

Descriptor calculation

We calculated all descriptors provided by MOE 2010.10 [59] and all from ACD/Labs (version 10.0) [74], that could be calculated for all compounds. A complete list of descriptors used is included the supplementary information. In addition, we calculated descriptors reported to be useful in other publications addressing BBB permeability, as far as we were able to reproduce them. Table 1 lists all additional descriptors together with a reference to their original publication. We also implemented size intensive descriptors using molecular weight as a normalizing factor [75]. Finally, our data set comprised over 880 descriptors, ranging from simple atom counts to computationally intensive quantum–mechanical properties.

Table 1 List of molecular descriptors developed or reproduced in addition to the standard descriptors by ACD/Labs 10.0 and MOE 2010.10

Quantitative models: beam search and multiple linear regression

A large number of potentially predictive descriptors prompted us to systematically reduce dimensionality (the number of descriptors) used to construct and validate the models. A beam search algorithm (width = number of descriptors = 79) was applied to preselect potentially predictive descriptors [78]. For each combination a bootstrapped multiple linear regression was calculated and the squared correlation coefficient was returned as fitness criterion. We limited the maximum number of generations and subsequently the number of descriptors simultaneously taken into account to 10 and selected the best multiple linear regression model per generation.

Qualitative models: beam search and random forest

To generate qualitative models our dataset was split into BBB permeable (logBB ≥ 0.3, n = 126) and non-permeable (logBB ≤ −0.3, n = 76) compounds. The compounds between the two limits (n = 142) were excluded from the process, as they do not show strong characteristics of BBB permeable or non-permeable, respectively. These limits were adapted from Abraham et al. [65], who assume an experimental error of about 0.3 log units (logBB values range from −2.2 to 1.6). We then performed a beam search from 1 to 5 descriptors (width = number of descriptors = 72). As qualitative model we constructed a random forest model for each combination (ntree = 5, depth = 5), validated by a bootstrapping procedure (sample ratio = 1.0, number of validations = 100). Accuracy was used as the main performance criteria. Again, we captured the best models per generation.

All models were calculated using RapidMiner (version 5.1.1) and the Weka’s implementation of a random forest algorithm. Correlation coefficients were calculated according to Pearson.

Results and discussion

We calculated the CSA for 32 compounds and compared it with experimental values taken from Gerebztoff and Seelig [56]. Similar to the original work we also achieved a good correlation (r = 0.898) to experimental CSA values.

Quantitative models to predict logBB

The main intention of the present study was to investigate a correlation between CSA and BBB permeability, as suggested by Gerebtzoff and Seelig [56]. We therefore constructed multiple linear regression models using a beam search algorithm for feature selection (up to 10 descriptors).

Table 2 shows the squared correlation to increase with respect to the number of descriptors. Simultaneously, the validated squared correlation is constant or even decreases for more than 5 descriptors. Overall, statistical parameters improve only slightly from 5 to 10 descriptors, although the number of descriptors used is doubled. Thus influence of additional descriptors must be questioned. The validated squared correlation increases constantly up to 5 descriptors. So we consider 5 as the maximum number of descriptors to avoid overfitting.

Table 2 Squared correlation coefficients (raw and bootstrap validated) of the best models with 1–10 descriptors constructed with beam search using multiple linear regression and squared correlation as performance criterion

In agreement with previous studies, TPSA is highly important for BBB permeability. The number of polar atoms (n_pol) and a descriptor taken from Feher et al. [30] (I3), followed by the number of positive ionisable groups (n_PI) and a descriptor developed for this study (PDist) were also found to influence BBB permeability, as well as the number of hydrogen bond acceptors (a_acc).

In contrast to our expectations, CSA never appeared in the most predictive models. This leads to the question why CSA does not contribute to BBB prediction as much as TPSA, for example. For BBB-permeable compounds Gerebtzoff and Seelig [56] suggest that there is an upper limit for CSA at 80 Ǻ². Figure 4a shows a scatterplot of logBB versus calculated CSA values for our large dataset, to further analyse this hypothesis. For 11 compounds, both experimental and calculated values for CSA and logBB were available. Overall, this plot does not show a clear correlation between CSA and logBB. As suggested in the original publication, we also investigated our dataset with respect to logD (at pH 7.4) and CSA. In contrast to the original publication Fig. 4b shows no significant separation by logD and CSA. A limit for BBB permeable compounds reflected by the CSA could not be determined.

Fig. 4
figure 4

a Experimental logBB plotted against 11 experimental and 362 calculated CSA show no correlation. Blue dots represent experimental CSA values, whereas grey dots are based on calculated CSA values. b Colour coded scatterplot of CSA versus LogD (at pH = 7.4), where green dots represent BBB permeable, red dots represent non-BBB permeable and gray dots represent unclassified compounds

Correlation between CSA and number of atoms

Searching for structural and chemical information covered by CSA, we tested its correlation with all other descriptors. Overall, various descriptors correlate remarkably well with the CSA. Table 3 lists the correlation with prominent other descriptors, including those from the models listed in Table 2. The majority of those are based on properties, easily obtainable from the structure. Remarkably, CSA is highly correlated to numerous simple descriptors that are easier to calculate, such as the number of atoms (see Fig. 5). A good correlation (r = 0.959) between those two properties suggest that CSA can be seen as derivative of the number of atoms. A high correlation of approximately 0.9 has also been found with the molecular weight. It was reported previously that molecular weight contributes to bioavailability in general [77]. Therefore we doubt that CSA provides more information with respect to BBB permeability than the number of atoms or molecular weight.

Table 3 Various commonly-known descriptors correlate well with the CSA
Fig. 5
figure 5

CSA plotted against the number of atoms (a_count) reveals a remarkably high correlation (r = 0.959)

Number of descriptors, dataset size and accuracy

Whenever we tried to construct multiple linear regression models for logBB prediction on our large dataset, we failed to achieve results comparable to those reported by others using smaller data sets. To benchmark this relationship, linear regression models were built based on a single descriptor, but varying the composition of the training set. As descriptor TPSA was chosen, since its impact on BBB permeability has not only been demonstrated by the models presented here but also by other researchers, for example by Kelder et al. [31]. In their study a set of 45 compounds was used to construct a regression model. Similarly, we constructed subsets from our dataset consisting of 50–350 compounds and calculated the squared correlation coefficient for each model. Each subset size was tested 500 times using different random seeds to cover different selection of compounds. Figure 6 illustrates that small sets show a large variability with respect to the squared correlation. Although the number of possible subsets is much lower for the large subsets, those are less likely to suffer from arbitrary correlations. This underlines the need for large datasets like the one we present here.

Fig. 6
figure 6

Different training sets with 50–350 examples all selected from our dataset (n = 362) show that the size of the training set highly influences the performance given by squared correlation, even when constructed with exactly the same descriptor (TPSA) and the same procedure

Qualitative models

Focusing on a small number of descriptors, we were not able to obtain simple models with high performance using quantitative techniques. Thus we also calculated qualitative models to predict BBB permeability using a random forest. Again, we could compare our results to various published studies [28, 38, 56].

For the qualitative models we used the same dataset as for the quantitative models, but converted logBB values into three bins. Compounds with a logBB ≥ 0.3 comprise the set of BBB permeable compounds, whereas compounds with a logBB ≤ 0.3 are considered as not BBB permeable. The remaining compounds are excluded from the qualitative modelling. This left us with a set of 202 compounds for the training set. From the initial set of 886 descriptors, only 72 descriptors remained after preselection. Similar to the quantitative approach we aimed to obtain simple and interpretable models with a maximum of 4 concurrent descriptors. The beam search returned one model without misclassification (accuracy = 1.00) using four descriptors. To evaluate the robustness of this model a bootstrap validation (n = 100) was applied. The complete results are shown in Table 4.

Table 4 Accuracies (raw and validated) of the best models and prediction systems with up to 4 descriptors constructed using a beam feature search in combination with random forest learners

The random forest prediction system based on four descriptors also achieves a high validated accuracy. The selected descriptors are similar to those obtained by multiple linear regressions and therefore highlight the importance of the following basic molecular properties:

  • TPSA was selected in all models.

  • QSUMN is the sum of charges on nitrogen atoms. This classifies all compounds by their charge on nitrogen atoms and subsequently also discriminates compounds having no nitrogen atom at all.

  • QSUMN/Weight represents a size-intensive descriptor calculated from QSUMN and the molecular weight. For large compounds the molecular weight is the dominating factor for this descriptor.

  • I3: Is −1 for acid compounds, +1 for basic compounds and 0 for the remaining compounds.

To analyse the dependence of the four descriptors we also calculated the intercorrelation matrix (Table 5). Although QSUMN and QSUMN/Weight are highly correlated, both seem be predictive for logBB. Especially compounds with higher molecular weight differ considerably for the two descriptors.

Table 5 Intercorrelation matrix for the four descriptors used in the best random forest prediction model

Table 6 compares the results from our calculations with results from other publications. The results clearly show that random forest prediction systems are well-suited to classify BBB permeability. Altogether, we outperformed many other models trained on datasets with similar sizes in terms of validated accuracy, even using fewer molecular properties.

Table 6 Results of classification systems for BBB permeability taken from the literature

In contrast to results from quantitative models, qualitative classification models are able to predict BBB permeability with high accuracy, especially when aiming for simple models based on a small number of descriptors. To quantify BBB permeability a more sophisticated and complex model is needed. However, we have shown that the number of descriptors that can be used is limited when looking at validated performances. Using a high number of descriptors for small datasets bears the risk of overfitting and arbitrary correlations.

In contrast, we focused on a simple prediction system that links BBB permeability to easily understandable molecular properties. Focusing on a small number of descriptors it might be easier to construct a binary classifier than to quantitatively predict BBB permeability.

Strengths and limitations

In the present study there are several novel findings:

  • In addition to well-known descriptors, we added a significant number of descriptors that have never been evaluated and validated in the context of BBB prediction, for example size intensive descriptors (explained in [75]), and other novel descriptors listed in Table 1. Furthermore, we addressed the CSA which has been proposed as being predictive for BBB permeability. The qualitative models as shown in Table 4 include, in addition to TPSA, two of these novel descriptors.

  • All prediction systems are limited by the experimental error of the data they are based on. Therefore, our set consists of compounds with experimental logBB values only, compiled from various publications.

  • We developed an unparalleled compact and highly-predictive qualitative model validated by bootstrapping, that might act as general guideline for estimating BBB permeability.

Conclusion

In this work, we applied qualitative and quantitative in silico techniques to predict BBB permeability. For this purpose we created a reasonably large dataset (n = 362) of experimental logBB values. For each compound of the training set we calculated a broad set of descriptors ranging from simple atom count descriptors to computational more expensive descriptors like the CSA perdendicular to the amphiphilic axis. For this special descriptor were also able to validate calculated CSA with a set of experimentally measured values (n = 32).

The best quantitative prediction system based on multiple linear regression without overfitting yielded a bootstrapped squared correlation coefficient of 0.521. Qualitative models based on a random forest performed remarkably better. The best prediction system based on only four descriptors achieved a bootstrap validated accuracy of 88% (unvalidated 100%). Remarkably, the CSA was not chosen by the feature selection algorithm used to select the most predictive descriptors. In contrast, a combination of simple and well-known descriptors was found to be most useful to predict logBB.

Finally, we also showed that large and carefully comprised datasets, like the one presented here, reduce the risk of arbitrary correlations and result in more reproducible and robust models.

Support information

The SVL script to calculate and visualize the CSA perpendicular to the amphiphilic axis is provided as well as a spreadsheet file containing the whole set of compounds together with their corresponding logBB as well as a complete list of the descriptors calculated by ACD/Labs 10.0 and MOE 2010.10 for free download.