Abstract
This study applies the ‘Flexible-Acceptor’ variant of the General Solubility Equation, GSE(Φ,B), to the prediction of the aqueous intrinsic solubility, log10 S0, of FDA recently-approved (2016–2020) ‘small-molecule’ new molecular entities (NMEs). The novel equation had been shown to predict the solubility of drugs beyond Lipinski’s ‘Rule of 5’ chemical space (bRo5) to a precision nearly matching that of the Random Forest Regression (RFR) machine learning method. Since then, it was found that the GSE(Φ,B) appears to work well not only for bRo5 NMEs, but also for Ro5 drugs. To put context to GSE(Φ,B), Yalkowsky’s GSE(classic), Abraham’s ABSOLV, and Breiman’s RFR models were also applied to predict log10 S0 of 72 newly-approve NMEs, for which useable reported solubility values could be accessed (nearly 60% from FDA New Drug Application published reports). Except for GSE (classic), the prediction models were retrained with an enlarged version of the Wiki-pS0 database (nearly 400 added log10 S0 entries since our recent previous study). Thus, these four models were further validated by the additional independent solubility measurements which the newly-approved drugs introduced. The prediction methods ranked RFR ~ GSE (Φ,B) > ABSOLV > GSE (classic) in performance. It was further demonstrated that the biases generated in the four separate models could be nearly eliminated in a consensus model based on the average of just two of the methods: GSE (Φ,B) and ABSOLV. The resulting consensus prediction equation is simple in form and can be easily incorporated into spreadsheet calculations. Even more significant, it slightly outperformed the RFR method.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
In the 5-year period 2016–2020, 228 drugs were approved by the FDA, mostly for the treatment of cancer, infectious/viral diseases, and neurological disorders [1,2,3,4,5,6,7]. Of these drugs, 74% are ‘small molecule’ new molecular entities (NMEs). Many of the NMEs are larger, more lipophilic, and possess more H–bond acceptors, compared to older drugs in the Lipinski ‘Rule of 5’ (Ro5) chemical space [8, 9]. NMEs outside the Lipinski space are often dubbed ‘beyond the Rule of 5’ (bRo5) drugs [9,10,11,12,13,14,15,16,17,18]. Size inflation is not the only physicochemical characteristic of the NMEs. Some new drugs are relatively small.
Generally, large molecules may increase pharmacokinetic (PK) risks due to low solubility, possibly low cell permeability, increased efflux, and elevated metabolism. During drug discovery/early development, strategies to mitigate some of the risks have included: (i) selecting molecules which can dynamically form intramolecular H-bonds (IMHB) to shield polar groups, (ii) shielding polar groups by bulky side chains or by N-methylation, and (iii) selecting molecules with flexible rings structures [14,15,16,17,18,19]. Flexible molecules with the potential to form IMHBs have been of particular interest, since these may possess enhanced solubility in water, by adopting hydrophilic ‘extended’ conformations, as well as facilitated permeability across cell membranes, by adopting hydrophobic ‘folded’ conformations [17,18,19].
Solubility plays a central role in the fuller understanding of the PK risks. Reliable and actionable in silico models to predict solubility of NMEs and of promising molecules not yet synthesized, could be a valuable contribution to risk assessment [13]. We started to address this topic in a series of in silico studies [20,21,22]; the present contribution is a continuation of that effort.
In a recent study to predict the intrinsic solubility, log10 S0, of four standardized external test sets of mostly druglike molecules [20], three methods were critically examined: (i) Yalkowsky General Solubility Equation (GSE) [23], (ii) Abraham Solvation Equation (ABSOLV) [24], and (iii) Breiman Random Forest regression (RFR) machine learning method [25]. RFR was found to be most accurate: for a highly-curated external test set of 100 druglike molecules with consistently-determined solubility values (average interlaboratory reproducibility, SDavg ~ 0.17 log10 unit), the strength of the prediction was indicated by the coefficient of determination, r2 = 0.64, and root-mean-square error, RMSE = 0.76 (log10 unit) [20]. However, the ‘black-box’ machine learning RFR method has some disadvantages: (i) it does not directly suggest how compounds could be altered to increase/decrease their solubility [26]; (ii) there is no obvious simple explicit equation to predict solubility which could be used in a spreadsheet calculation; (iii) the method ‘learns’ superlatively but ‘teaches’ tepidly. The linear ABSOLV model, based on Abraham’s five solvation descriptors [24, 27], yielded poorer statistics: r2 = 0.26 and RMSE = 1.10 for the same test set. The GSE, Eq. 1, was even slightly less successful compared to ABSOLV (r2 = 0.20, RMSE = 1.13) [20]. Nevertheless, the simple classic GSE is particularly appealing since it requires no ‘training.’ Merely the melting point (mp in oC) and the calculated (or measured) octanol–water partition coefficient, log P, are required to predict solubility (in log molar units):
In a follow-up study [21], solubility prediction using the above three models was applied to large molecules (MW > 800 g·mol−1). The novel aim was to explore to what extent Ro5 molecules could be used to predict the log10 S0 of molecules from the bRo5 space. For an external test set of 31 large molecules, RFR predicted solubility (r2 = 0.37, RMSE = 1.07) better than the other two methods. The RFR results suggested that it was possible to develop a model trained on small Ro5 molecules to predict the solubility of large bRo5 molecules. Unfortunately, the ‘how’ was not explicitly obvious. Nevertheless, the RFR method could serve as a benchmark against which other more actionable models could be measured. Also, the study revealed that the traditional GSE systematically underpredicts solubility of poorly soluble (S0 < 50 µmol·L−1) large molecules and greatly overpredicts solubility of highly soluble large molecules. The regression analysis of the three coefficients in Eq. 1 (0.5, − 1.0, − 0.01), using data partitioned into small and large molecule sets, resulted in notable differences between the two sets of coefficients, particularly in the first two terms (solvation contributions): (i) the 0.5 intercept in Eq. 1 was found to be − 0.28 for small molecules and − 1.77 for large molecules, and (ii) the log10 P slope factor, − 1.0, changed from − 0.83 to − 0.40 in small to large molecules, respectively [21]. The ABSOLV equation (trained with small molecules) revealed a different pattern of large-molecule residuals from that of the GSE: the solvation equation underpredicted the solubility of every large molecule tested. This was especially evident for very flexible molecules (e.g., gramicidin A, bryamycin, and vancomycin). The principal components analysis of the solubility database used to train the models revealed an asymmetric distribution in the data, resembling the shape of a ‘comet’, with small molecules symmetrically occupying the ‘head’ and large molecules (MW > 800 g·mol–1) exclusively occupying the ‘tail.’ Two hallmarks of bRo5 chemical space reside in the tail [21]: large size and large number of H-bond acceptors (NHA).
The above study [21] and earlier investigations by Caron and coworkers [15,16,17,18, 28] suggested that the influence of flexibility of large molecules on their solubility and permeability characteristics could be substantial. The latter researchers recommended the use of the Kier Φ molecular flexibility index [29] in modeling the properties of bRo5 molecules.
In our most recent solubility prediction study of bRo5 drugs, we discovered a way to incorporate the Kier molecular flexibility index, Φ, plus the Abraham B descriptor (H-bond acceptor strength) into Yalkowsky’s classic GSE to improve its performance substantially [22]. The three coefficients in Eq. 1 were empirically determined as smooth functions of the sum descriptor, Φ + B. The modified equation was named the ‘Flexible-Acceptor’ model, GSE(Φ,B). It was trained with small (Ro5) molecules to predict the solubility of large (bRo5) molecules (not used in the training). With just three coefficients in Eq. 1, each defined as a three-parameter exponential function of Φ + B, the strength of prediction nearly matched that of the RFR machine learning method. The coefficient of log10 P (traditionally fixed at -1.0) changed smoothly from − 1.1 for rigid nonionizable molecules (Φ + B = 0) to − 0.39 for typically flexible (Φ ~ 20, B ~ 6) large molecules. The intercept (usually fixed at + 0.5) varied smoothly from + 1.9 for rigid small molecules to − 2.2 for flexible large molecules. The mp coefficient remained practically constant, slightly different from the traditional value (− 0.01) for most molecules. For a test set of 32 large molecules the GSE(Φ,B) predicted the intrinsic solubility with RMSE of 1.10 log unit, compared to 3.0 by GSE(classic), and 1.07 by RFR.
Since our last study, it was found that the GSE (Φ,B) appears to work well not only for large drugs, but across a wide range of sizes of molecules. This piqued our interest to direct the new solubility prediction equation to recently-approved drugs (2016–2020), which comprise both the bRo5 and Ro5 molecules. For comparison, the GSE(classic), ABSOLV, GSE(Φ,B), and RFR models were each applied to predict the intrinsic solubility of 72 new drugs, for which useable reported solubility values could be accessed, nearly 60% from FDA published New Drug Application (NDA) reports [1,2,3,4,5,6,7]. The method performances ranked: RFR ~ GSE (Φ,B) > ABSOLV > GSE(classic). The performance of the GSE (Φ,B) was almost as good as that of the RFR. However, when the GSE (Φ,B) and ABSOLV methods were averaged, the resulting consensus model slightly outperformed the RFR method.
2 Computational Methods and Data Sources
2.1 Thermodynamic Basis of the General Solubility Equation (GSE)
Yalkowsky and coworkers developed the General Solubility Equation (GSE), Eq. 1, to predict the solubility of liquid/solid nonelectrolytes (mostly industrial organic chemicals) in water [23, 30,31,32,33,34,35]. The thermodynamic basis of the equation posits that the dissolution of a crystalline substance in water comprises two main contributions: (a) crystal lattice effect (XTL), related to the energy needed to break down the lattice to form a hypothetical ‘supercooled liquid’ (SCL), and (b) solvation effect, related to the energy released as the SCL dissolves in water. The total solubility of the compound in water is the product of the above two contributions, which in logarithmic terms can be stated as the sum [33, 34]
2.1.1 Crystal Lattice Effect
The lattice contribution, \(\log_{10} S_{{\text{W}}}^{{{\text{XTL}}}} = - {{\Delta S_{m} \left( {T_{m} - T} \right)} \mathord{\left/ {\vphantom {{\Delta S_{m} \left( {T_{m} - T} \right)} {2.303RT}}} \right. \kern-\nulldelimiterspace} {2.303RT}}\), arises from the application of the van’t Hoff equation, where ∆Sm (kJ·mol–1·K–1) is the standard molar entropy of phase transformation and Tm is the melting point (K). For many small organic compounds, ∆Sm ≈ 0.057 kJ·mol–1·K–1 [33, 34]. Since at 25 °C, 2.303 RT = 5.7 kJ·mol–1·K–1, Eq. 2 reduces to Eq. 3, where mp is the melting point in °C.
2.1.2 Solvation Effect
Hansch and coworkers [36] demonstrated that log10 S of 156 simple liquid solutes correlated linearly with the octanol–water partition coefficients, \(\log_{10} P \approx \log_{10} \left( {{{S_{{{\text{oct}}}}^{{{\text{liq}}}} } \mathord{\left/ {\vphantom {{S_{{{\text{oct}}}}^{{{\text{liq}}}} } {S_{{\text{W}}}^{{{\text{liq}}}} }}} \right. \kern-\nulldelimiterspace} {S_{{\text{W}}}^{{{\text{liq}}}} }}} \right)\). This led to the approximation:
where \({\text{a}}_{0} = \log_{10} S_{{{\text{oct}}}}^{{{\text{liq}}}}\) (solubility of a liquid solute in octanol) and a1 ≈ − 1. For small alcohol, aromatic, and alkane solutes, the series-dependent a0 intercepts were determined as: + 0.93, + 0.34, − 0.25, respectively. The a1 slope factors varied less: − 1.1 (alcohols), − 1.0 (aromatics), and − 1.2 (alkanes).
Yalkowsky and coworkers surmised that \({\text{a}}_{0} = \log_{10} S_{{{\text{oct}}}}^{{{\text{SCL}}}}\) = 0.5 [23]. The entropy of mixing favors complete miscibility of the two liquids (liquid solute and octanol); i.e., the mole fraction = 0.5. Since the concentration of pure octanol is 6.32 mol·L−1, then \(\log_{10} S_{{{\text{oct}}}}^{{{\text{liq}}}}\) = log10 (6.32 × 0.5) = 0.5. With this approximation (and with a1 = − 1), Eq. 4 substituted into Eq. 3 reduces to Eq. 1.
These fundamental considerations suggest that the traditional Eq. 1 could be adapted to compounds from the bRo5 chemical space, since Hansch’s research hinted that the three coefficients in Eq. 1 could be optimized to different classes of compounds. If the ‘supercooled liquid’ form of a large polar solute is not fully miscible with octanol, then the \(\log_{10} S_{{{\text{oct}}}}^{{{\text{SCL}}}}\) contribution could very well be a negative number. Hence, a large molecule with a decreased \(S_{{{\text{oct}}}}^{{{\text{SCL}}}}\) (due to decreased miscibility) is expected to have an increased \(S_{{\text{W}}}^{{{\text{SCL}}}}\). This, in effect, would lessen the contribution of lipophilicity to the predicted solubility.
2.2 ‘Flexible-Acceptor’ General Solubility Equation, GSE(Φ,B)
In our earlier investigation [22] it was found that molecular flexibility (Φ) [29] could be incorporated into a nonlinear variant of the GSE to produce a promising trainable model with improved accuracy in predicting the solubility of large molecules (MW > 800 g·mol–1). Further incremental improvements were achieved with an augmented second descriptor: Abraham’s H-bond basicity (B), a measure of H-bond acceptor potential [24, 27]. The derived GSE(Φ,B) has the general form, with the three c-coefficients treated as three-parameter exponential functions of (Φ + B):
The c-coefficients at aggregated values of Φ + B were determined by partial least squares (PLS open-source package from https://cran.r-project.org/web/packages/pls) analysis of solubility data sorted on values of Φ + B and uniformly binned into groupings of 209–1384 points. The details of the PLS procedure have been already described [22]. Since our database of solubility values has increased in size since our last study and since the focus now is on new drugs rather than specifically on big drugs, a new set of b-constants was determined in the current investigation, using druglike molecules as the training set, but excluding new drugs from the training.
Kier [29] constructed (considering structural attributes such as counts of chains, rings, branches, and heavy atoms) the molecular flexibility index, Φ, as the product of first and second order ‘kappa’ shape indices, 1k and 2k, divided by the heavy atom count in the molecule. Here, values of Φ were calculated from the two kappa and the heavy atom count descriptors provided by the Landrum’s RDKit open-source chemoinformatics library [37]. Table 1 lists these Φ values.
2.3 Abraham Descriptors and the ABSOLV Linear Model
To account for the thermodynamics of solute transfer from one phase to another, Abraham [24, 27] introduced five solvation descriptors: A, B, Sπ, E, and V. Two of these constitute hydrogen bonding potential: A is the sum of H-bond acidity (donor strength) and B is the sum of H-bond basicity (acceptor strength) in the molecule. Sπ is the dipolarity/polarizability (subscripted so as not to confuse it with solubility), E is an excess molar refraction in units of (cm3·mol−1)/10, and V is the McGowan characteristic volume in units of (cm3·mol−1)/100. Since large molecules have greater number of H-bond acceptors, compared to small molecules [21], the Abraham B descriptor was selected to augment the Φ descriptor to further improve solubility prediction in the bRo5 chemical space [22]. Values of the Abraham descriptors were calculated from 2D structures using the ABSOLV algorithm [27] (cf., www.acdlabs.com) and are listed in Table 1 for the new drugs.
Abraham and Le [24] amended the ABSOLV model to predict intrinsic solubility (log molar):
The independent variables are the five solute descriptors, plus the cross product of the H-bond terms. The seven d-coefficients were determined by PLS regression, using the training set database, exclusive of the new drugs set.
2.4 Statistical Machine Learning Random Forest Regression (RFR) Model
The implementation of the RFR open-source ‘randomForest’ library for the R statistical software has been described in our earlier solubility prediction studies [20,21,22]. The version used was downloaded from https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm. The method works by constructing an ensemble of hundreds of decision trees employing about 200 RDKit-generated molecular descriptors. The same procedure was applied in the current study. The method was re-trained with the enlarged database, excluding the newly-approved drugs.
2.5 Sources of Solubility Data for the Test (New Drugs) and Training (Wiki-pS 0 Database) Sets
The annual mini-reviews of FDA drug approvals by Mullard [1,2,3,4,5] were convenient starting points to identify the new drugs and to begin the search for their solubility values. The data for the newly-approved drugs were wearisome to locate. Since the drugs are relatively new, there are not many journal publications reporting properties of the compounds. Most of the data were found in FDA documents. As part of the New Drug Application (NDA) process, the FDA Center for Drug Evaluation and Research (CDER, www.accessdata.fda.gov) publishes reports listing some properties of compounds under consideration (review documents: Product Quality, Quality Assessment, Multi-Discipline, Clinical Pharmacology and Biopharmaceutics, and Other). Unfortunately, sometimes the information about solubility is redacted in these reports. Other useful sources include Product Monographs, Highlights of Prescription Information, and Safety Data Sheets. Some solubility data were found in patents. The European Medicines Agency (EMA) publishes Assessment Reports. The Australian regulatory agency publishes Australian Public Assessment Reports (AUSPAR), as well as Australian Product Information documents. These potential sources of measured solubility data were searched with the ‘solub’ key.
Generally, there was virtually no experimental detail about the measurements in the published regulatory reports. Most of the reported solubility values are of drugs in water (SW), without mention of the saturation pH. The temperature was assumed to be 23 °C when not stated or when reported as ‘room temperature’ (Table 1). In the dearth of experimental detail, it is a challenge to assess the quality of the reported measurements in most of the FDA/EMA/AUSPAR reports. Still, there are high quality data in some of the documents, where solubility measurements were published as a function of pH. Examples of some of these are presented below.
Of the 169 small-molecule NMEs approved in the 5-year period, 98 quantitative solubility measurements were found for only 72 NMEs [38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115]. The reported values were transformed into the intrinsic solubility scale, S0, using known (or predicted) pKa values, and adjusted to 25 °C [116] using the program pDISOL-X (in-ADME Research) [117,118,119,120,121,122]. Table 1 lists the normalized solubility data, along with the pKa values used in the data analysis.
The Wiki-pS0 (in-ADME Research) intrinsic aqueous solubility database of mostly druglike molecules (currently with 7190 deeply-curated entries) was used to train the ABSOLV, GSE(Φ,B) and RFR models. Several hundred values from the database have already been published [20,21,22, 116,117,118,119,120,121,122,123,124,125,126], and the entire database is currently being prepared for publication as a book. The newly-approved drugs were used as external test sets and were excluded from the training process.
The structures of the 72 new drugs considered here (along with the year of approval) are shown in the Appendix (Fig. 9). In dual-API drug products, each API was treated as a separate ‘drug’ in the data analysis.
2.6 Sources of Octanol–Water Partition Coefficients (log10 P) and Melting Points (mp)
Originally in Eq. 1, mp and log10 P, were taken to be experimental values. However, it has become a common practice to use calculated values, clogP, in place of measured log10 P. In this study, clogP values were in all cases calculated by the Wildman–Crippen sum of atomic contributions method in the open-source RDKit chemoinformatics library [37]. Experimental mp values were employed where available and were calculated otherwise [127]. Values of mp are difficult to predict accurately. Prediction studies suggest root-mean-square error of about 35 °C. From this, the mp contribution to calculated log S could be uncertain by ~ 0.4 log10 unit. Some uncertainty lingers even with tabulated experimental values, as it is sometimes unclear whether a particular mp value refers to a salt form or a free-acid/base form of the compound.
3 Results and Discussion
3.1 Data Reduction
For half of the new drugs, log10 S0 values in Table 1 were determined from reported SW values, using pDISOL-X. The program also calculated the pH of the saturated solution, as though the Henderson–Hasselbalch (HH) equation were valid. When aggregates/complexes form or when supersaturation persists in the suspension, the HH equation does not accurately predict the shape of the log10 S–pH curve [118,119,120,121,122]. There is no simple way to recognize such anomalies just from a single SW measurement.
The remaining log10 S data were sourced at two or more values of pH, which generally allowed for more confident determinations of log10 S0. These ‘raw’ log10 S–pH measurements required further data reduction and normalization. For ionizable molecules, the pKa values are required for such analysis. In cases where measured pKa values could not be found, they were calculated using the ChemAxon MarvinSketch v5.3.7 program (ChemAxon Ltd., https://www.chemaxon.com), as indicated by italic values in Table 1. In a few cases, it was possible to determine pKa values directly in the analysis of the log10 S–pH profiles (bold values in Table 1).
Examples of quality experimental log10 S–pH profiles reported for some of the new drugs are shown in Fig. 1. Frames a–c are of bases (acalabrutinib, pexidartinib, upadacitinib); frame d is that of an acid (dolutegravir); frame e is that of an ampholyte (talazoparib). The data from these five drugs appeared to follow shapes predicted by the Henderson–Hasselbalch equation: it was possible not only to determine the best-fit log10 S0, but also the values of pKa (in five cases) and the pKsp (in two cases). When profiles deviate from expected shapes, it may be possible to assess (and to correct for) the degree to which the measurements may be supersaturated or if aggregates/complexes are forming [118,119,120,121,122]. Figure 1f (safinamide) shows such an example of anomaly, where at pH 4.5, the solubility is higher than that expected for a solution saturated in the free base. Since the solubility values at pH 1.2 and 4.5 are nearly the same, the suspension at pH 4.5 may have been supersaturated with respect to the charged form of the base during the measurement. Had only a single measurement been reported at pH 4.5, the intrinsic solubility might have been determined at an order of magnitude too high.
3.2 Comparing Properties of the Newly-Approved Drugs to Those in the Database Training Set
Figure 2 shows the distribution of intrinsic solubility values for the database training set and the NMEs test set. The new drugs on the average are nearly an order of magnitude less soluble. Figure 3 compares the properties used to evaluate whether a compound falls into Lipinski’s ‘Rule of 5’ chemical space. The lipophilicities (as indicated by clogP) of the new drugs on the average are nearly an order of magnitude higher than those of the older drugs (Fig. 3a). The mean molecular weight of the older drugs is just under 300 g·mol–1; it is 450 g·mol–1 for the new drugs (Fig. 3b). Whereas the distribution of H-bond donors is nearly the same in the two sets (Fig. 3c), the distribution of H-bond acceptors is quite different (Fig. 3d). On the average, the number of H-bond acceptors, NHA, is about 4 per molecule for older drugs and nearly 7 per molecule in the newly-approved drugs.
The new drugs violate the boundary conditions of Lipinski’s Ro5 more often than those in the training set. There are relatively more molecules with clogP > 5 in the new drugs set (15% of NMEs), compared to that of the training set (6%). For the new drugs, 28% of the substances have MW > 500 g·mol–1, compared to 7% in the training set. The relative number of NHD > 5 in the new drugs set (5%) is higher than in the training set molecules (2%). The relative number of NHA > 10 for the new drugs (9%) is greater than in the training set (3%).
The distributions of Φ values for the training and test sets are shown in Fig. 4. On the average, the new drugs are more flexible (mean Φ = 6.2) than the molecules in the training set (mean Φ = 4.3). The training set spans a wide range of Φ values, from 0.4 to 43. The newly-approved drugs subtend that space, with Φ values ranging from 1.9 to 24.
3.3 Determination of the Three GSE Coefficients from Training Set iso-(Φ + B) Bins
The training set solubility data were sorted by Φ + B into ten bins of increasing values. For a narrow range of Φ + B values in each bin, the three GSE coefficients in Eq. 5 were determined by linear PLS regression, in the way that Hansch et al. [36] had trained the GSE for different chemical classes of compounds. Table 2 lists the set of determined c-constants for each of the bins. The resultant c-constants are depicted by the points on the three curves in Fig. 5, displayed as a function of the average values of Φ + B from each bin. It is possible to recognize trends for the substantially decreasing c0, the steadily increasing c1, and the very slightly increasing c2 coefficients with increasing values of Φ + B. Apparently, crystal lattice contributions are not appreciably affected by molecular flexibility and H-bond acceptor character, and trend near the traditional value (− 0.01) in Eq. 1. Evidently, solubility dependence on flexibility and H-bond acceptor strength are mediated by solution-phase interactions [26]. The bin analysis results are summarized in Table 2.
From the thermodynamics considerations, the c0 coefficient may be viewed as a measure of the solubility of the ‘supercooled’ liquid solute in octanol (c0 ≈ \(\log_{10} S_{{{\text{oct}}}}^{{{\text{sliq}}}}\)). Increasingly flexible molecules with strong H-bond acceptor character appear to be less miscible with octanol, as suggested by the decreasing c0 coefficients with increasing Φ + B (cf., Table 2 and Fig. 5). Between bins 1 and 10, \(S_{{{\text{oct}}}}^{{{\text{sliq}}}}\) decreases by four orders of magnitude. Given that the c1 coefficient also changes with Φ + B, the precise thermodynamic interpretation of the c0 coefficient is less clear than in the classical derivation [23, 33, 34] where c1 is constant.
The points in Fig. 5 were fitted to exponential forms as functions of Φ + B to determine the b-parameters (Eqs. 6–8), using standard nonlinear least-squares methods. The resultant best-fit curves in Fig. 5 define the aggregated form of the GSE(Φ,B), in the final form with the nine b-parameters determined, as shown below.
3.4 ABSOLV Training
The d-coefficients in Eq. 9 were determined by PLS regression using the log10 S0 values from the Wiki-pS0 database, excluding those of the NMEs: r2 = 0.65, RMSE = 1.16, n = 7092.
3.5 Solubility Prediction Results for the Newly-Approved Drugs
3.5.1 Model Training
Figure 6 shows the results of the training of the four models, as measured log10 S0 vs. calculated log10 S0 correlation plots. The solid diagonals are identity lines. The dashed diagonals are ± 0.5 log10 unit displaced from the identity lines. The measure of prediction performance (MPP) is indicated by the pie-charts as the percentage of predicted values that are within ± 0.5 log10 unit of the observed values [128]. In the first three frames, the symbols represent the predominant charge states of molecules at pH 7.4: black diamonds represent uncharged molecules, blue squares represent bases (positive charged), red circles represent acids (negative charged), and yellow diamonds represent zwitterions. The zwitterions are less well predicted in the GSE model, compared to the ABSOLV model [20]. The Random Forest Regression (RFR) internal validation was applied to randomly-selected 30% of the database, based on training using the other 70% of the database (exclusive of new drugs). For molecules like those of the database, it is expected that their log10 S0 could be predicted with r2 = 0.90, RMSE = 0.62, with 76% of the molecules ‘correctly’ predicted (Fig. 6d). Generally, the other three methods are less precise, with MPP values ranging around half of the RFR value.
3.5.2 Model Testing
Figure 7 shows the results of the predictions of the solubility of the newly-approved drugs (external test sets) by the four models. Table 3 summarizes the results. Briefly, the four results look similar, as MPP values range from 28 to 39%. Note that the horizontal scale in Fig. 7a is quite different from those of the other frames. The GSE(classic) underperformed compared to the other methods. The Flexible-Acceptor model produced prediction metrics nearly equal to those of RFR. None of the methods produced RMSE < 1, which may be indicative of the uncertain quality of half of the new drugs solubility data reported as single-point values in water. On the other hand, RFR uncharacteristically overpredicted the solubility of drugs with log10 S0 < 7, which may hint that those molecules possessed structural features not found in the Wiki-pS0 training database. The GSE(Φ,B) also shows similar overpredictions.
The bias in the predicted results is near zero (− 0.06) in the RFR method. Both GSE methods show negative bias (− 0.33 and − 0.25), whereas the ABSOLV method produces a positive bias (+ 0.29). A consensus model was suggested by averaging the ABSOLV and GSE(Φ,B), to minimize the method bias. Figure 8 shows the results of the consensus model. Although the r2 (0.67) and RMSE (1.07) values in the consensus method match those of the RFR method, the MPP value (40%) and the bias (+ 0.02) in the consensus model are slight improvements.
3.5.3 More is Needed than Just Increasing the Size of the Training Set
The Wiki-pS0 database of druglike molecules has steadily grown over the last 10 years. Lately, it has been our observation that this alone has not proportionately improved its ability to predict the solubility of drugs. Metrics such as those in Fig. 6 have remained largely unchanged [20,21,22]. Solubility prediction depends on multi-dimensional factors (quality of measurements of both training and test sets, distribution of training set molecules in chemical space in relation to the tested drugs, sensitivity of descriptors used in prediction models, etc.), with some factors yet to be recognized. Simply increasing the size of the solubility training set may not lead to improved predictions. Lipinski has suggested that compiling a large physicochemical property database aimed at maximizing chemical diversity may be an inefficient strategy for predicting the properties of novel molecules, given the enormous size of the chemical space, and since drugs appear to exist there as small tight clusters [129]. However, small improvement in solubility prediction can be expected as the training set acquires additional measurements of regulatory newly-approved molecules on a regular basis—i.e., drawing from the “tight cluster” space. It would be helpful if the quality of such measurements were to improve with time. New descriptors which can better differentiate the factors affecting solubility also can be important for narrowing the gap between the accuracy of the prediction models and that of the experimental data.
4 Conclusion
Many of the new drugs are large and fall outside of the Lipinski Ro5 chemical space, as depicted in Fig. 3. It would have been helpful to have access to more quantitative solubility measurements of the newly-approved drugs than provided in the regulatory agency reports. The experimental uncertainty of nearly half of the new measurements could not be directly verified. If better practices in solubility measurement were adhered to, as detailed in the recent data-quality ‘white paper’ by experts from six countries [121], and the experimental details were more openly shared, newly-reported measurements could achieve results with interlaboratory SD < 0.2 log10 unit. But apparently this is work still in progress. The data quality in the curated database (SD < 0.2 log unit) used here as the training set is not the limiting factor in prediction, given that the best root-mean-square error achieved in this study was above a log unit. The benchmark statistical machine learning approaches are probably up to the task in narrowing the gap between prediction and measurement. The Flexible-Acceptor GSE(Φ,B) performed nearly as well as the benchmark Random Forest regression method in predicting the aqueous intrinsic solubility of the newly-approved drugs (2016–2020). A similar near-match had been previously reported by us in the prediction of the solubility of large (bRo5) drugs, supporting the general applicability of the Flexible-Acceptor model. A consensus model based on the average predictions of the ABSOLV and GSE(Φ,B) methods was found to reduce the prediction biases in the separate methods, but perhaps even more significant, it slightly outperformed the Random Forest regression method overall. The relatively-simple consensus model can be readily incorporated into spreadsheet calculations.
Abbreviations
- S 0 :
-
Intrinsic aqueous solubility (i.e., the solubility of the uncharged form of the compound)
- MPP:
-
Measure of prediction performance [128]. It refers to the percent of ‘correct’ predictions, as defined by the count of absolute residuals |log10 Sobs0 – log10 Scalc0| ≤0.5 divided by n. MPP is represented as a pie chart in the correlation plots.
- RMSE:
-
root-mean-square error, accounting for bias in the prediction of external test set solubility values: RMSE = [ (1/(n−1)) Σi (yobsi − bias − ycalci)]1/2, where y = log10 S0, n = number of measurements of log10 S0.
- r2 :
-
coefficient of determination, accounting for bias in prediction of external test set solubility values [130]: r2 = 1 − Σi (yobsi − bias − ycalci)2 /Σi (yobsi − <y>)2, where y = log10 S0, and <y> is the mean value of observed log10 S0.
- bias:
-
intercept (a) in the regression fit: yobs = a + b ycalc, where the slope factor (b) is fixed at unity.
- SD:
-
standard deviation: SD = [ (1/n) Σi (yobsi − <y>)]1/2, where n = number of measurements, <y> = mean value of log10 S0.
References
Mullard, A.: 2016 FDA drug approvals. FDA approval count fell last year, despite a steady regulatory filing rate. Nat. Rev. Drug Discov. 16, 73–76 (2017)
Mullard, A.: 2017 FDA drug approvals. The FDA approved 46 new drugs last year, the highest total in more than two decades. Nat. Rev. Drug Discov. 17, 81–85 (2018)
Mullard, A.: 2018 FDA drug approvals. The FDA approved a record 59 drugs last year, but the commercial potential of these drugs is lackluster. Nat. Rev. Drug Discov. 18, 85–89 (2019)
Mullard, A.: 2019 FDA drug approvals. The FDA approved 48 new drugs last year, keeping up the momentum of recent years. Nat. Rev. Drug Discov. 19, 79–84 (2020)
Mullard, A.: 2020 FDA drug approvals. The FDA approved 53 novel drugs in 2020, the second highest count in over 20 years. Nat. Rev. Drug Discov. 20, 85–90 (2021)
Kinch, M.S., Griesenauer, R.H.: 2017 in review: FDA approvals of new molecular entities. Drug Discov. Today. 23, 1469–1473 (2018)
Roskoski, R., Jr.: Properties of FDA-approved small molecule protein kinase inhibitors. Pharmacol. Res. 144, 19–50 (2019)
Lipinski, C.A., Lombardo, F., Dominy, B.W., Feeney, P.J.: Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 23, 3–25 (1997)
Leeson, P.D.: Molecular inflation, attrition & the rule of five. Adv. Drug Deliv. Rev. 101, 22–33 (2016)
Doak, B.C., Over, B., Giordanetto, F., Kihlberg, J.: Oral druggable space beyond the rule of 5: insights from drugs and clinical candidates. Chem. Biol. 21, 1115–1142 (2014)
Matsson, P., Doak, B.C., Over, B., Kihlberg, J.: Cell permeability beyond the rule of 5. Adv. Drug Deliv. Rev. 101, 42–61 (2016)
DeGoey, D.A., Chen, H.-J., Cox, P.B., Wendt, M.D.: Beyond the rule of 5: lessons learned from AbbVie’s drugs and compound collection. J. Med. Chem. 61, 2636–2651 (2018)
Bergström, C.A.S., Charman, W.N., Porter, C.J.H.: Computational prediction of formulation strategies for beyond-rule-of-5 compounds. Adv. Drug Deliv. Rev. 101, 6–21 (2016)
Krämer, S.D., Aschmann, H.E., Hatibovic, M., Hermann, K.F., Neuhaus, C.S., Brunner, C., Belli, S.: When barriers ignore the rule-of-five. Adv. Drug Del. Rev. 101, 62–74 (2016)
Ermondi, G., Vallaro, M., Goetz, G., Shalaeva, M., Caron, G.: Experimental lipophilicity for beyond Rule of 5 compounds. Future Drug. Discov. (2019). https://doi.org/10.4155/fdd-2019-0002
Ermondi, G., Vallaro, M., Goetz, G., Shalaeva, M., Caron, G.: Updating the portfolio of physicochemical descriptors related to permeability in the beyond the rule of 5 chemical space. Eur. J. Pharm. Sci. 146, 105274 (2020). https://doi.org/10.1016/j.ejps.2020.105274
Caron, G., Kihlberg, J., Ermondi, G.: Intramolecular hydrogen bonding: an opportunity for improved design in medicinal chemistry. Med. Res. Rev. 39, 1707–1729 (2019). https://doi.org/10.1002/med.21562
Caron, G., Digiesi, V., Solaro, S., Ermondi, G.: Flexibility in early drug discovery: focus on the beyond-rule-of-5 chemical space. Drug Discov. Today 25, 621–627 (2020). https://doi.org/10.1016/j.drudis.2020.01.012
Carrupt, P.A., Testa, B., Bechalany, A., el Tayar, N., Descas, P., Perrissoud, D.: Morphine 6-glucuronide and morphine 3-glucuronide as molecular chameleons with unexpected lipophilicity. J. Med. Chem. 34, 1272–1275 (1991)
Avdeef, A.: Prediction of aqueous intrinsic solubility of druglike molecules using random forest regression trained with Wiki-pS0 database. ADMET & DMPK 8, 29–77 (2020). https://doi.org/10.5599/admet.766
Avdeef, A., Kansy, M.: Can small drugs predict the intrinsic aqueous solubility of ‘beyond rule of 5’ big drugs? ADMET & DMPK (2020). https://doi.org/10.5599/admet.794
Avdeef, A., Kansy, M.: Flexible-acceptor general solubility equation for beyond rule of 5. Drugs. Mol. Pharm. 17, 3930–3940 (2020). https://doi.org/10.1021/acs.molpharmaceut.0c00689
Yalkowsky, S.H., Valvani, S.C.: Solubility and partitioning I: Solubility of nonelectrolytes in water. J. Pharm. Sci. 69, 912–922 (1980)
Abraham, M.H., Le, J.: The correlation and prediction of the solubility of compounds in water using an amended solvation energy relationship. J. Pharm. Sci. 88, 868–880 (1999)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Hughes, L.D., Palmer, D.S., Nigsch, F., Mitchell, J.B.O.: Why are some properties more difficult to predict than others? A study of QSPR models of solubility, melting point, and log P. J. Chem. Inf. Model. 48, 220–232 (2008)
Platts, J.A., Butina, D., Abraham, M.H., Hersey, A.: Estimation of molecular linear free energy relation descriptors using a group contribution approach. J. Chem. Inf. Comput. Sci. 39, 835–845 (1999)
Ermondi, G., Poongavanam, V., Vallaro, M., Kihlberg, J., Caron, G.: Solubility prediction in the bRo5 chemical space: where are we right now? ADMET & DMPK (2020). https://doi.org/10.5599/admet.834
Kier, L.B.: An index of molecular flexibility from kappa shape attributes. Quant. Struct.-Act. Relat. 8, 221–224 (1989)
Yalkowsky, S.H., Banerjee, S.: Aqueous Solubility: Methods of Estimation for Organic Compounds, p. 142. Marcel Dekker Inc, New York (1992)
Alantari, D., Yalkowsky, S.: Comments on prediction of the aqueous solubility using the general solubility equation (GSE) versus a genetic algorithm and a support vector machine model. J. Pharm. Dev. Technol. 23, 739–740 (2018)
Ran, Y., Yalkowsky, S.H.: Prediction of drug solubility by the general solubility equation. J. Chem. Inf. Comput. Sci. 41, 354–357 (2001)
Jain, N., Yalkowsky, S.H.: Estimation of the aqueous solubility I: application to organic nonelectrolytes. J. Pharm. Sci. 90, 234–252 (2001)
Ran, Y., Jain, N., Yalkowsky, S.H.: Prediction of aqueous solubility of organic compounds by the general solubility equation (GSE). J. Chem. Inf. Comput. Sci. 41, 1208–1217 (2001)
Jain, N., Yang, G., Machatha, S.G., Yalkowsky, S.H.: Estimation of the aqueous solubility of weak electrolytes. Int. J. Pharm. 319, 169–171 (2006)
Hansch, C., Quinlan, J.E., Lawrence, G.L.: Linear free-energy relationship between partition coefficients and the aqueous solubility of organic liquids. J. Org. Chem. 33, 347–350 (1968)
Landrum, G., Lewis, R., Palmer, A., Stiefl, N., Vulpetti, A.: Making sure there's a give associated with the take: producing and using open-source software in big pharma. J. Cheminformatics 3, 1–1 (2011). http://www.rdkit.org/. Accessed 18 Jan 2022
Eli Lilly and Company. Prod. Monogr. Incl. Patient Med. Info. VERZENIO® (Abemaciclib mesylate). http://pi.lilly.com/ca/verzenio-ca-pm.pdf. Accessed 31 Jul 2020
Blatter, F.; Ingallinera, T.; Barf, T.; Aret, E.; Krejsa, C.; Evarts, J.: Crystal forms of (S)-4-(8-Amino-3-(1-(but-2-ynoyl)pyrrolidin-2-yl)imidazo[1,5-A]pyrazin-1-yl)-N-(pyridin-2-yl)benzamide. US 9,796,721 B2.
Pepin, X.J.H., Sanderson, N.J., Blanazs, A., Grover, S., Ingallinera, T.G., Mann, J.C.: Bridging in vitro dissolution and in vivo exposure for acalabrutinib. Part I. Mechanistic modelling of drug product dissolution to derive a P-PSD for PBPK model input. Eur. J. Pharm. Biopharm. 142, 421–434 (2019)
Food and Drug Administration (USA): Alpelisib (Piqray), Novartis pharmaceuticals NDA 212526Orig1s000. https://www.accessdata.fda.gov/drugsatfda_docs/nda/2019/212526Orig1s000MultidisciplineR.pdf. Accessed 18 Jan 2022
European Medicines Agency: Apalutamide (Erleada®), CHMP assessment report, Procedure No. EMEA/H/C/004452/0000. https://www.ema.europa.eu/en/documents/assessment-report/erleada-epar-public-assessment-report_en.pdf. Accessed 15 Nov 2018
Zhang, W.-P., Chen, D.-Y.: Crystal structures and physicochemical properties of amisulpride polymorphs. J. Pharm. Biomed. Anal. 140, 252–257 (2017). https://doi.org/10.1016/j.jpba.2017.03.030
Xu, R., Han, T., Shen, L., Zhao, J., Lu, X.: Solubility determination and modeling for artesunate in binary solvent mixtures of methanol, ethanol, isopropanol, and propylene glycol + water. J. Chem. Eng. Data 64, 755–762 (2019)
Food and Drug Administration (USA): Avapritinib (Ayvakit), Blueprint meds. Corp. 2019 NDA 212608Orig1s000. https://www.accessdata.fda.gov/drugsatfda_docs/nda/2020/212608Orig1s000ChemR.pdf. Accessed 18 Jan 2022
Genentech, Inc. Safety Data Sheet: Baloxavir marboxil (Xofluza). https://www.gene.com/download/pdf/XOFLUZATablets40mgSAPSDS2.pdf. Accessed 16 Oct 2018
Food and Drug Administration (USA): Baloxavir marboxil (Xofluza). NDA 210854Orig1s000, CDER Quality Assessment Review. Applicant: Shionogi. Accessed 18 Sep 2018.
Sobrinho, J.M.S., Soares, M.F.R., Labandeira, J.J.T., Alves, L.D.S., Neto, P.J.R.: Improving the solubility of the antichagasic drug benznidazole through formation of inclusion complexes with cyclodextrins. Quim. Nova 34, 1534–1538 (2011)
Australian Public Assessment Report for brivaracetam. Proprietary Product Name: Briviact. Sponsor: UCB Australia Pty Ltd. 2017. https://www.tga.gov.au/sites/default/files/auspar-brivaracetam-170307.pdf. Accessed 18 Jan 2022
Food and Drug Administration (USA): Capmatinib (Tabrecta), Novartis Ringaskiddy Pharma Ltd. 2017 NDA 213591Orig1s000. https://www.accessdata.fda.gov/drugsatfda_docs/nda/2020/213591Orig1s000ChemR.pdf. Accessed 18 Jan 2022
Otsuka Pharmaceutical Co., Ltd. Decitabine + Cedazuridine (Inqovi). Product monograph. 3 Jul 2020; https://www.taihopharma.ca/documents/31/INQOVI_Product_Monograph.pdf. Accessed 18 Jan 2022
Food and Drug Administration (USA): Cenobamate (Xcopri), SK Life Science, Inc. NDA 212839Orig1s000. https://www.accessdata.fda.gov/drugsatfda_docs/nda/2019/212839Orig1s000OtherR.pdf. Accessed 18 Jan 2022
Freundlieb, J.; Jacobs, T.: Formulations of copanlisib. Bayer Pharma AG Patent Application Publication: US 2020/0281932 A1. 10 Sep 2020. https://uspto.report/patent/app/20200281932. Accessed 18 Jan 2022
Fantini, A., Demurtas, A., Nicoli, S., Padula, C., Pescina, S., Santi, P.: In vitro skin retention of crisaborole after topical application. Pharmaceutics 12, 491 (2020). https://doi.org/10.3390/pharmaceutics12060491
Pfizer Canada ULC. Product Monograph: Dacomitinib (Vizimpro). Accessed 22 Feb 2019.
Food and Drug Administration (USA): Darolutamide (Nubeqa), Bayer HealthCare Pharmaceuticals Inc. NDA 212099Orig1s000. https://www.accessdata.fda.gov/drugsatfda_docs/nda/2019/212099Orig1s000ChemR.pdf. Accessed 18 Jan 2022
Food and Drug Administration (USA): Dolutegravir <GSK1349572A>, GSK. 211994Orig1s000. https://www.accessdata.fda.gov/drugsatfda_docs/nda/2019/211994Orig1s000OEList.pdf. Accessed 18 Jan 2022
Gigante, V., Pauletti, G.M., Kopp, S., Xu, M., Gonzalez-Alvarez, I., Merino, V., McIntosh, M.P., Wessels, A., Lee, B.-J., Rezende, K.R., Scriba, G.K.E., Jadaun, G.P.S., Bermejo, M., M.: Global testing of a consensus solubility assessment to enhance robustness of the WHO biopharmaceutical classification system. ADMET and DMPK (2020). https://doi.org/10.5599/admet.850
Bleasby, K., Fillgrove, K.L., Houle, R., Lu, B., Palamanda, J., Newton, D.J., Lin, M., Chan, G.H., Sanchez, R.I.: In vitro evaluation of the drug interaction potential of doravirine. Antimic. Agents Chemother. 63, 1–12 (2019)
Rong, W.-T., Lu, Y.-P., Tao, Q., Guo, M., Lu, Y., Ren, Y., Yu, S.-Q.: Hydroxypropyl-sulfobutyl-β-cyclodextrin improves the oral bioavailability of edaravone by modulating drug efflux pump of enterocytes. J. Pharm. Sci. 103, 730–742 (2014)
Parikh, A., Kathawala, K., Tan, C.C., Garg, S., Zhou, X.-F.: Development of a novel oral delivery system of edaravone for enhancing bioavailability. Int. J. Pharm. 515, 490–500 (2016)
Zeng, J., Ren, Y., Zhou, C., Yu, S., Chen, W.-H.: Preparation and physicochemical characteristics of the complex of edaravone with hydroxypropyl-?-cyclodextrin. Carbohydr. Polym. 83, 1101–1105 (2011)
European Medicines Agency: Zepatier® (elbasvir / grazoprevir) CHMP assessment report. Procedure No. EMEA/H/C/004126/0000. https://www.ema.europa.eu/en/documents/assessment-report/zepatier-epar-public-assessment-report_en.pdf. Accessed 26 May 2016
Celgene Inc. (Canada). Product Monograph: Enasidenib mesylate (Idhifa). Accessed 5 Feb 2019.
Food and Drug Administration (USA): Entrectinib(Rozlytrek), Genentech NDA 212725Orig1s000 & 212726Orig1s000. https://www.accessdata.fda.gov/drugsatfda_docs/nda/2019/212725Orig1s000,%20212726Orig1s000MultidisciplineR.pdf
Food and Drug Administration (USA): Erdafitinib(Balversa), Jansseb Biotech Inc. NDA 212018Orig1s000. https://www.accessdata.fda.gov/drugsatfda_docs/nda/2019/212018Orig1s000ChemR.pdf. Accessed 18 Jan 2022
Food and Drug Administration (USA): Fedratinib (Inrebic), Impact Biomedicines, Inc. NDA 212327Orig1s000. https://www.accessdata.fda.gov/drugsatfda_docs/nda/2019/212327Orig1s000ChemR.pdf. Accessed 18 Jan 2022
Helsinn Healthcare SA. Highlights of Prescribing Information: Fosnetupitant Dihydrochloride (in AKYNZEO). NDA 210–493 (2018). https://www.accessdata.fda.gov/drugsatfda_docs/label/2018/210493s000lbl.pdf. Accessed 18 Jan 2022
Food and Drug Administration (USA): Gilteritinib (Xospata). NDA 211349Orig1s000, CDER Quality Assessment Review. Applicant: Astellas Pharma US, Inc. 26 Nov 2018.
Pfizer, Inc. Highlights of prescribing information: Glasdegib (Daurismo). Nov 2018.
Food and Drug Administration (USA): Istradefylline (Nourianz), Kyowa Kirin, Inc. NDA 022075Orig1s000. https://www.accessdata.fda.gov/drugsatfda_docs/nda/2019/022075Orig1s000ChemR.pdf. Accessed 18 Jan 2022
O’Neil, M.J., Heckelman, P.E., Dobbelaar, P.H., Roman, K.J. (eds.): The Merck Index: An Encyclopedia of Chemicals, Drugs, and Biologicals, 15th edn. The Royal Society of Chemistry (2013)
Crasto, A.M.: Drug Approvals International. http://drugapprovalsint.com/lifitegrast/. Accessed 18 Jan 2022
Food and Drug Administration (USA): Lonafarnib (Zokinvy), Eiger Biopharmaceuticals Inc. NDA 213969Orig1s000. https://www.accessdata.fda.gov/drugsatfda_docs/nda/2020/213969Orig1s000ChemR.pdf. Accessed 18 Jan 2022
Food and Drug Administration (USA): Macimorelin acetate (Macrilen). NDA 205598Orig1s000, CDER Quality Assessment Review. Applicant: Aeterua Zentaris. 19 Oct 2017.
Garcia, J.M., Swerdloff, R., Wang, C., Kyle, M., Kipnes, M., Biller, B.M.K., Cook, D., Yuen, K.C.J., Bonert, V., Dobs, A., Molitch, M.E., Merriam, G.T.: Macimorelin (AEZS-130)-stimulated growth hormone (GH) test: validation of a novel oral stimulation test for the diagnosis of adult GH deficiency. J. Clin. Endocrinol. Metab. 98, 2422–2429 (2013)
Zhou, Z., Du, S., Wang, T., Wu, S., Guo, Z., Wang, Z., Zhou, L.: Measurement and correlation of solubility of meropenem trihydrate in binary (water + acetone/tetrahydrofuran) solvent mixtures. Chin. J. Chem. Eng. 25, 1461–1466 (2017)
Medicines Development for Global Health (Australia) (2018). Highlights of prescribing information: Moxidectin (Daurismo). https://www.accessdata.fda.gov/drugsatfda_docs/label/2018/210867lbl.pdf. Accessed 18 Jan 2022
Food and Drug Administration (USA): Naldemedine (Symproic). NDA 208854Orig1s000, CDER Quality Assessment Review. Applicant: Shionogi Inc. 13 Jan 2017.
Puma Biotechnology, Inc. Highlights of prescribing information: Neratinib (Nerlynx). Jul 2017.
Food and Drug Administration (USA): Nifurtimox (Lampit), Bayer Healthcare Pharmaceuticals, Inc. NDA 213464Orig1s000. https://www.accessdata.fda.gov/drugsatfda_docs/nda/2020/213464Orig1s000ChemR.pdf. Accessed 18 Jan 2022
Tesaro, Inc. Highlights of prescribing information: Niraparib (Zejula). Mar 2017.
Lancaster, R.G.; Olmstead, K.K.; Kagihiro, M.; Matono, M.; Taoka, I.; Pruzanski, M.; Shapiro, D.; Hooshmand-Rad, R.; Pencek, R.; Sciacca, C.; Eliot, L.; Edwards, J.; MacConell, L.A.; Marmon, T.K.: Compositions of obeticholic acid and methods of use. PTO US 2020/0054650 A1. Feb. 20, 2020.
Food and Drug Administration (USA): Ozanimod(Zeposia), Celgene Corp. 2019 NDA 209899Orig1s000. https://www.accessdata.fda.gov/drugsatfda_docs/nda/2020/209899Orig1s000ChemR.pdf. Accessed 18 Jan 2022
Food and Drug Administration (USA): Pemigatinib(Pemazyre), Incyte Corp. 2019 NDA 213736Orig1s000. https://www.accessdata.fda.gov/drugsatfda_docs/nda/2020/213736Orig1s000ChemR.pdf. Accessed 18 Jan 2022
Food and Drug Administration (USA): Pexidartinib (Turalio), Daiichi Sankyo Inc. NDA 211810Orig1s000. https://www.accessdata.fda.gov/drugsatfda_docs/nda/2019/211810Orig1s000MultidisciplineR.pdf. Accessed 18 Jan 2022
Food and Drug Administration (USA): Pitolisant (Wakix), Bioprojet Pharma NDA 211150Orig1s000. https://www.accessdata.fda.gov/drugsatfda_docs/nda/2019/211150Orig1s000ChemR.pdf. Accessed 18 Jan 2022
Food and Drug Administration (USA): Pralsetinib (Gavreto). Blueprint Medicines Corp. NDA 213721. Product Quality Review(s). 5 Aug 2020. https://www.accessdata.fda.gov/drugsatfda_docs/nda/2020/213721Orig1s000ChemR.pdf. Accessed 18 Jan 2022
Food and Drug Administration (USA): Pretomanid, The Global Alliance for TB Drug Development. NDA 212862Orig1s000. https://www.accessdata.fda.gov/drugsatfda_docs/nda/2019/212862Orig1s000ChemR.pdf. Accessed 18 Jan 2022
Food and Drug Administration (FDA). Highlights of prescribing information. Relugolix (Orgovyx®) Myovant Sciences, Inc. label (Dec 2020). https://www.accessdata.fda.gov/drugsatfda_docs/label/2020/214621s000lbl.pdf. Accessed 18 Jan 2022
Yu, K.; Chen, S.; Amgoth, C.; Tang, G.; Bai, H.; Hu, X.: Two polymorphs of remdesivir: crystal structure, solubility, and pharmacokinetic study. Cryst. Eng. Comm. (2021). https://www.rsc.org/suppdata/d1/ce/d1ce00175b/d1ce00175b1.pdf. Accessed 18 Jan 2022
Huang, H.; Zhou, G.; Shang, G.; et al. Hydrobromate of benzodiazepine derivative, preparation method and use thereof. Chengdu Brilliant Pharmaceutical Co., Ltd. Eur. Patent Applic. EP 3 553 059 A1 (2019).
Sirvent, J.A., Lücking, U.: Novel pieces for the emerging picture of sulfoximines in drug discovery: synthesis and evaluation of sulfoximine analogues of marketed drugs and advanced clinical candidates. Chem. Med. Chem. 12, 487–501 (2017). https://doi.org/10.1002/cmdc.201700044
Samant, T.S., Dhuria, S., Lu, Y., Laisney, M., Yang, S., Grandeury, A., Mueller-Zsigmondy, M., Umehara, K., Huth, F., Miller, M., Germa, C., Elmeliegy, M.: Ribociclib bioavailability is not affected by gastric pH changes or food intake: in silico and clinical evaluations. Clin. Pharmacol. Ther. 104, 374–383 (2018)
Food and Drug Administration (USA): Ripretinib (Qinlock), Deciphera Pharmaceuticals, LLC. 2020 NDA 213973Orig1s000. https://www.accessdata.fda.gov/drugsatfda_docs/nda/2020/213973Orig1s000RiskR.pdf
Genentech, Inc. Safety Data Sheet: Risdiplam (Evrysdi). https://www.gene.com/download/pdf/EVRYSDIRisdiplam0.75mgpermlSDS.pdf. Accessed 16 June 2020
Food and Drug Administration (USA): Safinamide (Xadago). NDA 207145Orig1s000, CDER Quality Assessment Review. Applicant: Newron Pharmaceuticals. 29 Dec 2014.
Food and Drug Administration (USA): Secnidazole (Solosec). NDA 209363Orig1s000, CDER Quality Assessment Review. Applicant: Symbiomix Therapeutics, LLC. 27 Jul 2017.
Rivera, A.B., Hernández, R.G., de Armas, H.N., Elizástegi, D.M.C., Losada, M.V.: Physico-chemical and solid-state characterization of secnidazole. Il Farmaco 55, 700–707 (2000)
Food and Drug Administration (USA): Selinexor (Xpovio), Karyopharm Therapeutics Inc. NDA 212306Orig1s000. https://www.accessdata.fda.gov/drugsatfda_docs/nda/2019/212020Orig1s000ChemR.pdf. Accessed 18 Jan 2022
Mogalian, E., German, P., Kearney, B.P., Yang, C.Y., Brainard, D., Link, J., McNally, J., Hane, L., Ling, J., Mathias, A.: Preclinical pharmacokinetics and first-in-human pharmacokinetics, safety, and tolerability of velpatasvir, a pangenotypic HCV NS5A inhibitor, in healthy subjects. Antimicrob. Agents Chemother. 61, e02084-e2116 (2017). https://doi.org/10.1128/AAC.02084-16
Food and Drug Administration (USA): Selumetinib <Koselugo>, AstraZeneca NDA213756Orig1s000. https://www.accessdata.fda.gov/drugsatfda_docs/nda/2020/213756Orig1s000ChemR.pdf. Accessed 18 Jan 2022
Leijen, S., Soetekouw, P.M.M.B., Evans, T.R.J., Nicolson, M., Schellens, J.H.M., Learoyd, M., Grinsted, L., Zazulina, V., Pwint, T., Middleton, M.: A phase I, open-label, randomized crossover study to assess the effect of dosing of the MEK 1/2 inhibitor Selumetinib (AZD6244; ARRY-142866) in the presence and absence of food in patients with advanced solid tumors. Cancer Chemother. Pharmacol. 68, 1619–1628 (2011). https://doi.org/10.1007/s00280-011-1732-7
European Medicines Agency: Epclusa? (sofosbuvir/velpatasvir) CHMP Assessment Report. EMA/399285/2016. Procedure No. EMEA/H/C/004210/0000 https://www.ema.europa.eu/en/documents/assessment-report/epclusa-epar-public-assessment-report_en.pdf. Accessed 18 Jan 2022
Target Molecule Corp.: Stiripentol (Diacomit). https://www.targetmol.com/compound/Stiripentol. Accessed 18 Jan 2022
Food and Drug Administration (USA): Talazoparib (Talzenna). NDA 211651Orig1s000, CDER Quality Assessment Review. Applicant: Pfizer. 3 Oct 2018.
Food and Drug Administration (USA): Tazemetostat (Tazverik), Epizyme 2019 NDA 211723Orig1s000. https://www.accessdata.fda.gov/drugsatfda_docs/nda/2020/211723Orig1s000ChemR.pdf. Accessed 18 Jan 2022
Drug Approvals International: Tecovirimat (TPOXX). http://drugapprovalsint.com/tecovirimat/. Accessed 14 Jul 2018
Food and Drug Administration (USA): Tenapanor(Ibsrela), Ardelyx, Inc. NDA 211801Orig1s000. https://www.accessdata.fda.gov/drugsatfda_docs/nda/2019/211801Orig1s000ChemR.pdf. Accessed 18 Jan 2022
Food and Drug Administration (USA): Tezacaftor (Trikafta), Vertex Pharmaceuticals, Inc. NDA 210491Orig1s000. https://www.accessdata.fda.gov/drugsatfda_docs/nda/2018/210491Orig1s000ChemR.pdf. Accessed 18 Jan 2022
Food and Drug Administration (USA): Tucatinib <Tukysa>, Seattle Genetics. NDA213411Orig1s000. https://www.accessdata.fda.gov/drugsatfda_docs/nda/2020/213411Orig1s000MultidisciplineR.pdf. Accessed 18 Jan 2022
Food and Drug Administration (USA): Upadacitinib (Rinvoq), AbbVie Inc. NDA 211675Orig1s000. https://www.accessdata.fda.gov/drugsatfda_docs/nda/2019/211675Orig1s000ChemR.pdf. Accessed 18 Jan 2022
Food and Drug Administration (USA): Venetoclax <Venclexta>, AbbVie Inc. NDA 208573Orig1s000. https://www.accessdata.fda.gov/drugsatfda_docs/nda/2016/208573Orig1s000ClinPharmR.pdf. Accessed 18 Jan 2022
Food and Drug Administration (USA): Vibegron (Gemtesa). Urovant Sciences, Inc. NDA 213006. Product Quality Review(s). https://www.accessdata.fda.gov/drugsatfda_docs/nda/2020/213006Orig1s000ChemR.pdf. Accessed 24 Nov 2020
Food and Drug Administration (USA): Zanubrutinib (Brukinsa), BioGene USA. NDA 213217Orig1s000. https://www.accessdata.fda.gov/drugsatfda_docs/nda/2019/213217Orig1s000MultidisciplineR.pdf
Avdeef, A.: Solubility temperature dependence predicted from 2D structure. ADMET & DMPK 3, 298–344 (2015)
Völgyi, G., Marosi, A., Takács-Novák, K., Avdeef, A.: Salt solubility products of diprenorphine hydrochloride, codeine and lidocaine hydrochlorides and phosphates—novel method of data analysis not dependent on explicit solubility equations. ADMET & DMPK 1, 48–62 (2013)
Avdeef, A.: Anomalous solubility behavior of several acidic drugs. ADMET & DMPK 2, 33–42 (2014)
Avdeef, A.: Phosphate precipitates and water-soluble aggregates in re-examined solubility–pH data of twenty-five basic drugs. ADMET & DMPK 2, 43–55 (2014)
Verbić, T.Z., Avdeef, A.: Solubility–pH profile of desipramine hydrochloride in saline phosphate buffer: enhanced solubility due to drug–buffer aggregates. Eur. J. Pharm. Sci. 133, 264–274 (2019)
Avdeef, A., Fuguet, E., Llinàs, A., Ràfols, C., Bosch, E., Völgyi, G., Verbić, T., Boldyreva, E., Takács-Novák, K.: Equilibrium solubility measurement of ionizable drugs—consensus recommendations for improving data quality. ADMET & DMPK 4, 117–178 (2016)
Bergström, C.A.S., Avdeef, A.: Perspectives in solubility measurement and interpretation. ADMET & DMPK 7, 88–105 (2019)
Avdeef, A.: Absorption and Drug Development, 2nd edn. Wiley-Interscience, Hoboken NJ (2012)
Avdeef, A.: Multi-lab intrinsic solubility measurement reproducibility in CheqSol and shake-flask methods. ADMET & DMPK 7, 210–219 (2019). https://doi.org/10.5599/admet.698
Llinàs, A., Avdeef, A.: Solubility challenge revisited after ten years, with multi-lab shake-flask data, using tight (SD ∼ 0.17 log) and loose (SD ∼ 0.62 log) test sets. J. Chem. Inf. Model 59, 3036–3040 (2019). https://doi.org/10.1021/acs.jcim.9b00345
Llinàs, A., Oprisiu, I., Avdeef, A.: Findings of the second challenge to predict aqueous solubility. J. Chem. Inf. Model. 60, 4791–4803 (2020). https://doi.org/10.1021/acs.jcim.0c00701
Lang, A.S.I.D.; Bradley, J.-C.: ONS Melting Point Model 010. QsarDB content. Property mpC. http://qsardb.org/repository/predictor/10967/104?model=rf. Accessed 18 Jan 2022
Hopfinger, A.J., Esposito, E.X., Llinàs, A., Glen, R.C., Goodman, J.M.: Findings of the challenge to predict aqueous solubility. J. Chem. Inf. Model. 49, 1–5 (2009)
Lipinski, C.A.: Drug-like properties and the causes of poor solubility and poor permeability. J. Pharmacol. Tox. Meth. 44, 235–249 (2000)
Avdeef, A.: Do you know your r2? ADMET& DMPK (2021). https://doi.org/10.5599/admet.888
Acknowledgements
This study is dedicated to the memory of Professor Michael Abraham, whose pioneering work in the critical role of hydrogen bonding in solvation has influenced the authors deeply. He is remembered as a teacher and a friend. The complete Wiki-pS0 database is planned to be released in book form: A. Avdeef. Intrinsic Aqueous Solubility—Curated Data for Pharmaceutical Research (under discussion with publisher).
Funding
This study was self-funded. The authors declare that they have no known competing financial interests that could have appeared to influence the work reported in this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
The structures of the 72 newly-approved drugs, along with the year of approval, are shown in Fig. 9
Rights and permissions
About this article
Cite this article
Avdeef, A., Kansy, M. Predicting Solubility of Newly-Approved Drugs (2016–2020) with a Simple ABSOLV and GSE(Flexible-Acceptor) Consensus Model Outperforming Random Forest Regression. J Solution Chem 51, 1020–1055 (2022). https://doi.org/10.1007/s10953-022-01141-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10953-022-01141-7