Direct QSPR: the most efficient way of predicting organic carbon/water partition coefficient (log K_{OC}) for polyhalogenated POPs
- First Online:
- Received:
- Accepted:
DOI: 10.1007/s11224-014-0419-1
- Cite this article as:
- Jagiello, K., Sosnowska, A., Walker, S. et al. Struct Chem (2014) 25: 997. doi:10.1007/s11224-014-0419-1
- 7 Citations
- 1.2k Downloads
Abstract
The organic carbon/water partition coefficient (K_{OC}) is one of the most important parameters describing partitioning of chemicals in soil/water system and measuring their relative potential mobility in soils. Because of a large number of possible compounds entering the environment, the experimental measurements of the soil sorption coefficient for all of them are virtually impossible. The alternative methods, such as quantitative structure–property relationship (QSPR techniques) have been applied to predict this important physical/chemical parameter. Most available QSPR models have been based on correlations with the n-octanol/water partition coefficient (K_{OW}), which enforces the requirement to conduct experiments for obtaining the K_{OW} values. In our study, we have developed a QSPR model that allows predicting logarithmic values of the organic carbon/water partition coefficient (log K_{OC}) for 1,436 chlorinated and brominated congeners of persistent organic pollutants based on the computationally calculated descriptors. Appling such approach not only reduces time, cost, and the amount of waste but also allows obtaining more realistic results.
Keywords
Persistent organic pollutants Organic carbon/water partition coefficient QSPR Quantum–mechanical descriptorsIntroduction
The occurrence of polyhalogenated persistent organic pollutants (POPs), such as chloro- and bromo-substituted biphenyls, naphthalenes, dibenzo-p-dioxins, dibenzofurans, and diphenyl ethers has been identified in almost all environmental compartments [1]. Due to their high liphophilicity and resistance to naturally occurring degradation processes, they are prone bioaccumulation in human and animal tissues [2]. In the organism, they are capable to induce various toxic effects, including carcinogenicity, reproductive disorders related to disrupting the hormonal system, immunotoxicity, and damages to the central and peripheral nervous systems. They are also suspected to be responsible for the increasing number of patients nowadays suffering from allergies and hypersensitivity [3, 4]. Therefore, efficient tools for comprehensive environmental risk assessment for polyhalogenated POPs are needed.
The procedure of comprehensive risk assessment requires information about the environmental transport and fate processes of a given substance. Among various physical/chemical properties governing the environmental occurrence and transport of POPs, the most important are: water solubility, vapor pressure, and partition coefficients, i.e., n-octanol/water partition coefficient (K_{OW}), n-octanol/air partition coefficient (K_{OA}), air/water partition coefficient (K_{AW}), and organic carbon/water partition coefficient (K_{OC}) [2]. The last property (K_{OC}) is crucial for characterizing the distribution of pollutants between the solid and solution phases in soil, or between water and sediment in aquatic ecosystems [5]. Thus, soil sorption coefficient indicates whenever the chemicals undergo leaching or run-off when enter to the soil or would be immobile [6].
The accurate values characterizing the mentioned properties can be obtained experimentally. However, because of a large number of possible substitution isomers, congeners, may exist, the empirical measurements of the properties for all of them are impractical. Therefore, the only way to acquire complete physicochemical characteristics of all polyhalogenated POPs are to employ computational techniques, such as quantitative structure–property relationships (QSPR) modeling [7].
Since the QSPR technique employing computationally calculated descriptors has been already successfully applied to predict n-octanol/water partition coefficient (K_{OW}) [15] the question raised whenever there is the possibility to use such descriptors to predict the organic carbon/water partition coefficient (K_{OC}). Consequently, considering that, one needs to investigate, if there is possibly a much more efficient, direct way of obtaining the values of log K_{OC}, then the scheme summarized by Gawlik et al. [14].
Therefore, our study was aimed at comparing the direct (based on computational descriptors) method of predicting log K_{OC} with the existing QSPR models utilizing the value of log K_{OW}. To perform this task, we have developed a QSPR model that predicts the organic carbon–water partition coefficients for a series of polyhalogenated POPs (polychlorinated and polybrominated benzenes, biphenyls, dibenzo-p-dioxins, dibenzofurans, diphenyl ethers, and naphthalenes) based on quantum–mechanical molecular descriptors. The descriptors could be obtained computationally, without performing additional experiments. The comparison resulted in practical recommendations toward the efficient environmental transport and fate modeling of polyhalogenated POPs that utilizes the values of log K_{OC} as model inputs.
Materials and methods
Predicting organic carbon/water partition coefficient (log K_{OC}) with the direct QSPR approach
At the first stage of our study, we have developed a novel QSPR model that allowed predicting the values of organic carbon/water partition coefficient directly from quantum–mechanical descriptors. The algorithm that we applied consisted of five main steps: (i) collecting experimental data and splitting them into training set (T) and validation set (V); (ii) calculating molecular descriptors; (iii) calibrating the model; (iv) internal and external validation of the model and the assessment of applicability domain; and (v) applying the model to predict the values of log K_{OC} for the compounds, for which the experimentally derived values of the coefficient have been unavailable.
The values of K_{OC} for all studied POPs derivatives were taken from the Handbook of Physical–Chemical Properties and Environmental Fate for Organic Compounds [16]. The experimental data have been available for 205 chlorinated or brominated POPs congeners (for details please refer to Supplementary Material). The logarithmic values of log K_{OC} ranged from 2.19 to 8.09 [16]. The compounds, for which experimental data have been available, were divided into two sets: training set and validation set. The compounds were ranked according to their endpoints (the experimentally determined values), and every forth compound was labeled as a validation compound and removed from the training set; the first and second compounds were arbitrarily included in the training set. This commonly used method produces two sets that accurately represent the data [17, 18].
Symbols and definitions of all calculated molecular descriptors [25]
Symbol | Definitions of molecular descriptors | Units |
---|---|---|
nH | Number of hydrogen substituents | – |
nCl | Number of chlorine substituents | – |
nBr | Number of bromine substituents | – |
nA | Number of atoms in the molecule | – |
MW | Molecular weight | g/mol |
HOF | Standard heat of formation | kcal/mol |
EE | Electronic energy | eV |
Core | Core repulsion energy | eV |
TE | Total energy | eV |
HOF_{c} | Standard heat of formation in a solution represented by the conductor-like screening model (COSMO) | eV |
TE_{c} | Total energy in a solution represented by COMSO | eV |
HOMO | Energy of the highest occupied molecular orbital (HOMO) | eV |
LUMO | Energy of the Lowest Unoccupied Molecular Orbital | eV |
Dx | X vector of the dipole moment | Debye |
Dy | Y vector of the dipole moment | Debye |
Dz | Z vector of the dipole moment | Debye |
Dtot | Total dipole moment | Debye |
SAS | Solvent accessible surface | Å^{2} |
MV | Molecular volume | Å^{3} |
Q_{-} | Lowest negative Mulliken’s partial charge on the molecule | – |
Q_{+} | Highest positive partial charge on the molecule | – |
Ahof | Polarizability derived from the heat of formation | Å^{3} |
Ad | Polarizability derived from the dipole moment | Å^{3} |
En | Mulliken’s electronegativity | eV |
Hard | Parr and Pople’s absolute hardness | eV |
Shift | Schuurmann MO Shift alpha | eV |
In the final, fifth step, after sterling validation, the developed QSPR model was applied to predict the values of the organic carbon/water partition coefficient for the compounds, for which the experimentally measured data have been unavailable. Reliability of the predictions (related to the applicability domain) was assessed based on the leverage value and Insubria graph approach [29].
Comparing the direct method of predicting organic carbon/water partition coefficient with other methods
As mentioned in the Introduction, in most published contributions the values of log K_{OC} have been derived from another physicochemical property, i.e., n-octanol/water partition coefficient (log K_{OW}). Thus, we performed a literature search for the best available models for predicting log K_{OC}. In the next step, a comparison of the prediction efficiency between such models and the direct QSPR model developed in this study has been carried out.
In this comparison we have taken into account: (i) time required to obtain log K_{OC}, (ii) cost associated with the conducted investigations, (iii) the amount of waste arising during investigations, and iv) predictive abilities of selected approaches.
Results and discussion
Predicting organic carbon/water partition coefficient (log K_{OC}) with direct QSPR approach
Since the error values (RMSE_{C}, RMSE_{CV}, and RMSE_{P}) were identical and there were no significantly large residual values for the validation set displayed in Fig. 2, one can conclude that the model has not been overfitted. This means that the model predicts correctly not only for the training compounds but also for other (external) compounds.
Mechanistic interpretation of the developed model, according to the physicochemical theory of dissolution, was intuitive: non-polar chemicals with large solvent accessible surface area (SAS) are less soluble in water. The theory divides the dissolution process into six stages, namely: (i) breaking up solute–solute intermolecular bonds; (ii) breaking up solvent–solvent intermolecular bonds; (iii) formation of a cavity in the solvent phase large enough to accommodate solute molecule; (iv) vaporization of solute into the cavity; (v) forming solute–solvent intermolecular bonds; and (vi) reforming solvent–solvent bonds with solvent restructuring [31]. Thus, since formation of the cavity appropriate for highly halogenated, large molecules require more energy, the solubility of larger congeners is lower, when comparing with less halogenated and smaller congeners, that will simultaneously absorbed mostly by the organic carbon layer. On the other hand, the adsorption of larger molecules on the surface of organic carbon layer is more favored, because of the larger surface of possible intermolecular interactions (attractions) between the target molecules and the organic carbon layer. SAS values increase with the increasing number of halogen atoms present in the molecule and the size of the radius of the halogen substituted. The last feature differentiates chlorinated and brominated derivatives having the same number of halogen substituents, because the atomic radius of bromine atom is larger than the radius of chlorine atom. For example, the values of log K_{OC} of pentachlorobithenyls are higher than that of trichlorobiphenyls, but lower than the values of pentabromobiphenyls. Regarding environmental implications, higher values of the organic carbon/water partition coefficient for highly halogenated organic pollutants correspond with their lower ability to leaching or running off with ground water [32].
Comparing the direct method of predicting organic carbon/water partition coefficient (log K_{OC}) with other methods
Many other contributions related to the prediction of log K_{OC} has been published so far [5, 6, 9, 11, 12, 13]. Methods of the prediction proposed in majority of them can be classified as “indirect” ones, because they are based on the correlation of log K_{OC} with another environmentally relevant parameter—log K_{OW} partition coefficient, which has to be either determined experimentally or calculated first [10, 11, 12, 33]. In the following paragraph, we present the results of a simple comparison between the results of the predictions by using our (direct) model and predictions by the other available (indirect) models.
We selected indirect models, originally proposed by Gerstl and Mingelgrin [11] and by Karickhoff [12] to compare them with our (direct) QSPR model.
log K_{OC}^{I} calculated according to newly developed QSPR model (direct method presented in this work),
- log K_{OC}^{II} calculated according to the equations proposed by Gerstl and Mingelgrin [11] (Eq. 10) and by Karickhoff [12] (Eq. 11) with use of the experimentally derived values of n-octanol/water partition coefficient (indirect method):$${\text{log }}\;{K_{\text{OC}}}^{\text{IIA}} = \, 0. 7 6 2 {\text{ log }}{K_{\text{OW}}}^{ \exp } + { 1}.0 5 1 ,$$(10)$${\text{log }} {K_{\text{OC}}}^{\text{IIIA}} = \, 0. 7 6 2 {\text{ log }}{K_{\text{OW}}}^{{{\text{pred}}.}} + { 1}.0 5 1 ,$$(11)
- log K_{OC}^{III} calculated according to the equations proposed by Gerstl and Mingelgrin [11] (Eq. 12) and by Karickhoff [12] (Eq. 13) with use of the predicted values of the n-octanol/water partition coefficient. The log K_{OW} values were predicted using one of our previously built QSPR models [15] (indirect method)$${ \text{log }} {K_{\text{OC}}}^{\text{IIIA}} = \, 0. 7 6 2 {\text{ log }}{K_{\text{OW}}}^{{{\text{pred}}.}} + { 1}.0 5 1 ,$$(12)$${ \text{log }}{K_{\text{OC}}}^{\text{IIIB}} = \, 0. 9 8 9 {\text{ log }{}K_{\text{OW}}}^{{{\text{pred}}.}} - 0. 3 4 6.$$(13)
Statistical comparison of the results (predicted values of log K_{OC}), obtained with the three methods, has been performed with use of a test set containing 41 compounds, for which we were able to find the experimental values of both partition coefficients: log K_{OC}, and log K_{OW}. Thus, we investigated differences between the experimental and predicted values of log K_{OC} with pairwise t Student’s test for each of the three strategies.
Comparison between the residuals derived from different schemes of predicting log K_{OC} with the observed values of log K_{OC} (the pairwise Student’s t test)
Statistics | Model | ||||
---|---|---|---|---|---|
K_{OC}^{I} | K_{OC}^{IIA} | K_{OC}^{IIB} | K_{OC}^{IIIA} | K_{OC}^{IIIB} | |
Mean residual | 0.018 | 0.041 | 0.089 | 0.098 | 0.197 |
Standard deviation of residuals | 0.162 | 1.353 | 1.465 | 1.496 | 1.501 |
Test statistic (t_{kr} = 2.021) | 0.718 | 0.194 | 0.388 | 0.419 | 0.839 |
p value | 0.477 | 0.847 | 0.700 | 0.677 | 0.406 |
Therefore, more generally, we recommend using direct QSPR models such as the one we have developed in this contribution. Another advantage is that the application of the model that predicts the log K_{OC} value of chloro- and bromo-analogs of POPs directly from a quantum mechanical descriptor is independent on the availability of other experimental data (i.e., experimentally derived values of log K_{OW}). Since Baker et al. [34, 35, 36] observed that the correlation log K_{OC}/log K_{OW} tend to be specific only for chemicals with log K_{OW} < 6 searching for alternative ways of predicting of K_{OC} is reasonable and justified. The authors have demonstrated that at least for 18 POP species having log K_{OW} values in the range 6–7, these correlation is very low, measured by R^{2} = 0.294 [36]. Application of this approach for such chemicals will lead to increased error with prediction of soil sorption coefficient. Thus, using direct model does not only prevent making possible systematic errors and mistakes during the experiments and mathematical conversions but also reduces time, cost associated with experimental research, and the amount of waste arising during such studies. Furthermore, the advantage of using computationally obtained descriptors is that they can be calculated also for not yet synthesized compounds. Thus, partition coefficients can be predicted for novel unknown and untested compounds.
It should be mentioned here that similar direct models have already been developed by other authors. Gramatica et al. [6, 9] reviewed most recently published QSPR models for predicting log K_{OC}. These models differ not only by descriptors used but also by size and composition of the training set (thus, its applicability) and predictive abilities. Moreover, many of them, as the authors note, are verified only in the case of their goodness-of-fit, while their predictive power for compounds not previously used for training is not known [6]. Therefore, applications of such improperly validated models are disputable. Gramatica et al. [9] proposed a series of QSPR models of K_{OC} for a wide and highly heterogeneous data set of 643 non-ionic organic chemicals that fulfill all OECD recommendations [7]. The developed models have very good stability, robustness, and predictivity. Moreover, their applicability domains have been clearly described, according to the golden QSPR standards. However, the advantage of QSPR model presented within this study is that it includes only one descriptor. Moreover, the descriptor utilized in our model is very intuitive in mechanistic interpretation.
Conclusions
In our contribution, we have developed a QSPR model for predicting the organic carbon/water partition coefficient for 1,436 polychlorinated and polybrominated congeners of benzens, biphenyls, dibenzo-p-diozins, dibenzofurans, diphenylethetrs, and naphtalenes. The model is based on a single molecular descriptor (solvent accessible surface—SAS) that can be simply calculated exclusively from the characteristic of chemical structure. We have observed that the values of log K_{OC} increase with the increasing SAS that is related to the increasing number of halogen substituents. In addition, since brominated congeners are characterized by higher surface comparing with their chlorinated analogs, their log K_{OC} partition coefficients are also higher. This significantly differentiates mobility of chlorinated and brominated POPs in the environment.
The QSPR model fulfills all five OECD recommendations related to the validation procedure: it has satisfactory statistics of goodness-of-fit, robustness, and predictive ability. Applicability domain of the model covers majority of the studied chemicals.
Finally, we have compared the predictions of our direct QSPR model with the values of log K_{OC} predicted using other models based on the n-octanol/water partition coefficient. We have demonstrated that the estimation of log K_{OC} of chloro- and bromo-analogs of POPs with the direct QSPR leads to more reliable results than in case of application and other available methods. In addition, the application of our model is possible whenever the values of the other coefficient (log K_{OW}) are even do not known, without necessity of performing additional time-consuming and expensive experiments.
Acknowledgments
This work was supported by Japan Society for the Promotion of Science (JSPS) and the Polish Academy of Science (PAN) under the Bilateral Joint Research Project, and by JSPS Grants-in-Aid for Young Scientists (B) No. 25871087. The authors (K. J., A. S., A. G. and T. P.) thank to the Polish Ministry of Science and Higher Education (grant no. DS 530-8180-D202-3) and the Foundation for Polish Science (FOCUS 2010 Programme) for the financial support. This research was supported in part (to M. H.) by the U. S. Department of Energy under contract DE-AC02-05CH11231. This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DEAC02-05CH11231.
Supplementary material
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.