Spectral density distribution moments as novel descriptors for QSAR/QSPR
- 655 Downloads
We propose spectral density distribution moments as molecular descriptors. We apply the new descriptors for developing a QSPR model that predicts the logarithmic values of subcooled liquid vapor pressure. We consider the infrared spectra of chloronaphthalenes.
keywordsDescriptors Molecular similarity Statistical spectroscopy
Nowadays, the majority of environmental studies are focused on group of chemicals called persistent organic pollutants (POPs), which pose a vast range of threats to human health and natural ecosystems. Due to their high lipophilicity and resistance to naturally occurring degradation processes, they are prone to bioaccumulate in human and animal tissues and to biomagnify in food chains . Moreover, after entering the organism, they can induce a variety of toxic effects, including cancer, allergies, and hypersensitivity, damage to the central and peripheral nervous systems, reproductive disorders, and disruption of the immune system [2, 3]. Therefore, according to the Stockholm Convention, the emission of POPs to the environment needs to be eliminated or reduced .
Typical representatives of POPs are chloronaphthalenes (CNs). This group includes of 75 congeners—chemicals based on the same skeleton (naphthalene), but differ by a number of chlorine atoms or by the substituted pattern . Despite of the fact that the synthesis of CNs is formally abandoned, there are still some commercial products (i.e., insulating materials, rubber belts) containing CNs available . Moreover, chloronahthalenes are released to the environment during thermal-related synthesis (i.e., industrial waste incineration as well as domestic heating) , which, in fact, is assumed to be currently the main source of CNs in the environment . Since the emission of CNs to the atmosphere, estimated only for Europeans countries, is still high, equal to 1.03 tons per year , there is an urgent need to perform comprehensive risk studies of these pollutants.
Among main factors influencing the environmental behavior of CNs are: their overall environmental persistence, mobility, and (eco) toxicity. The first two characteristics cannot be measured directly. They are usually determined with employing multimedia mass-balance (MM) models. And every MM model requires a set of phys/chem parameters (e.g., partition coefficients, half-live times, enthalpies of phase transfer, vapor pressure, etc.) as input data. These parameters can be obtained empirically. However, high costs of experiments and time required for performing them for large arrays of chemicals motivate the scientific community to search for alternative, non-experimental, and ways of receiving the lacking parameters.
Nowadays, the significance of a computational technique known as quantitative structure–property relationships (QSPR) modeling and their various applications in chemical risk assessment have being highlighted by many international organizations and regulations (e.g., REACH in Europe) . This approach is based on the assumption that the phys/chem properties of chemical compounds are the functions of so-called molecular descriptors, representing structural features of the molecules. Thus, based on the experimental data available even for few compounds, it is possible to develop mathematical equation describing the correlation(s) between their molecular structures and properties and, on this basis, to predict the lacking information for other, structurally similar molecules .
There are many examples of successful applications of the QSPR approach for predicting environmentally relevant properties of CNs [7, 11, 12]. However, still there is a need of searching for novel structural descriptors that more appropriately would express molecular variance in particular groups of structurally similar congeners of POPs.
Intensity distribution moments, recently proposed by us as new molecular descriptors [13, 14, 15], proved to be an efficient tool in the identification of specific groups of molecules. For example, using these descriptors, one could distinguish nitriles from amides . The general methodology used in this study, the statistical spectroscopy, is known in many different areas of science. The basic quantities, the distribution moments, may be derived from atomic or molecular spectra. Similar methods of statistical spectroscopy we have already applied in studies on stellar spectra [17, 18], in analyzing properties of chaotic dynamical systems , and in bioinformatics . Now, we continue the investigation on the usefulness of different kinds of moments as molecular descriptors. This time we check the spectral density distribution moments. In the present study, the moments are obtained from the frequencies (rather than from the intensities, as it was done before ) of the infrared (IR) spectra of CNs. However, the statistical distributions may also be created from any function describing the system under consideration. The new descriptors are applied for developing a QSPR model that predicts the logarithmic values of subcooled liquid vapor pressure at 25 °C.
Convenient characteristics of distributions are their moments.
Results and discussion
Experimental and predicted values of logPL, leverage values and Mρ,1 used as molecular descriptors (T-training set, V-validation set)
We study spectral density distributions of the frequencies of the IR spectra of the CNs.
The vibrational spectra we obtained from density functional theory (DFT) calculations. A hybrid B3LYP functional and 6-311++G** basis were used as implemented in the Gaussian 03 code .
In the present study, the spectral density distribution moments are applied as molecular descriptors for developing a QSPR model of the logarithmic values of subcooled liquid vapor pressure (logPL) at 25 °C. Experimental data, available for 17 CNs (22 % of the investigated group), have been taken from . The compounds, for which the experimentally derived logPL values have been available, were divided into two smaller sets: a training set (12 compounds) and a validation set (5 compounds). The splitting algorithm was as follows. The 17 compounds have been sorted along with the decreasing logPL value and then every third compound was selected to the validation set, whereas the remaining ones formed the training set. This method produces two representative sets of compounds, since the compounds are evenly distributed along with the range of logPL. The training set was then utilized for the model development and calibration, whereas the validation set, according to the golden standards  and the OECD recommendations for QSAR , was employed for performing external validation of the model.
It is worth noting that the subcooled liquid vapor pressure (logPL) at 25 °C has been already successfully predicted for CNs with QSPR models . The models utilized other popular quantum-mechanical descriptors (averaged polarizability, dipole moment etc.) calculated at the same level of theory (B3LYP/6-311++G**) with statistical modeling methods of different complexity, including MLR, principal component regression (PCR), partial least square (PLS) regression, and its two modifications: PLS regression with uninformative variable elimination (UVE-PLS) and partial least square regression with variable selection by genetic algorithm (GA-PLS). However, by employing the spectral density distribution moments as novel molecular descriptors in the current study it was possible to develop a model characterized by both lower complexity and better predictive ability than the best model obtained in the previous study. The best original model was developed with GA-PLS, utilized eight descriptors and the prediction error RMSEP = 0.108.
Summarizing, the model presented in the current study has been developed with much simpler algorithm MLR, utilizing only one descriptor with RMSEP equal to 0.06. This finally confirms the usefulness of the proposed spectral density distribution moments in QSPR.
The proposed descriptors characterize statistical properties of the distributions of the frequencies (not of the intensities) used for their computation. Therefore, the descriptors defined in this study are useful for the description of the properties which are mainly determined by the frequency distributions, such as log PL of chloronaphthalens. The frequency distributions are different if the molecules contain different numbers of chlorine atoms but all isomers of CNs with a fixed number of substituents have nearly the same frequency distributions. Therefore, the moments shown in Figs. 1 and 2 are nearly constant for the compounds with the same number of substituents. In order to distinguish different isomers one has to use the intensity distribution moments . For CNs such descriptors are different for each compound and they will be considered in a subsequent paper.
The contribution of TP was supported by the Polish Ministry of Science and Higher Education (Grant No. DS/8430-4-0171-1).
- 1.UNEP (2001) Stockholm Convention on persistent organic pollutants, Stockholm, 2001Google Scholar
- 8.Weem AP (2007) Exploration of management options for polychlorinated naphthalenes (PCN). In: Paper for the Sixth Meeting of the UNECE CLRTAP Task Force on persistent organic pollutants, Vienna, 4–6 June 2007Google Scholar
- 9.REACH, Regulation (EC) No 1907/2006 of the European Parliament and of the Coincil of 18 December 2006 concerning the Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH), establishing a European Chemicals Agency, amending Directive 1999/45/EC and repealing Council Regulation (EEC) No 793/93 and Commission Regulation (EC) No 1488/94 as well as Council Directive 76/769/EEC and Commission Directives 91/155/EEC, 93/67/EEC, 93/105/EC and 2000/21/EC. (2006)Google Scholar
- 17.Wąż P, Bielińska-Wąż D, Pleskacz A, Strobel A (2008) Acta Phys Pol B 39:1993Google Scholar
- 18.Wąż P, Bielińska-Wąż D, Strobel A, Pleskacz A (2010) Acta Astron 60:283Google Scholar
- 19.Wąż P, Bielińska-Wąż D (2009) Acta Phys Pol A 116:987Google Scholar
- 21.Frisch JM et.al. (2004) Gaussian. Inc.: Wallingford CTGoogle Scholar
- 24.OECD, (2007) Guidance document on the validation of (Quantitative) structure–activity relationships, (QSAR) Models, Organization for Economic Co-Operation and Development, ParisGoogle Scholar
- 25.Puzyn T, Leszczynski J, Cronin MTD (eds) (2010) Recent advances in QSAR studies: methods and applications. In: Challenges and advances in computational chemistry and physics, ISBN: 978-1-4020-9782-9; Springer, DordrechtGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.