Predicting the Temperature Dependence of the Octanol–Air Partition Ratio: A New Model for Estimating $$\Delta {U^{ \circ}_{\text{OA}}}$$

Baskaran, Sivani; Podagatlapalli, Akshay; Sangion, Alessandro; Wania, Frank

doi:10.1007/s10953-022-01214-7

Predicting the Temperature Dependence of the Octanol–Air Partition Ratio: A New Model for Estimating $\Delta {U^{ \circ}_{\text{OA}}}$

Open access
Published: 08 November 2022

Volume 52, pages 51–69, (2023)
Cite this article

Download PDF

You have full access to this open access article

Journal of Solution Chemistry Aims and scope Submit manuscript

Predicting the Temperature Dependence of the Octanol–Air Partition Ratio: A New Model for Estimating $\Delta {U^{ \circ}_{\text{OA}}}$

Download PDF

Sivani Baskaran^1,2^na1,
Akshay Podagatlapalli¹^na1,
Alessandro Sangion^1,3 &
…
Frank Wania ORCID: orcid.org/0000-0003-3836-0901^1,2

2083 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

The octanol–air partition ratio (K_OA) describes the partitioning of a chemical between air and octanol and is often used to approximate other partitioning phenomena in environmental chemistry (e.g., blood–air, atmospheric particulate matter–air, polyurethane foam-air). Such partitioning processes often occur at environmental temperatures other than 25 °C. Enthalpies $\Delta {H^{ \circ}_{\text{OA}}}$ or internal energies $\Delta {U^{ \circ}_{\text{OA}}}$ of phase transfer are used to express the temperature dependence of the K_OA. Existing poly-parameter linear free energy relationships (ppLFERs) for predicting $\Delta {H^{ \circ}_{\text{OA}}}$ were developed using a relatively small dataset. In this work we utilize a recently developed comprehensive K_OA database to create and curate a $\Delta {U^{ \circ}_{\text{OA}}}$ dataset containing 195 chemicals and use this dataset in the development of new predictive equations. Using the QSAR development platform QSARINS we evaluate the use of Abraham descriptors, other molecular descriptors, and the log₁₀ K_OA at 25 °C as variables in different multilinear regression equations for $\Delta {U^{ \circ}_{\text{OA}}}$. The $\Delta {U^{ \circ}_{\text{OA}}}$ of neutral organic chemicals can be reliably predicted using only the log₁₀ K_OA (RMSE_EXT = 6.86 kJ·mol⁻¹, ${\text{R}^{2} _{\text{adj}}}$ = 0.94), only the solute’s hydrogen acidity A and the logarithm of the hexadecane–air partition ratio L (RMSE_EXT = 7.23 kJ·mol⁻¹, ${\text{R}^{2} _{\text{adj}}}$ = 0.93), or A and log₁₀ K_OA (RMSE_EXT = 6.76 kJ·mol⁻¹, ${\text{R}^{2} _{\text{adj}}}$ = 0.95).

QSPRs for Predicting Equilibrium Partitioning in Solvent–Air Systems from the Chemical Structures of Solutes and Solvents

Article Open access 27 May 2022

Novel approach for predicting partition coefficients of linear perfluorinated compounds

Article 31 December 2015

Equations for the Correlation and Prediction of Partition Coefficients of Neutral Molecules and Ionic Species in the Water–Isopropanol Solvent System

Article 01 April 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The octanol–air equilibrium partitioning ratio (K_OA) describes the partitioning of a chemical between octanol and air and is commonly used to approximate the partitioning between various organic phases and the gas phase, including soil organic matter [1, 2] plant foliage [3, 4], atmospheric particulate matter [5, 6], and milk and blood [7]. Previous work on contaminants in outdoor environments (e.g., [8,9,10]) and in various terrestrial organisms (e.g., [11]) has shown the importance of K_OA in understanding the distribution, fate, and bioaccumulation potential of volatile and semi-volatile organic chemicals in the environment. However, these processes often occur at temperatures other than standard temperatures (i.e., 25 °C) and accurate understanding of the partitioning of a chemical in these systems may require a temperature correction.

The temperature dependence of log₁₀ K_OA can be described by the internal energy of phase transfer from octanol to air ($\Delta {U^{ \circ}_{\text{OA}}}$, kJ·mol⁻¹) using the van’t Hoff equation:

$${\mathrm{log}}_{10}\frac{{K}_{\mathrm{OA}} \mathrm{at} {T}_{2}}{{K}_{\mathrm{OA}} \mathrm{at} {T}_{1}}=- \frac{U^{ \circ}_{\mathrm{OA}}}{R}\cdot \left( \frac{1}{{273.15+T}_{2}}-\frac{1}{{273.15+T}_{1}} \right)\cdot {\mathrm{log}}_{10}e$$

(1)

where R is the ideal gas constant (8.314 ·10^–3 kJ·K⁻¹·mol⁻¹), T₁ and T₂ are two temperatures (°C), and log₁₀ e (i.e., 0.43) is applied for the logarithm base change. While $\Delta {U^{ \circ}_{\text{OA}}}$ is itself temperature dependent, the derivation of Eq. 1 requires it to be constant over small temperature ranges [12] and therefore a $\Delta {U^{ \circ}_{\text{OA}}}$ can be derived by regressing log₁₀ K_OA against reciprocal temperature (273.15 + T)⁻¹.

The concentration of a solute in the gas phase can be expressed volumetrically (e.g., in units of mol·m⁻³) or as a partial pressure (in Pa). If the K_OA is calculated using volumetric gas phase concentrations, a regression of log₁₀ K_OA against reciprocal temperature yields $\Delta {U^{ \circ}_{\text{OA}}}$. However, when K_OA is defined in terms of partial pressure, the enthalpy of phase transfer ($\Delta {H^{ \circ}_{\text{OA}}}$, kJ·mol⁻¹) is obtained. $\Delta {U^{ \circ}_{\text{OA}}}$ or $\Delta {H^{ \circ}_{\text{OA}}}$ can be converted into each other using [12, 13]:

$$\Delta {U^{\circ }_{\mathrm{OA}}}=\Delta {H^{\circ }_{\mathrm{OA}}}+{RT}$$

(2)

The direct determination of the K_OA and its temperature dependence using various experimental approaches can be challenging and time consuming. Recent work by Baskaran et al. [14] found that, compared to other K_OA prediction techniques, poly-parameter linear free energy relationships (ppLFERs) using Abraham descriptors are a reliable, fast and easy-to-use method for estimating K_OA. Traditionally ppLFERs using Abraham descriptors combine six system constants and five solute descriptors to describe how a chemical interacts and partitions between two phases [15]. The system constants, represented by lower-case letters, are determined using a multiple linear regression of the property against the solute descriptors, expressed with upper-case letters. These solute descriptors include: E (excess molar refraction), S (polarizability/dipolarity), A (hydrogen bond acidity), B (hydrogen bond basicity), V (McGowan molar volume), and L (log₁₀ of the hexadecane–air partition ratio). The product of a solute descriptor and a system constant, such as sS or bB, describes the energetic contribution of one particular type of intermolecular interaction to the property [15].

Properties describing the interaction of a solute between the gas phase and a condensed phase can be calculated using two different ppLFERs which differ in the use of either the V or E parameter.

$${\mathrm{log}}_{10}\;K=c+eE+sS+ aA+bB+lL$$

(3)

$${\mathrm{log}}_{10}\;K=c+sS+aA+bB+vV+lL$$

(4)

Equation 3 is intended to exclusively describe partitioning between the gas phase and a condensed phase, while Eq. 4 can be applied to any two phases, including two condensed phases [15].

In previous work [14] we presented a ppLFER without either V or E to estimate the K_OA at 25 °C, that performed as well as the 5-parameter equations by Abraham and Acree [16] and Endo and Goss [17]. Jin et al. [18] also developed a ppLFER equation for log₁₀ K_OA which directly incorporates temperature in the multiple linear regression but used a small and limited data set [14]. In order to predict K_OA values at temperatures other than 25 °C, Baskaran et al. [14] recommend using the 4-parameter ppLFER equation for K_OA and a ppLFER equation for $\Delta {H^{ \circ}_{\text{OA}}}$ by Mintz et al. [19]. The $\Delta {H^{ \circ}_{\text{OA}}}$ and the log₁₀ K_OA at 25 °C estimated this way were found to be highly correlated (R² > 0.98) [14], in analogy to previous work that has shown enthalpies of absorption ($\Delta {H^{ \circ }_{\text{ads}}}$) and the logarithm of the adsorption constants (log₁₀ K_ads) to be linearly correlated (R² > 0.91) [20]. Likewise, strong correlations between $\Delta {H^{ \circ }_{\text{vap}}}$ and P_L [20, 21] have been observed and Goss and Schwarzenbach [20] note that previous work (e.g., [22]) indicated strong relationships between enthalpies and partitioning ratios. The high correlation between log₁₀ K_OA and $\Delta {H^{ \circ }_{\text{OA}}}$ suggests that the temperature dependence of K_OA can be estimated from K_OA directly.

Other quantitative structure property relationships (QSPRs) for log₁₀ K_OA use different molecular, topographic, geometric, and quantum-chemical descriptors [12], which require commercial software or intensive computational power. The Open (Quantitative) Structure-activity/property Relationship App (OPERA) model by Mansouri et al. [23], uses two molecular descriptors computed with the PaDEL descriptor software [24] to estimate log₁₀ K_OA at 25 °C. PaDEL calculates 1, 2, and 3 dimensional (1D, 2D, and 3D) molecular descriptors, whereby an increase in the dimensionality corresponds to the complexity of the encoded information [24]; we collectively refer to these descriptors as PaDEL descriptors. The OPERA model utilizes the PaDEL-predicted log₁₀ hexadecane-air partition ratio (L_PaDEL) and the number of hydrogen-bond donor atoms (nHBDon), both 2D descriptors [25]. The L_PaDEL is identical to the L Abraham solute descriptor, while the nHBDon is somewhat similar to the A Abraham solute descriptor, because they both describe the capacity of the chemical to donate protons.

Both the Abraham and PaDEL descriptors are easily acquired. Two types of Abraham descriptors, experimental and estimated, are available from the UFZ-LSER website [26]. Experimental descriptors are measured directly or through chromatographic retention time techniques [15]. Solute descriptors can also be predicted from a chemical’s Simplified Molecular Input Line Entry System (SMILES) notation using the IFS-QSAR models [27] integrated into the UFZ-LSER website and the EAS-E Suite platform [28] or directly from the standalone python package available on GitHub [29]. All PaDEL descriptors can be estimated from the SMILES strings using either a stand-alone software [24] or through QSARINS [30].

Since the development of the $\Delta {H^{ \circ }_{\text{OA}}}$ ppLFER by Mintz et al. [19, 31], a large database of log₁₀ K_OA has been assembled [12]. This database can be used to create a comprehensive dataset of $\Delta {U^{ \circ }_{\text{OA}}}$ for chemicals with measured log₁₀ K_OA data at multiple temperatures. In this work we utilize a newly assembled and curated $\Delta {U^{ \circ }_{\text{OA}}}$ database to develop linear regression models for $\Delta {U^{ \circ }_{\text{OA}}}$ using Abraham solute descriptors, PaDEL descriptors, and/or the log₁₀ K_OA at 25 °C.

2 Methods

2.1 Data Curation

The development of a reliable model for $\Delta {U^{ \circ }_{\text{OA}}}$ relies on acquiring empirical data and processing the data through several data curation steps to reduce errors. We use measured log₁₀ K_OA values from the K_OA database [12] to derive experimental $\Delta {U^{ \circ }_{\text{OA}}}$ values. We also searched for directly measured $\Delta {H^{ \circ }_{\text{OA}}}$ and $\Delta {U^{ \circ }_{\text{OA}}}$ values, obtained using calorimetric techniques, to build our training and external validation datasets.

2.1.1 Data from the K _OA Database

All measured values of log₁₀ K_OA were extracted from the K_OA database and filtered to remove any measurements made (i) using indirect techniques (data from gas chromatographic retention times in octanol-filled columns were considered directly obtained), (ii) using water-saturated octanol, (iii) for a mixture or a chemical with an ambiguous structure, (iv) for inorganic and labelled compounds. We also removed duplicate values and values considered unreliable within the database. Finally, chemicals with measured K_OA values at less than four different temperatures were removed, which left 149 chemicals. Log₁₀ K_OA values were linearly regressed against inverse absolute temperature to obtain $\Delta {U^{ \circ }_{\text{OA}}}$ from the slope. We then considered the strength of the correlation based on R², the standard error of $\Delta {U^{ \circ }_{\text{OA}}}$ as derived from the error of the slope, and how the $\Delta {U^{ \circ }_{\text{OA}}}$ calculated from this regression compared with published data.

We eliminated 10 chemicals from the dataset because the log₁₀ K_OA values used to calculate $\Delta {U^{ \circ }_{\text{OA}}}$ values were obtained from a single reference and the $\Delta {U^{ \circ }_{\text{OA}}}$ values calculated in our regression disagreed with the published value from that reference. In many instances, the published $\Delta {U^{ \circ }_{\text{OA}}}$ was calculated from a regression of three log₁₀ K_OA values at different temperatures. Including a fourth log₁₀ K_OA measurement (obtained via personal communication with the authors, see [12]) caused the calculated $\Delta {U^{ \circ }_{\text{OA}}}$ to deviate from the reported value. We assume that the authors of these original measurement did not hold strong confidence in these unpublished log₁₀ K_OA measurements. Thus, data for chlorinated dibenzo-p-dioxin (CDD) 48, 50, 54, 66 and 73, and brominated diphenyl ether (BDE) 183 were removed from the $\Delta {U^{ \circ }_{\text{OA}}}$ dataset. For o,pʹ-dichlorodiphenyltrichloroethane, p,pʹ-dichlorodiphenyldichloroethane, cis-nonachlor and endrin, removing individual log₁₀ K_OA values did not lead to an agreement between previously published and calculated $\Delta {U^{ \circ }_{\text{OA}}}$ values, which suggests that there may be some disparity in the log₁₀ K_OA values used in the regressions (also obtained via direct communication with the authors) and the published data. Four chemicals (isopropyl ether, methane, BDE 99, and BDE 153) in the dataset were removed because the R² was below 0.95 and removing any outlier left only three datapoints in the regression. The regressions of the log₁₀ K_OA against inverse temperature for these chemicals are shown in Figs. SI 1–SI 3.

Although there have been multiple measurements of log₁₀ K_OA for perfluoroalkyl substances (PFAS), agreement between data from different studies is poor. As such, all PFAS compounds (perfluorooctane sulfonamido ethanols, perfluorooctane sulfonamide, fluorotelomer alcohols, and fluorotelomer acrylates) were excluded from the $\Delta {U^{ \circ }_{\text{OA}}}$ dataset. The discrepancies between these values is further discussed in Sect. 3.1.1.

By assessing the temperature regression plots for a few chemicals, it was clear that some measured log₁₀ K_OA data deviated from others. In many cases these outliers were from the same paper and often used older and/or objectively less precise measurement techniques (e.g., [32,33,34,35,36,37,38,39,40]). In these instances, we removed the outliers and recalculated the $\Delta {U^{ \circ }_{\text{OA}}}$ using the remaining log₁₀ K_OA data. In the case of propanol, after removing the obvious outliers from Eger et al. [34], we took the average $\Delta {U^{ \circ }_{\text{OA}}}$ value calculated from regressing temperature dependent log₁₀ K_OA data from Lei et al. [41] and Gruber et al. [42], which subsequently caused the elimination of a single datapoint published by Abraham et al. [43]. Plots of these regressions, including the outliers, are available in Figs. SI 5 and SI 6 in the supporting information.

For some compounds the R² for individual sets of temperature dependent log₁₀ K_OA values was high, however, individual log₁₀ K_OA values at specific temperatures deviated by 0.3 to 1 log₁₀ units between papers. Therefore, we took the average $\Delta {U^{ \circ }_{\text{OA}}}$ value calculated from log₁₀ K_OA values from each paper, in other words, we essentially took the average of the two slopes. This applied to hexachlorobenzene (HCB), and the polychlorinated biphenyls (PCBs) 153 and 180 (see Fig. SI 7).

After this step, we were left with 123 chemicals with reliable $\Delta {U^{ \circ }_{\text{OA}}}$ values from the K_OA database.

2.1.2 Direct Measurements of $\Delta {H^{ \circ }_{\text{OA}}}$ and $\Delta {U^{ \circ }_{\text{OA}}}$

Mintz et al. [19, 31] used $\Delta {H^{ \circ }_{\text{OA}}}$ values for 138 chemicals to develop ppLFER equations for the enthalpy of phase change between wet and dry octanol and air. $\Delta {H^{ \circ }_{\text{OA}}}$ values used to calibrate those models were cross-referenced against the original sources cited by Mintz et al. [31]. During this process, we found more directly measured $\Delta {H^{ \circ }_{\text{OA}}}$ values in the literature. The $\Delta {H^{ \circ }_{\text{OA}}}$ values were subsequently converted to $\Delta {U^{ \circ }_{\text{OA}}}$ using Eq. 2. We took the average of the $\Delta {U^{ \circ }_{\text{OA}}}$ value when more than one literature value was available.

Any chemical from the Mintz et al. dataset of $\Delta {H^{ \circ }_{\text{OA}}}$ values that already had a $\Delta {U^{ \circ }_{\text{OA}}}$ value calculated from the K_OA database was excluded, because these $\Delta {H^{ \circ }_{\text{OA}}}$ values were direct measurements made using calorimetric measurement or were based on log₁₀ K_OA data already included in the K_OA database [12]. This way preference was given to values obtained from multiple direct measurements of log₁₀ K_OA over individual calorimetric measurements. 72 chemicals with directly measured $\Delta {H^{ \circ }_{\text{OA}}}$ values (converted to $\Delta {U^{ \circ }_{\text{OA}}}$) were included in our dataset.

2.2 Chemical Identifiers, Descriptors, and Data Splitting

The chemical names, acronym, CAS number and SMILES notation for all 195 chemicals in the $\Delta {U^{ \circ }_{\text{OA}}}$ dataset were obtained from either the K_OA database or from the CompTox Dashboard. We considered three types of descriptors for estimating $\Delta {U^{ \circ }_{\text{OA}}}$: (i) the log₁₀ K_OA at 25 °C, (ii) Abraham descriptors, and (iii) PaDEL descriptors.

Abraham solute descriptors (E, S, A, B, V, and L) were obtained from the UFZ-LSER database [26]. For experimental solute descriptors, we gave preference to UFZ pre-selected values over ABSOLV values. Estimated solute descriptors for all chemicals were calculated from their SMILES using the IFS-QSARs built into EAS-E Suite [28, 29]. 178 chemicals had a full set of experimental solute descriptors, 4 chemicals were only missing experimental E values, and 13 chemicals had only estimated solute descriptors.

1444 1D and 2D PaDEL descriptors and 881 PubChem fingerprints (v2.21) were obtained from the chemicals’ SMILES notation using QSARINS [30]. 3D molecular descriptors were not included, because while the chemicals in the $\Delta {U^{ \circ }_{\text{OA}}}$ dataset are structurally simple, requiring 3D geometrically optimized chemical structures (e.g., SDF or MOL) and 3D optimization cannot be efficiently scaled up for high throughput applications and as such is not ideal for property estimations in the context of chemical risk assessment. Filters were applied to the PaDEL descriptors, wherein descriptors that had pair-wise correlation greater than 95% or where more than 80% of the descriptor was constant for the whole dataset were excluded. We also removed all descriptors that had missing values. In the end 531 PaDEL descriptors and fingerprints were considered, which included PaDEL calculated Abraham descriptors and 32 PubChem fingerprints. PubChem Fingerprints were excluded from model development and only used in a cluster analysis for data splitting as described later in this section.

Directly measured log₁₀ K_OA values were extracted from the K_OA database [12] and the average used when more than one experimental value existed. Estimated log₁₀ K_OA values were calculated using the experimental and calculated Abraham solute descriptors and the 4-parameter ppLFER for K_OA [14].

The curated $\Delta {U^{ \circ }_{\text{OA}}}$ dataset was split into a model development and an external validation dataset based on the availability of descriptors for each chemical. 54 chemicals that had either some missing experimental Abraham solute descriptors or an estimated log₁₀ K_OA values were set aside to be used for external validation. The remaining 141 chemicals were used to develop multiple linear regression (MLR) models using the descriptors.

In order to reduce bias in descriptor selection, we split the development dataset using four different splitting techniques, with a ratio of 3:1 (i.e., 75% of chemicals were used in the training set and 25% in the validation set) [44]. In the first and second splits, chemicals were ordered by $\Delta {U^{ \circ }_{\text{OA}}}$ and log₁₀ K_OA at 25 °C, respectively, and the chemicals with the highest and lowest value were included in the training dataset. In the random split, 75% of chemicals were randomly selected to be a part of the training dataset. We used cluster analysis (Ward’s method and Tanimoto distance) using PubChem fingerprints and Principal Component Analysis (PCA) of experimental Abraham solute descriptors to group chemicals in the model development dataset into 4 structural clusters (see Figs. SI 11 and SI 12). From each cluster we randomly selected 75% of the chemicals to be used as a training dataset. Figure 1 shows how the chemical datasets were split and used to develop $\Delta {U^{ \circ }_{\text{OA}}}$ models. Table SI 2 contains $\Delta {U^{ \circ }_{\text{OA}}}$ values, descriptors, and splits for all chemicals in the $\Delta {U^{ \circ }_{\text{OA}}}$ dataset.

2.3 Model Calculation and Selection

Model calculation and selection were completed using QSARINS [30] on the four different splits of the model development dataset. For each split we developed different models based on the different kind of molecular descriptors. First, we explored the traditional ppLFER equations assessing all combinations (all-subsets with 1 to 5 descriptors) using experimental Abraham descriptors. Then, we substituted the Abraham L with the experimental log₁₀ K_OA at 25 °C, as both variables describe partitioning between the gas phase and a hydrophobic condensed phase. Given the high number of PaDEL descriptors, we used the all-subset (up to 2 variables) and the Genetic Algorithm (up to 4 variables, parameters are included in the SI) built into QSARINS to identify the best PaDEL descriptors combination. Finally, we assessed the performance of the model using the two PaDEL descriptors used by the OPERA model for estimating log₁₀ K_OA, namely the number of hydrogen bond donors (nHBDon) and log₁₀ hexadecane-air partition ratio (L) [25].

During model development and selection, we filtered out models with p-values for the regression coefficients higher than 0.05, as we could not be confident that the values of these coefficients were different from 0. We restricted the maximum number of variables (i.e., descriptors) to be included in a regression equation to 4 to ensure models remained relatively simple and to avoid overfitting issues. Model performance across all four splits was assessed using different variations of the determination coefficient (${\text{R}^{2} _{\text{adj}}}$, ${Q^{2} _{\text{Loo}}}$, ${Q^{2} _{\text{LMO}}}$, ${Q^{2} _{\text{F1}}}$, ${Q^{2} _{\text{F2}}}$, ${Q^{2} _{\text{F3}}}$), the concordance correlation coefficients (CCC_EXT), root mean squared errors (RMSE_TR, RMSE_EXT) and mean absolute errors (MAE_TR, MAE_EXT) on the training set and on the prediction/external set [45,46,47,48]. Table SI 1 summarises these statistics for the models within each split.

After identifying the best models for different types of descriptors, we used all 141 $\Delta {U^{ \circ}_{\text{OA}}}$ values as the training set to develop new equations using the same descriptors (i.e., the so-called full model). These models were assessed using the 54 chemicals in the external validation dataset. It is important to note that in the external validation set, not all chemicals have empirical descriptors namely for A, L, and log₁₀ K_OA at 25 °C. When experimental A and L values were not available, IFS-QSAR estimated solute descriptors were used. Log₁₀ K_OA values at 25 °C were estimated using the 4-parameter ppLFER equation for K_OA [14] using experimental solute descriptors. All PaDEL descriptors are calculated from the SMILES notation of a chemical and so all descriptors are considered estimates.

3 Results and Discussion

3.1 Data Availability and the General Applicability Domain

Analysis of the log₁₀ K_OA database showed a bimodal distribution in the experimental data [12]. Given the high correlation between log₁₀ K_OA and $\Delta {U^{ \circ }_{\text{OA}}}$ [14], it is not surprising that the same trend is observed for the curated $\Delta {U^{ \circ }_{\text{OA}}}$ dataset (Fig. 2). This pattern arises because most $\Delta {U^{ \circ }_{\text{OA}}}$ values in this dataset are derived from temperature dependent measurements of the log₁₀ K_OA and thus limitations on the experimental techniques for log₁₀ K_OA are extended to the available $\Delta {U^{ \circ }_{\text{OA}}}$ measurements. Direct measurements of $\Delta {U^{ \circ }_{\text{OA}}}$ using the calorimetric techniques are applied to small volatile compounds and the $\Delta {U^{ \circ }_{\text{OA}}}$ of these compounds are less negative and fall within a limited range. The most common class of chemicals in the $\Delta {U^{ \circ }_{\text{OA}}}$ dataset are PCBs (Fig. SI 13), followed by polychlorinated naphthalenes (PCNs), alcohols, and ketones.

Most chemicals in the $\Delta {U^{ \circ }_{\text{OA}}}$ dataset have A values of 0 and B values less than 0.6 (Fig. SI 14), however only 36 (~ 18%) chemicals have both A and B values equal to 0. The polarity of a chemical (defined as aA + bB, where the system descriptors are from the four parameter ppLFER equation for K_OA [12, 14]), shows there is some variability in the hydrogen bonding ability with octanol of the chemicals in the $\Delta {U^{ \circ }_{\text{OA}}}$ dataset. However, there are few very polar compounds. Descriptor values for E and S are more widely distributed, while the distributions of the L and V descriptors resembles the distribution of $\Delta {U^{ \circ }_{\text{OA}}}$ values.

3.1.1 Perfluoroalkyl Substances (PFAS)

The partitioning properties of PFAS compounds are extremely difficult to measure because they can act as surfactants and likely have a very low solubility in octanol. Data from different studies regularly display divergent results and it is not possible to establish which ones are correct.

Most fluorotelomer alcohols (FTOHs) have log₁₀ K_OA values around 5, which is at the lower limits of the generator column technique. They are also too polar to apply the gas-chromatographic retention time technique [49]. In combination, this means obtaining reliable measurements for the K_OA of FTOHs is quite difficult. Figure SI 15 compares the $\Delta {U^{ \circ}_{\text{OA}}}$ values of FTOHs calculated from K_OA measurements reported by Goss et al. [50] and Thuens et al. [51] (temperature regression plots in Fig. SI 10). We would expect that $\Delta {U^{ \circ}_{\text{OA}}}$ values for FTOHs become more negative with increasing chain length, because the K_OA values increase with chain length. While this occurs for $\Delta {U^{ \circ}_{\text{OA}}}$ values from Goss et al. [50], $\Delta {U^{ \circ}_{\text{OA}}}$ declines with increasing chain length when using the data from Thuens et al. [51]. As the $\Delta {U^{ \circ}_{\text{OA}}}$ values from Goss et al. [50] have been calculated from measurements at only two (4:2 FTOH) and three (6:2 FTOH and 8:2 FTOH) temperatures, there is insufficient data to include $\Delta {U^{ \circ}_{\text{OA}}}$ values for FTOHs.

The temperature dependence of the K_OA for methyl and ethyl perfluorooctane sulfonamido ethanols (Me-FOSE and Et-FOSE) reported by two studies [52, 53] and therefore also the $\Delta {U^{ \circ}_{\text{OA}}}$ derived from those data are quite different (Fig. SI 8). Dreyer et al. [52] also published K_OA data on methyl and ethyl perfluorooctane sulfonamides (Me-FOSA and Et-FOSA), whereby the plot suggests that the values obtained at very low and high temperatures may be outliers. Given the uncertainty of the K_OA values for Me- FOSA, Et-FOSA, and Et-FOSE, we have also excluded those for Me-FOSE as there is no way for us to tell whether they are valid. Similar to Et-FOSE, the log₁₀ K_OA reported by Dreyer et al. [52] for 6:2 and 8:2 fluorotelomer acrylates (6:2 FTAc and 8:2 FTAc) also display poor linearity with reciprocal temperature (R² < 0.9) and therefore we chose to exclude also the K_OA data for 10:2 FTAc from the $\Delta {U^{ \circ}_{\text{OA}}}$ dataset (Fig. SI 9). Ultimately, we decided to exclude all PFAS compounds from the data set.

3.2 Model Selection

3.2.1 Model Prioritization

As not all chemicals with $\Delta {U^{ \circ}_{\text{OA}}}$ values had all available descriptors, a subset of 141 chemicals was used to test and identify the best performing models within each split. We selected five models to examine in detail. Those are the regressions using (i) A and L, (ii) only the log₁₀ K_OA at 25 °C, (iii) A and log₁₀ K_OA at 25 °C, (iv) nHBDon and L_PaDEL, and (v) nHBDon, number of carbons, nC, and number of halogens, nX. As initially hypothesized, the log₁₀ K_OA at 25 °C alone proved to be a very good predictor of $\Delta {U^{ \circ}_{\text{OA}}}$ (Table SI 1). Meanwhile the model using the Abraham solute descriptors A and L performed almost as well. The log₁₀ hexadecane-air partition ratio L can describe the non-polar interactions between the compound and the solvent while the hydrogen bond acidity A describes the potential for a chemical to donate a hydrogen. These two parameters describe similar chemical interactions as the two PaDEL descriptors L_PaDEL and nHBDon, used to predict log₁₀ K_OA in the OPERA model.

The model selected with the Genetic Algorithm using only PaDEL descriptors uses nHBDon, nC, and nX. These descriptors are not able to describe chemicals to the same extent Abraham descriptors do and their selection is a consequence of the data used to train the models. This is discussed in more detail in the following section.

All five models performed well during internal and external validation processes across all four splits (Table SI 1). The ${\text{R}^{2} _{\text{adj}}}$ describes the R² while correcting for the number of descriptors in a model. The RMSE_TR and MAE_TR was in almost all cases slightly smaller than the RMSE_EXT and MAE_EXT, however given the range of the $\Delta {U^{ \circ}_{\text{OA}}}$ values used in model development, the difference is likely unremarkable. Internal cross validation using leave-one-out (${Q^{2} _{\text{LOO}}}$) and leave-more-out (${Q^{2} _{\text{LMO}}}$) was very similar to the R² and ${\text{R}^{2} _{\text{adj}}}$ for all models (> 0.90). However, the external predictive ability of the models (${Q^{2} _{\text{F1}}}$, ${Q^{2} _{\text{F2}}}$, ${Q^{2} _{\text{F3}}}$) were slightly lower than the R². The CCC_EXT, describing both the precision and accuracy of all models was almost always greater than 0.90.

3.2.2 External Validation

Of the 54 chemicals in the external validation dataset, 17 chemicals had experimental log₁₀ K_OA values and 41 chemicals had experimental A and L solute descriptors that were retrieved from the UFZ-LSER database. All chemicals without experimental log₁₀ K_OA values had all the experimental solute descriptors necessary to estimate log₁₀ K_OA using the four parameter ppLFER equation [14]. As the PaDEL descriptors for all chemicals were estimated, we did not differentiate between the source of A, L, and log₁₀ K_OA at 25 °C used to externally validate the model, however wherever possible we used experimental log₁₀ K_OA values and solute descriptors. Full model equations are listed in Table 1.

Table 1 Proposed equations for calculating the $\Delta {U^{ \circ}_{\text{OA}}}$ (kJ·mol⁻¹) of an organic compound

Full size table

By using all 141 chemicals to train the models and 54 chemicals to externally validate the models we see that Model 3 using log₁₀ K_OA and A had the best overall performance (Tables 2, 3). Model 3 has the highest overall R², ${\text{R}^{2} _{\text{adj}}}$, ${Q^{2} _{\text{F1}}}$, ${Q^{2} _{\text{F2}}}$, ${Q^{2} _{\text{F3}}}$, and CCC_EXT. Model 2, using log₁₀ K_OA, performs almost as well as Model 3, followed by Model 1. Models 4 and 5 had the poorest performance.

Table 2 Internal validation results for the five selected models with all 141 chemicals in the training dataset

Full size table

Table 3 External validation results for the five selected models with all 141 chemicals in the training dataset and 54 chemicals used for external validation

Full size table

In Fig. 3, we can see that all models generally perform better at higher $\Delta {U^{ \circ}_{\text{OA}}}$ (i.e., less negative) values. Models 4 and 5 appear to have a larger number of outliers (> 10 kJ·mol⁻¹) at lower $\Delta {U^{ \circ}_{\text{OA}}}$ values relative to Models 1, 2, and 3; however, it is important to recognize that this relative difference between predicted and experimental values is smaller for increasingly negative $\Delta {U^{ \circ}_{\text{OA}}}$ values. An error of − 10 kJ·mol⁻¹ for a $\Delta {U^{ \circ}_{\text{OA}}}$ of − 90 kJ·mol⁻¹ corresponds to ~ 11% whereas the same error for a $\Delta {U^{ \circ}_{\text{OA}}}$ of − 30 kJ·mol⁻¹ is ~ 33%.

In Fig. SI 16 we present Williams plots of the standardized residual plotted against the leverage. The standardized residual (sr_i) of a chemical i is obtained by correcting residuals (r_i) using the standard deviation (sd) of the model and the leverage of a prediction (h_ii):

$${sr}_{i}=\frac{{r}_{i}}{\mathrm{sd}\cdot \sqrt{1-{h}_{ii}}}$$

(5)

Standardized residual values are considered high when the absolute value is greater than 2.5 [44]. The leverage of a training chemical indicates how much influence it has on the regression; chemicals with high leverages have a large influence on the model [44]. A chemical has a high leverage value if the leverage is greater than the warning leverage (h^*) which is a function of the number of chemicals in the training dataset (n) and the number of parameters in the model (k) [44]:

$${h}^{*}=3\cdot \frac{k+1}{n}$$

(6)

Standardized residuals and leverage values were obtained from QSARINS.

Considering only the number of chemicals with high leverage and high standardized residuals (Table 3), Model 2 has no chemical in the training set with a large leverage value and it has only 7 chemicals with high standardized residuals. The largest standardized residual was for 1,2,3,4-tetramethylbenzene in the external validation dataset; however, the log₁₀ K_OA (4.4) measured using the variable headspace ratio technique is known to be at the upper limit of the technique [41]. Thus, it likely that this high standardized residual value is due to the high uncertainty of the reported K_OA, $\Delta {U^{ \circ}_{\text{OA}}}$ or both, rather than an error in the model.

Models 1 and 3 had the same chemicals (external validation and training chemicals) with high leverage values. This high leverage is driven entirely by the A parameter used in both models. Chemicals with A greater than 0.35 have high leverage values. This is because most chemicals in the $\Delta {U^{ \circ}_{\text{OA}}}$ dataset have A values equal to 0. For example, methanoic acid (in the external validation dataset) has both a high leverage values and high standardized residuals in Models 1 and 3 and has the highest A value of 0.76.

Many of the same chemicals with high leverage values in Models 1 and 3 also had high leverage values in Model 4, driven by the nHBDon values. While nHBDon is a countable parameter, the $\Delta {U^{ \circ}_{\text{OA}}}$ dataset only includes chemicals with either nHBDon values equal to 0 or 1. Thus, all chemicals with an nHBDon value equal to 1 had a high leverage value. Nonetheless, these chemicals had low standardized residuals. The three chemicals with high leverage in Model 5 (2 training and 1 external validation compounds) all have long carbon chains. However, they are not the only chemicals of that size.

There was also an overlap in chemicals with high standardized residuals between models. Fluorene had a high standardized residual in all models. PCB 155 and 1,2,3,4-tetramethylbenzene had high standardized residuals in all models except Model 1 and 4, respectively. The standardized residuals for BDE 154 and trans-nonachlor was high in Models 1, 2, and 3, while β-HCH and δ-HCH had high standardized residuals in Models 1, 4, and 5. Heptachlor had high standardized residuals in model 1 and 5, and PCBs 61 and 77 had high standardized residuals in models 2 and 3.

The internal validation results presented in Table 2 show Models 1, 2, and 3 perform better than Models 4 and 5 based on R², ${\text{R}^{2} _{\text{adj}}}$, RMSE_TR, MAE_TR, RMSE_CV, MAE_CV, ${Q^{2} _{\text{LOO}}}$, and ${Q^{2} _{\text{LMO}}}$. Model 5 has the highest RMSE_EXT followed by Model 1. Model 5 also has the highest MAE_EXT and lowest ${Q^{2} _{\text{F1-F3}}}$ and CCC_EXT. Despite Model 4’s comparatively poor internal validation results, the external validation results (RMSE_EXT, MAE_EXT, ${Q^{2} _{\text{F1-F3}}}$, CCC_EXT) for this model are similar if not better than results for Models 1, 2, and 3.

While Models 1 and 3 have excellent internal and external validation results, the applicability domain of these models is limited by the A descriptor. Model 4’s internal validation results are poor compared to the other models, and are limited to chemicals with nHBDon values of 0. Model 5 was the worst performing model with high internal and external errors (RMSE and MAE values). Model 2 is consistently one of the best models when considering the internal and external validations, the number of outliers (high standardized residuals) and the number of chemicals with high leverage. Thus, the log₁₀ K_OA at 25 °C (Model 2) is generally sufficient to make a quick and reliable estimation of the $\Delta {U^{ \circ}_{\text{OA}}}$.

3.3 Comparison with Other $\Delta {U^{ \circ}_{\text{OA}}}$ Models

The $\Delta {H^{ \circ}_{\text{OA}}}$ ppLFER model by Mintz et al. [19] has been shown to reliably predict the temperature dependence of K_OA [14]. We compared the performance of this ppLFER for $\Delta {H^{ \circ}_{\text{OA}}}$ [19], converted to $\Delta {U^{ \circ}_{\text{OA}}}$, with Model 2. Since both models are built on similar datasets and an evaluation with chemicals that had been used to train and develop the models would bias the results, we rely only on chemicals not used in the training dataset for either model. This limits the comparison to 28 chemicals including several BDEs and CDDs. The training data for the Mintz et al. [19] model includes 138 chemicals and experimental solute descriptors and experimental $\Delta {H^{ \circ}_{\text{OA}}}$ values for all chemicals are provided in a previous publication [31]. During our data curation process we found small errors in the reported $\Delta {H^{ \circ}_{\text{OA}}}$ and misidentification of a few chemicals which means some of the descriptors used are erroneous. The Mintz et al. [19] model also includes inorganic chemicals (e.g., nitrogen and xenon) which were excluded from our model development. Due to these differences, we compare the statistics of the model residuals (Table 4) and the residual of each prediction (Fig. 4).

Table 4 Statistics on the residual (kJ·mol⁻¹) for $\Delta {U^{ \circ}_{\text{OA}}}$ predictions made using Model 2 (log₁₀ K_OA) and the Mintz et al. [19] model for chemicals not included in the training set of either model (n = 28) and all chemicals in the $\Delta {U^{ \circ}_{\text{OA}}}$ dataset (n = 195)

Full size table

For both models we use the best available descriptors (i.e., experimental values wherever possible and otherwise IFS-QSAR estimated values). In this combined external validation process (Table 4), Model 2 has a smaller bias (average residual), MAE, standard deviation (SD), and RMSE. The Mintz et al. [19] model has a larger number of chemicals with errors exceeding 10 kJ·mol⁻¹ and Model 2 has more $\Delta {U^{ \circ}_{\text{OA}}}$ estimates with absolute errors less than 5 kJ·mol⁻¹ (Fig. 4).

If we compare the performance of the Mintz et al. [19] model and Model 2 when using all 195 chemicals in the $\Delta {U^{ \circ}_{\text{OA}}}$ dataset, we see that Model 2 performs better than the Mintz et al. [19] model (Table 4). While there is little difference between the statistics on the residuals, fewer chemicals have residuals greater than 10 kJ·mol⁻¹ when using Model 2. Both models overestimated the $\Delta {U^{ \circ}_{\text{OA}}}$ by at least 10 kJ·mol⁻¹ for fluorene, PCN 57, cis-chlordane, trans-chlordane, and trans-nonachlor—all chemicals within the training data set of Model 2, apart from trans-nonachlor. The models also underestimated the $\Delta {U^{ \circ}_{\text{OA}}}$ for CDD 1, 1,2,3,4-tetramethylbenzene, BDE 17, BDE 28, BDE 154, PCB 61, PCB 155, of which CDD 1 and BDEs 17, 28, and 154 were in the external validation dataset for Model 2.

In addition to the model by Mintz et al. [19], there is a ppLFER for wet-octanol air $\Delta {H^{ \circ}}$ values [31] and models for $\Delta {H^{ \circ}_{\text{OA}}}$ using a support vector machine method, artificial neural networks, and MLRs based on various molecular descriptors [54]. The wet-octanol air $\Delta {H^{ \circ}_{\text{OA}}}$ model is based on the same dataset used to develop the Mintz et al. [19] model above [31]. The latter three models also use the same dataset as a basis for model development but randomly split their dataset of 127 chemicals into a training dataset of 89 chemicals and validation dataset of 38 chemicals [54]. The reported correlation coefficients for these models [31, 54] are similar to, or greater than what we report for Model 2 and the standard errors of the models developed using a support vector machine method and artificial neural networks is smaller than the RMSE of Model 2. However, these models cannot be used easily in high throughput applications because acquiring the molecular descriptors requires multiple software applications [54], which can be time-consuming and lends itself to increased chances of human error.

3.4 Sources for Descriptors

While we used only experimentally derived solute descriptors to develop and train the models in this work, the source of solute descriptors may impact model performance. Sources for the descriptors include experimental values from the UFZ-LSER database [26], IFS-QSAR estimates from the standalone IFS-QSAR software [27, 29] or EAS-E Suite [28], or PaDEL estimates made using QSARINS [30]. The first two are easily obtained from non-commercial web-based software. In Fig. 5 we compare Model 2’s estimated $\Delta {U^{ \circ}_{\text{OA}}}$ if only experimental, IFS-QSAR predicted, or PaDEL predicted solute descriptors are used. The log₁₀ K_OA at 25 °C used by Model 2 is calculated using the ppLFER equation with the 4 parameters S, A, B, and L [14]. Because not all chemicals have experimental solute descriptors, fewer estimations are available for comparison. Table 5 shows that the IFS-QSAR-estimated solute descriptors perform almost as well as experimental solute descriptors. Regardless of which set of descriptors is used, $\Delta {U^{ \circ}_{\text{OA}}}$ values of 1-decanol, β-HCH, δ-HCH, fluorene, dieldrin, BDE 29, PCB 77, and PCN 57 are overestimated and $\Delta {U^{ \circ}_{\text{OA}}}$ values of 1,2,3,4-tetramethylbenzene, BDE 154, PCB 61, PCN 180, and PCB 155 are underestimated. Clearly, the IFS-QSARs are preferable over PaDEL for predicting solute descriptors, considering the smaller associated MAE, standard deviation, and RMSE for the former.

Table 5 Statistics on the residuals for the performance of Model 2 using estimated K_OA values using experimental, IFS-QSAR-predicted, and PaDEL-predicted Abraham descriptors

Full size table

3.5 The Relationship Between log₁₀ K and ΔU°

The strong relationship between log₁₀ K_OA and $\Delta {U^{ \circ}_{\text{OA}}}$ in Model 2 mirrors similar relationships between vapour pressure (log₁₀ P_L) and the enthalpy of vaporization ($\Delta {H^{ \circ}_{\text{vap}}}$) and between adsorption constants (log₁₀ K_ads) and the enthalpies of adsorption ($\Delta {H^{ \circ}_{\text{ads}}}$) [20]. Goss and Schwarzenbach cite other such relationships and provide a thermodynamic explanation for their occurrence [20]. The equation relating $\Delta {H^{ \circ}_{\text{vap}}}$ and P_L [20] can be combined with a linear equation relating a chemical’s P_L with its log₁₀ K_OA [55] and a conversion to $\Delta {U^{ \circ}_{\text{vap}}}$ ($\Delta {U^{ \circ}_{\text{vap}}}$ = $\Delta {H^{ \circ}_{\text{vap}}}$ − RT) [13]:

$$\Delta {U^{\circ }_{\mathrm{vap}}}=8.90\cdot {\mathrm{log}}_{10}{K}_{\mathrm{OA}}+10.42 - {RT}$$

(7)

The slope in this equation is very similar to the absolute value of the slope in Model 2 (− 8.75). This is apparent in a vertical shift when estimated $\Delta {U^{ \circ}_{\text{OA}}}$ and $- \Delta {U^{ \circ}_{\text{vap}}}$ values are plotted against experimental $\Delta {U^{ \circ}_{\text{OA}}}$ values (Fig. SI 18). The difference in Eq. 2 in Table 1 and Eq. 7 corresponds to the internal energy of dissolution of the chemical in octanol (i.e., $\Delta {U^{ \circ}_{\text{dissol}}}$ = $\Delta {U^{ \circ}_{\text{OA}}}$ + $\Delta {U^{ \circ}_{\text{vap}}}$). This analysis suggests that the $\Delta {U^{ \circ}_{\text{dissol}}}$ is very small and is relatively constant for compounds with a wide range in K_OA. The observed relationship between K_OA and $\Delta {U^{ \circ}_{\text{OA}}}$ can perhaps be extrapolated to other partitioning systems such as K_OW and $\Delta {U^{ \circ}_{\text{OW}}}$. Indeed, if $\Delta {U^{ \circ}_{\text{dissol}}}$ is a small and constant for a wide range of compounds it may be possible to estimate $\Delta {U^{ \circ}_{\text{OW}}}$ from the enthalpy of dissolution of a chemical in water or even the water solubility of a compound. However large datasets with experimental values of each property are necessary before such a relationship can be verified.

4 Conclusion

We developed new models for predicting the $\Delta {U^{ \circ}_{\text{OA}}}$ which use fewer descriptors than earlier models. The best performing model (Model 2) relies only on the log₁₀ K_OA at 25 °C and additional parameters do not notably improve model performance. This model has similar or slightly improved performance relative to previous estimation techniques for $\Delta {H^{ \circ}_{\text{OA}}}$ and $\Delta {U^{ \circ}_{\text{OA}}}$, which relied on more or more complex descriptors. Our work parallels previous findings that that $\Delta {H^{ \circ}_{\text{vap}}}$ can be predicted quite well just from P_L [20]. Further measurements of log₁₀ K_OA and $\Delta {U^{ \circ}_{\text{OA}}}$ values could improve the applicability domain of these models particularly for chemicals with log₁₀ K_OA values between 4 and 6 ($\Delta {U^{ \circ}_{\text{OA}}}$ values between − 70 and − 50 kJ·mol⁻¹), and for more polar compounds with more complex hydrogen bonding abilities.

Data Availability

All data generated or analysed during this study are included in this published article, its supplementary information files, and in previously published cited works.

References

Harner, T., Green, N.J.L., Jones, K.C.: Measurements of octanol−air partition coefficients for PCDD/Fs: a tool in assessing air−soil equilibrium status. Environ. Sci. Technol. 34, 3109–3114 (2000)
Article CAS Google Scholar
Hippelein, M., McLachlan, M.S.: Soil/air partitioning of semivolatile organic compounds. 2. Influence of temperature and relative humidity. Environ. Sci. Technol. 34, 3521–3526 (2000)
Article CAS Google Scholar
Hiatt, M.H.: Leaves as an indicator of exposure to airborne volatile organic compounds. Environ. Sci. Technol. 33, 4126–4133 (1999)
Article CAS Google Scholar
Paterson, S., Mackay, D., Bacci, E., Calamari, D.: Correlation of the equilibrium and kinetics of leaf-air exchange of hydrophobic organic chemicals. Environ. Sci. Technol. 25, 866–871 (1991)
Article CAS Google Scholar
Harner, T., Bidleman, T.F.: Octanol−air partition coefficient for describing particle/gas partitioning of aromatic compounds in urban air. Environ. Sci. Technol. 32, 1494–1502 (1998)
Article CAS Google Scholar
Kaupp, H., McLachlan, M.S.: Gas/particle partitioning of PCDD/Fs, PCBs, PCNs and PAHs. Chemosphere 38, 3411–3421 (1999)
Article CAS Google Scholar
Batterman, S., Zhang, L., Wang, S., Franzblau, A.: Partition coefficients for the trihalomethanes among blood, urine, water, milk and air. Sci. Total Environ. 284, 237–247 (2002)
Article CAS Google Scholar
Davie-Martin, C.L., Hageman, K.J., Chin, Y.-P., Rougé, V., Fujita, Y.: Influence of temperature, relative humidity, and soil properties on the soil–air partitioning of semivolatile pesticides: laboratory measurements and predictive models. Environ. Sci. Technol. 49, 10431–10439 (2015)
Article CAS Google Scholar
Shoeib, M., Harner, T.: Using measured octanol-air partition coefficients to explain environmental partitioning of organochlorine pesticides. Environ. Toxicol. Chem. 21, 984–990 (2002)
Article CAS Google Scholar
Qiao, L.-N., Hu, P.-T., Macdonald, R., Kannan, K., Nikolaev, A., Li, Y.-F.: Modeling gas/particle partitioning of polybrominated diphenyl ethers (PBDEs) in the atmosphere: a review. Sci. Total Environ. 729, 138962 (2020)
Article CAS Google Scholar
Kelly, B.C., Gobas, F.A.P.C.: An arctic terrestrial food-chain bioaccumulation model for persistent organic pollutants. Environ. Sci. Technol. 37, 2966–2974 (2003)
Article CAS Google Scholar
Baskaran, S., Lei, Y.D., Wania, F.: A database of experimentally derived and estimated octanol–air partition ratios (K_OA). J. Phys. Chem. Ref. Data. 50, 043101 (2021)
Article CAS Google Scholar
Goss, K.-U., Eisenreich, S.J.: Adsorption of VOCs from the gas phase to different minerals and a mineral mixture. Environ. Sci. Technol. 30, 2135–2142 (1996)
Article CAS Google Scholar
Baskaran, S., Lei, Y.D., Wania, F.: Reliable prediction of the octanol–air partition ratio. Environ. Toxicol. Chem. 40, 3166–3180 (2021)
Article CAS Google Scholar
Endo, S., Goss, K.-U.: Applications of polyparameter linear free energy relationships in environmental chemistry. Environ. Sci. Technol. 48, 12477–12491 (2014)
Article CAS Google Scholar
Abraham, M.H., Acree, W.E.: Comparison of solubility of gases and vapours in wet and dry alcohols, especially octan-1-ol. J. Phys. Org. Chem. 21, 823–832 (2008)
Article CAS Google Scholar
Endo, S., Goss, K.-U.: Predicting partition coefficients of polyfluorinated and organosilicon compounds using polyparameter linear free energy relationships (PP-LFERs). Environ. Sci. Technol. 48, 2776–2784 (2014)
Article CAS Google Scholar
Jin, X., Fu, Z., Li, X., Chen, J.: Development of polyparameter linear free energy relationship models for octanol–air partition coefficients of diverse chemicals. Environ. Sci. Process. Impacts 19, 300–306 (2017)
Article CAS Google Scholar
Mintz, C., Burton, K., Ladlie, T., Clark, M., Acree, W.E., Abraham, M.H.: Enthalpy of solvation correlations for gaseous solutes dissolved in dibutyl ether and ethyl acetate. Thermochim. Acta 470, 67–76 (2008)
Article CAS Google Scholar
Goss, K.-U., Schwarzenbach, R.P.: Empirical prediction of heats of vaporization and heats of adsorption of organic compounds. Environ. Sci. Technol. 33, 3390–3393 (1999)
Article CAS Google Scholar
MacLeod, M., Scheringer, M., Hungerbühler, K.: Estimating enthalpy of vaporization from vapor pressure using Trouton’s rule. Environ. Sci. Technol. 41, 2827–2832 (2007)
Article CAS Google Scholar
Abraham, M.H., Duce, P.P., Morris, J.J., Taylor, P.J.: Hydrogen bonding. Part 2.–Equilibrium constants and enthalpies of complexation for 72 monomeric hydrogen-bond acids with N-methylpyrrolidinone in 1,1,1-trichloroethane. J. Chem. Soc. Faraday Trans. I Phys. Chem. Condens. Phases 83, 2867 (1987)
CAS Google Scholar
Mansouri, K., Grulke, C.M., Judson, R.S., Williams, A.J.: OPERA models for predicting physicochemical properties and environmental fate endpoints. J. Cheminform. 10, 1–19 (2018)
Article Google Scholar
Yap, C.W.: PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J. Comput. Chem. 32, 1466–1474 (2011)
Article CAS Google Scholar
Mansouri, K., Williams, A.: QMRF - Title: K_OA model for the octanol/air partition coefficient prediction from OPERA models (2017)
Ulrich, N., Endo, S., Brown, T., Bronner, G., Abraham, M.H., Goss, K.-U.: UFZ-LSER database v 3.2. http://www.ufz.de/lserd (2017)
Brown, T.N.: QSPRs for predicting equilibrium partitioning in solvent–air systems from the chemical structures of solutes and solvents. J. Solution. Chem. 51, 1101–1132 (2022)
Article CAS Google Scholar
ARC Arnot Research and Consulting Inc.: Exposure and Safety Estimation (EAS-E) Suite. ARC Arnot Research and Consulting Inc, Toronto (2022)
Brown, T.N.: IFSQSAR - A python package for applying QSARs. ver. 1.0. https://github.com/tnbrowncontam/ifsqsar (2022)
Gramatica, P., Chirico, N., Papa, E., Cassani, S., Kovarich, S.: QSARINS: a new software for the development, analysis, and validation of QSAR MLR models. J. Comput. Chem. 34, 2121–2132 (2013)
Article CAS Google Scholar
Mintz, C., Clark, M., Acree, W.E., Abraham, M.H.: Enthalpy of solvation correlations for gaseous solutes dissolved in water and in 1-octanol based on the Abraham model. J. Chem. Inf. Model. 47, 115–121 (2007)
Article CAS Google Scholar
Berti, P., Cabani, S., Conti, G., Mollica, V.: Thermodynamic study of organic compounds in octan-1-ol. Processes of transfer from gas and from dilute aqueous solution. J. Chem. Soc. Faraday Trans. I Phys. Chem. Condens. Phases 82, 2547–2556 (1986)
CAS Google Scholar
Cheong, W.J.: Measurements of limiting activity coefficients of homologous series of solutes and their application to the study of retention mechanism in reversed phase liquid chromatography. University of Minnesota, PhD Thesis (1989)
Eger, E.I.I., Ionescu, P., Laster, M.J., Gong, D., Hudlicky, T., Kendig, J.J., Harris, R.A., Trudell, J.R., Pohorille, A.: Minimum alveolar anesthetic concentration of fluorinated alkanols in rats: relevance to theories of narcosis. Anesth. Analg. 88, 867–876 (1999)
Article CAS Google Scholar
Fang, Z., Ionescu, P., Chortkoff, B.S., Kandel, L., Sonner, J., Laster, M.J., Eger, E.I.I.: Anesthetic potencies of n-alkanols: results of additivity and solubility studies suggest a mechanism of action similar to that for conventional inhaled anesthetics. Anesth. Analg. 84, 1042–1048 (1997)
Article CAS Google Scholar
Fang, Z., Sonner, J., Laster, M.J., Ionescu, P., Kandel, L., Koblin, D.D., Eger, E.I.I., Halsey, M.J.: Anesthetic and convulsant properties of aromatic compounds and cycloalkanes: implications for mechanisms of narcosis. Anesth. Analg. 83, 1097–1104 (1996)
Article CAS Google Scholar
Hiatt, M.H.: Analyses of fish tissue by vacuum distillation/gas chromatography/mass spectrometry. Anal. Chem. 69, 1127–1134 (1997)
Article CAS Google Scholar
Rohrschneider, L.: Solvent characterization by gas-liquid partition coefficients of selected solutes. Anal. Chem. 45, 1241–1247 (1973)
Article CAS Google Scholar
Taheri, S., Laster, M.J., Liu, J., Eger, E.I.I., Halsey, M.J., Koblin, D.D.: Anesthesia by n-alkanes not consistent with the Meyer–Overton hypothesis: determinations of the solubilities of alkanes in saline and various lipids. Anesth. Analg. 77, 7–11 (1993)
Article CAS Google Scholar
Treves, K., Shragina, L., Rudich, Y.: Measurement of octanol–air partition coefficients using solid-phase microextraction (SPME)–application to hydroxy alkyl nitrates. Atmos. Environ. 35, 5843–5854 (2001)
Article CAS Google Scholar
Lei, Y.D., Baskaran, S., Wania, F.: Measuring the octan-1-ol air partition coefficient of volatile organic chemicals with the variable phase ratio headspace technique. J. Chem. Eng. Data 64, 4793–4800 (2019)
Article CAS Google Scholar
Gruber, D., Langenheim, D., Gmehling, J., Moollan, W.: Measurement of activity coefficients at infinite dilution using gas−liquid chromatography. 6. Results for systems exhibiting gas−liquid interface adsorption with 1-octanol. J. Chem. Eng. Data 42, 882–885 (1997)
Article CAS Google Scholar
Abraham, M.H., Le, J., Acree, W.E., Carr, P.W., Dallas, A.J.: The solubility of gases and vapours in dry octan-1-ol at 298 K. Chemosphere 44, 855–863 (2001)
Article CAS Google Scholar
Gramatica, P.: Principles of QSAR modeling: comments and suggestions from personal experience. Int. J. Quant. Struct.-Prop. Relatsh. 5, 61–97 (2020)
Google Scholar
Chirico, N., Gramatica, P.: Real external predictivity of QSAR models. Part 2. New intercomparable thresholds for different validation criteria and the need for scatter plot inspection. J. Chem. Inf. Model. 52, 2044–2058 (2012)
Article CAS Google Scholar
Chirico, N., Gramatica, P.: Real external predictivity of QSAR models: how to evaluate it? Comparison of different validation criteria and proposal of using the concordance correlation coefficient. J. Chem. Inf. Model. 51, 2320–2335 (2011)
Article CAS Google Scholar
Gramatica, P., Sangion, A.: A historical excursus on the statistical validation parameters for QSAR models: a clarification concerning metrics and terminology. J. Chem. Inf. Model. 56, 1127–1131 (2016)
Article CAS Google Scholar
Lin, L.I.-K.: A concordance correlation coefficient to evaluate reproducibility. Biometrics 45, 255–268 (1989)
Article CAS Google Scholar
Baskaran, S., Lei, Y.D., Wania, F.: Response to comment on “A database of experimentally derived and estimated octanol–air partition ratios (K_OA)”. J. Phys. Chem. Ref. Data 51, 026102 (2022)
Article CAS Google Scholar
Goss, K.-U., Bronner, G., Harner, T., Hertel, M., Schmidt, T.C.: The partition behavior of fluorotelomer alcohols and olefins. Environ. Sci. Technol. 40, 3572–3577 (2006)
Article CAS Google Scholar
Thuens, S., Dreyer, A., Sturm, R., Temme, C., Ebinghaus, R.: Determination of the octanol−air partition coefficients (K_OA) of fluorotelomer alcohols. J. Chem. Eng. Data 53, 223–227 (2008)
Article CAS Google Scholar
Dreyer, A., Langer, V., Ebinghaus, R.: Determination of octanol−air partition coefficients (K_OA) of fluorotelomer acrylates, perfluoroalkyl sulfonamids, and perfluoroalkylsulfonamido ethanols. J. Chem. Eng. Data 54, 3022–3025 (2009)
Article CAS Google Scholar
Shoeib, M., Harner, T., Ikonomou, M., Kannan, K.: Indoor and outdoor air concentrations and phase partitioning of perfluoroalkyl sulfonamides and polybrominated diphenyl ethers. Environ. Sci. Technol. 38, 1313–1320 (2004)
Article CAS Google Scholar
Golmohammadi, H., Dashtbozorgi, Z., Jr.: Application of QSPR for the prediction of gas to 1-octanol solvation enthalpy using support vector regression. Phys. Chem. Liq. 51, 182–202 (2013)
Article CAS Google Scholar
Xiao, H., Wania, F.: Is vapor pressure or the octanol–air partition coefficient a better descriptor of the partitioning between gas phase and organic matter? Atmos. Environ. 37, 2867–2878 (2003)
Article CAS Google Scholar

Download references

Funding

This work was funded by the European Chemical Industry Council (CEFIC) through project ECO 41 of the Long-range Research Initiative (LRI).

Author information

Sivani Baskaran and Akshay Podagatlapalli are co-first authors.

Authors and Affiliations

Department of Physical and Environmental Sciences, University of Toronto Scarborough, Toronto, Canada
Sivani Baskaran, Akshay Podagatlapalli, Alessandro Sangion & Frank Wania
Department of Chemistry, University of Toronto, Toronto, Canada
Sivani Baskaran & Frank Wania
Arnot Research and Consulting (ARC), Toronto, Canada
Alessandro Sangion

Authors

Sivani Baskaran
View author publications
You can also search for this author in PubMed Google Scholar
Akshay Podagatlapalli
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Sangion
View author publications
You can also search for this author in PubMed Google Scholar
Frank Wania
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: SB, FW; Data Curation: SB, AP; Formal analysis and investigation: SB, AP, AS; Visualization: SB; Writing—original draft preparation: SB, AP; Writing—review and editing: SB, AS, FW; Funding acquisition: FW; Supervision: FW.

Corresponding author

Correspondence to Frank Wania.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Ethical Approval

This work was funded by the European Chemical Industry Council (CEFIC) through project ECO 41 of the Long-range Research Initiative (LRI).

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 3664 kb)

Supplementary file2 (XLSX 1097 kb)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Baskaran, S., Podagatlapalli, A., Sangion, A. et al. Predicting the Temperature Dependence of the Octanol–Air Partition Ratio: A New Model for Estimating $\Delta {U^{ \circ}_{\text{OA}}}$. J Solution Chem 52, 51–69 (2023). https://doi.org/10.1007/s10953-022-01214-7

Download citation

Received: 11 May 2022
Accepted: 11 September 2022
Published: 08 November 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s10953-022-01214-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Predicting the Temperature Dependence of the Octanol–Air Partition Ratio: A New Model for Estimating \(\Delta {U^{ \circ}_{\text{OA}}}\)

Abstract

Similar content being viewed by others

QSPRs for Predicting Equilibrium Partitioning in Solvent–Air Systems from the Chemical Structures of Solutes and Solvents

Novel approach for predicting partition coefficients of linear perfluorinated compounds

Equations for the Correlation and Prediction of Partition Coefficients of Neutral Molecules and Ionic Species in the Water–Isopropanol Solvent System

1 Introduction

2 Methods

2.1 Data Curation

2.1.1 Data from the K OA Database

2.1.2 Direct Measurements of \(\Delta {H^{ \circ }_{\text{OA}}}\) and \(\Delta {U^{ \circ }_{\text{OA}}}\)

2.2 Chemical Identifiers, Descriptors, and Data Splitting

2.3 Model Calculation and Selection

3 Results and Discussion

3.1 Data Availability and the General Applicability Domain

3.1.1 Perfluoroalkyl Substances (PFAS)

3.2 Model Selection

3.2.1 Model Prioritization

3.2.2 External Validation

3.3 Comparison with Other \(\Delta {U^{ \circ}_{\text{OA}}}\) Models

3.4 Sources for Descriptors

3.5 The Relationship Between log10 K and ΔU°

4 Conclusion

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 3664 kb)

Supplementary file2 (XLSX 1097 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

2.1.1 Data from the K _OA Database

3.5 The Relationship Between log₁₀ K and ΔU°