Background

Surfactants are widely used across the globe both in industrial and consumer products so properties which determine their distribution in the environment are of particular importance. The n-octanol/water partition ratio or partition coefficient (log Kow) and n-octanol/water distribution coefficient (log D) are key parameters in environmental risk assessment of chemicals as they are often used to estimate the environmental fate and bioavailability and thus exposure and toxicity of a compound. The partition coefficient (log Kow) is a constant for the molecule in its neutral form. The distribution coefficient (log D) takes into account all neutral and charged forms of the molecule. In the pH region where the molecule is predominantly unionised, log D = log Kow\(\cdot\)Log D values at pH 7 are considered more relevant for understanding environmental fate and bioavailability of ionisable compounds with a low or high pKa, compared to log Kow values generated at a pH unrepresentative of typical aquatic environmental conditions. Due to their amphiphilic properties, surfactants form aggregates in solution and tend to accumulate at the interface of hydrophobic and hydrophilic phases. Surfactants can even emulsify the n-octanol–water system, making the measurement of log Kow a technical challenge. For this reason, the traditional ‘shake-flask’ method (OECD 107 Test Guideline) [1] is no longer considered appropriate for log Kow determination of surfactants.

Currently, several existing experimental methods of the Organisation for Economic Co-operation and Development (OECD) and Quantitative Structure–Property Relationship (QSPR) models are available for log Kow measurement or prediction. The experimental methods include: the ‘slow-stirring’ method (OECD 123) [2], the high-performance liquid chromatography (HPLC) method (OECD 117) [3], and a solubility ratio method (referred to in OECD 107) [1] which uses the ratio of the chemical solubility in n-octanol and in water. All these methods are listed in EU Technical Guidance Document (TGD) guidance [4] and have been used for regulatory notification purposes by different lead registrants in REACH Phases 1 and 2 (i.e. chemicals manufactured or imported in Europe > 1000 and > 100 tonnes per annum, respectively). However, there are concerns that these methods have not been fully validated for surfactants and may not be applicable due to the specific phase behaviour of surfactants. This is complicated by the fact that aqueous ‘solubility’ is not properly defined for surfactants and also is difficult to measure. Surfactants dissolve not only as single molecules (mono-molecular solution), but at higher concentrations also form different types of soluble aggregates, e.g. spherical micelles, vesicles (depending on their chemical structure, concentration, temperature). The maximum mono-molecular solubility of a surfactant is defined as the critical micelle concentration (CMC). However, the CMC is not a good descriptor of water solubility, as micelles themselves are also a perfectly water-soluble state of surfactants [5]. A working approach for surfactants might be the comparison of measured solubilities in n-octanol and water. However, it is then prudent to take the CMC in water as the solubility limit, in order to avoid the artefact of unrealistically low log Kow values [4].

The Environmental Risk Assessment of Surfactants Management (ERASM) ‘Hydrophobicity of Surfactants’ Task Force was established in 2011 with the objective to evaluate the most appropriate log Kow/D method for surfactants. The Task Force coordinated a laboratory study at the Fraunhofer IME Institute in Schmallenberg, Germany to measure log Kow/D values using three different recognised experimental methods side-by-side for a set of 12 surfactants from the four main surfactant categories (non-ionics, anionics, cationics and amphoterics). This study was conducted consistently in one experienced laboratory, with the aim of reducing uncertainties and to identify whether any of the existing methods predominate over the others in providing consistency and reliability of results across all surfactant classes. In addition, the Task Force applied several QSPR methods and predicted log Kow property data for the same set of test compounds for comparison with the experimental data generated by Fraunhofer IME.

Methods

Test compounds

The 12 test compounds: tetraethylene glycol monooctyl ether (C8EO4), tetraethylene glycol monododecyl ether (C12EO4), octaethylene glycol monododecyl ether (C12EO8), sodium dodecyl sulphate (C12AS), tetraethylene glycol monododecyl ether sulphate (C12E4S), sodium dodecanoate (C12 carboxylate), C12 trimethyl ammonium chloride (C12TMAC), C16 trimethyl ammonium chloride (C16TMAC), C18 benzalkonium chloride (C18BAC), C12–16 alkyldimethyl betaine (C12–16ADB), (3-lauramidopropyl) dimethylbetaine (C12AAP), C12 alkyldimethyl, N-oxide (C12DAO) and 2 reference compounds [atrazine (ATR) and pentachlorophenol (PCP)] used in this study are listed in Table 1 (full structures are shown elsewhere [6]). It is well known that commercially used surfactants are generally mixtures of homologues (e.g. a distribution of alkyl chain lengths). To reduce complexity in the data interpretation, high purity single chain length test items were obtained either from commercial sources or were synthesized and provided by the ERASM Task Force member companies. The activity and purity information were confirmed either by Certificates of Analysis (CoA) documents or from analytical data shared by the suppliers. The exception was C12–16 alkyl dimethyl betaine (approximately 70% C12, 20% C14, 10% C16). Additional information on the source, purity and appearance of all test and reference substances is detailed in Additional file 1: Section S1.

Table 1 Test and reference compounds

Log K ow determination approaches

Full details of the experimental test conditions for all the log Kow methods plus supporting analytical methodology are provided in Additional file 2: Section S2.

HPLC method

The OECD Test Guideline 117 was followed, with adjustments made to the mobile phase to accommodate high log Kow values (> 6 may be expected when analysing the more hydrophobic surfactants). This HPLC method is currently only validated for neutral compounds and subsequent work of Eadsforth et al. has validated the chemical domain of applicability of the method to neutral non-ionic surfactants [3, 7]. However, the method has not been validated for the other ionisable surfactant classes (anionics, cationics and amphoterics). Log Kow values of three alcohol ethoxylates, C8EO4, C12EO4 and C12EO8, together with the reference compound ATR, as a neutral reference compound, were determined by the HPLC method. A calibration graph was generated to facilitate the determination of log Kow (Additional file 2: Table S6) of the test compounds according to their retention time within the HPLC column. As per OECD 117 guidelines, a minimum of 6 reference compounds which cover (and exceed) the range of the log Kow values of the test compounds were chosen with known log Kow values to generate a calibration line.

Slow-stirring method

The slow-stirring method (OECD Test Guideline 123) was followed which minimises turbulence and thereby enhances the exchange between n-octanol and water without microdroplets being formed. The water phase (lower phase) is sampled from a stopcock at the bottom of the vessel, whereas the n-octanol phase (upper phase) is sampled using a microsyringe, taking care not to disturb the boundary layer. This method has been successfully applied to the determination of log Kow values of highly hydrophobic compounds up to 8.2 [8]. For surfactants, the method should operate below the CMC to ensure no micelles are present during the equilibration study.

Water, n-octanol and the test compound are equilibrated in a thermostated stirred reactor at constant temperature. Exchange between the phases is improved by carefully controlled stirring (150 rpm) which limits turbulence, thereby enhancing the exchange between n-octanol and water and thus increasing the accuracy of the determination of the Kow value. In practice, for each test compound log Kow values were generated at a range of volume ratios of n-octanol and water (i.e. 0.5:1, 1:1 and 2:1) for each of two (normally 48 h and a longer period, either 148 h or 168 h) or more stir periods. In this experiment, the majority of the test compounds (# 1–6, 10–12) were added to the water phase, while the cationics and reference compound (# 7–9, 13, 14) were added to the n-octanol phase. It is considered that either application mode should give the correct log Kow value, the main justifications for adding cationics in n-octanol, are that (a) they were soluble in this solvent and (b) this application mode would reduce any losses resulting from their strong adsorption to glass surfaces. Further studies were carried out for test compounds # 2, 3 and 12 using both improved, more sensitive, analytical methods and over longer stir periods (48 h, 168 h, 240 h and 336 h) to ensure equilibration had been reached. These three compounds were applied in both the water and n-octanol phases. For these three compounds, reasonably consistent data as demonstrated from the mean and standard errors (Additional file 2: Table S2) were generated for each test compound at time points 168 h, 240 h and 336 h and for both phases so a mean value was calculated from these time points and under both dosing methods. For the other test compounds (# 1 and 4–11), reasonably consistent data as demonstrated from the mean and standard error (Additional file 2: Table S2) were generated for each test compound at two or more time points and from these data a mean value has been calculated.

In this study, the slow-stirring method should be taken as the benchmark for comparison with all other methods as it is the most consistently applicable method across all the surfactant classes and provides a complete dataset.

Solubility ratio method

The solubility ratio method (referred to in OECD 107) is based on the log of the ratio of the n-octanol solubility and the water solubility, determined experimentally. However, as the water solubility of surfactants is neither properly defined nor easy to measure, it is recommended in the EU TGD [4] to take the CMC in water as a working approach for determining the water solubility of a surfactant.

$$ K_{\text{ow}} = {\text{ C }}n{\text{-octanol}}/{\text{C water}} $$
(1)

Determination of the solubility in n-octanol

The solubilities of the test compounds in n-octanol were determined by adapting the procedure described in OECD Test Guideline 105 (water solubility) [9]. Solubility determinations for each test compound were carried out at three stir times (24 h, 48 h and 144 h) and a mean value was calculated from these time points.

Determination of critical micelle concentrations (CMC)

The standard definition as given in OECD 105 (‘the water solubility of a compound is the saturation mass concentration of the compound in water at a given temperature’) does not apply to surfactants. At low concentrations, there may be true homogeneous solutions, whereas at higher concentrations lyotropic phase separation can occur [10]. The creation and characterisation of ‘saturated’ solutions are usually not possible; one exception is anionic surfactants below the Krafft point [5]. Therefore, the term ‘water solubility’ is not easy to define nor determine for surface-active compounds. As explained previously, the CMC, for which there are defined methods, was used as a ‘surrogate’ for water solubility for the 12 test surfactants.

In this study, CMC determinations were performed by two methods. The first approach was by adding the surface-active compound step by step to a buffered aqueous solution (pH 7) at 25 °C and measuring the surface tension of the solution by the ring method (OECD 115) [11]. The determination of the CMC values by this method was performed by IMETER®/MSB Augsburg, Germany (http://www.imeter.de). For the determination of accurate surface tension values, calibration factors were applied as described in OECD 115. Several algorithms are available for the correction of systematic deviations. Appropriate calculations [12] were used. In addition, a calibration factor was applied to adjust the system by measurement of a reference liquid as water.

The second approach involved using Solid-Phase Micro-extraction (SPME). SPME fibres coated with polyacrylate have been shown to be applicable for the measurement of freely dissolved concentrations of non-ionic, anionic, and cationic surfactants [6, 13,14,15].

The ratio of the n-octanol solubility with the two water solubilities (i.e. CMC values) determined by the different methods was taken to generate log Kow/D of surfactants. Literature values for CMC were also available and similarly compared.

Testing strategy

As the current methods were not appropriate for all the surfactant classes, the following approach was devised for the test (and reference) compounds shown in Table 1.

  • Log Kow values for all compounds (# 1–14) at the selected pH values were determined using both the slow-stirring (OECD 123) and solubility ratio estimation methods.

  • Log Kow values for the three non-ionics and the neutral reference (ATZ) (i.e. # 1–3, 13) were determined using the HPLC method (OECD 117).

All test and reference compounds were tested at pH 7; in addition, the C12 carboxylate was also tested at pH 2 and pH 9. Standard aqueous buffer solutions of pH 2, 7 and 9 were prepared. Saturated aqueous and n-octanol phases for log Kow slow-stirring studies were prepared by stirring overnight at 150 rpm and 25 °C in the following proportions.

  • Aqueous buffer solution: 900 mL buffer solution (pH 2, 7 or 9) and 100 mL n-octanol

  • n-Octanol buffer solution: 900 mL n-octanol and 100 mL buffer solution (pH 2, 7 or 9).

Log K ow predictive methods

The number of publicly available QSPR methods for calculating log Kow values has increased significantly over the last few years and there are now multiple methods published and/or commercially available as software. Few reviews are available which make a side-by-side comparison of log Kow predictive methods. A review of ten commonly used commercial software packages was conducted by Dearden [16] and a further review of Mannhold et al. [17] considered a larger selection of both substructure and property-based methods. None of the publicly available methods or commercially available software packages have been developed to specifically accommodate prediction of surfactants nor considered as part of these reviews. As part of this review, we focussed on those methods which were considered in these previous reviews, but which have been used commonly in calculating log Kow of surfactants for regulatory submissions due to either reasons of availability, a clear understanding of the underlying method and/or history of use. These are CLOGP version 5.0 [18], KOWWIN [19], Pipeline Pilot [20], ACD Labs [21] and SPARC [22].

Most methods for calculating log Kow values assume a neutral state of the compound. For most of these methods, the exact algorithm is confidential or not published which makes it difficult to determine the accuracy or applicability of the method to surfactants. There are a few QSPRs, however, for which the background calculations are easier to understand and for this reason make themselves more appealing for use with surfactants since the result can be investigated and modified to account for charge. Such methods include those of Meylan and Howard [23] as incorporated into the KOWWIN software and Hansch and Leo (H&L) [24] which forms the basis of the CLOGP method. The H&L prediction method has been applied successfully to a number of surfactant classes when combined with modification factors which have been developed to specifically address the difficulties in calculating log Kow values for surfactants [25,26,27,28,29].

In addition to the above methods, we have also included some additional methods to predict log Kow/D values for the test and reference compounds: Molinspiration [30], Crippen Fragmentation in Chemdraw [31], Viswanadhan’s Fragmentation in Chemdraw [32] and Broto in Chemdraw [33]. These were again selected for ease of availability for practical application to regulatory submissions. Each individual software package/model will produce different predicted log Kow values depending on the approach used. These commercially available programmes are generally designed for the prediction of log Kow values for neutral organic compounds. Significant differences in predicted values can arise as a result of the way in which ‘charged’ moieties are handled, not only between the programmes but also within the programmes, depending on how the Simplified Molecular Input Line Entry System (SMILES) notation is entered. The format of the SMILES notation has been found to be of particular importance when using KOWWIN and CLOGP (see Table 1 for SMILES used in model calculation).

To demonstrate some of the potential inaccuracies, log Kow predictions for different structural SMILES notations were run for each test compound (# 1–12). In addition, the average predicted log Kow values computed by different QSPRs were also calculated to compare with experimental measurements. In this case, only the ionised forms have been included for anionic, cationic, and amphoteric compounds since they are all fully ionised under test conditions. A prediction for neutral C12 carboxylate was also made for comparison with the experimental value log Kow determined under conditions in which the compound was fully protonated. For non-ionic compounds, only the neutral forms were used to calculate the averages. Further details for the methods and how they have been applied to the test compounds in this study are provided in Additional file 3: Section S3.

In order to enable some comparison and judgement to be made as to the predictivity of the different QSPR methods, regression coefficients (R2) were calculated between predicted values for each method for each surfactant class and observed values determined using the most appropriate method. However, R2 is not sufficient by itself to enable comparison of such data since it only provides a relative pattern of differences between observed and predicted values and as such can still provide acceptable values for a constant magnitude of error even when this magnitude is very high [34]. Mean Absolute Error (MAE) values thus were also calculated to provide a better indication of the magnitude of the differences between predicted values and observed values for each method and for each surfactant class. The MAE values enable the magnitude of differences between observed and predicted values to be assessed. An additional threshold approach based on MAE was conducted following a modified method of Roy et al. [34] to help further discriminate between predictive methods. The details of this approach are presented in Additional file 4: Section S4.

Log K ow versus log D

Measured values have been corrected to account for ionisation where appropriate for relevant comparisons between values. A derivation of the Henderson–Hasselbalch equation [35] was used to achieve this:

$$ {\text{log }}D_{\text{acids}} = {\text{log }}P + \log \left[ {\frac{1}{{1 + 10^{{{\text{pH}} - {\text{p}}K{\text{a}}}} }}} \right] $$
(2)
$$ {\text{log }}D_{\text{bases}} = {\text{log }}P + \log \left[ {\frac{1}{{1 + 10^{{{\text{p}}K{\text{a}} - {\text{pH}}}} }}} \right] $$
(3)

The % ionisation at any given pH can also be estimated from:

$$ \% {\text{ ionised }} = \frac{100}{{1 + 10^{{{\text{p}}K{\text{a}} - {\text{pH}}}} }} $$
(4)

where log P (also referred to as log K) refers to the partition coefficient of the unionised compound in an aqueous-organic phase system. For this study, we thus consider the organic phase as n-octanol. The % ionised was calculated for each compound under test conditions as described in Additional file 2: Section S2, Table S5.

pK a calculation

Inherent to the ability to correct for ionisation are the pH of the system and the pKa of the compound. Literature values for pKa were selected where available. The remaining pKa values for the test compounds were calculated by ACD Labs from Chemsketch [21], Chemicalize [36] and Pipeline pilot [20] (software available from ChemAxon and Accrelys, respectively). Other tools [37] are available which have been more widely assessed for variability of results. However, these three tools are easily accessible and provide readily available values for users.

Results and discussion

Experimental log K ow values

Details of test and reference substances are shown in Table 1. Calculations of % ionisation under test conditions suggested that all ionisable test compounds except for C12 carboxylate and C12DAO should be in 100% ionised state at pH 7 based on predicted pKa values (Additional file 2: Section S2, Table S5). Calculations of C12DAO suggest that this compound should be 99% ionised at pH 7. Calculations for C12 carboxylate suggested that this compound should be 100% protonated at pH 2 and 100% ionised at pH9. Therefore, all measured log Kow values at pH 7 should be considered as log D values except for C12DAO, C12 carboxylate and non-ionic surfactants.

All the experimentally measured log Kow values, based on the slow-stirring, HPLC and solubility ratio methods, are reported in Table 2. Additionally, for several ionisable compounds, log D values at pH 7 are extrapolated to log Kow for the neutral species equivalent for comparison purposes and listed in brackets in Table 2.

Table 2 Experimental log Kow/D values for 12 surfactant test compounds and 2 reference compounds

Two reference compounds (ATR and PCP) were included in this study to check consistency of results with previously recorded log Kow/D values (Table 2). Both the HPLC and slow-stirring methods generated values close to that reported in the literature for ATR, though the solubility ratio method generated a higher value (approx. 0.5 log unit). For PCP, when values were corrected for ionisation using Eq. 2, the slow-stirring method provided a value for the neutral species which was consistent with the literature value when this was also corrected for ionisation and reported as the neutral species (< 0.4 log unit difference between the 5.55 value reported in this study and the corrected 5.87 value reported from the literature). The small observed difference is likely due to the inaccuracy of the predicted pKa value. The solubility ratio method using either method for measuring water solubility generated log Kow/D values between 2.03 and 2.25 log units difference from literature values, thus reflecting the difficulties associated with using the solubility ratio method even for compounds which are not surfactants.

CMC values determined for the 12 surfactants and 2 reference compounds using two different methods (surface tension [11] and SPME [6]) are compared against literature values in Table 3. For all non-ionic compounds (alcohol ethoxylates and atrazine), the determined CMC or solubility values were in reasonable agreement with values sourced from the literature. For the remaining compounds, there is some variability between the literature values and those measured by both methods ranging from a factor of 2 (PCP) to over a 100 (C18BAC) highlighting the variable nature of the measurements. These will be influenced by experimental conditions (e.g. pH, equilibrium time, etc.).

Table 3 CMC values at pH 7 for use as surrogate water solubility values for solubility ratio method calculations

Experimental log Kow values generated for surfactant compounds (Table 2) are highly varied for the different methods. Although there is a reasonable correlation (R2 = 0.8639) between log Kow values for non-ionic surfactants generated by HPLC and slow-stirring methods, HPLC derived log Kow values are consistently higher than those generated by the slow-stirring method (Fig. 1). Similar slow-stirring values for C8EO4 (2.68) and C12EO8 (4.25) [40] were found by other researchers. No comparison could be made between log Kow values generated using the HPLC and solubility ratio methods since there are insufficient solubility ratio data for non-ionic surfactants.

Fig. 1
figure 1

Comparison of log Kow results using HPLC [this study (blue) and previous [6] (red)] against slow-stirring values

Cationic surfactants demonstrate good correlation (R2 = 1) between log Kow values generated using the slow-stirring and solubility ratio methods (Fig. 2). This correlation should be taken with some caution given the size of the dataset and the slope of the regression line of 1.2 and y intercept of 1.4 indicate systematic over-estimation by the solubility ratio method. However, the values determined using the slow-stirring method seem lower than would be expected, particularly given the size of the longer alkyl chain molecules. The differences in log Kow values between the slow-stirring and solubility ratio methods perhaps reflect the added complexity of analysing cationic compounds which are known to strongly adsorb to surfaces such as glassware. For both anionic and amphoteric surfactants, there is little correlation between log Kow values generated using the slow-stirring and solubility ratio methods (Fig. 2) as seen in the R2 values, although available data suggest that the solubility ratio approach may underestimate log Kow/D values compared to the slow-stirring method. No correlation could be made between log Kow values generated using the slow-stirring and solubility ratio methods for non-ionics, since two out of the three test compounds were totally miscible in n-octanol, so a value for their n-octanol solubility could not be provided.

Fig. 2
figure 2

Slow-stirring vs solubility ratio log Kow values for all surfactant classes: non-ionic (blue cross), anionic (green triangle), cationic (blue diamond) and amphoteric (red square) [R2 for the amphoterics is omitted since there are only 2 points] [CMC determined using the surface tension method (OECD 115)]

There is reasonable consistency between the solubility ratio log Kow values (Table 2) as demonstrated by the mean and standard deviations of CMC values for the majority of compounds (Table 3). However, where observed differences occur (e.g. C12AAPB, C12DAO and C18BAC), it suggests difficulties in measuring solubility for these compounds. When calculating log Kow from log D (using Eq. 2) for C12 carboxylate, a predicted value consistent with 4.49 measured at pH 2 (under fully protonated conditions) would be expected. However, the predicted values of 5.23 and 5.53 (for values measured at pH 7 and pH 9, respectively) do not correspond exactly, suggesting either problems with the experimental method or in the calculated pKa value, or both.

When determining solubility in n-octanol, data for some test compounds at different time points are reasonably consistent, whereas others are less so. In addition, some compounds (2 non-ionics and the amine oxide) were infinitely soluble (fully miscible) in n-octanol. In conclusion, it was not possible to produce reliable solubility data for all test compounds in both n-octanol and water. Even where it has been possible to get realistic solubility data in this study, the correlation between log Kow values using the solubility ratio method and other approaches is generally low as observed from the R2 values. C12EO8 is the only surfactant with comparable values generated using both the HPLC and solubility ratio methods and these show between 2.02 and 3.31 log units difference between values generated by both methods. When comparing with slow-stirring log Kow values, the datasets generated using both methods show good correlation for cationic compounds (R2 = 1 and a slope of 1.2) but either no correlation (for anionics, R2 = 0.0004) or too few data to make any firm conclusions on the remaining two surfactant categories (Fig. 2). Despite good correlation observed with the cationics, the solubility ratio method cannot be applied to all surfactants when solubility cannot be determined in either or both of the solvent phases. Given that the EU TGD also recommends treating the method with caution for reasons of poor correlation typically observed between octanol solubility and Kow [4], the solubility ratio method is not recommended as a robust or accurate method for the determination of log Kow values for the four classes of surfactants assessed in this study.

Predicted log K ow values

Predicted log Kow values for the twelve surfactants and two reference chemicals are given in Table 4. It can be concluded that QSPR predictions for the ionised reference PCP show good agreement between all the software packages, though less for the neutral reference ATR. The situation for the surfactants is, perhaps not surprisingly, more complex.

Table 4 Predicted log Kow values for test and reference compounds

All QSPR predicted log Kow/D values have been compared with the log Kow data from the slow-stirring experiments. Several stir times were evaluated for each test substance during the slow-stirring study to ensure that the log Kow values were generated at optimum stirring times (i.e. when the analytical data confirmed that there was equilibrium between the n-octanol and water phases). A comparison of QSPR predicted log Kow/D values with experimental slow-stirring log Kow/D values is provided in Table 5. Broad comparisons of the mean predicted values across all methods compared with mean experimental values derived from values generated in this study [HPLC, slow-stirring and solubility ratio (based on CMC values derived using the surface tension method)] are presented by surfactant class in Fig. 3. These comparisons provide an indication of which class of surfactants is best predicted using the QSPR methods. Non-ionic surfactants with an R2 = 0.980 demonstrate the highest correlation between experimental and predictive methods and although the regression slope is approximately 1, the intercept demonstrates a systematic difference between predicted and experimental values. Anionics have lower correlation with R2 = 0.698 whereas cationics can be considered to have no correlation with an R2 of 0.251. The negative slope of the regression line for amphoterics suggests a complete inability of the predictive methods to calculate representative log Kow/D values for these structures. A more detailed analysis of each surfactant class was conducted to identify and discriminate predictivity of individual QSPR methods.

Table 5 Summary of QSPR-predicted values used in comparison with slow-stirring log Kow/D values for each surfactant tested
Fig. 3
figure 3

Mean experimental vs. mean predicted log Kow/D values for the four surfactant groups (error bars show standard deviation). Experimental log Kow data shown include slow-stirring, solubility ratio, HPLC (generated in this study only)

All the software programmes used were able to predict a log Kow value for neutral (non-ionic) surfactants. This class of surfactants posed no issue with regard to SMILES notation and there are no reasons to discount any individual values. CLOGP [18], modified Hansch and Leo (H&L) [24], ALOGP [41] and the Broto atomic fragment [33] all demonstrate R2 values of > 0.98 for correlation between predicted and observed values (Additional file 4: Section S4, Table S7). R2 values for all methods are above the threshold for acceptability as defined in ECHA guidance [42]. However, when considering MAE values as a better indicator of absolute predictivity, Broto, CLOGP, Molinspiration, ALOGP and modified Hansch and Leo have the lowest values (0.06, 0.18, 0.21, 0.33 and 0.43, respectively) indicating that these are the best ranked of the considered QSPR prediction methods for predictivity (Additional file 4: Section S4, Table S8). Whilst MAE values provide only a ranking of predictivity between methods, when considering the threshold approach (Additional file 4: Section S4, Table S9) CLOGP, Molinspiration, ALOGP and Broto all classify as good methods and would, therefore, be the most recommended for predicting log Kow of non-ionic surfactants based on the small dataset considered.

For anionic surfactants SPARC [22], Crippen Fragmentation [31] and Viswanadhan’s Fragmentation [32] are all unable to generate a prediction due to their inability to handle charged compounds. All the remaining programmes are able to generate predicted log Kow/D values although ACD Labs [21] requires removal of the counter ion from the SMILES notation and KOWWIN [19] will always ‘force’ the structure to its neutral form, either by adding an ‘H’ atom or bonding the counter ion to the negative charge, when it has been included in the SMILES notation. This can lead to significantly different predicted values of log Kow for what is apparently the ‘neutral’ form. (See Test compounds #4 and #5; Table 4). When at neutral pH, most anionics will exist in their ionised form; therefore, it is recommended that the SMILES notation reflect this (i.e. do not include the counter ion). The remaining predictive methods appear able to discount the counter ion.

Calculation of log Kow/D for the majority of anionic surfactants, e.g. alkylbenzene sulphonates, alkyl sulphates, using the H&L approach with a variety of surfactant specific modifications has been widely researched and validated. When compared to this approach, CLOGP appears to give consistently higher log Kow values for the sulphate-containing surfactants. This is due to the lower fragment value used for the sulphate fragment (− 2.17 cf. − 5.87 in H&L method). KOWWIN and Broto both scored highly when considering R2 alone with values of 0.999 for both methods (Additional file 4: Table S7). Whilst ALOGP predictions also appear consistently high for selected compounds in this class, using the MAE measure of predictivity, ALOGP ranked by far the best when considering magnitude of the error with an MAE value of 0.16 (Additional file 4: Table S8) followed by Molinspiration, KOWWIN and H&L with modifications (with MAE values of 0.46, 0.83 and 1.06, respectively). When taking into account the threshold approach also, in which only the ALOGP method scores as a moderate predictor compared to poor/bad scoring for all other methods (Additional file 4: Table S9), ALOGP is consistently better for predicting log Kow/D for anionic surfactants based on this small dataset. Molinspiration, KOWWIN and H&L with modifications would be next recommended methods for anionics based on MAE scores (Tables 5 and Additional file 4: Table S8).

For cationic surfactants, SPARC, Crippen Fragmentation, Broto and Viswanadhan’s Fragmentation are all unable to generate a prediction due to their inability to handle charged compounds or missing fragment values for N+. As for anionic surfactants, ALOGP predictions appear consistently high for the compounds in the cationic class of surfactants. Care should be taken when entering the SMILES notation for quaternary nitrogen in both CLOGP and KOWWIN since significantly different values are obtained. It is recommended that the [N+] format should always be used. In contrast to the anionics, where it is recommended that the counter ion is not included in the SMILES notation, for cationic surfactants such as alkyl ammonium quaternary structures the counter ion should be included. The fragment value for the quaternary nitrogen in such compounds, as determined by H&L, included the relevant halide ion, i.e. Cl, Br, I. Whilst KOWWIN will again create a neutral structure by bonding the ion to the nitrogen, this does appear to result in a log Kow/D value more comparable with the others. Recent publications [25, 43] suggest modifications to the original H&L bond factors used for cationic surfactants. The MAE threshold approach (Additional file 4: Table S9) indicates that all methods considered are poor and R2 values demonstrate no correlation between predicted and observed values (ranging from 0.177 for CLOGP to 0.413 for H&L with modifications) (Additional file 4: Table S7). With an MAE score of 1.18, ACD Labs ranks highest for predictivity (Additional file 4: Table S8). No method can be classified as providing good predictivity for cationic surfactants (Additional file 4: Table S9). However, based on MAE and R2 measures, preference is given to the ACD Labs and CLOGP methods, providing values are derived using [N+] with counter ion SMILES notation with CLOGP.

For amphoterics, there are considerable uncertainties surrounding the appropriate approach to be taken where N+ is present in conjunction with other polar groups. SPARC, Crippen Fragmentation, Broto and Viswanadhan’s Fragmentation are all unable to predict log Kow for amphoterics due to their inability to handle charged compounds or missing fragment values for N+. The same is true for the standard H&L method since there is no published value for an N+ fragment (without an associated halide ion). As with cationic surfactants, it is recommended that the quaternary nitrogen is entered in the SMILES string as [N+] for amphoterics to avoid miscalculation. Neither Molinspiration, ALOGP or ACD Labs are able to calculate a value in the absence of the ‘+’ charge. KOWWIN will protonate any negatively charged groups and treat the N+ as a pentavalent nitrogen. When working with sulphobetaines, it is suggested [44] that when using KOWWIN the [Na+] should be included in the SMILES notation to avoid protonation of the N+ which leads to an underestimation of log Kow. The value of the Na+ can then be subtracted. This approach was validated against KIAM values taken from experiments using immobilised artificial membranes (IAM). Using the same approach here, subtraction of the Na+ value appears to prevent over-estimation of the log Kow for the carboxybetaines and brings the values closer to those predicted by CLOGP and Molinspiration, although the differences in predictions using these methods are still large (Table 4). Overall when comparing methods for predictivity, no method stands out and all methods score as poor/bad based on the MAE threshold approach (Additional file 4: Table S9). ALOGP generates the best MAE value compared to the other methods (MAE value of 2.25, Additional file 4: Table S8) but also the lowest R2 value of 0.529 (Additional file 4: Table S7).

Discussion

It should be borne in mind that, for simplicity, the experimental data generated in this study involved the deliberate use of single chain constituents. In reality, commercial surfactants are often complex mixtures containing several components with a range of different water solubilities and hence n-octanol/water partition coefficient values. Evaluation of the experimental methods investigated in this study for application to multi-component surfactant products still needs to be undertaken.

Log Kow/D data calculated using the HPLC method, slow-stirring method, solubility ratio approach or predictive software are generally not in agreement when assessing the 12 test compounds, though non-ionic log Kow values were rather more consistent than the other three classes of surfactants. Of the experimental techniques, the slow-stirring method is considered to be the most widely applicable method for generating log Kow data for all the surface-active test compounds, provided it can be demonstrated that the ‘surfactant’–‘water’–‘n-octanol’ system was allowed to reach equilibrium. This is supported by good agreement in slow-stirring log Kow data for C8EO4 and C12EO8 generated in the current study and earlier work [40]. It is possible by minimising micelle formation, emulsification and adsorption effects [45, 46] to obtain reasonably reliable log Kow values for surface-active molecules using a slow-stirring method. Corrections to apparent log Kow data can be made if the concentration in the aqueous phase at equilibrium is above the CMC. The main limitation of the slow-stirring method from the current study is that it requires sensitive analytical methods (e.g. liquid chromatography coupled with mass spectrometry; LC–MS) for analysis of the water phase for the more hydrophobic test compounds.

Predicted log Kow/D values do not show a great degree of correlation with experimental values, with the exception of slow-stirring derived log Kow/D values for non-ionics. It is recognised that conclusions drawn from this study are based on a relatively small dataset and so further studies would be recommended to confirm findings. However, this conclusion is also not restricted to surfactants. It has been shown that log Kow values derived by different methods for a range of organics were not comparable [47]. It has been advised [48] that log Kow data for organics derived from software packages should be used cautiously as they cannot always cope with the complex and/or ionisable compounds. A more recent study [49] used a combination of molecular dynamics simulations and the quantum chemical conductor-like screening model for realistic solvents (COSMO-RS). A weight of evidence (WoE) approach is a reasonable approach to take for non-ionic surfactants using experimental and predicted values, given the greater degree of correlation and lower incidence of prediction errors between slow-stirring log Kow/D values and log Kow/D predictions using various methods. Figure 3 also demonstrates the good correlation achieved when taking this approach for non-ionics. However, a WoE or averaging approach is difficult to justify for the other classes of surfactants given that the correlations as determined by R2 are lower and the incidence of prediction errors as determined by MAE scores are higher (Additional file 4: Tables S7–S9). Figure 3 also demonstrates the reduced correlation for anionics when taking this approach and the lack of correlation when considering cationics and amphoterics. Recommendations of currently available prediction models are provided for those methods which seem to provide the most robust predictions for surfactants at pH 7 (Table 6). In dealing with complex multi-component surfactant products, the recommended approach is to calculate a weighted average from the predictions of each individual chain length.

Table 6 Recommendations of log Kow/D calculation methods for fully ionised surfactants

Given the intrinsic difficulties with phase separation, emulsification, limits of detection, ionisation state in the environment and lack of a clear definition of solubility for surfactants, all current experimental methods have limitations for determining accurate log Kow values. Therefore, it is recommended [50] that promising alternative experimental log Kow methods and alternative methods to log Kow, which may be more biologically relevant, should be evaluated and validated for surfactants.

The alternative experimental log Kow methods which have the potential for overcoming some of the experimental difficulties associated with current methods with surfactants include:

  • pH metric (potentiometric) method for ionisable compounds [48, 51, 52].

  • Proton nuclear magnetic resonance (H-NMR) A recent study has demonstrated how proton nuclear magnetic resonance (H-NMR) spectra can be used as a predictive method to determine log Kow values [53].

  • Centrifugal partition chromatography (CPC), also known as counter-current chromatography (CCC) [54, 55].

It is beyond the scope of this study to assess these methods. These relatively unused approaches require evaluation against existing methods for application for all compounds including surfactants.

Alternative experimental methods to log Kow aimed at determining surfactant partitioning behaviour include:

  • Immobilised Artificial Membranes (IAM) has potential for high throughput for determining K/Dmembrane-water [50, 56,57,58,59,60].

  • Liposome-water partitioning to determine K/Dlipid-water for soluble fractions [57, 58].

  • Solid-Phase Micro-extraction (SPME) for determining Kfibre-water [13,14,15, 61,62,63].

  • Solid-supported phospholipid membranes (SSLM) to determine K/Dmembrane-water [64].

Conclusions

All current experimental methods have limitations for determining accurate log Kow values given the intrinsic difficulties with phase separation, emulsification, limits of detection, lack of defined solubility, etc. Given these limitations, on the basis of the current study, the slow-stirring method is the preferred of the currently available experimental methods for generating experimental log Kow/D data for all the surface-active test compounds, provided (a) sufficient time has been allowed to ensure equilibration of the test substance and the n-octanol and water phases, (b) a low stir rate is used to minimise any emulsion formation and (c) care is taken to sample the aqueous and n-octanol phases to minimise any contamination from the n-octanol/water interface.

For the experimental methods outlined above, it is important that log Kow/D data are generated for test compounds in both their neutral and fully ionised forms. Where the pKa approximates to the environmental pH (range 5–9), it is recommended that log Kow/D is measured under both sets of conditions under which the surfactant is fully neutral and fully ionised (i.e. two values should be determined at both high and low pH). If the pKa is < 5 or > 9 then testing at pH 7 is recommended to represent relevant environmental conditions. Measured values can be corrected using a derivation of the Henderson–Hasselbalch equation (Eq. 2) for any ionisation state to generate a log Kow or log D under relevant environmental conditions. Thus, for any determination of partitioning the pKa and the pH of the test system should be reported.

Although there is a reasonable correlation between log Kow values for non-ionics generated by the slow-stirring and HPLC methods, it is apparent from this work that HPLC generates consistently higher log Kow values. As with other indirect methods, HPLC suffers from the lack of reference surfactants with accurately determined log Kow values. If slow-stirring derived log Kow values for non-ionic reference standards were developed further and applied in an OECD 117 HPLC method, this positive bias would be removed, making the HPLC approach a more rapid and attractive approach to determining log Kow values for non-ionic surfactants. Whilst some recent work has been carried out using the OECD 117 HPLC method for other classes of ionisable surfactants [65], this method would need further validation, preferably using reference compounds for these classes with accurate log Kow values determined using the slow-stirring method.

The solubility ratio method is based on the log of the ratio of the n-octanol solubility and the water solubility. Experience in this study has shown that it will not always be possible to produce realistically accurate solubility data in n-octanol and water for surfactants. Where log Kow values have been generated with such data as part of the solubility ratio method, with the exception of the cationic surfactants, the correlation of these values with those generated using other approaches was low (as observed with the R2 values). It is, therefore, not recommended as a robust or accurate method for the determination of log Kow values for the four classes of surfactants.

For deriving QSPR predictions of log Kow/D for surfactants, recommendations for the current most appropriate methods and approaches are provided in Table 6. There is some agreement between experimental and predicted log Kow values for non-ionics and the MAE threshold approach identifies several methods (including Broto and CLOGP) with good predictivity (Tables 5 and Additional file 4: Table S9). For anionics, there are fewer QSPR methods with good predictivity (including ALOGP) based on R2 and MAE values. Cationic log Kow values were largely over-predicted, whereas amphoterics were often under-predicted by the various QSPR models used in this assessment. Whilst there are some QSPR methods available which can be applied for non-ionic and anionic surfactants, there is a dearth of methods for prediction of either cationic or amphoteric surfactants. Approaches are required to address this. These may include improvements to existing log Kow methods, development of new log Kow/D predictors or indeed development of predictors of more relevant partitioning parameters as described above.

Given the inherent difficulties in deriving robust log Kow/D values for surfactants using currently available and validated experimental and QSPR predictive methods, it is recommended to investigate the application of newer alternative experimental log Kow methods as well as more biologically relevant and methodologically defensible alternative methods for describing partitioning of surfactants such as Kmembrane-water or Klipid-water.