Introduction

One of the most important parameter characterising a surfactant is its critical micelle concentration, in short cmc. Among the factors known to affect the cmc value in aqueous solution is the structure of the surfactant [1]. Therefore, as shown in many articles [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18], the critical micelle concentration can be predicted directly from the structure of the molecule by quantitative structure—property relationship analysis, in short QSPR. The first correlation was given by Klevens [19] who empirically found that the logarithm of cmc linearly decreases with the increase in the number of carbon atoms in the surfactant alkyl chain. All surfactants have an amphiphilic structure. They contain both hydrophobic groups called tails and hydrophilic ones which are called heads. The conventional monomeric surfactants are formed from one tail and one head while the gemini surfactants consist of two hydrophobic groups and two hydrophilic heads connected by the spacer group. In the case of the conventional surfactants, the logarithm of cmc is a linear function of alkyl chain carbon number to at least sixteenth carbon atoms while the logarithm of cmc of gemini surfactants start to deviate more and more from this linear relationship at number of carbon atoms greater than 14 [1]. For some cationic gemini surfactants, this deviation from linearity becomes an increase in value of logarithm of cmc. In the aqueous solution at room temperature, the minimum was observed for several series of cationic gemini surfactants having the spacer groups or tails of different nature and flexibility [20,21,22]. In the case of cationic gemini surfactants with only straight hydrocarbon chains (tails and spacer), according to the author’s best knowledge, the minimum logarithm cmc value was not observed. However, in the aqueous solution at room temperature, the critical micelle concentrations were measured using surfactants with the maximum alkyl chains length of 18 carbon atoms.

Therefore, the author of this paper decided to theoretically analyse the effect of the structure of the hydrophobic tail chains of cationic gemini surfactant on cmc using, as in the previous papers [4, 5, 7, 10, 11], the molecular connectivity indices [23] only. The simple theoretical model, that will allow to study the effect of the hydrocarbon chains structure modifications on the value of the critical micelle concentration, has been derived for compounds with fixed heads and fixed medium spacer length but with different hydrophobic tail chains.

Results and discussion

To find and study the dependence of the critical micelle concentration on the structure of the hydrophobic gemini surfactants tail chains, the surfactants with fixed heads and spacer group and having only hydrocarbon chains were chosen. The structures of all considered molecules are shown in the “Methods” section.

Based on the literature data of cmc [24,25,26,27] and values of molecular connectivity indices [7] of training set compounds 110 (see “Methods”), using the polynomial regression analysis and stepwise method, two second-order regression models have been obtained. For each model, the search of the best equation consisted of two steps. The first step is shown in Table 1.

Table 1 Correlation coefficient values in first step

As shown in Table 1, the highest correlation coefficient values are for the relationships containing the second-order valence molecular connectivity index (\({}^{2}\chi^{\nu }\)) and also the first-order valence molecular connectivity index (\({}^{1}\chi^{\nu }\)). These indices define Model 1 and Model 2, respectively. The second step for each model is shown in Table 2.

Table 2 Correlation coefficient values in second step

As shown in Table 2, the best equations were obtained using in addition the square of the following indices: the second-order valence molecular connectivity index [\(({}^{2}\chi^{\nu } )^{2}\)] in Model 1 and the first-order valence molecular connectivity index [\(({}^{1}\chi^{\nu } )^{2}\)] in Model 2. The correlation formulas (Models 1–2) are as follows:

Model 1:

$$ Log_{10} cmc = (10.14849 \pm 1.94143) - (1.39052 \pm 0.28227)\, \cdot \,{}^{2}\chi^{\nu } + (0.03060 \pm 0.01008)\, \cdot \,({}^{2}\chi^{\nu } )^{2} . $$
(1)

Model 2:

$$ Log_{10} cmc = (8.38205 \pm 1.37961) - (0.95624 \pm 0.15916)\, \cdot \,{}^{1}\chi^{\nu } + (0.01656 \pm 0.00449)\, \cdot \,({}^{1}\chi^{\nu } )^{2} . $$
(2)

The statistical characteristic of the descriptors included in Model 1 and Model 2 is given in Table 3.

Table 3 Statistical characteristic of descriptors included in Models 1 and 2

The models obtained [Eqs. (1) and (2)] were used to predict \(Log_{10} cmc\) values of compound that were not used in the correlation analysis. The calculated and experimental \(Log_{10} cmc\) values of training set compounds 110 along with \(Log_{10} cmc\) values of test compound 11 are listed in Table 4.

Table 4 Literature and calculated using Models 1 and 2 \(Log_{10} cmc\) values of training set compounds 110 and test compound 11

The calculated \(Log_{10} cmc\) values of training set compounds have been also plotted against the experimental values. The corresponding scatter plots of the experimental \(Log_{10} cmc\) values versus the calculated \(Log_{10} cmc\) values for Models 1 and 2 are shown in Fig. 1. The plots of residuals versus the experimental values of \(Log_{10} cmc\) are shown in Fig. 2. The \(Log_{10} cmc\) values of additional test compounds 1214, calculated using both models (Models 1 and 2), are included in Table 5.

Fig. 1
figure 1

Plots of the experimental \(Log_{10} cmc\) vs. the calculated using Eq. (1) (r = 0.996, F = 989.912, s = 0.107) and Eq. (2) (r = 0.9964, F = 1105.768, s = 0.101) for training set (filled diamond) and test compound 11 (unfilled triangle)

Fig. 2
figure 2

Plots of the residuals vs. the experimental \(Log_{10} cmc\) for training set (filled diamond) and test compound 11 (unfilled triangle)

Table 5 Calculated \(Log_{10} cmc\) values of compounds 1214 and compound 3

Model 2 [Eq. (2)] has been used to examine the variation of \(Log_{10} cmc\) with the alkyl chains carbon number (n) for surfactant series n-6-n. The corresponding plot of the calculated \(Log_{10} cmc\) versus alkyl chain carbon number of n-6-n compounds for n = 8–28 is shown in Fig. 3.

Fig. 3
figure 3

Calculated using Eq. (2) \(Log_{10} cmc\) of n-6-n versus alkyl chain carbon number (n) (filled circle)

The second-order valence molecular connectivity index (\({}^{2}\chi^{\nu }\)), appearing in Model 1, is a path-type index and it represents the two-bond terms in the molecule and its value depends on isomers of the molecule. The value of \({}^{2}\chi^{\nu }\) index increases with increasing length and branching of hydrocarbon chains. The first-order valence molecular connectivity index (\({}^{1}\chi^{\nu }\)), appearing in Model 2, is a path-type index and it represents the one-bond terms in the molecule. The value of this index increases with increasing hydrocarbon chain length and like the \({}^{2}\chi^{\nu }\) index its value depends on isomers but in this case, it decreases with increasing branching [23, 29]. Of course, when the number of atoms in the molecule increases by adding carbon atoms to the chains through branches, the value of the \({}^{1}\chi^{\nu }\) index increases too. Both indices are the valence connectivity indices, therefore, they differentiate multiple bonds.

Therefore, the value of the second-order valence molecular connectivity index and the first-order valence molecular connectivity index increases with increasing length of the alkyl chains and this relationship is a linear relationship. The equations obtained for surfactants having only hydrocarbon chains [Eqs. (1) and (2)] show that the relationship between \(Log_{10} cmc\) and the valence molecular connectivity indices of first or second order is parabolic. This suggests that the relationship between \(Log_{10} cmc\) and the alkyl chains length is also parabolic.

The experimental data obtained for gemini surfactants with other spacer group confirm that the variation of \(Log_{10} cmc\) with number of carbon atoms in the straight hydrocarbon chains is not linear. The departure from linearity (in water, at 25 °C) has been found by Zana [30] for n-8-n gemini surfactants and also by Rosen et al. [19] for gemini surfactants having diethyl ether spacer group. In the latter case, there was observed that the \(Log_{10} cmc\) increases for the alkyl chains length of 20 carbon atoms. In the case of monohydroxypropyl spacer group (in 0.1 N NaCl, at 25 °C) [31], the increase in \(Log_{10} cmc\) value has already been observed for the alkyl chains length of 16 carbon atoms. In these last two cases, this nonlinear dependence seems that can be described by quadratic function.

Models 1 and 2 [Eqs. (1) and (2)] were used to predict the cmc of test compound 11 and the obtained value using Eq. (1) is 0.847 mmol but using Eq. (2), this value is 0.843 mmol, and the experimental value is 0.83 mmol [28]. These and results presented in Tables 3, 4 and Figs. 1, 2 suggest that both models have very good prediction ability of compounds with only hydrocarbon tail chains and Model 2 is slightly better than Model 1.

Both models obtained (Model 1 and 2) have been used to analyse the effect of double bonds, branches and also phenyl groups on cmc value (Table 5). To study these effects, the compounds with 12 carbons atoms in the tail chains and simultaneously with different hydrocarbon tail structures were taken into account (compounds 1214 in Fig. 4). The comparison of the \(Log_{10} cmc\) value of gemini surfactants with the straight and branched hydrocarbon tail chains (compounds 3 and 12, see Table 5), shows that the branches cause the cmc value calculated using Model 1 to decrease but using Model 2, the branches cause the increase the cmc value. The result obtained for Model 2 is consistent with the results obtained for monomeric surfactants [1] for which branching results in larger cmc values in comparison with cmc values of those having straight chains. As shown in Table 5, the replacement of the single bond with the double one (compounds 3 and 13) causes the increase in cmc value calculated using both models. These results are consistent with those obtained for conventional surfactants [1]. When the phenyl group is present in the hydrophobic tails of the gemini surfactant, the cmc value is higher than for molecule with the straight tail chains but having the same number of carbon atoms (compounds 3 and 14). These results are also consistent with those obtained for conventional monomeric surfactants [1]. As it was reported in Rosen book [1], the phenyl group is equivalent to about three and one-half methylene groups. The result obtained using Model 2 is approximately consistent with this finding. The calculated cmc value of compound 14 using Eq. (2) is 5.344 mmol. This cmc value is slightly lower than that obtained for the gemini surfactant with straight tail chains having ten carbon atoms, for which the calculated cmc value is 6.03 mmol.

Fig. 4
figure 4

Structures of compounds. R1 = CnH2n+1 (n = 8, 10, 12, 14, 16). R2 = CnH2n+1 (n = 12, 14, 16, 18)

Above results show that Model 2 [Eq. (2)] better predicts and describes changes in values of the critical micelle concentration. Thus, Model 2 can be considered as the best.

The best model (Model 2) has been used to examine the variation of \(Log_{10} cmc\) with number of alkyl chains carbon atoms (n) of n-6-n compounds and also to predict the minimum cmc for those compounds. The plot of the calculated \(Log_{10} cmc\) versus carbon number from n = 8 to n = 28 is presented in Fig. 3. As can be seen in the Fig. 3, a straight line can be drawn through the first four points and from n = 16, the \(Log_{10} cmc\) with further increase of n begins to deviate from that line reaching the minimum of n = 24. Thus, as shown in Fig. 3, the smallest cmc value has the compound having hydrocarbon chains with 24 carbon atoms and is equal to 0.0038 mmol. This result seems to be consistent with reality, or at least approximately.

Conclusion

The results obtained in this work using Eq. (2) are in good agreement with the experimental ones, including those results that relate to compounds not used in the correlation analysis. Model 2 (Eq. (2)) very good describes and predicts critical micelle concentration values of compounds having the quaternary ammonium head groups, the spacer group of medium length and the hydrocarbon tail chains. The present theoretical studies on gemini surfactant hydrophobic tail chains suggest that the dependence of \(Log_{10} cmc\) on the alkyl chains length is nonlinear, and the obtained equation [Eq. (2)] shows that it is the parabolic relationship. This suggests that at some tail chains length, even when lengthening the hydrocarbon only chains, the cmc values will start to increase.

Methods

The entire data set contains gemini surfactants with only the hydrocarbon tail chains, having the quaternary ammonium head groups connected by the spacer group of six carbon atoms length. All investigated gemini surfactants have the bromides as counterions. The structures of all considered compounds are shown in Fig. 4.

The training set contains the gemini surfactants with tail chains of varying lengths and flexibility. The test set contains one compound having tail chains of different lengths and also the surfactants with tails with fixed number of carbon atoms equal to 12 carbons but differing in structure. Thus, the data set includes ten compounds of training set (compounds 110) and four compounds of test set (compound 11 and additional compounds 1214). The chemical structures of compounds of training set and test compound 11, and the experimental values of cmc were taken from the literature [24,25,26,27,28].

The Kier and Hall molecular connectivity indices [23] were calculated based on the graphic structural formula of the molecule [7]. The values of molecular connectivity indices and the cmc values of compounds from training set are listed in Table 6.

Table 6 Experimental cmc values and molecular connectivity indices values of training set compounds

Each formula expressing the relationship between \(Log_{10} cmc\) and the molecular connectivity indices was generated using the least-squares method. The final equation was obtained using the stepwise method. The quality of the derived models was tested using the following statistical parameters: the correlation coefficient (r), the Fisher ratio (F), and the standard deviation of the fit (s). The values of the correlation coefficient closer to 1 represent the better fit of the model. The larger the value of the Fisher ratio and the smaller the standard deviation indicate that the model is statistically significant.

High absolute Student t value of the descriptors, in statistical characteristic of the descriptors (Table 3), express that the regression coefficients of the descriptors are significantly larger than the standard error. The descriptors with the p values below 0.05 are considered statistically significant.

All statistical calculations were performed using the program STATISTICA 12 [32].