Introduction

It is a typical practice nowadays to search for correlations between values of various measured physicochemical features of molecules or molecular systems and their predicted or measured structural or physicochemical characteristics [1,2,3,4,5,6,7]. Such correlations can be generally found using chemometric methods of analysis [8,9,10,11]. Following this trend, we applied in this work the Quantitative Property-Retention Relationship (QPRR) [12] approach for revealing possible relations between retention parameters of 40 substances exhibiting ampholytic properties owing to the presence of carboxylic, sulphonyl, sulphonamide, hydroxy, thio, amino, and imino groups or heterocyclic nitrogen atom (s) in their structures (Table 1S in the Supplementary materials)—obtained using HPLC with six polar and semipolar columns—and physicochemical (predicted) and spectral (experimental) characteristics of the compounds.

Since the analyzed compounds exhibit common property that is ampholyticity, but are significantly structurally diversified, we selected the abovementioned QPRR method, instead of more often applied QSRR (Quantitative Structure-Retention Relationship) approach [12,13,14,15,16]. We do so, having in mind that many drugs possessing similar structure exhibit different therapeutic features [17, 18]. On the other hand, similarities in the physicochemical properties of the compounds investigated should be reflected in their biological behavior.

The compounds investigated are natural and synthetic amino acids, pseudo-amino acids (containing an amino group in the chain or in a ring), sulphanilic acid, and its derivatives (sulphonamides) and derivatives of fluoroquinone and purine [17, 19, 20]. Among these compounds are precursors of biologically interesting peptides or proteins, others are important as vitamins or valuable pharmaceuticals. The choice of the compounds of a wide spectrum of biological activity and one common property (ampholyticity) for investigations gives a unique opportunity to reveal possible relations between retention parameters and physicochemical (predicted/experimental) descriptors, and afterwards to predict chromatographic parameters of untested substances, to gain valuable information on the biological activity/pharmacological relevance of ampholytic molecules and, to some extent, to model conditions of chromatographic separations.

Materials and methods

Chemicals

Data concerning the investigated compounds—their names, structure and properties—are compiled in the Supplementary materials (Table 1S). Detailed information on the sources of the target compounds is provided in our recent works [17, 19, 20].

Chromatographic investigations

Chromatographic analyses were carried out on a Waters SM 2690 Alliance HPLC system equipped with a PDA 996 spectroscopic diode detector and a Compaq Deckpro computer on which the Millennium 3.2 program for data collection and process control was installed. Six HPLC stainless steel columns were applied: Discovery HS PEG of 150 × 4 mm i.d. (Supelco) packed with silica gel chemically bounded with polyethylene glycol of particle size 5 μm (denoted further on as LKD); Hypersil HSA of 50 × 4.6 mm i.d. (Thermo-Hypersil-Keystone, Cheshire, UK) packed with silica gel bounded with human blood serum albumin of particle size 7 μm (denoted further on as LKH); IAM PC C10/C13 of 150 × 4.6 mm i.d. (Regis Chemical Company, Morton Grove, IL, USA) packed with silica gel chemically bounded through propylamine with phosphatidylcholine and subsequently endcapped through unreacted propylamine with methyl glycolate of particle size 12 μm (denoted further on as LKIAM); IC Pack Cation M/D of 150 × 3.9 mm i.d. (Waters) packed with silica gel chemically bounded with copolymer of butadiene and maleinic acid of particle size 5 μm (denoted further on as LKIC); Nucleosil 100-5 OH (Diol) of 250 × 4 mm i.d. (Macherey-Nagel, Düren, Germany) packed with silica gel chemically bounded with propylene glycol (propanediol) of particle size 5 μm (denoted further on as LKN); and Spheri-10 Anion AX-MP of 100 × 4.6. mm i.d. + 30 × 4.6. mm i.d. precolumn (Brownlee Laboratories, Santa Clara, CA, USA) packed with silica gel chemically bounded with polyethyleneimine of particle size 10 μm (denoted further on as LKS).

The compounds investigated were subjected to chromatographic analyses in isocratic conditions at ambient temperature (20 °C). The mobile phases were as follows: 0.025 M phosphate buffers of pH 2.5 and 7.0 (buffers of required pH were obtained by adding adequate portion of H3PO4 to 0.025 M aqueous solution of NaH2PO4) in the case of analyses carried out on the LKD column; acetonitrile 0.025 M phosphate buffers of pH 2.5 and 7.0 mixed in proportion 10:90 in the case of analyses carried out on the LKS, LKIC, and LKN columns or 20:80 in the case of analyses carried out on the LKIAM column; and 1-propanol:0.05 M phosphate buffer of pH 7.0 mixed in proportion 5:95 in the case of analyses carried out on the LKH column. All the mobile phases were filtered through a Whatman GF/F glass micro filter and degassed by ultrasonication immediately before use. The compounds investigated were dissolved in solvents present in mobile phases. The detection wavelength was 220 nm.

The HPLC retention factors (k) of the target compounds, equal to (tR − t0)/t0, were assessed on the basis of measured retention times (tR) and dead times (t0) by the Knox and Kaliszan method [21]. Since the amount of solvents in mobile phases never exceeded 20%, it was assumed that decadic logarithms of these factors, named subsequently retention parameters (logkw), correspond to pure buffers (subscript w indicates water). Thus, determined retention parameters, summarized in Table 2S (Supplementary materials), can be identified as logarithms of n-octanol/water partition coefficients [15]. The naming symbol of the retention parameter refers to the symbol of the column and round off value of buffer pH (e.g., logkwLKD2).

Physicochemical and spectral characteristics

Detailed information concerning the physicochemical and spectral characteristics of the investigated compounds used in chemometric analysis is given in Table 3S (Supplementary materials) [17, 19, 20]. The values of these characteristics are compiled in Table 4S (Supplementary materials).

Chemometric analysis

To find relations between determined retention parameters and physicochemical/spectral (predicted/measured) characteristics, the Quantitative Property-Retention Relationship (QPRR) approach was employed [17] for each of 11 experimental sets of data corresponding to six polar and semipolar columns with selected mobile phases and pH conditions. Following this approach, it was revealed that the GA-MLR method makes up a satisfactory tool for finding functional relations (models) between values of logkw and values of physicochemical/spectral characteristics (descriptors). In a result, three models relating the logkwLKD7, logkwLKIC2, and logkwLKN7 with the abovementioned descriptors were developed. Values of descriptors were, in each case, autoscaled. Genetic Algorithm (GA) was employed to select optimal set of descriptors [22]. The set of parameters applied to control GA was the size of population—100 and the mutation rate—45%. All chemometric calculations were done using QSARINS software [23]. The model fitting, robustness and predictive abilities were assessed by recommended procedures described elsewhere [24,25,26,27,28]. The applicability domain was evaluated by the Williams plot approach [29,30,31]. The Y-scrambling procedure was applied to confirm the model’s robustness.

Results and discussion

Resolving ability of HPLC columns

Unique collection of retention parameters (Table 2S in the Supplementary materials) provides an opportunity for evaluation of separating ability of columns used in HPLC analyses of ampholytic substances. A measure of the abovementioned feature can be the gap between logkw values for a given column and conditions. Values of this quantity compiled in Table 5S (Supplementary materials) reveal that the largest gap, and thus, the highest separating ability possess LKD, LKH, LKIC, and LKS columns, lower separating ability possesses LKIAM column and the lowest—LKN column. The above information forms a useful framework for planning chromatographic analyses and modeling conditions for separation of analyte components.

Relevance of retention parameters in the context of lipophilicity

Results of numerous investigations supported by OECD recommendations indicate that retention parameters (logkw) satisfactory approximate logarithms of n-octanol/water partition coefficients [32] which reflect lipophilicity of molecular systems, among other biologically active substances of pharmacological relevance. This important property determines features and behavior of biomolecules, e.g., ability to interact with proteins [32], and pharmacological and toxicological potency [33]. Retention parameters obtained in this work (Table 2S in the Supplementary materials) are a unique collection of data which carry information on lipophilicity of the ampholytic compounds investigated. This information refers to the certain pH values, 2.5 and 7.0, which correspond roughly to acidity of a medium in various parts of the alimentary canal, namely stomach (pH ~ 1–2) and intestine (pH ~ 6.8–7.4) [17]. The data collected can thus be helpful in explaining the pharmacological potency of the target compounds.

QPRR results

The GA-MLR methodology was employed to develop QPRR models allowing prediction of three chromatographic parameters: logkwLKD7, logkwLKIC2, and logkwLKN7 for ampholytic compounds based on their physicochemical characteristics.

Obtained models (equations) together with their statistical characteristics are presented in Table 1. Other details related to the models are summarized in Tables 6S–8S (Supplementary materials). The most optimal model developed for the LKD7 column is comprised of three descriptors: BE—binding energy (obtained with the AM1 method), TEAI—total energy (obtained at the ab initio level of theory), and Eabs (pH 7.0)—the energy of long wavelength absorption in buffer of pH 7.0. In the case of the LKIC2 column, the model is based on two descriptors: POL—polarizability (calculated with the AM1 method) and HE—hydration energy (obtained by QSAR approach). In the case of the third column, the lipophilicity (logkwLKN7) is defined as a function of three descriptors: OSMAX—the oscillator strength corresponding to the first long wavelength electronic transition (calculated by AM1/CI), POLAI—polarizability (calculated with HF method), and Eabs (pH 7.0)—the energy of long wavelength absorption in buffer of pH 7.0.

Table 1 Information on models relating logkw with physicochemical descriptors. Models were applied to predict the logkw for 6 arbitrary compounds (predicted values are provided in Tables 6S–8S; the structures of these chemicals are provided in Table 1S)

High values of R2, Q2CV, and Q2Ext and relatively low values of errors represented by RMSEC, RMSECV, RMSEP confirm satisfactory the goodness-of-fit, robustness, and predictive ability of the developed QPRR models. The quality of the models is additionally proved by the visual correlation between the observed and predicted values of retention parameters for the training (T) and validation (V) sets (Fig. 1). Scrambling procedure confirms that models are not correlated by chance (Fig. 1S in the Supplementary materials).

Fig. 1
figure 1

Calculated versus observed values for models summarized in Table 1

Williams plots were generated to verify the applicability domain of the developed models (Fig. 2). According to this approach, if residuals do not differ by ± 3 standard deviations from the mean value and are similar for the sets of training and validation compounds (this similarity is expressed by so-called leverages), then the models (equations) can be used to predict logkw of untested compounds. Since no outlying predictions are observed in Fig. 2 one can conclude that the developed models present satisfactory predictive capability.

Fig. 2
figure 2

Williams plot: standardized residuals versus leverages. Solid lines indicate ± 3 standard deviation units, dashed line indicates the threshold value for models summarized in Table 1

Descriptors relevance

All presented models reflect the influence of physicochemical/spectral characteristics on the lipophilicity defined as logkw. The models for LKD7 and LKN7 columns relate negatively respective logkw values with spectral characteristics: Eabs (pH 7.0) depending on the pH (in both cases) and OSMAX (in the latter case), namely values of the logkw increase with the decrease of values of these latter descriptors. Physicochemical descriptors POL and POLAI representing polarizability of molecules, which influence positively logkw values for LKIC and LKN columns, respectively, are known to have an impact on the overall hydrophobic properties of molecules [34]. According to the developed models, an increase of polarizability of molecules causes an increase of their lipophilicity, which is well-reflected in the data, e.g., flamequine, theobromine, and L-levodopa. The model for LKIC2 relates positively relevant logkw values with hydration energy (HE), reflecting the energy released in a solvation process [35]. Thus, compounds exhibiting a higher tendency to hydration, such as L-methionine, piroxicam or p-aminophenol, should be more lipophilic. Total energy (TEAI) and binding energy (BE), which influence logkwLKD7 values, are a measure of stability of molecules and to some extent their susceptibility to interact with molecules from surroundings, e.g., water molecules. According to the developed equation, more stable molecules are more lipophilic (and more hydrophobic), e.g., hydrochlorothiazole, sulfathiazole, and sulfanilamide. The above discussion demonstrates that found dependencies of logkw on physicochemical/spectral descriptors can be a useful tool for predicting of features and behavior of unknown ampholytic compounds.

Concluding remarks

On the basis of the QPRR approach, three models relating the logkw values with physicochemical (predicted/spectral) parameters of ampholytic biologically active substances of pharmacological relevance were developed using GA-MLR method. These models concern three chromatographic columns, among six selected for investigations, and three variants of experimental conditions, among 11 considered. From this information emerges that the occurrence of the correlation between the abovementioned quantities is not a common case.

The QPRR models form a useful platform for predicting retention parameters of untested compounds having common features (e.g., ampholyticity) with the tested compounds. This approach, utilizing physicochemical characteristics of the compounds as chemometric descriptors, instead of structural ones used in QSRR models, allows extending the applicability of the developed models for predicting retention parameters of structurally diversified substances.

The relationships found can be used to gain pharmacologically interesting information on the biologically active ampholytic substances. The log kw values determined in this work are a unique collection of chromatographic parameters which when used as a measure of lipophilicity can be very helpful in assessing pharmacological potency of the compounds investigated.