Introduction

Chromatography provides qualitative and quantitative information about analyzed sample. Qualitative information involves specified location of the peak in the chromatogram so that it is possible to recognize the type of separated substances, chemical structure of the molecules, physicochemical properties, and the number of peaks indicating the number of components in the mixture (Malviya et al. 2010).

The data obtained during the chromatographic analysis as well as data from other complex analyses are often assisted by chemometric methods. Chemometric is the field of science and technology based on extraction of useful information from a multi-dimensional data matrix, employing methods from statistics and mathematics. Chemometrics is considered as essential and indispensable tool in many areas of pharmacy such as drug synthesis, development and design, structure–activity relationships, and drug mechanism. Chemometrics is a breakthrough in searching for new drugs, because it incorporates mathematical accuracy, statistical indices, and biological and/or chemical information. With the help of diverse software and chemometric methods, researchers have obtained faster and more accurate results, which represent an exponential progression in Drug Discovery (Kaliszan 1998; Marini 2010; Mocak 2012; Scotti and Scotti 2015). The combination of chromatography and chemometric has been introduced in order to provide information about drug analytes, biological macromolecules, the stationary phase, whose properties are related to molecular pharmacology and rational drug design (Booth and Wainer 1996; Norinder and Hermansson 1991; Wolf et al. 1988). Proper processing of chromatographic data can increase the speed and efficiency of searching for new drugs. Whole process is based on the assumption on quantitative relationship between the chemical structure of xenobiotics and their ability for intermolecular interactions. At the same time, it reduces the number of tests involving living organisms. In addition, chromatographic and activity data can reveal information about the drug and investigate the compound’s affinity to the stationary phase (Booth and Wainer 1996; Kaliszan 1998; Wolf et al. 1988). Moreover, chemometric methods facilitate processing of the data for further analyses. Principal Component Analysis (PCA) is the most commonly used chemometric technique. Norinder et al. have applied PCA, to extract the most important factors, further used to establish the regression equation in order to predict the enantioselectivity α, whereas Kumar et al. have applied PCA for classifying aqueous herbal drugs as well as diagnosis and therapeutic prognosis of oral sub-mucous fibrosi (Bober et al. 2011; Koba and Baçzek 2012) (Koba and Baczek 2010; Koba and Ba̧czek 2012a, b; Koba et al. 2010b; Norinder and Hermansson 1991; Stasiak et al. 2010; Koba et al. 2010a; Kumar 2017).

Chemometric methods are also useful in case of increasing signal-to-noise ratio, removing undesired effects from data, or peak alignment. With the use of projection method and clustering, multidimensional chemical data can be effectively explored. Various strategies of chemometric data are invaluable when calibration and classification models are to be developed (Stasiak et al. 2010). In addition, chemometric methods facilitate the interpretation of the data by identifying relevant variables, e.g. potential biomarkers, organic and inorganic impurities in active substances and ready-to-use drug forms (Cserháti and Szőgyi 2012).

Materials and methods

HPLC analysis

In this study, lipophilicity data for 15 selected anti-cancer compounds, particularly: (1) Nimustine, (2) Actinomycin D, (3) Irinotecan, (4) Daunorubicin, (5) Doxorubicin, (6) Idarubicin, (7) Melphalan, (8) Mitomycin C, (9) Vinorelbine, (10) Pirarubicin, (11) Docetaxel, (12) Vincristine, (13) Vindesine, (14) Vinblastine and (15) Etoposide, were used (Fig. 1a–n). These drugs are derived from many taxonomic groups, and many differences in their mechanisms of action and therapeutic effects were reported. All cytostatics used in experiment were purchased from Sigma-Aldrich (St. Louis, USA) and were used as solutions of a concentration 0.1 mg/ml dissolved in DMSO.

Fig. 1
figure 1figure 1

a–n The chemical structures of the tested compounds (a nimustine, b actinomycin D, c irinotecan, d daunorobicin, e doxorucin, f idarubicin, g melphalan, h mitomycin D, i vinorelbine, j pirarubicin, k docetaxel, l vincristine, m vinblastine, n etoposide)

The solvents applied in HPLC analysis were as follows: acetonitrile, water and formic acid, obtained from POCH (Gliwice, Poland). To perform chromatographic measurements, HPLC Shimadzu system (Kyoto, Japan) was used. HPLC was equipped with degasser (DGU-20A5), eluent feed pump (LC-20AD) and autosampler (SIL-20AD) equipped with syringe, detector (SPD-M20A), column oven (CTO-20AC), and recorder (CBM-20A). Data were collected with the use of Solution LC software (version 1.0.0.1 Kyoto, Japan). The compounds were analyzed using the following chromatographic columns: Luna 5u C18 (2) 100 A 150 × 4.6 mm (Phenomenex, Torrance, CA, USA), Cadenza CD-C18 150 × 4.6 mm size (Imtakt Corporation, Kyoto, Japan), TSK-gel ODS-80Ts 150 × 4.6 mm (Tosoh Corporation, Tokyo, Japan), Ascentis C18 150 × 4.6 mm (Supelco, Bellefonte, PA, USA) and Unison UK-C18 150 × 4.6 mm (Imtakt Corporation, Kyoto, Japan).

The compounds were analyzed using RP-HPLC system with a gradient elution utilizing solvent A (water + 0.1% formic acid) and solvent B (acetonitrile + 0.1% formic acid). Measurement of the retention times (Rt) was performed with a linear gradient from 5–100% solvent B at two different time points (tg was 10 or 30 min). Obtained retention times were used as input data to determine the values of log kw by DryLab 6.0 program (Monar-Institute, Berlin, Germany). The parameters of log kw, log k0, S-value determined for the test drugs analyzed on five different stationary phases are shown in Tables 1, 2 and 3. All measurements were performed at a flow rate of 1 ml/min and wavelength of 254 nm. The volume of eluent contained in the system between the point where the gradient is formed and the point, at which the mobile phase is introduced into the column, was 0.745 ml. The dead time for all HPLC columns was determined by injecting solvent B. The dead time for Luna C18, Cadenza CD-C18, TSK-gel ODS-80Ts, Ascentis C18 and Unison UK-C18 was estimated at 2.123 min; 1.837 min; 2.621 min; 1.918 min and 1.754 min, respectively.

Table 1 Comparison of lipophilicity parameter—log kw for HPLC columns, used in chromatographic analysis
Table 2 Comparison of lipophilicity parameter—log k0 for HPLC columns, used in chromatographic analysis
Table 3 Comparison of lipophilicity parameter—S-value. for HPLC columns, used in chromatographic analysis

Lipophilicity parameters

Parameters, which characterized the lipophilicity (log P) of the test compounds were calculated using Virtual Computational Chemistry Application Laboratory (VCCLab, https://www.vcclab.org/). To establish the ability for interactive prediction of log P values, algorithms such as Pharma (parameter AC_log P), Actelion, Molinspiration, accompanying software Dragon X (parameters miLOGP, KOWWIN) and XLOGP2 and ACOGP3 programs (parameters XLOGP2, XLOGP3) were used (Table 4). The chemical structure of the tested cytostatic was established by molecular modeling using HyperChem 8.0.8 software (Hypercube, Gainesville, USA) and Dragon (Talete, Milan, Italy). Initially, using procedure based on Molecular Mechanics Force Field (MM +), a preliminary optimization of structure of the analyzed compounds was conducted and the results were further analyzed using semi-empirical AM1 method involving the Polak-Ribiere algorithm and the limit of decrease of energy values determined at the 0.01 kcal Å−1.

Table 4 Comparison of log P parameters for tested compounds

Structural descriptors

Structural parameters such as total energy (TE), the binding energy (BE), unit atomic energy (IAE), electron energy (EE), core–core interaction energy (CCI energy), heat flow (HF) and the energy of the highest occupied molecular orbital (HOMO) and lowest occupied molecular orbital (LUMO), were calculated by HyperChem. In addition, value of the highest positive (Cm +) or negative (Cm −) charged electron atoms, distribution of dipole moments with respect to axes X, Y, Z (ox, oy, oz), as well as the total dipole moment (TDM) were calculated. The above-described descriptors were extended by parameters obtained from the “QSAR Properties” module of HyperChem 7,5 Software (HyperCube Inc.), particularly particle surface area available to the solvent (SAS), the surface of Van der Waals bonds (VDWS), molecular volume (V), hydration energy (HE), refraction (R), polarizability (P), atomic weight (MA) and the parameter of lipophilicity (log P). The data are summarized in Tables 5 and 6. Statistical analysis was performed with the use of R® Software (https://www.r-project.org/). The input log kw data have been centered prior to PCA. Presented data have the same units of measure so covariance matrix has been computed from the mean centered data matrix by R® software. All data before the statistical analysis were standardized and the analyses were performed at the significance level of 5% (α = 0.05).

Table 5 Values of descriptors obtained from HyperChem QSAR properties module
Table 6 Values of descriptors obtained from the HyperChem Software

Results

Statistical analysis

Classification of compounds using principal component analysis (PCA) based on experimentally determined lipophilicity parameters

Principal Component Analysis was performed for data from Tables 1, 2 and 3, with the use of R® Software. To investigate how many factors are statistically significant, auxiliary scree plot (Fig. 2) was presented. PCA analysis indicated a set of four factors for the interpretation. According to the Kaiser criterion (which assumes that one factor distinct one variable, eigenvalue of the factor should be greater than 1), only two initial factors would be statistically significant. Therefore, the next two should be rejected. The first factor explains 74% of the total variance and the second one 20%. In total, it is more than 94% of all analyzed variables. The second criterion is the scree plot (Fig. 2), for which one should find the point where decrease in variable value is observed. As compared to the Kaiser criterion, scree plot suggests that there are even three statistically significant factors. However, only Factors 1 and 2 were selected for further analysis.

Fig. 2
figure 2

Scree plot that shows which factors should be classified to further analysis (based on experimentally determined lipophilicity parameters)

PCA analysis

Obtained loadings plot (Fig. 3a) illustrates the relationship between introduced variables and interpreted factors. Analysis of these loadings shows that Factor 1 is more correlated with S-value and its derivatives (which are clustered together), while Factor 2 is correlated with log kw and log k0 variables. Highlighted values will be taken into account when interpreting the results.

Fig. 3
figure 3

a Two-dimensional scatter plot of the scores presented classification of individual compounds using principal components analysis (PCA) based on experimentally determined lipophilicity parameters. b Two-dimensional scatter plot of the scores presented classification of individual compounds using factor analysis (FA) based on experimentally determined lipophilicity parameters

In Fig. 3b classification of individual groups of drugs is presented. The main criteria were PCA analysis and also the mechanism of action and clinical application of drugs. PCA allowed to group 9 compounds: (10) Pirarubicin, (6) Idarubicin, (4) Daunorubicin, (9) Vinorelbine, (14) Vinblastine, (12) Vincristine, (5) Doxorubicin, (3) Irinotecan. This group contains compounds whose mechanism of action is inhibition of topoisomerase I and II or intercalation in DNA. The resulting grouping contains compounds that are bounced into anthracyclines or alkaloids.

Second generations of anthracyclines (called also anthracycline antibiotics, derived from Streptomyces bacterium) were classified into one cluster due to the uniform mode of action, together with alkaloids derived from the periwinkle plant Catharanthus roseus—Vinblastine, Vinorelbine and Vincristine. All these mentioned cytostatics are used to treat leukemias, lymphoproliferative units, testicular cancer, breast cancer, lung cancer and bladder cancer. Presented main cluster contained also Irinotecan, which is a semisynthetic analogue of the natural alkaloid camptothecin. Its mechanism of action depends on preventing DNA from relaxing by inhibition of topoisomerase I. It is also used in the treatment of breast and lung tumors.

All these clustered variables presented on score plot (Fig. 3b) were characterized by low values of both—Factor 1 and Factor 2.

Classification of compounds based on experimentally determined lipophilicity parameters, together with lipophilicity obtained by computational methods

Principal Component Analysis for the data in Tables 1, 2, 3 and 4, was performed in R® Software. To investigate how many factors are statistically significant, the scree plot was presented (Fig. 4). Five factors should have been subjected for further analysis. To extract the most statistically significant factors Kaiser Criterion was applied. There are three factors for which the eigenvalue is near to 1. Although factor 3 explained only 7.7% of the total data variance, however, after summing Factor 1 and Factor 2, more than 86% of the information contained in the input data was included; therefore the analysis will be conducted for Factor 1 and Factor 2.

Fig. 4
figure 4

Scree plot that shows which Factors should be classified to further analysis (based on experimentally determined lipophilicity parameters, together with lipophilicity obtained by computational methods)

PCA analysis

After analysis of loadings (Fig. 5a), it was observed that higher correlation occurs between Factor 1 and S-value variables. New interpretable variables are XLOGP2 and Average log P, which are also correlated with Factor 1. Moreover, occurrence of such many derivatives of partition coefficient parameter may indicates that Factor 1 depends on increasing lipophilicity and thus increases the solubility of the drug. Factor 2 included mainly log kw parameter for all used HPLC columns (which may possible to draw conclusion that used columns are sufficiently selective to separate all analytes).

Fig. 5
figure 5

a Two-dimensional scatter plot of the scores presented classification of individual compounds using principal component analysis (PCA) based on experimentally determined lipophilicity parameters, together with lipophilicity obtained by computational methods; Factor 2 relative to Factor 1. b Two-dimensional scatter plot of the scores presented classification of individual compounds using Factor Analysis (FA) based on experimentally determined lipophilicity parameters, together with lipophilicity obtained by computational methods; Factor 2 in relative to Factor 1

Factor 1 is associated with parameters that allow predicting lipophilicity and its presence will have a decisive influence on the permeability of the drug into the cerebrospinal fluid. The classification will be difficult; however, the main criterion, due to PCA analysis, the mechanism of action and clinical application, remains unchanged.

The proposed method of classification is presented on Fig. 5b. Cluster “A” consists of (15) Etoposide, (7) Melphalan and (5) Doxorubicine, assigned according to their clinical use, main cancer of the hematopoietic tissues. All classified drugs indicated different mechanism of action. Cluster B consists of (4) Daunorubicin, (6) Idarubicin, (10) Pirarubicin, and (11) Docetaxel. Drugs classified into cluster B have similar medical use. This compound are used in therapy of head and neck cancer or breast cancer. Cluster C includes (3) Irinotecan, (12) Vincristine, (14) Vinblastine, (9) Vinorelbine and (13) Vindesine. Cluster C drugs are used in the treatment of lung cancer.

Classification based on all lipophilicity parameters in combination with molecular descriptors

In this case, Principal Component Analysis was performed in R® software for data from Tables 1, 2, 3 and 6. Obtained eigenvalues and the scree plot (Fig. 6) of the test data set were presented in this section. Lipophilicity parameters in combination with the data obtained by HyperChem 7.5 Software (HyperCube Inc.) were analyzed. After this analysis, six statistically significant factors were isolated. However, after application of the Kaiser criterion, the number of factors was reduced to main two factors, and the total value of variance explained by these factors was about 70%.

Fig. 6
figure 6

Scree plot that shows which Factors should be classified to further analysis (based on experimentally determined lipophilicity parameters in combination with molecular descriptors)

PCA analysis

As was observed previously, Factor 1 was related to lipophilicity parameters such as log kw, log k0, also, Factor 2 was correlated with S-value. The proposed method of classification is presented on Fig. 7a. Hydration energy (HE) was strongly correlated. Moreover, it is responsible for exceeding the energy of the crystal structure and therefore facilitates the dissolution of the compound. Log P and hydration energy influenced the ability to dissolve substances and their penetration through biological membranes. Parameters such as refractive index (R), polarizability (P), atomic weight (mass amu), volume (V), surface bonding and Van der Waals surface, were dependent on Factor 1, and these values described (1) the ability of a molecule to deformation under the influence of the electric field, (2) susceptibility to change the angle of rotation, (3) amount of Van der Waals bonds and (4) solvent accessible surface area. This surface is relatively large; thus we could infer that dissolution of the compound should be very simple. Total dipole moment (TDM), which is the vector of the sum of bonds, described compound’s polarity. Parameter, which described Atom’s negative charge depended on Factor 1. The highest HOMO orbital, correlated with Factor 2, was totally filled up by electrons. It plays a significant role in the occurrence of new and breaking old chemical bonds. The total energy (TE), binding energy (BE), electron energy (EE) and nuclear energy (IAE) values are positive, so every bond in the molecule is stable and strong, and also depends on value of Factor 1. Additionally, C–C interactions are negative, which can indicate that it could be easy to tear weak bonds between cores. Total energy, binding energy, electric energy or C–C interactions will depend on Factor 1. For Actinomycin D, Vinorelbine, Pirarubicine, Docetaxel, Vincristine and Vinblastine energies are negative, which means that these substances are present in bound form. Hydration energy, correlated with Factor 2, in Daunorubicin, Doxorubicin, Idarubicin, Melphalan, Pirarubicin, Docetaxel and Etoposide, so it is sufficient to disrupt the bonds in the crystal structure and increase solubility defined by total dipole moment, which divides the drugs as less and more polar. The division is dependent on increasing electronegativity between elements, which form the compound.

Fig. 7
figure 7

a Two-dimensional scatter plot of the scores presented classification of individual compounds using factor analysis (FA) based on all used descriptors; Factor 2 relative to Factor 1. b Two-dimensional scatter plot of the scores presented classification of individual compounds using factor analysis (FA) based on all used descriptors; Factor 2 in relative to Factor 1

PCA allowed to observe two groups of compounds (Fig. 7b: Cluster “A” [(4) Daunorubicin, (5) Doxorubicin, (6) Idarubicin, (15) Etoposide, (10) Pirarubicin] and Cluster “B” [(3) Irinotecan, (9) Vinorelbine, (12) Vincristine, (13) Vindesine, (14) Vinblastine]. All drugs from cluster A are used in treatment of acute leukemia and drugs from cluster “B”—in lymphoma.

Discussion

In this study, antitumor activity of 15 test drugs was evaluated. Classification of these cytostatic drugs according to the mechanism of action and clinical application by principal components analysis was conducted. The principle of PCA method is to reduce the large amount of data and present it in the form of several main factors, explaining the total variance. For this purpose, data on retention parameters, lipophilicity parameters and structural descriptors (describing the chemical structure) were calculated. In next step, chemical structure, anti-tumor activity and other properties were discussed, based on the obtained results from PCA.

Figures 3a and 5a showed lipophilicity parameters such as log kw, log k0 and S-value. After analysis of both, we assumed that log kw and log k0 values have been correlated with Factor 1. This may indicate that this factor is responsible for drug dissolution as well as better absorption into the tissues. By contrast, Factor 2 is related to S-value. This parameter is responsible for solubility in aqueous or oily solution.

Afterwards, combination of retention parameters together with the values characterizing the lipophilicity log P have been considered, which led to obtain various results. Lack of the rotation indicated that Factor 1 is connected with XLOGP3 and Average log P variables. After application of the Varimax rotation, parameters such as AClogP, miLogP, XLOGP2 and Average log P were assigned to Factor 3. We might assume that the division of these values between two factors may result from antitumor activity. Its strength is primarily based on the impact of Factor 3 and less from other factors. The obtained activity of tested substances strictly depends on the lipophilicity and is likely to be related to the distribution and accumulation in the cells. In pharmacy, Log P is a very important parameter, which can predict rate of drug’s distribution in the human body, the dose and toxicity as well.

Considering the juxtaposition of several retention parameters with molecular descriptors, it can be observed that without and with Varimax rotation approach, five statistically significant factors were obtained. Descriptors which described the features of the investigated molecules are replaced by numerical values. In this paper descriptors belonging to the class of constitutional (calculated on the basis of the molecular formula), geometry (describing the size and shape of the molecule), quantum—mechanical (characterized by semi-empirical methods; molecular orbital energies) and electrostatic (which show charge distribution in the molecule) group were used. Geometric configuration of drug’s chemical structure may have an important role affecting the properties of the tested anticancer drugs.

In each example, classification of these pharmaceuticals based on their values of charges was proposed. One can postulate that clusters formed allow better rationalization of therapy and contribute to more effective treatment.

Conclusions

The present study confirms that lipophilicity is one of the key parameters characterizing the usefulness of active substances that confirms state of the art in the field of drug design. Moreover, the study indicates that the use of PCA can effectively allow the classification of chemical compounds on the basis of computational data (including lipophilicity) not only due to the spatial structure, but also due to the mechanism of action and potential therapeutic application. Drug distribution defined by two or even five factors (obtained from data set consisting of lipophilicity parameters and molecular descriptors) is sufficient to help create new drugs in full compliance with both, chemical structure and antitumor activity. It was proven that the therapeutic effect of the analyzed compounds is closely related to their lipophilicity and should not preclude the existence of relationship between lipophilicity and transport of the drug and their eventual accumulation in cells. PCA, and FA may be used for drugs’ pre-elimination in therapy according to classification of therapeutic use, or mechanism of action. In addition, this method is compared with QSAR analysis which gives favorable results in the design of new drugs and is helpful in exploring strategies for predicting their activity.