Pharmacological classification of anticancer drugs applying chromatographic retention data and chemometric analysis

The combination of chromatography and chemometrics has been introduced in order to provide information about drug analytes, biological macromolecules, the stationary phase, whose properties are related to molecular pharmacology and rational drug design. In this work a pharmacological classification was made in order to find strategy for predicting an activity of anticancer drugs. Principal Component Analysis (PCA) method has been employed to build some relationship models between the following: lipophilicity parameters of selected antitumor drugs obtained by chromatographic analysis, some molecular modeling descriptors from HyperChem software, lipophilicity parameters calculated by proper program, biological activity, as well as mechanism of actions and therapeutic properties of selected anticancer drugs. The most significant influence on factors value had parameters, which described compounds’ lipophilicity. Important debt on antitumor properties had some other molecular descriptors, which were considered in this paper (e.g. topological, physico-chemical, and energy ones). In addition, distribution of individual drugs on plots determined by two principal components had shown points in good accordance with their chemical structures and cytostatic activity as well as their mechanism of actions and therapeutic properties.


Introduction
Chromatography provides qualitative and quantitative information about analyzed sample. Qualitative information involves specified location of the peak in the chromatogram so that it is possible to recognize the type of separated substances, chemical structure of the molecules, physicochemical properties, and the number of peaks indicating the number of components in the mixture (Malviya et al. 2010).
The data obtained during the chromatographic analysis as well as data from other complex analyses are often assisted by chemometric methods. Chemometric is the field of science and technology based on extraction of useful information from a multi-dimensional data matrix, employing methods from statistics and mathematics. Chemometrics is considered as essential and indispensable tool in many areas of pharmacy such as drug synthesis, development and design, structure-activity relationships, and drug mechanism. Chemometrics is a breakthrough in searching for new drugs, because it incorporates mathematical accuracy, statistical indices, and biological and/or chemical information. With the help of diverse software and chemometric methods, researchers have obtained faster and more accurate results, which represent an exponential progression in Drug Discovery (Kaliszan 1998;Marini 2010;Mocak 2012;Scotti and Scotti 2015). The combination of chromatography and chemometric has been introduced in order to provide information about drug analytes, biological macromolecules, the stationary phase, whose properties are related to molecular pharmacology and rational drug design (Booth and Wainer 1 3 1996; Norinder and Hermansson 1991;Wolf et al. 1988). Proper processing of chromatographic data can increase the speed and efficiency of searching for new drugs. Whole process is based on the assumption on quantitative relationship between the chemical structure of xenobiotics and their ability for intermolecular interactions. At the same time, it reduces the number of tests involving living organisms. In addition, chromatographic and activity data can reveal information about the drug and investigate the compound's affinity to the stationary phase (Booth and Wainer 1996;Kaliszan 1998;Wolf et al. 1988). Moreover, chemometric methods facilitate processing of the data for further analyses.
Principal Component Analysis (PCA) is the most commonly used chemometric technique. Norinder et al. have applied PCA, to extract the most important factors, further used to establish the regression equation in order to predict the enantioselectivity α, whereas Kumar et al. have applied PCA for classifying aqueous herbal drugs as well as diagnosis and therapeutic prognosis of oral sub-mucous fibrosi (Bober et al. 2011; Koba and Baçzek 2012a, b;Koba et al. 2010b;Norinder and Hermansson 1991;Stasiak et al. 2010;Koba et al. 2010a;Kumar 2017). Chemometric methods are also useful in case of increasing signal-to-noise ratio, removing undesired effects from data, or peak alignment. With the use of projection method and clustering, multidimensional chemical data can be effectively explored. Various strategies of chemometric data are invaluable when calibration and classification models are to be developed (Stasiak et al. 2010). In addition, chemometric methods facilitate the interpretation of the data by identifying relevant variables, e.g. potential biomarkers, organic and inorganic impurities in active substances and ready-to-use drug forms (Cserháti and Szőgyi 2012).
The compounds were analyzed using RP-HPLC system with a gradient elution utilizing solvent A (water + 0.1% formic acid) and solvent B (acetonitrile + 0.1% formic acid). Measurement of the retention times (R t ) was performed with a linear gradient from 5-100% solvent B at two different time points (t g was 10 or 30 min). Obtained retention times were used as input data to determine the values of log k w by DryLab 6.0 program (Monar-Institute, Berlin, Germany). The parameters of log k w , log k 0 , S-value determined for the test drugs analyzed on five different stationary phases are shown in Tables 1, 2 and 3. All measurements were performed at a flow rate of 1 ml/min and wavelength of 254 nm. The volume of eluent contained in the system between the point where the gradient is formed and the point, at which the mobile phase is introduced into the column, was 0.745 ml. The dead time for all HPLC columns was determined by injecting solvent B. The dead time for Luna C18, Cadenza CD-C18, TSK-gel ODS-80Ts, Ascentis C18 and Unison UK-C18 was estimated at 2.123 min; 1.837 min; 2.621 min; 1.918 min and 1.754 min, respectively.

Lipophilicity parameters
Parameters, which characterized the lipophilicity (log P) of the test compounds were calculated using Virtual Computational Chemistry Application Laboratory (VCCLab, https ://www.vccla b.org/). To establish the ability for interactive prediction of log P values, algorithms such as Pharma (parameter AC_log P), Actelion, Molinspiration, accompanying software Dragon X (parameters miLOGP, KOW-WIN) and XLOGP2 and ACOGP3 programs (parameters XLOGP2, XLOGP3) were used ( Table 4). The chemical structure of the tested cytostatic was established by molecular modeling using HyperChem 8.0.8 software (Hypercube, Gainesville, USA) and Dragon (Talete, Milan, Italy). Initially, using procedure based on Molecular Mechanics Force Field (MM +), a preliminary optimization of structure of the analyzed compounds was conducted and the results were further analyzed using semi-empirical AM1 method involving the Polak-Ribiere algorithm and the limit of decrease of energy values determined at the 0.01 kcal Å −1 .

Structural descriptors
Structural parameters such as total energy (TE), the binding energy (BE), unit atomic energy (IAE), electron energy (EE), core-core interaction energy (CCI energy), heat flow (HF) and the energy of the highest occupied molecular orbital (HOMO) and lowest occupied molecular orbital (LUMO), were  , oz), as well as the total dipole moment (TDM) were calculated. The above-described descriptors were extended by parameters obtained from the "QSAR Properties" module of HyperChem 7,5 Software (HyperCube Inc.), particularly particle surface area available to the solvent (SAS), the surface of Van der Waals bonds (VDWS), molecular volume (V), hydration energy (HE), refraction (R), polarizability (P), atomic weight (MA) and the parameter of lipophilicity (log P). The data are summarized in Tables 5 and 6. Statistical analysis was performed with the use of R ® Software (https ://www.r-proje ct.org/). The input log k w data have been centered prior to PCA. Presented data have the same units of measure so covariance matrix has been computed from the mean centered data matrix by R ® software. All data before the statistical analysis were standardized and the analyses were performed at the significance level of 5% (α = 0.05).

Classification of compounds using principal component analysis (PCA) based on experimentally determined lipophilicity parameters
Principal Component Analysis was performed for data from Tables 1, 2 and 3, with the use of R ® Software. To investigate how many factors are statistically significant, auxiliary scree plot (Fig. 2) was presented. PCA analysis indicated a set of four factors for the interpretation. According to the Kaiser criterion (which assumes that one factor distinct one variable, eigenvalue of the factor should be greater than 1), only two initial factors would be statistically significant. Therefore, the next two should be rejected. The first factor explains 74% of the total variance and the second one 20%. In total, it is more than 94% of all analyzed variables. The second criterion is the scree plot (Fig. 2), for which one should find the point where decrease in variable value is observed. As compared to the Kaiser criterion, scree plot suggests that there are even three statistically significant factors. However, only Factors 1 and 2 were selected for further analysis.
PCA analysis Obtained loadings plot (Fig. 3a) illustrates the relationship between introduced variables and interpreted factors. Analysis of these loadings shows that Factor 1 is more correlated with S-value and its derivatives (which are clustered together), while Factor 2 is correlated with log k w and log k 0 variables. Highlighted values will be taken into account when interpreting the results. In Fig. 3b classification of individual groups of drugs is presented. The main criteria were PCA analysis and also the mechanism of action and clinical application of drugs. PCA allowed to group 9 compounds: (10) Pirarubicin, (6) Idarubicin, (4) Daunorubicin, (9) Vinorelbine, (14) Vinblastine, (12) Vincristine, (5) Doxorubicin, (3) Irinotecan. This group contains compounds whose mechanism of action is inhibition of topoisomerase I and II or intercalation in DNA. The resulting grouping contains compounds that are bounced into anthracyclines or alkaloids.

Number of Eigenvalue
Second generations of anthracyclines (called also anthracycline antibiotics, derived from Streptomyces bacterium) were classified into one cluster due to the uniform mode of action, together with alkaloids derived from the periwinkle plant Catharanthus roseus-Vinblastine, Vinorelbine and Vincristine. All these mentioned cytostatics are used to treat leukemias, lymphoproliferative units, testicular cancer, breast cancer, lung cancer and bladder cancer. Presented main cluster contained also Irinotecan, which is a semisynthetic analogue of the natural alkaloid camptothecin. Its mechanism of action depends on preventing DNA from relaxing by inhibition of topoisomerase I. It is also used in the treatment of breast and lung tumors.
All these clustered variables presented on score plot (Fig. 3b) were characterized by low values of both-Factor 1 and Factor 2.

Classification of compounds based on experimentally determined lipophilicity parameters, together with lipophilicity obtained by computational methods
Principal Component Analysis for the data in Tables 1, 2, 3 and 4, was performed in R ® Software. To investigate how many factors are statistically significant, the scree plot was presented (Fig. 4). Five factors should have been subjected for further analysis. To extract the most statistically significant factors Kaiser Criterion was applied. There are three factors for which the eigenvalue is near to 1. Although factor 3 explained only 7.7% of the total data variance, however, after summing Factor 1 and Factor 2, more than 86% of the information contained in the input data was included; therefore the analysis will be conducted for Factor 1 and Factor 2. Fig. 3 a Two-dimensional scatter plot of the scores presented classification of individual compounds using principal components analysis (PCA) based on experimentally determined lipophilicity parameters. b Two-dimensional scatter plot of the scores presented classification of individual compounds using factor analysis (FA) based on experimentally determined lipophilicity parameters PCA analysis After analysis of loadings (Fig. 5a), it was observed that higher correlation occurs between Factor 1 and S-value variables. New interpretable variables are XLOGP2 and Average log P, which are also correlated with Factor 1. Moreover, occurrence of such many derivatives of partition coefficient parameter may indicates that Factor 1 depends on increasing lipophilicity and thus increases the solubility of the drug. Factor 2 included mainly log k w parameter for all used HPLC columns (which may possible to draw conclusion that used columns are sufficiently selective to separate all analytes).
Factor 1 is associated with parameters that allow predicting lipophilicity and its presence will have a decisive influence on the permeability of the drug into the cerebrospinal fluid. The classification will be difficult; however, the main criterion, due to PCA analysis, the mechanism of action and clinical application, remains unchanged.
The proposed method of classification is presented on Fig. 5b. Cluster "A" consists of (15) Etoposide, (7) Melphalan and (5) Doxorubicine, assigned according to their clinical use, main cancer of the hematopoietic tissues. All classified drugs indicated different mechanism of action. Cluster B consists of (4) Daunorubicin, (6) Idarubicin, (10) Pirarubicin, and (11) Docetaxel. Drugs classified into cluster B have similar medical use. This compound are used in therapy of head and neck cancer or breast cancer. Cluster C includes (3) Irinotecan, (12) Vincristine, (14) Vinblastine, (9) Vinorelbine and (13) Vindesine. Cluster C drugs are used in the treatment of lung cancer.

Classification based on all lipophilicity parameters in combination with molecular descriptors
In this case, Principal Component Analysis was performed in R ® software for data from Tables 1, 2, 3 and 6. Obtained eigenvalues and the scree plot (Fig. 6) of the test data set were presented in this section. Lipophilicity parameters in combination with the data obtained by HyperChem 7.5 Software (HyperCube Inc.) were analyzed. After this analysis, six statistically significant factors were isolated. However, after application of the Kaiser criterion, the number of factors was reduced to main two factors, and the total value of variance explained by these factors was about 70%.

PCA analysis
As was observed previously, Factor 1 was related to lipophilicity parameters such as log k w , log k 0 , also, Factor 2 was correlated with S-value. The proposed method of classification is presented on Fig. 7a. Hydration energy (HE) was strongly correlated. Moreover, it is responsible for exceeding the energy of the crystal structure and therefore facilitates the dissolution of the compound. Log P and hydration energy influenced the ability to dissolve substances and their penetration through biological membranes. Parameters such as refractive index (R), polarizability (P), atomic weight (mass amu), volume (V), surface bonding and Van der Waals surface, were dependent on Factor 1, and these values described (1) the ability of a molecule to deformation under the influence of the electric field, (2) susceptibility to change the angle of rotation, (3) amount of Van der Waals bonds and (4) solvent accessible surface area. This surface is relatively large; thus we could infer that dissolution of the compound should be very simple. Total dipole moment (TDM), which is the vector of the sum of bonds, described compound's polarity. Parameter, which described Atom's negative charge depended on Factor 1. The highest HOMO orbital, correlated with Factor 2, was totally filled up by electrons. It plays a significant role in the occurrence of new and breaking old chemical bonds. The total energy (TE), binding energy (BE), electron energy (EE) and nuclear energy (IAE) values are positive, so every bond in the molecule is stable and strong, and also depends on value of Factor 1. Additionally, C-C interactions are negative, which can indicate that it could be easy to tear weak bonds between cores. Total energy, binding energy, electric energy or C-C interactions will depend on Factor 1. For Actinomycin D, Vinorelbine, Pirarubicine, Docetaxel, Vincristine and Vinblastine energies are negative, which means that these substances are present in bound form. Hydration energy, correlated with Factor 2, in Daunorubicin, Doxorubicin, Idarubicin, Melphalan, Pirarubicin, Docetaxel and Etoposide, so it is sufficient to disrupt the bonds in the crystal structure and increase solubility defined by total dipole moment, which divides the drugs as less and more polar. The division is dependent on increasing electronegativity between elements, which form the compound. PCA allowed to observe two groups of compounds ( Fig. 7b: Cluster "A" [(4) Daunorubicin, (5) Doxorubicin, (6) Idarubicin, (15) Etoposide, (10) Pirarubicin] and Cluster "B" [(3) Irinotecan, (9) Vinorelbine, (12) Vincristine, (13) Vindesine, (14) Vinblastine]. All drugs from cluster A are used in treatment of acute leukemia and drugs from cluster "B"-in lymphoma.

Discussion
In this study, antitumor activity of 15 test drugs was evaluated. Classification of these cytostatic drugs according to the mechanism of action and clinical application by principal components analysis was conducted. The principle of PCA method is to reduce the large amount of data and present Fig. 5 a Two-dimensional scatter plot of the scores presented classification of individual compounds using principal component analysis (PCA) based on experimentally determined lipophilicity parameters, together with lipophilicity obtained by computational methods; Factor 2 relative to Factor 1. b Twodimensional scatter plot of the scores presented classification of individual compounds using Factor Analysis (FA) based on experimentally determined lipophilicity parameters, together with lipophilicity obtained by computational methods; Factor 2 in relative to Factor 1 it in the form of several main factors, explaining the total variance. For this purpose, data on retention parameters, lipophilicity parameters and structural descriptors (describing the chemical structure) were calculated. In next step, chemical structure, anti-tumor activity and other properties were discussed, based on the obtained results from PCA. Figures 3a and 5a showed lipophilicity parameters such as log k w , log k 0 and S-value. After analysis of both, we assumed that log k w and log k 0 values have been correlated with Factor 1. This may indicate that this factor is responsible for drug dissolution as well as better absorption into the tissues. By contrast, Factor 2 is related to S-value. This parameter is responsible for solubility in aqueous or oily solution.
Afterwards, combination of retention parameters together with the values characterizing the lipophilicity log P have been considered, which led to obtain various results. Lack of the rotation indicated that Factor 1 is connected with XLOGP3 and Average log P variables. After application of the Varimax rotation, parameters such as AClogP, miLogP, XLOGP2 and Average log P were assigned to Factor 3. We might assume that the division of these values between two factors may result from antitumor activity. Its strength is primarily based on the impact of Factor 3 and less from other factors. The obtained activity of tested substances strictly depends on the lipophilicity and is likely to be related to the distribution and accumulation in the cells. In pharmacy, Log P is a very important parameter, which can predict rate of drug's distribution in the human body, the dose and toxicity as well.
Considering the juxtaposition of several retention parameters with molecular descriptors, it can be observed that without and with Varimax rotation approach, five statistically significant factors were obtained. Descriptors which described the features of the investigated molecules are replaced by numerical values. In this paper descriptors belonging to the class of constitutional (calculated on the basis of the molecular formula), geometry (describing the size and shape of the molecule), quantum-mechanical (characterized by semi-empirical methods; molecular orbital energies) and electrostatic (which show charge distribution in the molecule) group were used. Geometric configuration of drug's chemical structure may have an important role affecting the properties of the tested anticancer drugs.
In each example, classification of these pharmaceuticals based on their values of charges was proposed. One can postulate that clusters formed allow better rationalization of therapy and contribute to more effective treatment.

Conclusions
The present study confirms that lipophilicity is one of the key parameters characterizing the usefulness of active substances that confirms state of the art in the field of drug design. Moreover, the study indicates that the use of PCA can effectively allow the classification of chemical compounds on the basis of computational data (including lipophilicity) not only due to the spatial structure, but also due to the mechanism of action and potential therapeutic application. Drug distribution defined by two or even five factors (obtained from data set consisting of lipophilicity parameters and molecular descriptors) is sufficient to help create new drugs in full compliance with both, chemical structure and antitumor activity. It was proven that the therapeutic effect of the analyzed compounds is closely related to their lipophilicity and should not preclude the existence of relationship between lipophilicity and transport of the drug and their eventual accumulation in cells. PCA, and FA may be used for drugs' pre-elimination in therapy according to classification of therapeutic use, or mechanism of action. In addition, this method is compared with QSAR analysis which gives favorable results in the design of new drugs and is helpful in exploring strategies for predicting their activity.
Funding None.

Compliance with ethical standards
Conflict of interest All authors declare that they have no conflict of interest.
Ethical approval This article does not contain any studies with human participants or animals performed by any of the authors. Fig. 7 a Two-dimensional scatter plot of the scores presented classification of individual compounds using factor analysis (FA) based on all used descriptors; Factor 2 relative to Factor 1. b Two-dimensional scatter plot of the scores presented classification of individual compounds using factor analysis (FA) based on all used descriptors; Factor 2 in relative to Factor 1 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.