Discovery of potential ALK inhibitors by virtual screening approach

Crizotinib is an anticancer drug used for the treatment of non-small cell lung cancer. Evidences available suggest that there is a development of an acquired resistance against crizotinib action due to the emergence of several mutations in the ALK gene. It is therefore necessary to develop potent anti-cancer drugs for the treatment of crizotinib resistance non-small cell lung cancer types. In the present study, a novel class of lead molecule was identified using virtual screening, molecular docking and molecular dynamic approach. The virtual screening analysis was done using PubChem database by employing crizotinib as query and the data reduction was carried out by using molecular docking techniques. The bioavailability of the lead compounds was examined with the help of Lipinski rule of five. The screened lead molecules were analyzed for toxicity profiles, drug-likeness and other physico-chemical properties of drugs by OSIRIS program. Finally, molecular dynamics simulation was also performed to validate the binding property of the lead compound. Our analysis clearly indicates that CID 11562217, a nitrile containing compound (pyrazole-substituted aminoheteroaryl), could be the potential ALK inhibitor certainly helpful to overcome the drug resistance in non-small cell lung cancer.


Introduction
Lung cancer is the prominent cause of cancer deaths in the world and a global issue to be addressed (Siegel et al. 2012). Lung cancer is broadly classified into two main types based upon their histology, which are non-small cell lung cancer (NSCLC) and small cell lung cancer (SCLC). The most common forms of NSCLC are adenocarcinoma (ADC) and squamous cell carcinoma (SCC) (Skarda et al. 2008). Chromosomal rearrangements in the anaplastic lymphoma kinase (ALK) gene that codes for anaplastic lymphoma kinase has been identified as one of the causes of NSCLC. There are two types of tyrosine kinase, receptor and cytoplasmic tyrosine kinase. The ALK is a cytoplasmic tyrosine kinase where crizotinib (a potential anticancer drug used in the treatment of NSCLC) is bound. Chromosomal rearrangements involving the ALK gene occur in different malignant conditions, including anaplastic large cell lymphoma (ALCL) and inflammatory myofibroblastic tumor (IMT) (Chiarle et al. 2008). These rearrangements lead to the expression of ALK fusion genes. ALK fusion gene possesses different properties from the two genes that it was originally derived from, can then code for the new ALK fusion protein, which is abnormally and constitutively activated. The new protein contains the tyrosine kinase domain of ALK and the coiled coil domain of EML4. The coiled coil domain of EML4 allows this protein to bind with other ALK fusion proteins and form dimerised and activated fusion proteins (Katayama et al. 2012). The most prevalent ALK fusion oncogene in NSCLC is the echinoderm microtubule-associated proteinlike 4 (EML4)-ALK fusion gene and is present in 4-5 % of cases of NSCLC (Young et al. 2010). An inversion in the chromosome 2 brings together the 5 0 end of the EML4 gene and the 3 0 end of the ALK gene resulting in the formation of the EML4-ALK fusion gene (Shaw and Solomon 2011). The affected person tend to have typical clinical features like early age of onset, little or absence of any smoking history (Shaw et al. 2009). Some of the drugs commonly used for the chemotherapeutic treatment of lung cancer are Bevacizumab, Carboplatin, Cisplatin, Crizotinib, Docetaxel, Erlotinib, Etoposide, Gemcitabine, Irinotecan, Paclitaxel, Pemetrexed, and Vinorelbine. Targeted drug therapy is used against NSCLC of which tyrosine kinase inhibitors are amongst the best method in treatment methodology. In particular, crizotinib is one such tyrosine kinase inhibitor which is the first drug to have gained FDA approval for the treatment of NSCLC in 2011 (Ou, 2011). Although crizotinib has proved itself as an efficient counter to ALK type NSCLC, acquired resistance has made its beneficial effects temporary and has emerged as a major roadblock for crizotinib. The literature evidences available indicates that L1196M (the ''gatekeeper'' mutation) and G1269A are the two most commonly found secondary mutations in the ALK kinase domain. In a few cases, patient harbored with both mutation (Kim et al. 2013). Of note, the available evidence indicates that ALK double mutation (L1196M, G1269A) is one of the main causes for crizotinib resistance (Doebele et al. 2012;Molina et al. 2008). The prevalence of ALK double mutation (L1196M, G1269A) is also significantly higher than other mutation. These situations urge the development of new and more effective ALK inhibitors especially for the treatment of drug resistance NSCLC. For years, computational techniques in particular virtual screening (VS) have proven to be of great use to make the drug development process faster and less expensive. The available literature evidences also suggested that VS techniques proved to be efficacious in making qualitative predictions that discriminated active from inactive compounds (Oprea 2000;Chen 2008). Therefore, in the present investigation, we have employed VS technique to address the crizotinib resistance in NSCLC. We hope that this approach certainly helpful for the experimental biologist to figure out the potent candidates for NSCLC.

Data set
The three-dimensional (3D) structure of native and mutant (L1196M, G1269A) ALK structures were retrieved from the crystal structures of the Brookhaven Protein Data Bank (PDB) for the analysis (Berman et al. 2000). The corresponding PDB codes were 2XP2 and 4ANS for the native and mutant structures, respectively (Cui et al. 2011).
Crizotinib was used as the small molecule for our study. The SMILES strings of the crizotinib and the lead molecules were collected from PubChem (Feldman et al. 2006) and submitted to CORINA for constructing the 3D structure of molecule (Gasteiger et al. 1990). The 3D structure of target proteins (2XP2 and 4ANS) drug molecule and lead compounds was energy-minimized using GROMACS package 4.5.3 adopting the GROMOS43a1 force field parameters before performing the computational analysis (Hess et al. 2008;Spoel et al. 2005).

Virtual screening
Virtual Screening (Shoichet 2004) is an important technique in computer-assisted drug discovery for screening of potential molecule from the database. This approach becomes popular in the pharmaceutical research for lead identification. Diminution of the massive virtual chemical space of small organic molecules and to screen against a specific target protein is the basic goal of the virtual screening (Tondi et al. 1999). In the present study, virtual screening technique performed with the help of PubChem database by employing crizotinib as a query (Bolton et al. 2008). It is worth stressing that PubChem database holds over 27 million records of unique chemical structures of compounds (CID) derived from nearly 70 million substance depositions (SID). The publicly available PubChem database provides great opportunities for scientists to perform VS process (Xie 2010). Several hits were obtained from the PubChem database, which were further analyzed using molecular docking studies.

ADME and toxicity
The bioavailability of the lead compounds was examined with the help of Lipinski's rule of five (Lipinski et al. 1997). The molecular properties such as logP (partition coefficient), molecular weight (MW), or counts of hydrogen bond acceptors and donors in a molecule were utilized in formulating ''rule of five'' (Ertl et al. 2000). The rule states that most molecules with good membrane permeability should have molecular weight B500, calculated octanol-water partition coefficient, log P B 5, hydrogen bond donors B5, acceptors B10 and van der Waals bumps polar surface area (PSA) \120 Å 2 (Muegge 2003). In the present study, all the molecular properties for all the lead compounds were estimated by using Molinspiration program (http://www.molinspiration.com/cgi-bin/properties) (Buntrock 2002). Toxicity is the second important parameter need to be considered in the analysis of lead compounds. Infact, toxicity will account the failure of majority of the lead cases. In the present study, toxicity of the lead compound examined with the help of OSIRIS program (http://www.organic-chemistry.org/prog/peo/). The program was also helpful to evaluate the drug likeliness and drug score of the lead compounds. Nearly 5300 distinct substructure fragments created by 3300 traded drugs as well as 15,000 commercially available chemicals yielding a complete list of all available fragments with the associated drug likeliness. The drug score consolidates druglikeliness, cLogP, logS, molecular weight, and toxicity risks. It is a total value which may be used to judge the compound's overall potential to qualify for a drug.

Molecular docking
The docking study is immensely important to understand the bioactivity of the screened lead compounds. Initially, SMILES strings were used for constructing three dimensional structures of all the lead compounds. Subsequently, docking algorithm was performed with the help of Patch dock server (Schneidman et al. 2005). It is a molecular docking algorithm based on geometry. The energy minimized PDB coordinate file corresponds to the protein and the ligand molecule is the input parameters for the docking. This algorithm has three major stages (1) molecular shape representation (2) surface patch matching and (3) filtering and scoring. The Patch Dock services were available at http:// bioinfo3d.cs.tau.ac.il/PatchDock/. The docked complexes were ranked based on the geometric matching score with target proteins. The geometric matching score of crizotinib with target proteins (native and mutant structures) were used as reference for filtering the lead compounds.

Molecular dynamics simulation
GROMACS Package 4.5.3 implemented with Gromos 43a1 force field was utilized to perform molecular dynamics (MD) of docked complexes such as native-type ALK-crizotinib complex, mutant-type ALK-crizotinib complex, native-type ALK-CID11562217 complex and mutant-type ALK-CID11562217 complex (Hess et al. 2008;Spoel et al. 2005). The protein was solvated in cubic 0.9 nm with the help of periodic boundary conditions and the SPC water model (Meagher and Carlson 2005).This resulted in the addition of 22,269 and 23,506 water molecules to the native and mutant complex structures, respectively. PRODRG server was used to generate topology of the ligand (Schuttelkopf and Van Aalten 2004). This server uses the GROMOS force field for generating topology file and assigning atom types. Six sodium (6 Na? ions) counter ions were added to neutralize the total charge of the system and one thousand steps of steepest descent energy minimization were carried out for the proteins. After the energy minimization step, the system was equilibrated at constant temperature and pressure. Using an atom-based cutoff of 8 Å , the non bonded list was generated. Constrains bond lengths at their equilibrium values were handled by SHAKE algorithm and the long range electrostatic interactions were handled by particle-mesh Ewald algorithm (Darden et al. 1999;Van Gunsteren and Berendsen 1977). The total simulation time was set to 20,000 ps with integration time step of 2 fs. Structural analysis was done at every picosecond and trajectories were stored in traj.trr file. For instance, root mean square deviation (RMSD) was analyzed with the help of Gromacs utilities g_rms.

Results and discussion
Virtual screening and bioavailability analysis The present study initiated by extracting structurally similar compounds to crizotinib from the Pubchem database. The crizotinib was used was used as a query molecule. About 99 % similarity cutoff was maintained in the analysis. The results yield a total of 63 compounds. These compounds were utilized for our further study. Molinspiration program was used to predict the bioavailability of crizotinib and the lead compounds. Initially, crizotinib properties were calculated with the help of Molinspiration program (Fig. 1) and used as a control for screening the other lead compounds. The result is shown in Table 1. It is clear from the table that 3 compound such as CID: 11656144, CID: 11502981 and CID: 58659185 showed violations for the rule of five. The remaining 60 compounds have zero violations for the rule of five. This brings to the conclusion that bioavailability of these 60 compounds was significantly better in our dataset.
It is bare that for passing oral bioavailability criteria, number of rotatable bond should be \10 (Oprea 2000). Therefore, we have made the further refinement of these hits by restricting the number of rotatable bonds to 10. The result is presented in Table 2. It is clear from the Table 2 that almost all the 60 compounds screened from the ADME analysis possess reasonable number of rotatable bonds (\10). This result indicates that these compounds may have the potential to become a lead compound. However, toxicity is also one of the important issue could be addressed for all the lead compounds before its selection.

Toxicity analysis
The primary objective behind the failure of the majority of compounds in drug discovery process is the issues related to pharmacokinetics and toxicity. In the present investigation, these issues were addressed with the help of OSIRIS property explorer program. The pharmacokinetic property of a lead compound can be investigated by utilizing the parameters such as clogP and logS. The result is shown in Table 3. clogP is an entrenched measure of the compound's hydrophilicity. The high log P values may cause poor retention because of the compound's low hydrophilicity. It has been demonstrated that for compounds to have a reasonable probability of being well absorbed, their log P value must not be greater than 5.0. It is clear from the table that log P values of all the 60 compounds found to be in the acceptable criteria.
Drug solubility normally affects the absorption and distribution characteristics of a compound. Infact, insufficient solubility of drug can lead to poor absorption (Lipinski et al. 1997). Our evaluated log S worth is a unit stripped logarithm (base 10) of a compound's dissolvability measured in mol/liter. There are more than 80 % of the drugs available in the market have an (expected) log S value greater than -4. It is clear from the Table 3 that the solubility of the 60 lead compounds was found in the comparable zone with that of standard drugs to fulfill the requirements of solubility and this could be regarded as a candidate drug for oral absorption.

Drug likeness
The drug likeliness is imperative parameter because drug like molecules exhibit favorable absorption, distribution, metabolism, excretion, toxicological (ADMET) parameters (Tetko 2005). In this study, Osiris program was utilized to calculate the drug-likeness of crizotinib and other virtually screened compounds (Sander 2001). It is worth stressing that the drug likeness value of 60 lead compounds was found to be in acceptable criteria.

Drug score and toxicity
The information assessed in Table 3 shows that the 57 lead compounds should be non-mutagenic and non-tumorigenic impacts when run through the mutagenicity assessment system comparable with standard drugs used. The compounds such as CID: 11662380, CID: 58659189, CID: 58659191, and CID: 58659192 failed to pass through the Osiris program and showed mutagenic and tumorigenic effects. We have also analyzed the overall drug score (DS) for all the lead compounds and compared with that of crizotinib. The score consolidates drug-likeness, miLogP, logS, molecular weight, and toxicity risks. The DS score could also be an important parameter to judge the compound's potential to meet all requirements to qualify for a drug. The result is demonstrated in Table 3. The reported lead compounds demonstrated moderate to good DS as compared with standard drug crizotinib. In our dataset, 17 lead compounds showed similar drug score as that of crizotinib. About five compounds such as CID: 11690598, CID: 68563708, CID: 58659130, CID: 11676204 and CID: 11575401 showed a drug score of 0.6 and above. Therefore, further examination was carried out with 57 compounds.

Molecular docking
Molecular docking program was employed to find out the binding affinity of lead compounds with the target protein.
Docking analysis was performed twice to eliminate the false positive. The docking results are shown in Table 4. The docking score of native-type ALK-crizotinib complex was found to be 5312 and for the mutant-type ALKcrizotinib complex was found to be 4602. The lesser docking score of mutant complex clearly indicates that double mutation (L1196M and G1269A) significantly affects the binding of crizotinib with the ALK structures. It is believed that a potential lead compound is the one should have higher docking scoring than the existing drug molecule, crizotinib. Therefore, we have examined docking score for all the 57 hits both with the native type and with mutant type ALK systems. 16 hits showed higher docking score only with mutant type ALK than native type ALK and 17 more hits from our dataset showed similar dock score to that of crizotinib. Most importantly, 10 hits from our dataset showed higher score both in the native type as well as with mutant type. For instance, CID 11562217 molecule showed the highest docking score among the 10 hits in our data set. The docking score of native-type ALK-CID 11562217 complex was found to be 5662 and for the mutant-type ALK-CID 11562217 complex was found to be 5908. This result indicates that CID 11562217 has a better binding affinity not only with the native type but also with mutant ALK as compared to the crizotinib. It is also to be noted that the pharmacokinetic and pharmacodynamic investigation of CID 11562217 indicated better results than the other lead compounds explored in our study (Fig. 2). The two dimensional structure of crizotinib was compared with CID 11562217 to get the structural attributes and the result is demonstrated in Fig. 3. It demonstrates that CID11562217 is a nitrile enhanced crizotinib. It is worth stressing that nitrile compounds with cyanide functional group could possess potential anti-tumor effects (US Patent 20060128724). The literature evidence also highlights that our lead molecule has kinase inhibiting effects. Further, the cyano-containing analogues were able to produce DNA-DNA cross-linking. The reduced DNA cross-linking was paralleled by a similar reduction in cytotoxicity indicating a relationship between cross-linking and anti-tumor effect (Jesson et al. 1987). Therefore, further validation of CID 11562217 compound was done with the help of molecular dynamics simulation study.

Molecular dynamics simulation
Molecular dynamics simulation study was carried out with the help of GROMACS package 4.5.3 to explore the stability of the complex structures. In particular, the parameter, RMSD, was examined from the trajectory file and used for analyzing the complex stability. RMSD investigation can give a thought of how much the three-dimensional structure has deviated over the time. The result is shown in Fig. 4. Native type ALK-crizotinib complex structure acquired *0.34 nm at 1000 ps during the simulations, while mutant type ALK-crizotinib complex structure acquired *0.28 nm of backbone RMSD at 1000 ps. On the other hand, native- Bold indicates ADME screened compounds based on Lipinsiki rule of 5 type ALK-CID11562217 structure acquired *0.18 nm of backbone RMSD while mutant-type ALK-CID11562217 complex structure acquired *0.22 nm of backbone RMSD at 1000 ps. Between a period of 2000-5000 ps, native type ALK-crizotinib complex structure maintains a RMSD value of *0.30 nm whereas mutant type ALK-crizotinib complex structure showed a deviation from *0.25 to *0.36 nm. In the virtual complex, native-type ALK-CID11562217 structure showed a RMSD value between *0.18 and *0.20 nm and mutant type ALK-CID11562217 complex structure maintains a RMSD value of *0.24 nm. From the period of 5000-10,000 ps, native-type ALK-crizotinib complex structure maintains a RMSD value of *0.34 nm while, mutant type ALK-crizotinib complex has deviated from *0.32 to *0.36 nm. On the contrary, native-type ALK-CID11562217 complex structure maintains a RMSD value of *0.25 nm while mutant type ALK-CID11562217 complex structure maintains a RMSD value of *0.20 to *0.24 nm. From the beginning of 11,000 ps to the end of 15,000 ps, mutant type ALK-crizotinib complex structure showed higher deviation and attains a RMSD value of *0.44 nm while native-type ALK-crizotinib complex structure maintains a RMSD value of *0.23 nm. Mutant type ALK-CID11562217 complex structure maintains a RMSD value of *0.25 nm in this simulation period. Between a period of 16,000-19,000 ps, native type ALKcrizotinib complex structure maintains a RMSD value of *0.35 nm whereas mutant type ALK-crizotinib complex structure showed a deviation from *0.43 to *0.45 nm. For instance, native-type ALK-CID11562217 structure showed a RMSD value of *0.25 nm and mutant type ALK-CID11562217 complex structure maintains a RMSD value of *0.22 nm. At the end of 20,000 ps the mutant type ALKcrizotinib complex structure attained RMSD of *0.40 nm    and native type ALK-crizotinib complex structure attained RMSD of *0.35 nm. This clearly indicates that ALK double mutation disturb the structural stability and also its function. It is worth stressing that native and mutant type ALK-CID 11562217 able to maintain a RMSD of *0.24 nm. Overall, significant difference in RMSD value observed between the crizotinib and CID 11562217 complex system. The lesser RMSD value of CID 11562217 complex demonstrates the stable binding of CID 11562217 with both native and mutant type ALK structures.

Conclusion
In the present investigation, we have addressed the crizotinib resistance in NSCLC with the help of virtual screening approach. CID 11562217 was discovered to be Root mean square deviations correspond to native-type ALKcrizotinib complex (black), mutant-type ALK-crizotinib complex (red), native-type ALK-CID11562217 complex (green) and mutanttype ALK-CID11562217 complex (blue) along the MD simulation at 300 K more drug like as it productively passed through the parameters of pharmacokinetics and toxicity. Docking study demonstrated that CID 11562217 has the highest binding affinity not only with native type ALK but also with the mutant type ALK system among the lead compounds screened from the Pubchem database. RMSD data obtained from molecular dynamic simulation revealed structural stability of the ALK-CID11562217 complex structure. It is worth stressing that our results correlate well with available experimental evidences. Of note, the available data suggests that pyrazole-substituted aminoheteroaryl compounds have potential anti-tumor effects. We hope that the findings reported here might give helpful signs to design powerful drugs against drug resistant lung cancer types.