Background

The finding and development of antibiotics was a milestone in medical sciences that prevented fatality from simple infections. Unfortunately, the emergence of antibiotic-resistant strains among these pathogens appears to be inevitable as selective pressure for survival [1]. The most alarming is the prevalence of resistance even in the last resort antibiotic colistin [2] that has added a serious challenge to the current antibiotic crisis.

The cost of developing a prescription drug estimated by The Tufts Center for the Study of Drug Development as published in the Journal of Health Economics in March 2016, is a massive 2.558 billion dollars [3]. Huge research costs and numerous failures in various stages of drug development have lowered the interests of commercial pharmaceutical companies in drug discovery research. The rapid increase in drug resistance among pathogens and the excessive time and cost parameters required to develop a drug demand a robust and faster method of drug discovery. This is where computational strategies come into play, efficiently assisting drug discovery and development with the available in vitro techniques [4].

Computer-aided drug design (CADD) approaches were applied in this study to find the probable drug targets and discover potential lead candidates against these. SAM is a critical metabolite involved in several biochemical reactions in bacteria. MetK, a SAM producer and various SAM utilizers including DNA adenosine methylase (Dam), Uroporphyrinogen-III methyltransferase (CobA), and tRNA (guanine-N (1)-)-methyltransferase (TrmD) were taken as drug targets in this study. Dam is responsible for DNA replication and mRNA transcription which methylates adenine in DNA of bacteria in contrary to human cytosine. TrmD is responsible for proper reading of codons that prevents + 1 frameshift reading error thus involved in proper peptide elongation. CobA is responsible for corrin ring contraction in vitamin B12 synthesis, an important cofactor for membrane biosynthesis. Thus, all the genes/proteins involved from DNA replication to peptide elongation, and even membrane biosynthesis were targeted to discover new lead candidates, simultaneously preventing easy resistance buildup in these targets.

Methods

Selection of MDR strains and obtaining their genomic sequences

Nine prioritized pathogens by WHO [5] as ‘critical’ and ‘high’ against whom new antimicrobials are sought were taken as the reference organisms. The whole-genome sequences of these organisms published in NCBI were taken for whole-genome alignment. The genomic sequences of the 9 selected pathogens were downloaded from NCBI FTP site in the annotated gbk format.

ftp://ftp.ncbi.nlm.nih.gov/genomes/archive/old_genbank/Bacteria/

Multiple sequence alignment (MSA)

MSA was performed using the progressive Mauve algorithm in MAUVE, a multiple sequence alignment software. The genomic regions common to all the aligned sequences were searched for, in MAUVE via visual observation of locally collinear blocks (LCBs) denoted by certain color codes. LCBs represent homologous regions of sequence shared by two or more of the genomes under study without rearrangement [6].

Alignment of amino acid sequences of the lead proteins

Clustal Omega was used to align the amino acid sequences of S-adenosyl methionine synthase (MetK) for all 9 selected pathogens including Acinetobacter baumannii strain 1656–2 [7], Campylobacter coli 15–537,360 [8], Enterococcus faecium Aus0004 [9], Helicobacter pylori 2017 [10], Klebsiella pneumoniae subsp. pneumoniae 1084 [11], Neisseria gonorrhoeae FA 1090 (https://www.ncbi.nlm.nih.gov/nuccore/AE004969) P. aeruginosa B136–33 (https://www.ncbi.nlm.nih.gov/nuccore/NC_020912.1) Staphylococcus aureus 04–02981 (https://www.ncbi.nlm.nih.gov/nuccore/NC_017340.1) and Salmonella typhimurium [12].

Gene essentiality analysis

The common genes obtained from MAUVE alignment were looked for their essentiality in DEG and OGEE, databases of essential genes.

Obtaining the dockable crystal structures of the target proteins

The X-ray diffraction structures of S-adenosyl methionine synthase, MetK from N. gonorrhoeae (PDB id: 5T8S) [13]; cobA from P. aeruginosa (PDB id: 2YBQ) [14] and that of TrmD from P. aeruginosa (PDB id: 5WYQ) [15] were obtained from Protein Data Bank. For those whose crystal structures were not available in RCSB PDB, homology modeling tools including Phyre2, RaptorX, ps2v2, Swiss-model, and CPHmodel were used to predict their tertiary structures and the best structures were selected based upon the completeness and Z-scores of the predicted structures using Prosa-server.

Preparation of ligand database

In the present work, both ligand-based (LBVS) and structure-based virtual screening (SBVS) was performed. LBVS was done because similar compounds exhibit similar Physico-chemical and biological properties so a broad chemical database with structural diversity would offer an ideal solution for effective lead discovery. In this study, a ligand database containing 715 indole derivatives including marine indoles [16] was prepared from ZINC database [17].

Protein and ligand preparation

SBVS was performed based on the common gene in all nine pathogens, MetK, and the metabolite that it produces, SAM which is further utilized in methylation reactions. Prior to molecular docking, the proteins and ligands were prepared for efficient and more accurate docking results. Protein preparation was done by deleting water, adding hydrogen atoms, merging non-polar bonds, and computing Gasteiger charges in AutoDockTools (http://mgltools.scripps.edu/). Similarly, ligand preparation was done in Openbabel GUI [18] available in PyRx interface by adding hydrogens, energy minimization and converted to pdbqt file format, a useable file format for docking afterward.

Setting reference values for docking

The native ligands were removed from each of the target proteins in Discovery Studio Visualizer 2017 and docked back in their binding sites, a process called re-docking. The binding energy thus calculated was taken as a reference value for identifying potential leads based on their binding energy in the respective binding pockets of the target proteins.

Binding sites prediction

The ligand-binding sites in the target proteins were identified from the RCSB protein data bank. For the homology modeled 3D-structures, 3DLigandSite, a web server that superimposes the ligands bound to the structures similar to the query and thus predicts the binding site [19], was used to predict the ligand-binding sites. All the amino acid residues around the binding site (Supplementary Table 2) are marked to create a site for molecular docking.

Molecular docking, rescoring and clustering analysis of docked poses

Docking was carried out using AutoDockVina in a virtual screening software, PyRx against the target proteins with the selected ligand database. The conformation with the lowest docked energy was chosen after the docking interactions since, the higher the negative binding energy value, the stronger is the binding of the ligand in the target [20].

The rescoring of docked poses was done by using the python implementation of NNScore 1.0 [21] in combination with a consensus of the top 24 scoring networks.

AuPosSOM (Automatic analysis of Poses using SOM) [22] was used for the clustering analysis and to differentiate active compounds from inactive ones. AuPosSOM default parameters were used. The tree was visualized using PhyloWidget [23].

Protein-ligand interaction visualization

The 2D and 3D protein-ligand interaction for the lead compound was observed and analyzed using Discovery Studio Visualizer 2017 and ligplot+.

ADME/Tox screening

The toxic profiles and drug-likeness based on the binding energies were predicted using the OSIRIS program [24]. OSIRIS calculates various drug relevant properties like molecular weight, cLogP, cLogS, Druglikeness, and toxicities like mutagenicity, tumorigenicity, reproductive effects and irritant effects in the lead molecules based on functional groups present in their structures [25].

Results

Drug target identification

The MAUVE result showed MetK as one of the probable therapeutic targets and was taken as a reference on our search for other therapeutic targets (Fig. 1).

Fig. 1
figure 1

Multiple sequence alignment in MAUVE, showing MetK as a common gene in all 9 pathogen’s genomes aligned (data for remaining 5 not shown)

Manual curing of the alignment revealed twenty-four common genes found in most of the strains with diverse roles. Most of the genes (Supplementary Table 1) were ribosomal proteins (14 proteins) and some were involved in ATP synthesis (3 proteins), DNA directed RNA polymerase (2 proteins), chaperones (2 proteins), elongation factor (1 protein), protein translocator (1 protein), involved in thiol assimilation (1 protein) which were not pursued further due to lack of required computational resources to work on these.

Amino acid alignment

The active binding sites in MetK were found same (conserved) in all the pathogens under study upon amino acid sequence alignment using clustal Omega (Fig. 2) making this a common target in all these pathogens. Further analysis of gene essentiality for pathogen survival could reveal whether this is the lead target protein.

Fig. 2
figure 2

Clustal Omega alignment of MetK protein from eight pathogens

Gene essentiality

Search in DEG and OGEE for gene essentiality of the genes under study in the target organisms revealed metK as essential in H. pylori, P. aeruginosa, S. aureus and S. typhi. Similarly, dam was found to be essential in S. enterica subsp. enterica serovar Typhimurium; cobA was reported as non-essential in P. aeruginosa; and trmD was essential in P. aeruginosa, S. aureus, S. typhi whereas non-essential in H. pylori and Acinetobacter sp. However, these databases do not make use of the interrelation between the genes to record gene essentiality. So the genes mentioned as non-essential here could still be essential when the activity of one is inhibited. Since our works were concerned with multiple targets simultaneously and all these genes under study are SAM utilizers, they were taken for further study despite being non-essential in some instances.

Ligand database preparation

The close structural proximity of the indole ring to the adenosyl moiety of SAM (Fig. 3) pushes indole derivatives to be probable candidates against SAM binding pocket of MetK. Thus, indole derivatives were presumed as the potential ligand sources for virtual screening. A total of 715 indole derivatives were taken from ZINC database among which only 102 showed the drug-like properties based on ADMET parameters (Fig. 4) and used for molecular docking studies.

Fig. 3
figure 3

Chemical structures of SAM and Indole

Fig. 4
figure 4

Drug-likeness parameters used for initial screening of ligands

Molecular docking results

One hundred two ligands that passed ADME/T tests were subjected to molecular docking against the MetK protein of N. gonorrhoeae (PDB id: 5T8S). Fifty three exhibited higher binding energy in the SAM binding pocket of MetK than its native ligand SAM (Supplementary Table 3). Top 3 ligands ZINC04899565 (B.E = − 11.0 Kcal/mol), ZINC06096559 (B.E = − 10.9 Kcal/mol) and ZINC19909549 (B.E = − 10.9 Kcal/mol) with higher B. E than that of SAM (B.E = − 8.7 Kcal/mol) were further investigated to look for their binding modes into the catalytic site of MetK. The common aminoacid residues involved for these three top ligands were Lys274, Gly120, Ile103, Ile307, Phe235, Lys168, Ser191, Asp121, and Asp243 which could have contributed in stronger binding affinities in the binding pocket of MetK (Fig. 5).

Fig. 5
figure 5

2D-interactions of a) ZINC04899565, b) ZINC06096559, and c) ZINC19909549 with amino acid residues at binding pocket of MetK of N. gonorrhoeae

NNScore, a neural network based scoring function was then used to re-rank the small-molecule ligands which resulted in 12 positive hits (good binders) as potential inhibitors of MetK in N. gonorrhoeae (Supplementary Table 3). ZINC04899565 was still among the top binders.

We further used a contact activity relationship (CAR) analysis to overcome the limitations of the scoring functions used for docking. Aupossom analyses all the docking poses in multiple conformations given by the docking algorithm and discriminates active and inactive compounds using only mean protein contacts’ footprints calculation. The 12 ligands along with SAM were clustered into 10 different groups with varied scores. The score is determined by the combination between contact specificity and contact intensity of the ligands with various atoms of the protein molecule. The plot (Fig. 6) shows the ligands in leaves 0, 3, 4 and 5 as the most active ones. ZINC01494627 from cluster 0, ZINC14824027 and ZINC49171024 from cluster 3, ZINC04899565 from cluster 4 and ZINC15219763 from cluster 5 can thus be concluded as the potential MetK inhibitors of N. gonorrhoeae.

Fig. 6
figure 6

The scoring plot in AuPosSom

CAR results showed that these 12 potential leads were distributed in 10 different clusters (0 to 9) with different protein-contact footprints represented in the clustering tree (Fig. 7). The different clusters indicate the differences in the interacting residues with the protein. Cluster 0 contains 1 compound along with SAM (native ligand) which interacts predominantly with A41, E56, Z99, K274, D166 and I237 residues. Cluster 5 contains 1 compound that interacts predominantly with P16, D166, I237 and G242 residues and so on. The similar binding residues of the ligands with that of SAM could have contributed to their inhibition potential of MetK. CAR analysis can thus be used to cluster the compounds as per the binding residues of the protein.

Fig. 7
figure 7

Tree representation of contact footprints clustering of 12 potential ligands and SAM for MetK. The clusters are numbered 0 to 9, each leaf representing a particular ligand. The contact footprints for each cluster are represented by a letter indicating an aminoacid followed by the position in the protein chain. Example, A41 refers to Alanine in the position 41. E, Z, S, K, P, D, I and G indicates Glutamic acid, Glutamine, Serine, Lysine, Proline, Aspartic acid, Isoleucine, and Glycine respectively

Those 5 screened ligands on further docking against the SAM binding pocket of MetK of all other pathogens under study resulted in 3 potential leads (Table 1). These 3 lead molecules were further docked against all other protein targets CobA, Dam and TrmD (Tables 2, 3, 4) to assess their inhibition potential in multiple targets resulting 2 potential lead candidates (Table 5).

Table 1 Indole derivatives with MetK inhibition potential in all pathogens under study
Table 2 Indole derivatives with Dam inhibition potential in all pathogens under study
Table 3 Indole derivatives with CobA inhibition potential in all pathogens under study
Table 4 Indole derivatives with TrmD inhibition potential in all pathogens under study
Table 5 Molecular properties of the lead candidates obtained from OSIRIS property explorer

Cross-reactivity with human homologs

Questions on cross-reactivity with human homologs of S-adenosylmethionine synthase (Uniprot id: Q00266) could be raised as it has more than 50% structural similarity with that of bacteria (Table 6). Unfortunately, both leads were potential inhibitors of its human homolog as well (Supplementary Table 4).

Table 6 Alignment of metK of Homo sapiens and pathogens

Nevertheless, MetK inhibitors of humans could still be used as antibacterial therapeutics because of the presence of SAM transporters in humans [26] and the SAM requirements can be replenished from external sources. The lack of crystal structures of SAM transporters in humans constrained the molecular docking studies of possible inhibition of the transport system.

Human homologs for other target proteins were not considered because of their dissimilarity with humans.

Discussion

Drug target identification

Lead molecules with multiple target proteins in a single pathogen and with a common target in multiple pathogens are highly sought [27] to develop a broad-spectrum drug. This strategy has been designed for preventing easy resistance development against these new drugs. Developing resistance in multiple targets at once could be evolutionarily challenging for any pathogens and probably impossible for the bacteria to survive against such developed drugs. Hence, genome-level sequence alignment of major pathogens could give common new lead target proteins for the screening of lead inhibitor molecules of these proteins, MetK being the common target in this study (Fig. 1). From this, the respective metabolic pathway or other target proteins could be identified based on protein-protein interactions.

MetK codes the formation of SAM from ATP and methionine as substrates. SAM is utilized by three major metabolic pathways: transmethylation, transsulfuration, and polyamine synthesis making SAM an important molecule in normal cell functioning and survival [28]. In addition, SAM is a primary methyl donor in multiple reactions including corrin ring methylation [29], RNA methylation [30], and DNA methylation [31] thus, these steps were also taken as lead targets.

Quorum sensing is one of the major causes of resistance in pathogens that utilize autoinducers which inturn utilize SAM as a substrate [32]. Also, it controls biofilm formation and virulence in bacteria [33]. The literature search further verified the essentiality of MetK in many different pathogens [34, 35]. Also, the lack of reports about SAM transporters in any of the mentioned target pathogens makes this a better target. So, including MetK, other SAM utilizing proteins, namely CobA, Dam, and TrmD were taken as potential drug targets against which virtual screening of compounds was done.

Indoles as potential drugs

The adenosyl moiety of SAM and ATP binding domain present in kinases [36] probably suggests kinase inhibitors as potential inhibitors of these proteins that biosynthesize or utilize SAM. Protein kinase inhibitors represent an important class of targeted therapeutic agents, particularly as anticancer drugs [37]. Several indoles have been reported to possess kinase inhibition potentials [38,39,40,41]. Also, Indole is reported to be metabolized by human cytochrome P450 2A6 [42] and the source of indole could be from tryptophan metabolism by gut microflora. This indicates that indole could be easily transported in humans through gut suggesting indoles are metabolized in humans thus indicating these could not pose toxicity [43]. In addition, indole has been suggested to be pharmaceutical scaffolds for drug development [44]. Also, its metabolized derivative Indirubin has been reported to be kinase inhibitor [45].

ADMET screening

The primary reason for lead molecules not being able to pass the clinical trials is their inability to reach the target and perform its predicted function, and also the toxicity issues [46]. Thus, ADMET and pharmacokinetic properties evaluation in the early stages of drug discovery seem to be a wiser choice. The toxic profiles and drug-likeness were predicted by OSIRIS using various parameters. Various physicochemical and drug relevant properties such as Molecular weight, cLogP, cLogS, number of hydrogen bond donors and acceptors, topological polar surface area, number of rotatable bonds and drug-likeness were analyzed for each of the lead molecules. The parameters (Fig. 4) were set based on Lipinski’s rule of five which predicts whether a chemical compound has chemical and physical properties that would make it likely to be an orally active drug in humans. The rule states that most orally active drugs will have molecular weight ≤ 500, logP ≤5, hydrogen bond donors ≤5, and hydrogen bond acceptors ≤10. Also, the aqueous solubility of a compound measured as logS significantly affects its absorption and distribution characteristics. Low solubility usually goes along with a bad absorption so poorly soluble compounds should be avoided. The number of rotatable bonds determines the flexibility of compounds and can predict oral bioavailability. It has been reported that 10 or fewer rotatable bonds in a molecule indicated good oral bioavailability [47]. Polar surface area (PSA) of a molecule can predict membrane permeability including crossing of the blood-brain barrier. Most known drugs have PSA values less than 120 Å [48].

Lead candidates

ADMET evaluation of these two leads (Fig. 8) was done to access their possibility to be drug candidates. Both met the parameters for drug-likeness including Lipinski’s rule of five (Table 5).

Fig. 8
figure 8

Potential lead candidates with potential broad-spectrum activities against the target pathogens under study a) ZINC04899565 b) ZINC49171024

Apart from indole, ZINC04899565 has a benzene and a 2,5-diketopiperazine ring. Out of all the naturally occurring peptide antibiotics, the 2,5-diketopiperazine rings are among the most numerous. Cycloserine diketopiperazine active against Mycobacterium tuberculosis, bicyclomycin active against gram-negative bacteria, avrainvillamide which contains a 3-alkylidene-3H-indole-1-oxide function active even against multi drug-resistant bacteria are some instances [49]. Moreover, 2,5-diketopiperazines have also been reported to inhibit quorum sensing in certain gram-negative bacteria thus blocking cell-to-cell communication and restraining the virulence as well [50]. Structurally, the 2,5-diketopiperazines are the smallest possible cyclic peptides, which are peptidoimmetic in nature resembling a constrained protein beta turn. They have two cis-amide bonds thus possessing 2H-bond acceptors and donors each. Although they contain conformationally constrained heterocyclic scaffolds, they are flexible since the six-membered ring can exist either in a flat or a slight puckered boat conformation. Moreover, these are stable to proteolysis [49]. All these features support them bind to a wide range of enzymes and receptors, and their good bioavailability and resistance to enzymatic degradation make them excellent drug candidates. Thus, ZINC04899565 has the potential to be a broad-spectrum antimicrobial based on this study and could be pursued further.

ZINC49171024 is an indole with a pyrrolidine and a benzimidazole ring. Pyrrolidine moiety containing compounds have been reported as antimicrobials and fungals [51]. The strong lipophilic properties of benzimidazoles contribute in producing antimicrobial effects [52]. All these features indicate the high probability of these two compounds to be developed as broad-spectrum antimicrobials.

Conclusion

CADD approaches were used in this study to discover potential methyltransferase inhibitory activities of indole derivatives. Multiple protein targets were subjected to molecular docking that resulted ZINC04899565 and ZINC49171024 as probable therapeutic drug candidates with multi-target potential and probable antimicrobial resistance managers since these multiple targets would be troublesome for the pathogens to easily evolve resistance as these are really critical in its survival and mutating these targets could be more lethal than survival.