Introduction

Scientific knowledge gained from the previous SARS and MERS outbreaks has hastened the quest for developing novel antiviral drugs against SARS-CoV-2. Non-structural viral proteins (3-chymotrypsin-like protease, papain-like protease, RNA-dependent RNA polymerase, and its helicase), viral structural proteins (S-glycoprotein) and host protein, transmembrane serine protease 2 (TMPRSS2), are the major antiviral targets identified for their druggability. Human coronaviruses can enter the cell via two pathways: the endosomal mediated entry (i.e. by cathepsins) and the cell-surface or an early endosomal pathway mediated by TMPRSS2 [1,2,3]. SARS-CoV-2 is said to use the latter path, where its spike glycoprotein (S) binds to host ACE2 and TMPRSS2 receptors to allow cell entry [4,5,6,7,8]. Uncoating allows genomic RNA to be used as mRNA to translate the replicase polyproteins. Polyprotein 1a (pp1a) and polyprotein 1ab (pp1ab) are produced by the translation of the replicase gene. Autoproteolytic cleavage of pp1a and pp1ab yields 11 non-structural proteins (nsp1–nsp11) and 15 non-structural proteins (nsp1–nsp10 and nsp12–nsp16), respectively. The nsp12 RNA polymerase is an RNA-dependent RNA polymerase (replicase, RdRp). The replicase employs genomic RNA as a template to generate negative-sense genomic RNAs (gRNAs), which are then used to prepare progeny positive-sense RNA genomes [7, 8]. Through discontinuous transcription of the genome, the replicase synthesises a nested set of sub-genomic RNAs (sgRNAs). Later, the sgRNAs are translated into structural and accessory proteins. The endoplasmic reticulum (ER) structural proteins S, M and E are transported to the ER–Golgi intermediate compartment (ERGIC) for virion assembly [9]. To form nucleocapsids, the N proteins bind to progeny genomic RNA. The smooth-walled vesicles transport the assembled virions from the ERGIC to the cell membrane, where the mature virus particles are released [9, 10].

TMPRSS2 is an S1A class of serine proteases like Factor Xa and trypsin that processes S-protein into two functional subunits, N-terminal receptor-binding domain (S1) and a C-terminal membrane fusion domain (S2) at the S1/S2 cleavage site. The S1 domain facilitates ACE2 recognition and initiates a conformational change in the S2 subunit, leading to the insertion of fusion peptides into the host cell membrane to facilitate membrane fusion and delivery of the viral nucleocapsid into the cytoplasm [1, 4, 10,11,12,13,14,15,16,17,18,19,20,21]. The S2 domain contains a fusion peptide (FP), a second proteolytic site (S2′), an internal fusion peptide (IFP) and two heptad-repeat domains (HR1 and HR2) before the transmembrane domain (TM) (Fig. 1). Further studies speculated that both FP and IFP involved in the viral entry process by cleaving S-protein at both S1/S2 and S2′ cleavage sites are essential [18,19,20].

Fig. 1
figure 1

Structure of TMPRSS2. S1 N-terminal receptor-binding domain; S2 C-terminal membrane fusion domain; SP signal peptide; NTD N-terminal domain (NTD), RBD receptor-binding domain; FP fusion peptide; IFP internal fusion peptide; HR1 heptad repeat 1; HR2 heptad repeat 2; TM transmembrane domain. The SP, S1↓S2 and S2′ cleavage sites are indicated by arrows

TMPRSS2 is expressed in the prostate, stomach, colon, salivary glands and gastrointestinal, urogenital and respiratory epithelia in humans [22]. Overexpressed TMPRSS2 was discovered to be controlled by androgen receptor signalling in prostate cancer. It initiates the metastatic cascades by activating the hepatocyte growth factor (HGF). Prostate cancer metastasis is reported to be inhibited by TMPRSS2 inhibitors [23,24,25]. Previous research has established that TMPRSS2 is an activating protease for respiratory influenza virus [26, 27]. In animal models of SARS-CoV and MERS-CoV infection, the role of host TMPRSS2 in spike protein activation was clearly demonstrated. The absence of TMPRSS2 (in TMPRSS2-knockout mice) significantly reduced airway infection and spread [13]. Furthermore, when Tmprss2(− / −) mice are infected with a re-assorted influenza A virus (IAV) H10 subtype hemagglutinin (HA), they exhibited no abnormal clinical signs, lung lesions, viral antigen, or body weight loss when compared to wild-type mice [28]. In another study, TMPRSS2 is an important HA-activating protease of IAV and IBV (influenza B virus) in primary human type II pneumocytes and human bronchial cells [12]. TMPRSS2-positive VeroE6 cells are highly susceptible to SARS-CoV-2 infection, indicating the role of TMPRSS2 in viral entry into the host cell [15]. SARS-CoV-2 receptors, ACE2 and TMPRSS2 have been found to be most abundant in bronchial transient secretory cells [16]. A recent study confirms that SARS-CoV-2 takes advantage of the host ACE2 for entry and the serine protease, TMPRSS2, for S-protein priming [10, 29,30,31]. Camostat, a TMPRSS2 inhibitor, exhibited inhibition of SARS-CoV-2 host cell entry [10, 32]. These findings strongly suggest that TMPRSS2 is a critical protein required for SARS-CoV-2 host cell entry and, thus, represents a treatment option. As the 3D crystal structure of TMPRSS2 is not available, we used the previously reported homology model of TMPRSS2, which was generated using TMPRSS15 (PDB ID. 4DGJ) [18].

Methodology

The structure-based virtual screening

The Schrödinger software suite’s virtual screening workflow 2018–3 version (Maestro 11.7, Schrödinger, LLC, New York, NY, 2020) was used to screen the Zinc database [33] against the active site of TMPRSS2.

Database and ligand preparation

In total, 1,82,651 molecules from the natural products category of Zinc database were downloaded in 2D SDF format. The Ligprep module of the software was used to prepare these two-dimensional structures. In brief, the molecules were desalted, converted from a 2D structure to a low energy 3D structure, tautomeric and ionised (between pH 6.8 and 7.2 using the Epik module), and all possible stereoisomeric states were generated. Using the OPLS 2005 force field, the energies of generated structures were minimised.

Homology model of TMPRSS2

The homology model of TMPRSS2 protein was obtained from the TMPRSS15 crystallographic structure (PDB ID. 4DGJ) (with 41% of similarity on their peptide sequence) [20]. The obtained template is an aligned sequences of all the available S1A proteases followed by the identification of TMPRSS15 as the most suitable for the current study. PDB id 4DGJ was selected to build the homology model using Prime module of Schrodinger. The generated model was further validated by performing the MD simulation of 100 ns.

Protein preparation and receptor grid generation

The Protein Preparation Wizard was used to generate the homology protein of TMPRSS2 from TMPRSS15 (PDB ID. 4DGJ). Bond order, missing atoms, tautomer/ionization states, water orientation and hydrogen bond networking were all examined in the protein. The OPLS 2005 force field was then applied for constrained energy minimization. The receptor grid was created using the previously prepared protein. The centroid of the workspace ligand (Benzamidine; A: BEN 245) was used to define the size and position of the receptor grid box, with a van der Waals scaling factor of 1.0 and a partial charge cut-off of 0.25.

Virtual screening

The Glide Virtual Screening Workflow was used to perform the virtual screening. As input structures, previously prepared ligands were used. Lipinski’s rule and reactive functional group criteria were used to predict and prefilter the ADME properties of these ligands prior to Glide Docking. In three stages, all ligands that passed through these prefilters were docked into the previously prepared receptor grid structure: The molecules were docked flexibly in Glide HTVS (High Throughput Virtual Screening) mode using the default settings in the first stage. In the second stage, 10% of the high-scoring hits from the previous step were docked flexibly in Glide SP (Standard Precision) mode. In the final stage, 10% of the good scoring hits from the previous step were docked flexibly in Glide XP (Extra Precision) mode.

MD simulation

Desmond software (Schrödinger software suite 2018–3 version) was used to run molecular dynamics (MD) simulations for 100 ns on the top two hits (ZINC000095912839 and ZINC000085597504). The system, which included an SPC solvent model, ligand and protein complex, was constructed in an orthorhombic boundary, considering the buffer distance of 10 Aͦ × 10 Aͦ × 10 Aͦ, and the charge was neutralised with counterions. The system was then minimised, and the simulation was performed for 100 ns using an NPT ensemble system at 300.0 K temperature and 1.0 bar pressure. The trajectory was recorded every 100.0 ps, and 1000 frames were captured to calculate the root mean square deviation (RMSD).

Results and discussion

The structure-based virtual screening

The HTVS docking of the prepared database ligands (1,82,651 molecules) to the active site of the homology model of TMPRSS2 resulted in 10,914 hits. The further docking of these hits in the SP mode yielded in 1091 hits. The final docking of these hits in the XP mode produced 110 hits with a docking score range of −8.654 to −6.775 and a glide energy range of −55.714 to −29.065 kcal/mol (Table 1; Fig. 2).

Table 1 Structure-based virtual screening results of natural molecules from the ZINC database against the TMPRSS2 active site (top 10 hits)
Fig. 2
figure 2

The top ten TMPRSS2 hits’ structures, along with their Zinc IDs, docking scores and glide energy

The analysis of the docking complexes of the top 10 hits reveals that the ligands form H-bond, Pi-Pi stacking and salt bridge-type interactions with the active site residues of TMPRSS2 (Table 1; Fig. 3). Both aliphatic and aromatic hydroxyl and amine groups present in these molecular structures formed H-bond interactions with active site residues Val25, His41, Lys87, Asp180, Gln183, Gly184, Ser186, Gly207, Gly209 and Val218 (Table 1; Fig. 4). The Pi-Pi stacking interaction was observed between the benzene moiety of ZINC000000526288 and the active site His41 residue. The salt bridge interaction was observed between the aliphatic amine group of ZINC000000077285 and Asp180 residues of the active site (Table 1; Fig. 3).

Fig. 3
figure 3

Top ten TMPRSS2 hits’ 2D interaction diagrams

Fig. 4
figure 4

RMSD, RMSF and protein–ligand contact diagram of ZINC000095912839 with TMPRSS2

ADME properties

The in silico ADME analysis results of TMPRSS2 hits are given in Table 2. The molecules show properties within the permitted limits of Lipinski rule of 5 and Jorgensen’s rule of 3. The results, therefore, suggest that the hit molecules have acceptable ADME properties.

Table 2 ADME property analysis results of the top ten TMPRSS2 hits

MD simulation

MD simulation of the selected hits was carried out to assess the physical movements of atoms and molecules of the ligand-receptor complex under physiological conditions to gain insights into the protein–ligand interactions. The MD simulation analysis of ligand-receptor complex of ZINC000095912839 and TMPRSS2 shows a stable ligand RMSD of 3.6 Å for the first 60 ns, which later changes to 4.4 Å. In contrast, the protein RMSD was steady at around 1.9 Å for the period of 80 ns and changes to 2.4 Å (Fig. 4). The protein RMSF graph shows that TMPRSS2 residues remain stable during the period of simulation (Fig. 4). Protein–ligand contact analysis shows that molecule ZINC000095912839 interacts with more than 20 active site residues of TMPRSS2. The interaction types observed include H-bonds, hydrophobic, ionic and water bridges. Among these, a prominent H-bond interaction lasting throughout the simulation period was observed with Asp180, Gly209 and Ser181 residues (Fig. 4).

The MD simulation results analysis of ZINC000085597504 and TMPRSS2 show a relatively fluctuating ligand RMSD which seems to be stable at around 8 Å for the first 28 ns, which later fluctuates till 78 ns and becomes stable at 28 Å for the rest of the simulation period. In contrast, the protein RMSD seems to be relatively stable at around 2.1 Å throughout the simulation period (Fig. 5). The protein RMSF graph shows that TMPRSS2 residues remain stable during the period of simulation (Fig. 5). Protein–ligand contact analysis shows that molecule ZINC000085597504 interacts with more than 50 active site residues of TMPRSS2. The interaction types observed include H-bonds, hydrophobic, ionic and water bridges. Among these, a prominent H-bond interaction was observed only with Asp180, Ser181, Ser183, Gly209 and Cys210 residues (Fig. 5).

Fig. 5
figure 5

RMSD, RMSF and protein–ligand contact diagram of ZINC000085597504 with TMPRSS2

Overall, the comparison of MD simulation results of the hits with those of Glide XP docking suggests a significant correction for ligand interaction with active site residues. The prominent residues commonly observed include Asp180, Gln 183, Gly184, Gly209, Ser186 and Gly209. Furthermore, MD simulation also identifies a notable H-bond interaction with Ser181 for the three hits, which was not recognized during Glide XP docking.

Conclusion

TMPRSS2 is a vital host target protein that has been recognised as an important antiviral drug target against SARS-CoV-2 infection. There are no solved crystal structures of TMPRSS2 in the protein databank at this time; however, a homology structure derived from TMPRSS15 has been useful in the discovery and development of lead molecules against this target. The potential hit molecules from the zinc natural molecule database were identified using structural-based virtual screening and molecular dynamics-based computational research in the current study. Furthermore, in vitro and in vivo studies with these molecules may shed more light on their potential benefits in the treatment of CoVID-19.