Introduction

Over the last few years, the pharmaceutical industry faced several challenges owing to the ever-rising increase in the complexities of diseases worldwide. This significant obstacle that the medical community has been facing has instigated a much-needed shift in the treatment strategies that are currently being employed. A dramatic switch that favors multi-target therapeutics and herbo-synthetic combinations as complementary or sometimes even alternative therapy is being explored [1]. There is enough literature to support the claim that secondary metabolites derived from several plants are pharmacologically active and have been quite successful in treating a wide range of diseases [2, 3]. Plant-based medication is not only readily available and affordable, but it also causes the least number of side effects. In the hunt for therapeutically beneficial novel medications, the medicinal plants that have been in use for more than a thousand years are being re-examined [4, 5]. Most of these plants and their phytochemical extracts are being used globally with little or no awareness about their benefits. One such plant extract that we consume daily is the tea prepared with the young leaves, buds, and stalks of Camellia sinensis (L.) Kuntze (Family: Theaceae) [6].

This plant is commonly cultivated commercially in tropical and subtropical regions, primarily in the southern parts of Asia and its leaf infusion, as tea is the most consumed beverage in the world. The pharmacological activity of C. sinensis extracts has been proven to help treat common ailments like hypertension, diabetes, and obesity. It also possesses anticancer, anti-inflammatory, antioxidant, and cardiopulmonary-protective properties [7, 8]. Recently, tea was reported to have beneficial role in the modulation of human gut microbes [9], also. Both black and green teas are rich in antioxidant polyphenols such as catechins [10]. These polyphenolic antioxidants are known to protect cells from intracellular ROS production, hence preventing the oxidative destruction cycle, which includes cell membrane rupture, DNA damage, and subsequently necrosis, mutagenesis, and cell death [11, 12]. This information can be exploited to computationally map the diseases that can be cured using these phytochemicals.

Lately, computational methods have been critical in the design and discovery of novel medicines for various illnesses [13]. The in silico technique of molecular docking is currently regarded as one of the most dependable, cost-effective, time-saving, and efficient ways for screening potential medicines. The use of this approach is extensively advocated and accepted, given the abundance of phytochemicals with potential pharmacological effects [14]. Recently, in silico biology witnessed its extensive applications for search and screening of plants drugs against various conditions such as acetylcholinesterase [15], microbial infections [16] and COVID-19 [17, 18]. Pharmacophore approach is often integrated with the molecular docking technique to understand the chemistry of protein ligand binding interaction. Pharmacophores explain the functional groups that are involved in such docking interaction. Recently, Oluyori et al. [19] identified SARS-CoV-2 inhibitors from bitter cola by using pharmacophore modeling. Molecular docking coupled with statistical methodology may be considered as to obtain accurate and valid results. In our previous studies, we successfully applied multivariate statistical models to validate our findings [20, 21]. The limiting factor of the conventional molecular docking method is that it targets a single protein rather than multiple proteins. Network pharmacology, also known as polypharmacology method, was developed to target multiple proteins and corresponding diseases [22]. Choudhary and Singh [23] utilized network pharmacology approach for targeting various proteins by Piper longum phytochemicals. Li et al. [24] found that the phytochemicals of Shenlian extract could target 37 proteins by using a similar approach. Recently, Wu et al. [25] used this tool for three flower teas (Rosa rugosa, Chrysanthemum morifolium, and Citrus aurantium) targeting nonalcoholic steatohepatitis [25]. Overall, the network pharmacology approach is witnessing an extensive application in the field of computational biology; however, such an approach is scarce for conventional tea based on C. sinensis leaves. Therefore, we believe that the innumerable health benefits of tea warrant further study on its phytochemicals and the diseases it can be used to treat. Our study is claimed to give a new insight on the numerous health benefits of tea by combining a revised approach consisting of a combination of statistics, pharmacophore alignment, virtual library screening and molecular docking.

In the present study, eight important tea compounds (Epicatechin, epicatechin gallate, epigallocatechin gallate, catechin, quercetin, kaempferol, ellagic acid and gallic acid) were docked with thirty target proteins as obtained from the virtual databases. Finally, tripartite network consisting of phytochemicals, proteins and associated diseases was formed with the aid of multivariate principal component analysis (PCA) and pharmacophore-based experimentations.

Materials and methods

Data collection

Tea phytochemicals, target proteins, and related illnesses were selected after extensive literature mining. Databases such as IMPPAT and PubChem were utilized to screen out phytochemicals from Camilla chinensis. IMPPAT or Indian Medicinal Plants, Phytochemistry, and Therapeutics (https://cb.imsc.res.in/imppat/home) is a curated database with material from more than 50 specialized books, and 7000 research article abstracts on Indian traditional plants of ethno botanical value [26]. The structures of the compounds can be downloaded in the sdf, mol, pdb, or pdbqt formats both two-dimensionally and three-dimensionally from this database. PubChem (https://pubchem.ncbi.nlm.nih.gov/) is the world’s largest open access repository of publicly available chemical data at the National Institutes of Health (NIH) [27]. The phytochemicals identified via these servers were further analyzed to find interacting proteins and their corresponding diseases using the PCIDB (https://www.genome.jp/db/pcidb) web server. The Phytochemical Interactions Database, or PCIDB, is an open-source web server with over 100,000 records, most of them from plants and linked to different protein and disease databases [28]. The proteins that interacted with the shortlisted phytochemicals were found using the UniProt database (https://www.uniprot.org/), which houses freely accessible information about protein sequences and their respective functions [29]. Once these proteins were found, the associated diseases were identified using KEGG DISEASE (https://www.genome.jp/kegg/disease/) and OMIM (https://www.ncbi.nlm.nih.gov/omim) databases. The Kyoto Encyclopedia of Genes and Genomes DISEASE Database (KEGG) [30] and Online Mendelian Inheritance in Man (OMIM) [31] are both open-access databases that provide data regarding disease pathways and other gene-related information. Finally, the drugs designated to target the selected proteins were taken from the DrugBank database (https://go.drugbank.com/) [32].

Preparation of ligands

The Avogadro software was primarily used for the preparation of ligand molecules before docking. This software serves as a cross-platform molecule editor and visualizer for computational chemistry, molecular modeling, bioinformatics, materials science, and related fields [33]. The three-dimensional structures of the phytochemicals selected for the study, namely gallic acid, catechin, epigallocatechin gallate, epicatechin, epicatechin gallate, quercetin, kaempferol and ellagic acid, were downloaded from the PubChem database (https://pubchem.ncbi.nlm.nih.gov/) in the.sdf format and converted to their respective.pdb formats using the Avogadro software. Three-dimensional structures of the control drug molecules were downloaded from the DrugBank database (https://go.drugbank.com/) in the.pdb formats and were subjected to energy minimization and structural optimization by using Avogadro.

Evaluation of drug-likeness

The SwissADME server (http://www.swissadme.ch/) was employed to assess the drug-likeness of the selected phytochemicals. This tool allows us to compute physicochemical descriptors and estimate ADME parameters, pharmacokinetic characteristics, drug-like nature, and medicinal chemistry friendliness of one or more small compounds. Some of the characteristics that were taken into account to predict the drug-likeness of the compounds were the topological polar surface area (TPSA, important for the estimation of brain permeability and gastrointestinal absorption of the potential drugs), gastrointestinal absorption, PGP substrate, lipophilicity (XLOGP3), and water solubility (Log S) [34]. The Lipinski’s rule of 5 was also used to predict the drug-likeness of the molecules. The rule of 5 indicates that poor absorption or penetration is more probable when there are more than 5 H-bond donors and 10 H-bond acceptors, while molecular weight (MWT) is more than 500, and computed Log P (CLogP) is greater than 5 [35].

Preparation of receptors

The three-dimensional structures of the protein targets were downloaded from the RSCB PDB database (https://www.rcsb.org/). This is a database that stores three-dimensional structural data for big biological entities including proteins and nucleic acids [36]. Out of the thirty proteins that were selected for the study, the native structures of 28 were retrieved from the PDB database itself. These proteins include the carbonic anhydrase 12 [PDB ID: 1JCZ, 1.55 Å, X Ray Diffraction], Dihydrofolate reductase [PDB ID: 1BOZ, 2.10 Å, X Ray Diffraction], Epidermal Growth Factor and Receptor [PDB ID: 1IVO, 3.30 Å, X Ray Diffraction], Prothrombin [PDB ID: 1A2C, 2.10 Å, X Ray Diffraction], Carbonic anhydrase 2 [PDB ID: 12CA, 2.40 Å, X Ray Diffraction], Receptor tyrosine protein kinase erbB-2 [PDB ID: 1MFG, 1.25 Å, X Ray Diffraction], Protein kinase C gamma type (PRKCG) [PDB ID:2E73, Solution NMR], Neutrophil elastase [PDB ID: 1BOF, 2.20 Å, X Ray Diffraction], 72 kDa type IV collagenase [PDB ID: 1CK7, 2.80 Å, X Ray Diffraction], Hepatocyte growth factor receptor [PDB ID: 1FYR, 2.40 Å, X Ray Diffraction], Lysosomal alpha-glucosidase [PDB ID: 5KZW, 2.00 Å, X Ray Diffraction], Cytochrome P450 2D6 [PDB ID: 2F9Q, 3.00 Å, X Ray Diffraction], Cyclin-dependent kinase 4 [PDB ID: 2W96, 2.30 Å, X Ray Diffraction], Angiotensin-converting enzyme [PDB ID: 1O86, 2.00 Å, X Ray Diffraction], Matrix metalloproteinase-9 [PDB ID: 1GKC, 2.30 Å, X Ray Diffraction], P-selectin [PDB ID: 1G1Q, 2.40 Å, X Ray Diffraction], Acetylcholinesterase [PDB ID: 1B41, 2.76 Å, X Ray Diffraction], Carbonic anhydrase 4 [PDB ID: 1ZNC, 2.80 Å, X Ray Diffraction], G1/S-specific cyclin-D1 [PDB ID: 2W96, 2.30 Å, X Ray Diffraction], Dipeptidyl peptidase 4 [PDB ID: 1J2E, 2.60 Å, X Ray Diffraction], Prostaglandin G/H synthase 2 [PDB ID: 5F19, 2.04 Å, X Ray Diffraction], Receptor-type tyrosine-protein kinase FLT3 [PDB ID: 1RJB, 2.10 Å, X Ray Diffraction], PI3-kinase subunit alpha [PDB ID: 2RD0, 3.05 Å, X Ray Diffraction], Xanthine dehydrogenase/oxidase [PDB ID: 2CKZ, 3.20 Å, X Ray Diffraction], Cyclin-dependent kinase 6 [PDB ID: 1BI7, 3.40 Å, X Ray Diffraction], Protein kinase C theta type [PDB ID: 1XJD, 2.00 Å, X Ray Diffraction], Potassium voltage-gated channel subfamily H member 2 or hERG-1 [PDB ID: 1BYM, Solution NMR], and Serine-protein kinase ATM [PDB ID: 5NP1, 5.70 Å, Electron microscopy]. Nonpolar H was removed, and polar H was added to the receptor proteins by using AutoDock tool maintaining physiological pH.

The structures of the remaining two proteins UDP-glucuronosyltransferase 1–1 and UDP-glucuronosyltransferase 1–4 were unavailable, and hence, protein structure prediction was performed by homology modeling. The SWISS-MODEL server (https://swissmodel.expasy.org/) was used for homology modeling [37]. UCSF Chimera software was used to further optimize all 3D protein structures for docking [38]. The quality assessment of the modeled proteins was performed by SwissADME MolProbity tool and UCLA-DOE LAB SAVES v6.0: PROCHECK tool (https://saves.mbi.ucla.edu/).

Active site prediction, molecular docking and docking validation

The active sites present in the target proteins were predicted by CASTp (Computed Atlas of Surface Topography of proteins) server. CASTp (http://sts.bioe.uic.edu/castp/index.html?3trg) is a freely accessible server, and the sites predicted by the server were used to set the dimensions of the grid box while performing docking [39]. The binding grid co-ordinates of the thirty proteins are shown in Supplementary Table S3. Protein–ligand molecular was performed by AutoDock Vina software. AutoDock Vina relies on a sophisticated gradient optimization method, followed by the prediction of the best binding orientation of the ligand within the protein cavity [40]. The eight tea phytochemicals were docked with all thirty proteins in the present study. The phytochemical–protein docking results were compared with that of the respective control drugs. Finally, the top result for each ligand was superimposed with the respective control by using BioVia Discovery Studio software, and the similarity of docking poses was examined.

Statistical analysis

Principal component analysis

Minitab 18 statistical software was used to perform principal component analysis or PCA to group phytochemicals based on the docking scores (binding affinity in Kcal mol−1). This statistical tool allows us to reduce the difficulty that accompanies the interpretation of larger data sets. It decreases the dimensionality of such datasets, therefore improving interpretability, while minimizing data loss. By reducing the overall distance between the data and their projection onto the PC (principal components), the first PC (PC1) maximizes variance. The second and following PCs are unrelated to the first, although chosen in the same way [41, 42].

Heatmap generation and clustering analysis

A second confirmatory statistical test for clustering of docking scores was performed by TBtools software along with heat map generation. Pearson correlation methodology was used for the statistical clustering. TBtools (a Toolkit for Biologists integrating various biological data-handling tools) is an open access and stand-alone software suite, with various integrated tools. This software is based on simple IOS logic (input, output and start) [43].

Structural classification and phytochemical alignment

Structural classification of eight (8) phytochemicals of tea was performed by the ClassyFire server (http://classyfire.wishartlab.com/). Classyfire performs automated structural classification of the input molecules. It relies on hierarchical chemical classification principle and provides structural information as created by ChemOnt [44].

PharmaGist, an open-source server, was used to perform pharmacophore analysis of the eight tea compounds. Pharmacophore screening is helpful to identify the common descriptors or functional groups. Based on the clustering of multivariate statistical PCA modeling (Minitab), each of the clusters was subjected to pharmacophore analysis. PharmaGist analysis is based on the DUD (directory of useful decoys) data set, which contains 2950 active ligands for 40 different receptors, with 36 decoy compounds for each of the active ligands [45]. Three-dimensional structural rendering of the output files was performed by PyMol software to represent common descriptors.

Tripartite network construction

The tool Cytoscape (https://cytoscape.org/) allows for global datasets and functional annotations to be projected and integrated, creating strong visual mappings that span these datasets [46]. To understand the interaction among phytochemicals, protein targets and associated diseases, a tripartite network was constructed using Cytoscape v3.7.2.

Molecular dynamic (MD) simulation

MD simulation was performed as described in our previous study [20]. Briefly, GROMACS-2019.2 [47]-based bio-molecular package was used to perform the molecular docking simulation of Epicatechin gallate (ECG)-hepatocyte growth factor receptor (PDB id 1FYR) as facilitated by the Simlab, the University of Arkansas for Medical Sciences (UAMS), Little Rock, USA [48]. GROMOS96 43a1 force field was used for the simulation, and a ligand topology file was generated by PRODRG software [49]. A triclinic grid box was defined for the protein–ligand complex. The molecular dynamic simulation was performed in SPC water and 0.15 M counter ions (Na+/Cl) environment. The system was supported with NVT/NPT ensemble temperature 300 K and 1 bar atmospheric pressure. The pressure and temperature were maintained with Parrinello-Rahmanbarostat and Parrinello-Danadio-Bussithermostat, respectively [50]. The resultant model was energy minimized by 5000 steepest descent integrator, and run time was fixed for 100 ns. Parameters, namely root-mean-square deviation (RMSD), radius of gyration (Rg), root-mean-square flexibility (RMSF) and ligand-H bonds, were evaluated for the protein–ligand complex. A comparative study was performed with above-mentioned parameters for ligand-bound and ligand-free proteins.

Free energy analysis by MM-PBSA calculation

The free energies of epicatechin gallate (ECG)-hepatocyte growth factor receptor (PDB id 1FYR) complex (ΔG_Vander Waal, ΔG_Electrostatic, ΔG_Polar, ΔG_Non-Polar and ΔG_Binding) were estimated by molecular mechanics-Poisson–Boltzmann solvent-accessible surface area (MM-PBSA) method using g-mmbsa package [51].

The following equation is performed for calculating ΔG_Bind (KJ mol−1):

$$ \Delta {\text{G}}\_{\text{Bind}} = {\text{G}}\;{\text{Comp}} - \left( {{\text{G}}\;{\text{Prot}} + {\text{G}}\;{\text{Lig}}} \right) $$

where ΔG_Comp = the energy of protein–ligand complex, G_Prot and G_Lig = individual energy of protein and ligand, respectively. The MMPBSA calculation was performed for 5-ns trajectory.

Results and discussion

Data collections

Tea (C. sinensis) leaves are rich in compounds such as catechin, epicatechin, epicatechin gallate and epigallocatechin gallate [3]. Catechin and its derivatives generally constituted of 30% dry weight of black tea leaves [52]. Epigallocatechin gallate was also reported to be the principal constituent of green tea [53]. Furthermore, phytochemicals such as quercetin, kaempferol, ellagic acid and gallic acid were also reported in tea [54, 55]. Considering previously reported health benefits, richness and dominance of above-mentioned phytochemicals in the tea leaves, these eight compounds were selected for this study. For instance, catechin–gallic acid esters epicatechin, epicatechin gallate and epigallocatechin gallate were reported to have antidiabetic and antioxidant properties [6]. Epigallocatechin gallate, in particular, also served as a hepatoprotective agent by suppressing cytotoxin-induced cell death [56]. Ellagic acid had antioxidant and anti-inflammatory properties. Gallic acid and kaempferol displayed anticancer and anti-inflammatory activities, while Quercetin had been deemed beneficial in cases of hypertension [3]. Based on these findings, eight tea phytochemicals, namely epicatechin, epicatechin gallate, epigallocatechin gallate, catechin, quercetin, kaempferol, ellagic acid and gallic acid, were selected for this study (Table 1). The schematic presentation of the work is represented in Fig. 1.

Table 1 List of selected tea phytochemicals and corresponding chemical classes
Fig. 1
figure 1

Schematic representation of the study

Furthermore, we identified 30 proteins and their associated target diseases though database scouting as mentioned in the Materials and Methods. In Table 2, we tabulated target proteins, associated diseases and control drugs of the respective proteins.

Table 2 Target proteins and associated diseases along with control drugs

Drug-likeness of the compounds

SwissADME could effectively predict the drug-like properties of eight tea phytochemicals. However, it was noted in the literature that the drug metabolism should not be assessed in one parameter. Hou et al. [83] reported that the values obtained from TPSA (total polar surface area) did not correlate with another drug-likeliness parameter. In general, TPSA represents the bioavailability of drug candidates, and the recommended range is 20 to 140 Å2 [82]. Except two compounds, other phytochemicals satisfied the criteria (Table 3). Water solubility of the drug-like molecules is considered to have a significant impact on ADME (absorption, distribution, metabolism and excretion) properties, i.e., on the bioavailability of the compounds. The parameter Log S represents the intrinsic solubility of the candidate drugs in the water. The log value range within − 1 to − 5 shows the balancing between the solubility in water and lipids [83]. In our study, all the phytochemicals are found well within the range of log S (− 1.64 to − 3.70). Further, the transcellular passive diffusion is one of the major parameters for permeability and absorption of the drug-like compounds, and crossing the barrier of the gastrointestinal (GI) tract represents such property. Our study showed that except epicatechin gallate and epigallocatechin gallate, all six phytochemicals had high GI absorption capacity. XlogP3 is an automatic method and determinant of lipophilicity of drug-like molecules. The recommended range of XlogP3 is 1.48–6.19, and all the compounds passed the criteria. p-Glycoprotein substrate (PGP) is a representative of active transporter and generally removes the drug molecules from the cells, when positive. Except two compounds, all other six were found to be PGP negative [84]. Finally, we observed minimum Lipinski's violations (0–2). Overall, all the compounds were found to be passing one or other drug-likeness parameters.

Table 3 Drug-likeness of the phytochemicals

Quality analysis of the homology modeled proteins

The homology modeling of two proteins (UDP-glucuronosyltransferase 1–1: UniProt id P22309 and UDP-glucuronosyltransferase 1–4: UniProt id P22310) were performed by SWISS-Model server, based on the templates of PDB id 6KVJ.1.A and 6O86.1.A. The quality analysis was done by the parameters MolProbity score, QMEAN and GMQE (Global Model Quality Estimate) Z scores. While QMEAN represents the degree of nativeness of the modeled structure, GMQE (Global Model Quality Estimation) shows expected accuracy of the structured models [85]. The results of these quality estimates are shown in supplementary tables S1 and S2. Finally, Ramachandran plots revealed > 85% favorable regions for both the modeled structures (Fig. S1 and S2). Overall, despite the unavailability of accurate templates for our modeled proteins, we were able to obtain moderate quality models.

Molecular docking

Molecular docking is the preferred tool for initial screening of drug molecules. It eliminates the need for tedious, time-consuming and expensive techniques that were previously used for drug discovery. AutoDock Vina is one of most established and robust open-source molecular docking software. This software is considered one of the top ranking programs consistent with the screening benchmark, known as directory of useful decoys by Watowich group [86]. Gaillard 2018 [87] compared the performance of AutoDock and AutoDock Vina based on a comparative assessment of scoring functions (CASF-2013) and reported superior function of AutoDock Vina in all aspects. In our previous study, we successfully deployed AutoDock Vina to identify piperine as the inhibitor of Dengue and Ebola virus enzymes [21]. In this work, all seven phytochemicals, except gallic acid, showed substantial inhibitory potential against target proteins. Out of all the phytochemicals, epigallocatechin gallate, epicatechin gallate, and quercetin have shown maximum coverage, targeting 22, 21 and 20 proteins, respectively, where the binding scores were better than those of the respective controls (Table 4). In the molecular docking experiment, we identified the best target protein for each of the phytochemicals in terms of binding affinity (Kcal mol−1). Target proteins, namely hepatocyte growth factor receptor (PDB id 1FYR), dihydrofolate reductase (PDB id 1BOZ), prostaglandin G/H synthase 2 (PDB id 5F19) and angiotensin-converting enzyme (PDB id 1O86), on an average was found to be most potential targets by the phytochemicals. Epigallocatechin gallate could inhibit the progression of tumor genes by blocking DNA methyltransferases, proteases, and dihydrofolate reductase (DHFR) activities [88]. In a recent study, Zong et al. [21] showed that this phytochemical prevented aminoglycosides-induced ototoxicity in Zebra fish, by blocking several enzymes including DHFR [89]. In our work, epigallocatechin gallate scored considerably well (− 9.8 kcal mol−1) against DHFR compared with other target proteins. Tea phytochemicals were earlier reported to have strong inhibiting potentials against angiotensin-converting enzyme (ACE) protein (Fauzi et al., 2018) [90]. Epicatechin and kaempferol were found to have substantial binding potentials with matrix metalloproteinase (MMP)-9 (PDB id 1GKC) compared to their respective controls, both having similar scores − 9 kcal mol−1. Kanbarkar and Mishra (2021) [91] recently in their findings showed that tea phytochemicals could inhibit MMP protein. Kaempferol could modulate the activity of MMP-2 and 9 proteins [92]. Anticancer activities of tea polyphenols were vastly reported in the literature [93, 94]. In their review, Cheng et al. (2020) [95] showed the modulating effect of Catechin and its derivatives against various types of cancer. The flavonoids quercetin and rutin could suppress the expressions of multiple oncogenes, including prostaglandin synthase 2 and showed antiglioma effects [96]. Consistent with these findings, among all 30 target proteins, prostaglandin G/H synthase 2 (PDB id 5F19) was the best targeted by the phytochemicals catechin and quercetin with the binding affinities − 9.2 and − 9.4 kcal mol−1, respectively. Recently, Wang et al. (2021) [97] showed that the tea phytochemical catechin had strong binding affinity with ACE protein. The IC50 value of ellagic acid was shown as 2 mM against ACE protein indicating its strong inhibitory potential [98]. In consistent with these findings, among all other proteins ellagic acid showed the highest binding potential to ACE protein (− 8.9 kcal mol−1) compared with control. Epicatechin gallate scored the highest (− 10 kcal mol−1) against hepatocyte growth factor receptor, among all target proteins as well as phytochemicals. Hepatocyte growth factor is one of the major elements for the development of hepatocellular carcinoma [99], and epicatechin gallate is the one of the most prominent phytochemicals in tea leaves [100, 101]. Elsewhere, it was shown that green tea and epicatechin gallate effectively prevented and managed nonalcoholic fatty liver diseases (NAFLD) [102]. Further, catechin derivatives were reported to be very effective against hepatocellular carcinoma [103]. Overall, all these findings were consistent with various wet laboratory-based studies, in turn validating the accuracy and robustness of our outcomes.

Table 4 Binding affinity (Kcal mol−1) between compounds and target proteins as generated by Autodock Vina

Interaction analysis and docking validation

For amino acid–ligand interaction analysis and docking pose validation, the best protein–ligand complexes, namely Epicatechin/Kaempferol-1GKC (UniProt P14780), Epicatechin gallate-1FYR (UniProt P08581), Epigallocatechin gallate-1BOZ (UniProt P00374), Catechin/Quercetin-5F19 (UniProt P35354), and Ellagic acid-1O86 (UniProt P12821), were selected. Control drugs were re-docked with the respective proteins. Overlaid 3D diagrams of phytochemical-control drug pockets are represented in Fig. 2. Epicatechin and kaempferol along with the control drug marimastat interacted with the common amino acids, namely LEU288A, ALA189A, TYR423A and HIS401A, of the protein matrix metalloproteinase-9 (PDB id 1GKC) through conventional H, Pi-H and other bonds. For the target protein hepatocyte growth factor receptor (PDB id 1FYR), control drug crizotinib, interacted with the amino acid VAL123A through conventional H bond similar to epicatechin gallate. Methotrexate and epigallocatechin gallate both interacted with the protein dihydrofolate reductase (PDB id 1BOZ), for the amino acids THR56A, SER59A, AL115A, ALA9A and LEU22A. Control drug Icosapent shared the similar binding pocket (HIS207B, HIS386B, and HIS388B) of the protein Prostaglandin G/H synthase 2 (PDB id 5F19) with two phytocompounds, namely catechin and quercetin. Finally, ellagic acid similar to its corresponding control drug perindopril stabilized its interaction with the target angiotensin-converting enzyme through the common amino acids GLU123A, ARG124A, TYR135A and SER517A (Table 5).

Fig. 2
figure 2

Superimposed three-dimensional docking interaction between control drugs and phytochemicals to target proteins (Uniprot id), along with H bond interactions; catechin (CAT), epigallocatechin gallate (EGCG), epicatechin (EC), epicatechin gallate (ECG), quercetin (QUE), kaempferol (KAE), and ellagic acid (EA)

Table 5 Amino acid residues interacting with the phytochemicals

Statistical analysis

Application of advanced statistical tool and molecular docking methodology could give insightful information on the underlying mechanism of protein–ligand biding affinities. Recently, PCA was successfully deployed for understanding the molecular interaction of Mur enzymes and gallomyricitrin [104]. In our previous studies, we used PCA tool to categorize phytochemical groups and correlated their chemical classes with their docking scores [20, 105]. Docking outputs from 8 ligands and 30 proteins along with the control docking results were taken as inputs for PCA, and the results are presented in Fig. 3a. The first principal component (PC1) and the second component (PC2) explained approximately 61.50 and 1.78% of the variance. Each of the principal components shows different dimensions of the measured dataset, and all these components are transformed into uncorrelated datasets. We observed four distinct clusters in the analysis. While cluster 4 consisted only of hydroxybenzoic acid derivative gallic acid, cluster 2 had control drugs. Ellagic acid and catechin gallate esters epigallocatechin gallate and epicatechin gallate were grouped into cluster 1, and other flavonoids quercetin, catechin, epicatechin and kaempferol were found in the cluster 3. Placement of gallic acid and control drugs in the separate cluster was indicative of comparatively weaker biding affinities to target proteins compared to other compounds.

Fig. 3
figure 3

Statistical analysis of binding affinities (kcal mol−1), a principal component analysis and b heat map with clustering

Further, to validate the finding from PCA analysis, we performed heat map generation and clustering analysis based on the Pearson correlation algorithm (Fig. 3b). All clustering algorithms rely on the grouping principle based on the input data. Clustering principle is commonly used in gene expression experiments; however, we used this statistical methodology earlier in molecular docking study [21]. Correlation-based clustering method has an advantage over the distance-based matrices in terms of scaling irrelevance and primary focus on the relativity of output data [106]. In this study, we observed a clustering pattern similar to the PCA results.

Molecular alignment and pharmacophore analysis

Pharmacophore approach is being extensively used in the computer-aided drug discovery. It is defined as the ‘ensemble of various steric and electronic features’ for a particular molecule. Pharmacophores of ligand entities represent functional descriptors (groups) such as aromatic, acceptor, and donor that are responsible for interaction with target proteins. Recently, we aligned three molecules, namely curcumin, piperine and chloroquine, to understand the common interacting descriptors with COVID 19 spike protein as a target [20]. In a similar study, Oluyori et al., 2022 [19] showed that standard aromatic rings of garcinia biflavonoid I, garcinia biflavonoid II, kolaflavone and amentoflavone were responsible for their interaction with the main protease and RNA-dependent-RNA polymerase of protein of SARS-CoV-2. In this study, based on the PCA/clustering outputs, we performed molecular alignment to identify functional pharmacophores of the molecules. For PCA cluster 1 (ellagic acid, epigallocatechin gallate and epicatechin gallate), three H bond acceptors, two H bond donors and one aromatic group were identified as common descriptors. On the other hand, PCA cluster 3 (quercetin, catechin, epicatechin and kaempferol) had three H bond donors, one H bond acceptor and two aromatic groups as common functional groups (Fig. 4). Further, we observed alignment scores for PCA cluster and 2 as 26.638 and 33.237, respectively (Figs. S1 and S2). Alignment scores represent the overall alignment performance of the input molecules. Two-dimensional interaction diagrams of the best ranked ligands with their respective target proteins showed the identified descriptors as functional pharmacophores (Fig. 5).

Fig. 4
figure 4

Molecular alignment of phytochemicals showing pharmacophores (ACC acceptor, DON donor, AR aromatic)

Fig. 5
figure 5

Protein–phytochemical interacting amino acids showing pharmacophores involved: catechin (CAT), epigallocatechin gallate (EGCG), epicatechin (EC), epicatechin gallate (ECG), quercetin (QUE), kaempferol (KAE), and ellagic acid (EA)

Phytochemical interaction with target proteins and respective diseases

Network pharmacology is an advanced technique, which enables researchers to identify multiple targets for their candidate compounds in the shortest time possible. This is an emerging methodology in drug discovery studies and is being extensively used in phytochemical-based research. In our earlier study, we successfully deployed network pharmacology approach to identify multiple disease targets of ginger phytochemicals [107]. Recently, Sarkar et al. [108] used this methodology to identify targets for mango constituent. Tea compounds were essentially reported to have multiple health benefits and diseases modulating role; however, the application of a network pharmacology approach for these tea molecules is scarce to the best of our knowledge. Nevertheless, a few such works were presented for non-Camellia sinensis tea [109, 110]. In this work, a tri-partite network consisting phytochemicals, proteins and associated diseases is constructed based on the top docking score (binding affinity Kcal mol−1) for each of the phytocompounds (Fig. 6). Target proteins and their related diseases are presented in Table 2. Gallic acid did not qualify due to overall weak binding affinities toward target proteins. While the result was analyzed, it was found that epicatechin and kaempferol both were connected with the common target (1GKC, matrix metalloproteinase-9). Matrix metalloproteinase-9 is associated with diseases such as like intervertebral disc disease and penile cancer.

Fig. 6
figure 6

Tripartite network of phytochemicals, proteins and associated diseases: catechin (CAT), epigallocatechin gallate (EGCG), epicatechin (EC), epicatechin gallate (ECG), quercetin (QUE), kaempferol (KAE), and ellagic acid (EA)

Molecular dynamic simulation

The computer-assisted molecular docking simulation tool is very useful in computational biology study to determine the accuracy and stability of the docked complex in a near-native physiological environment. Kushwaha et al. [111] recently exhibited the binding stability of quercetin-3-rutinoside-7-glucoside at the active site of SARS-Cov-2 Mpro by using MD simulation tool. Similarly, Islam et al. [112] use MD simulation tool to evaluate the docking potential of the selected phytochemicals against the main protease of SARS-CoV-2. In our study, epicatechin gallate (ECG)-hepatocyte growth factor receptor (PDB id 1FYR) complex scored highest (− 10 kcal mol−1) in terms of binding affinity and was analyzed for understanding the stability and conformational dynamics of the complex using MD simulation tool. The results obtained collectively from all the four parameters, namely RMSD, RMSF, gyration and ligand–protein H bonds, clearly showed that the complex formed by the phytochemical ECG was stable and could effectively bind with the target protein hepatocyte growth factor receptor. Root mean square deviation (RMSD) is indicative of inter-residual interactions within protein. We performed a 100-ns simulation for both apo and bound states of the selected protein. We observed a gradual increment of the RMSD in both the states up to 20 ns, and thereafter, a stable RMSD was observed throughout the remaining simulation time within the range of 0.25–0.35 nm on an average. Overall, marginally higher RMSD values were seen for the bound state protein (~ 0.31 nm) than the apo state (~ 0.25 nm), possibly due to the introduction of the ligand (ECG) into the structure (Fig. 7a). A similar observation was reported for radius of gyration (Rg). The compactness of the protein structure in the free and bound forms was analyzed in Rg analysis (Fig. 7b). Before beginning the simulation, the Rg values of apo and bound states were calculated as 1.7 and 1.64 nm, respectively. After initial fluctuation up to 30 ns, both the structures were stabilized at ~ 1.7 and ~ 1.5 nm of Rg for apo and bound states, respectively. In the MD simulation, initial fluctuations at the start of the trajectory were commonly cited in the literature. In addition, the marginal difference of output parameter values between bound and free-state of protein was also reported elsewhere to explain stable dynamics [113, 114]. Further, to study the residual mobility of ligand-bound and ligand-unbound forms, we analyzed the root-mean-square fluctuations (RMSF) of individual amino acid residues. As shown in Fig. 7c, at the bound state, we did not observe any remarkable change in the residual flexibility when compared to the apo form. In both cases, the difference in residual flexibility was minimal. Stable hydrogen bond formation between ligand and proteins is important for the structural stability of the complex. For interaction with the protein 1FYR, ECG showed maximum 5 H bonds. We observed at least 2 H to be long-lived throughout the simulation of 100 ns (Fig. 7d).

Fig. 7
figure 7

Molecular dynamic (MD) simulation of epicatechin gallate (ECG)-hepatocyte growth factor receptor (PDB id 1FYR) complex: a root-mean-square deviation (RMSD), b radius of gyration, c root-mean-square fluctuation (RMSF) and ligand–protein H bonds

Free energy analysis by MM-PBSA calculation

In a recent development, MM-PBSA method is considered as an advanced methodology for the estimation of free energy. Although this method increases the computational cost significantly, it can provide more accurate results than the conventional score-based molecular docking technique [115]. The results showed that epicatechin gallate (ECG) had a very high binding affinity (− 242.09 ± 10.97 kJ mol−1) toward the receptor protein hepatocyte growth factor receptor (PDB id 1FYR). The dynamic stability of a protein–ligand complex was further explained by other important energy terms, namely van der Waals, electrostatic, nonpolar and polar. Among these energy terms, polar or polar solvation energy had the opposite effect, which links binding energy to the unfavorable positive value [116]. In this study, we observed that van der Waals contributed more negative energy than its electrostatic counterpart (Table 6). In our study, the low contribution of polar solvation energy was important to stabilize the ECG-1FYR complex. Overall, a stable energy trajectory was observed, as demonstrated by representative 5-nm snapshots on five energy terms, namely, ΔG_Van der Waal, ΔG_Electrostatic, ΔG_Polar, ΔG_Non-Polar and ΔG_binding (Fig. 8).

Table 6 MM-PBSA calculations of binding free energy for ECG-P00374 complex
Fig. 8
figure 8

Free energy terms of epicatechin gallate (ECG)-hepatocyte growth factor receptor (PDB id 1FYR) complex, a ΔG_Van der Waal, b ΔG_Electrostatic, c ΔG_Polar, d ΔG_Non-Polar and e ΔG_binding

Conclusion

In this study, we found that tea phytochemicals were capable of targeting multiple target proteins associated with various ailments. The results supported by multiple tools like statistics, pharmacophore analysis, network pharmacology and molecular docking showed that seven tea phytochemicals (gallic acid was excluded) could altogether target five proteins and ten associated diseases at a minimum. However, this was a minimum estimate due to stringent screening applied in the present work. Further, among eight phytochemicals studied, epicatechin gallate was found to bind strongly with the target protein hepatocyte growth factor receptor (PDB id 1FYR) with the affinity as 10 kcal mol−1. Epicatechin gallate formed a highly stable complex with the target protein in a physiological environment as found by MD simulation.