1 Introduction

Papaya leaf curl virus (PaLCuV) infection leads to severe symptoms in various agricultural plants, including papaya. The begomovirus features a multifunctional protein crucial for nearly every stage of its life cycle and serves to safeguard the viral DNA structurally [1]. It is vital to tightly regulate interactions between this viral protein and plant proteins during infection [2, 3]. Until recently, there was limited knowledge regarding the protein interactions involving papaya crops and begomoviral proteins, hindering global efforts to understand their functional and structural interplay. PaLCuV’s genome comprises six distinct gene products, with the CP and Rep proteins playing pivotal roles in virion activation, host adaptation, and disease progression [4].

Begomoviruses interact with multiple host proteins to modulate transcription and translation processes, facilitating virus replication [5]. These viruses can adapt rapidly to environmental changes by altering their genetic information, establishing favorable protein complexes within hosts. This adaptation mechanism helps them evade plant immune responses [6]. Additionally, they regulate microRNA levels involved in host development [7] and interact with various host proteins for successful disease development [8]. The role of one such viral protein, βC1, associated with Tomato Yellow Leaf Curl China Virus (TYLCCNV) infection, interacts with Asymmetric Leaves 1 (AS1), disrupting normal leaf development. It also interferes with jasmonic acid-responsive genes, promoting cellular infestation by the insect vector Bemisia tabaci [9]. Another significant interaction involves the protein ubiquitin-conjugating enzyme E3 (SlUBC3) from Solanum lycopersicum and CLCuMB, suggesting βC1’s potential interference with the ubiquitin–proteasome pathway through UBC [10].

A study by Shen et al. [11] revealed that Sucrose non-fermenting1 (snf1)-related kinase 1 (SnRK1) plays a pivotal role in the phosphorylation of the geminivirus-encoded Rep protein of Tomato golden mosaic virus (TGMV). Through mutagenesis studies, researchers pinpointed the specific domains responsible for binding with the virus. These insights underscore the vital role of SnRK1 in regulating plant–virus interactions and highlight its significance in maintaining plant health and resilience under challenging conditions. In conclusion, this study demonstrates the significant role played by SnRK1 in plants and underscore its potential as a target for improving plant resistance against viruses and environmental stresses.

Docking studies provide a simulated platform to understand the optimal orientation and conformation of interacting proteins, aiding in the identification of key residues and binding sites crucial for the interaction [12]. Energy calculations contribute by evaluating the stability and dynamic behavior of the predicted protein complexes. Statistical tests are applied to assess the significance of identified interactions, and the results are visualized using graphical tools. Gaining insights into the structural determinants of protein–protein interactions holds the key to a deeper understanding of biological functions, diseases, and the development of therapeutics [13].

Through in-silico integrative modelling and interface analysis, we aimed to comprehend the sequence–structure relationship and the method of host–virus interaction in this study. This involves the binding of papaya plant and PaLCuV–PaLCuB protein during infection. Therefore, in this study, we used ten papaya plant proteins, one PaLCuB protein, and all six PaLCuV proteins for structural and functional analysis through protein–protein interaction. Since each gene’s genomic function varies greatly, genes work together to accelerate the onset of disease. Thus, using integrative modelling to investigate the virion protein is exciting. Conserved residues that facilitate binding with plant proteins during infection can be found. An important aspect of this is the ability to precisely predict the binding potency of a given protein–protein complex. With this study we may better understand the structural basis of virus host assembly, that help us to develop new antiviral drugs that target the binding regions [14].

2 Materials and methods

2.1 Sequence retrieval and primary sequence analysis of PaLCuV–PaLCuB protein

To elucidate the interaction between begomovirus and host proteins, the FASTA format amino acid sequence of all six proteins i.e., CP, Pre-CP, REn, Rep, TrAP and C4 of PaLCuV (DNA-A), a protein sequence of βC1 of PaLCuB of the isolate PL-1 (av1_GKP) were used in study (Table 1). The protein domain region or functional region was identified using the Conserved Domain Database (CDD). The ProtParam tool (https://web.expasy.org/protparam) [15] of Expasy Proteomic Server was used on the primary amino acid sequence of isolate PL-1 to obtain broad chemistry of virus’s protein. This server provides information on various physicochemical properties of all the proteins of begomovirus, such as iso-electric point, molecular weight, aliphatic index, instability index, and grand average hydropathy (GRAVY) [16]. The phosphorylation site was identified for each begomovirus protein using the NetPhos 2.0 server [17].

Table 1 Physicochemical and domain prediction for genes of Papaya leaf curl virus (PaLCuV) and Papaya leaf curl betasatellite (PaLCuB) using Expasy ProtParam tool; conserved domain database (CDD); NetPhos 2.0 server

2.2 Structural modelling of virus protein

Phyre2 (Protein Homology/analogY Recognition Engine v.2.0) webserver was used to predict and analyse protein structure of begomovirus proteins. This tool uses advanced remote homology detection methods to build 3D models (http://www.sbg.bio.ic.ac.uk/phyre2) ‘Normal’ mode modelling by Phyre2 produces a set of potential 3D models based on PSI-Blast and hidden Markov models (HMM) of query proteins [18].

2.3 Sequence retrieval and structural assessment of papaya plant protein (receptor)

The FASTA format amino acid sequence of plant protein (receptor), which has been found to interact with begomovirus protein in previous literature, was studied, and those present in papaya plants were retrieved from the NCBI database with accession numbers listed in Table S1. Homology modelling of host proteins to build 3D models was performed using the SWISS-MODEL workstation (https://swissmodel.expasy.org/interactive) [19] use mono, homo, or heteromeric complexes based on the ProMod3 modelling engine and produce a highly accurate model based on the values of QMEAND [20] and QMEAN [21]. It uses a query sequence to generate the best match template for net-forming BLAST [22] and HHblits [23]. The template was selected based on maximum Global Model Quality Estimate (GMQE) and percent identity [24] and model was built.

2.4 Model validation of virus and plant protein

The viral and plant protein models were subjected to the PDB sum server (https://www.ebi.ac.uk/thornton-srv/databases/pdbsum/Generate.html) for structural validation and quality. This work focuses on a crucial web tool called PROCHECK [25], which is used to explain the projected 3D model’s geometry, durability, stereochemical consistency, and authenticity. The energy potential of the predicted model was calculated using protein structure analysis (ProSA) (https://prosa.services.came.sbg.ac.at/prosa.php) [26], and the Verify3D [27] score was used to verify the quality of the predicted model. It is also used to determine whether the amino acid sequence of the atomic model is compatible. In comparison to other approaches that are optimized to find native structures, ERRAT [28] was performed to estimate the accuracy of the non-bonded atoms of the modeled proteins.

2.5 Energy minimization and prediction of binding sites

To remove the steric clashes in the modeled structure, energy minimization is performed using UCSF Chimera software v.1.15 [29]. Further, we have employed Solvent accessibility-based Protein–Protein Interface iDEntification and Recognition (SPPIDER) (https://sppider.cchmc.org/) a web server for the prediction of interaction sites using an unbound protein 3D structure for both virus and plant protein [30].

2.6 Protein (virus)–protein (plant) (P–P) interaction through docking

Active residues of the virus and plant for P–P interaction were predicted prior to docking analysis. Docking analysis of begomovirus and plant proteins was performed using the HADDOCK v2.4 (High Ambiguity Driven Protein–Protein Docking program) webserver (https://wenmr.science.uu.nl/haddock2.4/submit/2) [31]. Active residues of each viral protein and plant protein resulting from the SPPIDER search were used as input active and passive residues, respectively, in the HADDOCK server. The output HADDOCK for docked complexes can be ranked based on the HADDOCK score, Root mean square deviation (RMSD) values and Z-score. To examines the interaction between the host and virus, we performed the protein–protein interaction to predict the binding affinity between proteins based on potential energies in terms of negative delta G. However, PROtein binDIng enerGY prediction (PRODIGY) web server was used for predicting the binding affinity of protein–protein complexes [32].

2.7 Interface analysis of virus–plant docked complex

The P–P docked complexes were analyzed using the PDBsum server (https://www.ebi.ac.uk/thornton-srv/databases/pdbsum/Generate.html) [25]. This web server provides detailed information about interface residues, non-bounded atoms, and the type of bond shared by the interacting residues of the docked complex. Further, PyMOL v.2.5.5 (The PyMOL Molecular Graphics System, Version 2.0, Schrödinger, LLC.) software was used to predict and visualize hotspot interacting residues of various plant–virus protein docked complexes.

3 Results

3.1 Sequence analysis of isolate PL-1 (PaLCuV–PaLCuB proteins)

The PaLCuV and PaLCuB proteins contains a variable number of amino acids and their divergent conserved regions are listed in Table 1. This conserved region of the begomovirus protein is utilized further for the structural analysis. The primary sequence analysis of proteins illustrated that the Pre-CP and βC1 proteins are acidic (an isoelectric point of 6.94; 4.91) in nature, and the rest are alkaline in nature, as their isoelectric point (pI) lies above 7.0 and has a variable molecular mass (kDa), respectively (Table 1). The stability of the protein across a broad temperature range is shown by the aliphatic index. It is commonly known that a protein is considered unstable if its instability index value is greater than 40, and stable if it is less than 40 [33]. However, we found the REn and Rep proteins of PaLCuV to be stable, as their instability index is 36.06 and 36.20. (i.e., < 40). Variable positively and negatively charged amino acids are present in the viral protein, and their negative GRAVY value suggests that they might interact with water more effectively [34]. The phosphorylation sites of viral proteins have also been discovered (Table 1). These sites represent variable serine, threonine, and tyrosine sites for phosphorylation, which is crucial for controlling a viral protein’s stability, activity, and interactions with other viral and cellular proteins [13].

3.2 Begomovirus protein 3D structure prediction and their model validation

The FASTA sequences of all seven viral proteins were used as input on the Phyre2 web server. This work uses homology detection methods to build 3D models of the user’s query protein for PDB file building [18]. The selection of the best homology model was based on maximum confidence and coverage of the single highest scoring template (Fig. 1). The pdb of the predicted models was validated with PROCHECK for identification of the stereochemical quality of a protein structure by analysing residue-by-residue geometry and overall structure geometry through a Ramachandran plot (Table 2; Fig. 2). The residue percentage distribution in the 3D model showed that, on average, there were > 72.3% of residues in favoured regions, < 24.1% in additional allowed regions, < 4.8% in generously allowed regions, and < 6.2% in disallowed regions. The ProSA software seeks to determine the energy profile of the suggested model in addition to the z-score value, which aids in determining the residue relationship’s energy. For every protein, a z-score rating value is calculated by ProSA and displayed. The z-score through ProSA software showed the highest value for REn and lowest for CP proteins whereas for βC1 it was − 3.27. All generated model indicates good model quality (Fig. 3). ERRAT validation score > 50% indicates good resolution model and higher (> 80%) indicates high-resolution model [34]. However, the REn protein was found to be least compatible in its 3D model with its 1D amino acid sequence as compared to other proteins through Verify 3D server [35, 36]. Further, the details of the begomovirus protein structure validation are summarized in Table 2.

Fig. 1
figure 1

Three-dimensional ribbon structure of PaLCuV–PaLCuB protein generated using Phyre representing C-terminal region (blue), N-terminal region (red); Pre-CP, CP, REn, Rep, TrAP, C4 and betaC1 protein

Table 2 Model validation scores of virus protein using different web servers
Fig. 2
figure 2

Ramachandran plot of PaLCuV–PaLCuB protein model. The plot was created using the PROCHECK program of PDBsum webserver representing Pre-CP, CP, REn, Rep, TrAP, C4 and betaC1 protein

Fig. 3
figure 3

Graphical representation of z-scores for the virus protein model obtained by ProSA. Method of obtained PDB structures determined either by nuclear magnetic resonance (NMR) or X-ray crystallography shown in dark and light blue, respectively. Models generated by Phyre2 web server were shown in circle

3.3 Sequence analysis of papaya plant protein and secondary structure validation

A total of ten papaya host proteins were used in the study (Table S1). The FASTA sequences of all ten proteins were used as input for template search in the SWISS-MODEL web server. This portal resulted in the best template match based on percent identity, i.e., > 70.67%, GMQE value close to 1, considered a good score for the predicted model (Table 3). The target-template alignment served as the foundation for building the protein model. UCSF Chimers programme was utilised to estimate the ribbon structure based on protein PDB information (Fig. 3). The.pdb of the predicted models was validated with PROCHECK through a Ramachandran plot (Table 3; Fig. 4). Over 90.22% of the residues were represented by the 3D model in the most favored places. A high-quality model should exhibit above 90% in the most favored regions; this is comparable for plant proteins and indicates the model’s acceptability. Each protein’s z-score rating value was assessed by ProSA, which discovered a range of − 5.18 to − 12.27. Each protein’s ERRAT value can be statistically calculated to obtain a satisfactory validation score (> 84.61%), indicating that the original model is consistent. Verify 3D compatibility was found at a good percentage, i.e., > 50%, except for protein CSN5, with 47.04% between an atomic model (3D) and its own amino acid sequence (1D) (Table 3).

Table 3 Structural assessment and model validation of host proteins using different web servers
Fig. 4
figure 4

Ramachandran plot of Carica papaya plant protein model. The plot was created using the PROCHECK program of PDBsum webserver representing ADK, CaM, CDK1, CSN5, ISC, CUL1, GSK3, hsp70, PCNA, and SAMS plant protein

3.4 Energy minimization and prediction of binding sites

Interactions between the viral protein and plant protein must be tightly controlled for assigning precise regulation of each virus functions throughout the infection process. Since biological systems are dynamic energy minimization is done prior to molecular docking. By using UCSF Chimera software, an analysis of energy levels (kJ/mol) revealed that the lowest initial energy in CUL1 protein i.e., − 60,174.87 while CaM had a highest value with 740.82 kJ/mol. However, following the minimization process, we found minimized final energy lowest in CUL1 protein while highest in CaM protein with the value of − 70,314.74 and − 2274.73 respectively (Table 4). Subsequently, for the begomovirus proteins, βC1 has a highest and TrAP has a lowest initial energy with the value of 53,203,841.81and 48,622.18 kJ/mol respectively, while after minimization, CP had the lowest and TrAP had a highest final energy with value of − 11,163.61 and − 2333.15 kJ/mol respectively (Table 4). However, achieved minimized energy value of protein represents a substantial decrease from its initial state which indicates that the system has no inappropriate geometry or steric clashes. Additionally, binding site identification is crucial because it determines probability of interaction between viral and plant protein and provides a clear idea of the orientation of the active site residues. The key residues interacting with viruses and plant proteins are determined using the SPPIDER webserver (Fig. 5; Table S2).

Table 4 Energy minimization of host and virus protein using UCSF chimera software
Fig. 5
figure 5

Three-dimensional ribbon structure of Carica papaya plant protein generated using SWISS-MODEL representing C-terminal region (blue), N-terminal region (red); ADK, CaM, CDK1, CSN5, ISC, CUL1, GSK3, hsp70, PCNA, and SAMS plant protein

3.5 Protein (plant)–protein (virus) interaction through docking

The begomovirus proteins, during infection, interact with plant proteins and modulate the biological and cellular systems of plants either by up- or down-regulating the plant proteins [7] (Table S1). For each virus protein, ten plant proteins were docked to get the best result in terms of the HADDOCK score, Z-score, and RMSD value. Moreover, to avoid the inappropriate interpretability of the result, we filtered the resulted dataset for each virus protein having more negative HADDOCK score and a positive HADDOCK score. Therefore, among the ten datasets of virus–plant protein interaction, we selected those virus proteins having highest negative HADDOCK score, and their P–P docked PDB was downloaded. Moreover, docked complexes with a positive HADDOCK score were eliminated from the study.

Consequently, we got the best six P–P docked complexes, showing good scores and statistical values (Table 5). This includes PCNA-Rep; SAMS-CP; CDK-Rep; ADK-REn; CaM-preCP; PCNA-βC1. Their HADDOCK score represents the overall quality of the model, whereas Z-score and RMSD are statistical values depending on cluster size. Lower the value of Z-score, better is the model [28], and smaller the RMSD is between two structures, the more similar are these two structures [33]. However, among six docked unit we got CDK-Rep complex having lowest Z-score and RMSD value i.e., − 2.0 and 1.5 ± 1.1 respectively.

Table 5 Statistical report of best plant–virus protein–protein docked complex through HADDOCK server

Moreover, for each docked complex’s binding energy (kcal/mol) was analyzed using the PRODIGY Server (Table 5), which predicts the binding affinity of protein–protein complexes from their 3D structure. Among all six docked complexes, the SAMS-CP and CDK-REP protein exhibited the highest binding energy, i.e., − 18.7, − 14.0 kcal/mol respectively indicating a better interaction between host and virus proteins. A negative indicates that the complex interaction is energetically favorable and establish a stable binding. Previous literature explains the interaction between PCNA-Rep protein that can enhance the virus titer and their transcriptional level in infected host cells [37, 38]. TrAP binds with Adenosine kinase (ADK) and causes overexpression of ADK in infected cells which disturbs the plant cell cycle also disturbs the plant methyl cycle by inhibiting the role of S-adenosyle methionine decarboxylase 1 (SAMDC1) [39, 40]. Interaction between Cotton Leaf Curl Multan Virus-C4 in infected plants inhibited enzymatic activity of SAMS that disrupted the plant silencing pathway and enhanced the infection [41]. The βC1 binding with NbrsgCaM provides energy supplement to geminiviruses and weakens host antiviral system by repressing RDR6 expression [42]. The βC1 protein plays multiple roles during plant cell infection, including interference with DNA methylation, ubiquitination, and nutrient metabolism. It operates in both the nucleus and cytoplasm to ensure successful pathogenicity [43]. Previous studies suggest that there might be occurrence potential protein–protein interactions in papaya plants (Table 2) [5]. These findings indicate that interactions like PCNA-Rep, SAMS-CP, CDK-Rep, ADK-REn, CaM-preCP, and PCNA-βC1 can disrupt various papaya proteins. This disruption creates an environment conducive to viral replication and transcription by modulating transcriptional and post-transcriptional gene silencing pathways, leading to developmental abnormalities and severe disease symptoms.

3.6 Interface and hotspot residue analysis of plant–virus protein docked complex

The interface statistic of the docked complex can be interpreted by considering the number of interface residues, interface area, salt bridge, hydrogen bond, and non-bonded contact. The bottom Key panel shows which how residues interact in each trajectory frame. Some virus residues make more than one specific contact with the plant protein (Table 6). However, highest number of interface residues were found for SAMS-CP; ADK-REn and CaM-preCP interacting complex. Identification of hotspots and protein dimers helps to understand probable binding interactions between the proteins and modulate the interaction. Moreover, the identified hotspot residues of P–P complexes are represented (Table 6; Fig. 6). The number of H-bond lines between any two residues indicates the number of potential hydrogen bonds between them. However, SAMS-CP; ADK-REn and CaM-preCP complex are with maximum number of hydrogen bond i.e., 19, 17 and 15 respectively. For non-bonded contacts, which can be plentiful, the width of the striped line is proportional to the number of atomic contacts which is again maximum for SAMS-CP ADK-REn and PCNA-Rep docked complex. In support of HADDOCK server analysis, among all six docked complexes, the SAMS-CP protein exhibited the highest binding energy i.e., − 18.7 with a greater number of interface residues i.e., CP-39_SAMS-50 and hydrogen bonds i.e. 19 (Table 6; Fig. 6).

Table 6 Statistical interface analysis of protein–protein docked complex
Fig. 6
figure 6

Schematic diagram of interactions between protein chains of docked complex. Interacting chains are joined by coloured lines, each representing a different type of interaction

Further, to support the interface analysis of PDBsum server, PyMOL software were also used to predict, highlight, and visualize hotspot interacting residues of top six plant–virus protein docked complexes which were identified in Table 6. This finds the polar contacts between the interacting chain and the hydrogen bond. The docked complex is presented and the Green Chian A represent the Virus protein while the Blue Chain B represent the Plant protein. The interface residues of each docked complex are highlighted which explain the binding mechanism for host defence suppression. However, the tight interactions between the virus and the plant protein, is essential for the function of virus protein throughout the infection process. Therefore, the interacting key residue for gene of interest i.e., CP, Rep and βC1 are explained which have important role in plant gene silencing pathways and successful pathogenicity. In the case of SAMS-CP, the key residues of CP interacting with SAMS protein are Tyr 50, Arg 54, Arg 42, Lys 40, Tyr248, Arg 204, Cys 72, Ghu 78, Tyr 115, Lys 200, Lys 201, Val 199, Lys 119, Lys 194. For PCNA-Rep, the interacting key residue are Arg 6, Arg 74, His 88, Gln92, His 59, Lys 106, Tyr 103, Arg 50 and for the PCNA- βC1 Asp 80, Arg 35, Lys 41, Arg 67, Ile91, Phe57, Asp 58, Ile54, Tyr 48, Lys 49 are interacted with H-bonds. It was determined that the number of discrete H-bonds formed between certain amino acid residues during interaction are the most essential factors for biomolecular complexes’ stability. Furthermore, the rest other combination of interacting key amino acid residues has been displayed (Fig. 7).

Fig. 7
figure 7

Interaction analysis of plant–virus protein–protein complex with interacting residue and binding reorganization. Best protein–protein docked complex of ADK-REn; CDK1-Rep; CaM-preCP; PCNA-Rep; SAMS-CP

4 Discussion

Begomoviruses exhibit a wide range of species or strains with diverse biological characteristics, including pathogenicity [44, 45]. As reported by Saxena et al., 1998 a novel begomovirus species named PaLCuV was initially identified and categorized within the Geminiviridae family [46]. Distributed globally, begomoviruses are recognized as the largest and economically significant group of plant viruses, causing substantial damage to various crop varieties. C. papaya, a vital tropical fruit crop known for its potential as a source of pharmacological and bioactive valuable compounds [4, 47], faces a substantial threat from begomovirus, impeding its growth and diminishing productivity on a global scale [1]. To gain insights into the interaction between virus and host plant through computational means, interdisciplinary modelling and analysis of protein–protein interactions are currently underway. This holistic bioinformatics analysis aims to offer fresh perspectives on understanding the binding mechanisms during viral disease occurrences.

To comprehend the functional and biological mechanisms through primary structure analysis, sequence annotation, and physiochemical property analysis, it is crucial to convert sequence data into functional genes [48]. Understanding the three-dimensional (3D) structure of proteins is essential for addressing various biological inquiries. However, the number of genes and genomes that have been sequenced is quickly outpacing the number of experimentally determined structure. In addressing this challenge, comparative modelling remains a highly effective strategy, bridging the knowledge gap between a protein’s sequence and its three-dimensional appearance. This approach is advantageous as it fills in the missing information between available sequence and structural data, generating precise and reliable protein models [49].

Bioinformatic strategies offer valuable tools for exploring the involvement of proteins members in plant–pathogen interactions. These computational methods involve a series of steps to examine protein–protein interactions (PPI). This includes a sequence-based approach, which extracts information from distinctive sequence motifs to construct the protein’s secondary structure [50, 51]. Methods for detecting binding sites associated with the sequence and predicting interfacial residues in proximity through 3D structures are also employed [52]. As a final step, the interface prediction method utilizes machine learning techniques to retrieve information from experimentally determined interacting residues and employs trained models to identify interacting residues of a query protein [53]. This comprehensive approach, utilizing sequence conservation analysis, energetics, and binding site information, can be applied to predict interactions between virus and host proteins, such as begomovirus proteins and their interacting host Papaya. Furthermore, it aids in highlighting the residues within the interacting domain responsible for their binding affinity [14].

Initial efforts were undertaken to acquire structural insights into the begomovirus protein through in-silico approaches, yielding specific outcomes. Currently, we have identified eleven begomovirus isolates, namely PL-1, PL-6, PL-10, PL-13, PL-20, PL-27, PL-29, PL-31, PL-36, PL-43, and PL-45. These isolates belong to four major variants of papaya-infecting begomoviruses: PaLCuV, Tomato Leaf Curl New Delhi Virus, Cotton Leaf Curl Virus, and Croton Yellow Vein Mosaic Virus. The primary focus of our study is on isolate PL-1 (PaLCuV) and its associated satellite, chosen to explore the binding affinity between the virus protein and the papaya plant protein during infection. This research aims to elucidate how this interaction contributes to the elevation of virus titer and the development of PaLCuD. In this study, seven begomovirus proteins from isolate PL-1 were docked with ten papaya plant proteins, and 3D modelling was conducted based on the best template-target alignment using a variable computational method, resulting in appropriate 3D models. Employing a bioinformatics approach, various computational tools were utilized to examine diverse dynamic behavior of the protein structure [13]. Our analysis focused on the top-performing protein–protein complex, assessing features such as HADDOCK score, Z-score, and RMSD value.

Kamal et al. [14] conducted an extensive investigation into the interaction between Cotton Leaf Curl Multan Betasatellite-βC1 (CLCuMB-βC1) and Gossypium hirsutum calmodulin-like protein 11 (Gh-CML11). This interaction, occurring during infection, triggers the over-expression of the Gh-CML11 protein, ultimately serving as a calcium source for virus movement and transmission. In our study, we not only confirmed the stable structure of the docked complex but also demonstrated the strongest binding energy. The docking analysis of SAMS-CP protein displayed the highest binding energy, along with a greater number of interface residues and hydrogen bonds, followed by PCNA-Rep, CDK-Rep, ADK-REn, CaM-Pre-CP, and PCNA-βC1 P–P complex. Efficient structures in protein–protein interactions (PPI) involve a significant number of conserved residues, indicating the crucial role of the PPI complex between the virus and host in various transcriptional pathways. This contributes to the elevation of viral titers, creating a conducive environment for virus encapsidation and replication within the host cell, thereby establishing successful pathogenicity [12]. Despite this, a thorough examination of modelling techniques, structural stability analyses, and protein–protein interaction studies prior to this discovery offer valuable insights for the development of novel therapeutic agents against this destructive plant virus. These findings open new avenues for researchers to enhance management and control strategies against this pathogen.

The analysis of the interface suggests the potential of targeting specific interacting residues of a viral compound [13], which could significantly impact the genome assembly and packaging of economically important viruses such as PaLCuV. Gaining insights into the structural factors governing protein–protein interactions is crucial for a more profound comprehension of biological functions, diseases, and the advancement of therapeutics [54, 55]. A critical component of this understanding is the precise prediction of the binding strength of a given protein–protein complex. As a result of our research, we now possess an improved understanding of the structural foundation of virus–host assembly, enabling the development of novel antiviral drugs that target specific binding regions.

The emergence of the viral disease could pose a substantial threat to the newly cultivated crop varieties in India and their promising yields [56]. Our findings are poised to contribute to the development of innovative antiviral medicines targeting the binding pocket of begomoviruses, potentially disrupting the genome packaging and assembly in economically crucial plant–virus interactions. In summary, we anticipate that our distinctive insights into the interaction mechanisms between papaya plant protein and PaLCuV–PaLCuB protein, coupled with the novel study involving functional and structural physiochemical analysis of virus particles, will pave the way for controlling papaya begomoviral infections in the future.

5 Conclusions

To comprehend the role of a viral protein in the infection process, a comprehensive understanding of its three-dimensional structure is essential. Our research has substantiated this concept through molecular, functional, and structural analyses. Employing a systematic homology-driven structure prediction and in-silico analysis, we explored the binding affinity of the targeted plant–virus protein interaction, specifically concerning the occurrence and progression of diseases affecting papaya crops. The emergence of this new begomovirus infection poses a substantial threat to papaya production and carries the potential for further dissemination. Future investigations should prioritize the development of accurate detection methods and the assessment of seed transmission extent to formulate and implement appropriate phytosanitary measures for sustaining the disease’s spread. The study concludes by interpreting findings, discussing the implications of identified interactions, and suggesting further avenues for experimental validation or research directions in understanding the molecular mechanisms of PaLCuD.

Table S1. Host–virus protein–protein interaction obtained from literature survey [5] and interaction found in present study associated with papaya leaf cur disease. Table S2. Energy minimization of host and virus protein using UCSF chimera software.