Introduction

Tick-borne infections are emerging at a high rate due to globalization, climate change and migration. Crimean-Congo haemorrhagic fever (CCHF), a tick-borne disease, is predominantly caused by CCHF virus (CCHFV) and hyalomma ticks serve as vector for CCHFV. CCHF virus (CCHFV) is endemic in Africa, Asia and Europe [1]. One hundred forty outbreaks of CCHFV have been reported since 1967 [2] and 10,000–15,000 cases with 500 deaths are recorded every year [3]. In 1944, CCHFV outbreak was first reported in Crimean region of Eastern Europe. In India, CCHFV was first reported in Ahmedabad in 2011, and recently, it has been reported in Rajasthan, 2019 [4]. CCHFV exhibits 10–40% of fatality rate in humans [5].

Molecular clock and phylogenetic analysis of CCHF infected patients in India revealed that 2010–2011 outbreak strain shares homology with Tajikistan [6] and 2015 outbreak strain exhibits homology with Afghanistan CCHFV strain [7]. These reports suggest that CCHFV has emerged in India due to migration of ticks from endemic regions (Afghanistan and Tajikistan) [6]. CCHFV has six genotypes (I–VI) classified based on their preferential geographical region [8]: genotypes I–III (belongs to Africa), genotype IV (Asia), genotype V (Europe) and genotype V1 (Greece) [8, 9]. A survey was conducted for evaluating seroprevalence of CCHFV in domestic animals around the globe. They have found seropositive cattles (in 34 countries), sheep (25 countries), goats (15 countries) and camels (11 countries) whilst no seropositive animals were found in New Zealand, Australia, Germany and Netherland; thereby, no CCHFV cases have been reported from these countries [10] and concluded that CCHFV is most prevalent in areas with seropositive domestic animals.

CCHFV tripartite negative sense RNA genome consists of RNA polymerase (large segment, L), envelope glycoprotein (GP) (medium segment, M) and nucleocapsid protein (NP) (small segment, S) [11]. Viral RNA genome encapsulated by nucleoprotein forms ribonucleoprotein, which bind with RNA-dependent RNA polymerase (RdRp) and engage in viral replication [12] whereas GP interacts with ribonucleoprotein to mediate virus assembly [12]. Structural GPs encoded by M segment are responsible for viral attachment and formation of neutralizing antibodies [12]. L segment has two domains: ovarian tumour domain and catalytic site domain [13, 14]. Ovarian tumour domain has deubiquitination (DUB) activity and participates in evading host immune response which suggests DUB inhibitors can be designed for CCHF treatment [15]. Hawman et al. have reported three mutations (S2007N, V2686 and P3281L) in L segment of CCHFV by passaging of CCHFV Hoti strain in type I interferon deficient mice and reported that the mutated residues are highly conserved in all CCHFV genotypes [16].

Functions of proteins are primarily executed by their functional and structural elements termed as domains [17]. Zheng et al. reported that virus host interaction is primarily mediated by domains, as domains are most stable and conserved regions of protein [18]. Mutations can affect stability, virus host interaction and catalytic activity of proteins as it has been addressed in the literature that Q266L mutation in H7N9 strain of influenza virus enhances binding affinity of H7N9 strain with its receptors and also led to influenza pandemic [19]. Single amino acid mutant models of fibroblast growth factor receptor 1 involved in cancer have shown effect of mutation on interaction ability of proteins. Mutant models have shown a smaller number of hydrogen bonds in comparison to native models and can also destabilize the protein [20, 21]. Salinas et al. have reported that mutations can alter binding site pocket shape and its ligand identification properties [22]. Another study described enhancement of stability due to mutations (A49L and Q106T) in interleukin-4 [23]. Mutations in spike protein of SARS-CoV-2 reduce infectivity upon glycosylation at N331 and N343 and enhance infectivity upon D614G mutations whereas few mutations cause antibody resistance [24]. Molecular dynamic simulation study of effect mutation on binding affinity of SARS-CoV-2 with human ACE2 receptor described that all SARS-CoV-2 variants (delta plus, kappa, mu, iota, lambda and C.1.2) exhibit higher binding affinity than wild type (PDBid: 6MOJ) [25]. Considering these facts, the current work is aimed to assess the phylogenetic relationship and frequency of mutations in L segment of CCHFV around the globe, followed by identification of conserved spots in L segment and evaluation of effect of mutations on stability and function of domains by using in silico tools and molecular dynamic simulation study. Interestingly, phylogenetic analysis suggests that sequences belonging to same genotype have less divergence and genotype III sequences are more closely related to reference sequence as it also belongs to genotype III. Thereafter, mutation profiling revealed that OTU domain of CCHFV L segment has no highly frequent mutations representing its conserved in nature whilst four highly frequent mutations (V2074I, I2134T/A, V2148A and Q2695H/R) were found in catalytic site domain. In molecular dynamic simulation study of wild and six mutant catalytic site domains, it was found that mutant catalytic site domain has shown large deviation and fluctuations over 50 ns simulation run.

Methodology

Data acquisition and phylogenetic analysis

Protein sequences of CCHFV for L segment with human host were downloaded from Virus Pathogen Resource Database (ViPRbrc) in FASTA format. Multiple sequence alignment (MSA) was carried out by Clustal W online server. Clustal W aligns sequence-by-sequence weighting, gap penalty and weight matrix [26]. Multiple sequence alignment executed by Clustal W was further taken as input for mutation analysis and building phylogenetic tree.

Mega X provides good platform for evolutionary studies [27]. Evolutionary analysis of protein sequence dataset was carried out by constructing phylogenetic tree by maximum likelihood with 1000 bootstrap replications using molecular evolutionary genetic analysis (Mega X) software. Tree was clustered according into six genotypes of CCHFV reported in the literature [9, 28].

Mutation profiling

Mutations were recorded in all sequences across the length of L segment w.r.t. reference sequence (Accession No.: YP_325663.1) using BioEdit 7.2 sequence alignment editor software [29]. BioEdit 7.2 software is a user-friendly software and entails various applications, such as amino acid or nucleotide variation identification, pairwise alignment, multiple sequence alignment and phylogenetic tree [29]. Mutation frequency was calculated as the number of sequences with mutated amino acid divided by the total number of sequences.

Structural and solubility analysis

Solubility of proteins is crucial for assessment of protein homeostasis which is necessary for stability and function of proteins [30]. Solubility of mutant models was evaluated using SODA web server. SODA calculates difference in physico-chemical properties of wild and mutant molecules. SODA prediction is based on five components: secondary structure propensity (α-helix and β-sheets), aggregation, intrinsic disorder and hydrophobicity [31]. HOPE server was employed for assessment of effect of point mutation on structure and function of protein. HOPE (Have yOur Protein Explained) is a next-generation fully automatic web application which amalgamates set of in silico servers: WHAT IF Web server (for 3D coordinates information), YASARA (for homology modelling) and UniProt (for sequence annotation) [32].

Homology modelling and refinement of wild and mutant catalytic domains

Three-dimensional structure of wild and mutant models of catalytic site domain was constructed using MODELLER v9.24 [33]. Thereafter, all models were refined using galaxy refine server and validated from Ramachandran plot using PROCHECK. Furthermore, energy minimization of both wild and mutant structures was executed using Chimera 1.1.5 software [34]. PyMOL software was used for calculating RMSD of mutant models w.r.t wild model.

Molecular dynamic simulation study

Molecular dynamic (MD) simulation study of wild and six mutant models was carried out using GROMACS v2021.3. Topology of each model was created in charmm27 force field and solvated in triclinic box of TIP3P water model. Net charge of the system was neutralized by adding Na+ and Cl ions, followed by energy minimization for 50,000 steps and, then, by system equilibration in two stages: isothermal-isochoric ensemble (NVT) and isothermal-isobaric ensemble (NPT) for 100 ps. MD simulation was performed at 300 K temperature and 1 bar pressure. MD simulation for 50 ns was performed for finding root mean square deviation (RMSD), root mean square fluctuation (RMSF) and intramolecular interactions (hydrogen bonds).

Result

Data acquisition and phylogenetic analysis

A total of 106 sequences and 1 NCBI reference sequence (Accession No.: YP_325663.1) was downloaded from ViPR database on 9 August 2021 in FASTA format.

CCHFV has six genotypes and each genotype displays conservancy among their region of origin and one genotype covers multiple regions. Senegal sequence belongs to genotype I; Democratic Republic of Congo, Uganda and South Africa (genotype II); South Africa, Namibia, Sudan, Nigeria, Spain and the USA (genotype III); Oman, India, Iran and United Arab Emirates (genotype IV); Turkey, Russia and Kosovo sequences (genotype V); and Greece (genotype VI). L segment protein sequences of strains with human host belonging to genotypes I and VI have not been reported in the database till 9 August 2021. Phylogenetic tree of 107 sequences rooted with NCBI reference sequence (Accession No.: YP_325663.1) was generated and clustered into genotypes. Reference sequence belongs to genotype III [35] and it was observed that sequences belonging to genotype III showed less divergence from reference sequence (Fig. 1), whereas genotype V sequences have shown large evolutionary distance. All South Africa strains belong to genotype III except SPU94_85_813055 strain (genotype II). Phylogenetic tree suggests that the USA is closer to Spain, Sudan and Nigeria, which belongs to genotype III. Therefore, USA strain of CCHFV was also classified as genotype III (Fig. 1).

Fig. 1
figure 1

Phylogenetic tree of L segment protein sequences representing CCHFV genotypes in different colours (purple, genotype II; red, genotype III; green, genotype IV; blue, genotype V)

Mutation profiling

Phylogenetic analysis revealed divergence among sequences belonging to different genotypes; hence, mutation analysis was carried out. Amino acid mutations were recorded in 106 sequences with respect to (w.r.t.) reference sequence (YP_325663.1) across 3945 amino acid length of CCHFV L RNA segment and 729 positions were found be mutated. Mutation frequencies were classified into five intervals (0–0.2, 0.21–0.4, 0.41–0.6, 0.61–0.8 and 0.81–1.0) (Table 1). The majority of mutations were found to be less frequent, like 563 mutations were below 0.2. The number of mutations which lies in the range of 0.81–1.0 was considered highly frequent and 38 mutations were identified in this interval (Table 1). These mutations were observed to be present in all genotypes. Highly frequent mutations were mapped in L segment schematic diagram to locate in different domains of L segment (Fig. 2). In 0.81–1.0 frequency interval, 24 amino acids have been mutated into single variant and 14 amino acids into more than one amino acid variant (Fig. 2). L segment has two domains: ovarian tumour (OTU) domain and catalytic site domain [14]. Four highly frequent mutations were found in catalytic site domain at V2074I (0.99), I2134T/A (0.95), V2148 (0.99) and Q2695H (0.87) whereas no mutation was found in OTU domain (Fig. 2).

Table 1 No. of mutated amino acid positions at different frequency intervals
Fig. 2
figure 2

Schematic representation of mutation allocated in 0.81–1.0 frequency interval. OTU, ovarian tumour domain. All mutations have been identified in all genotypes of CCHFV. The mutations represented in catalytic domain are persistent in recent strains as well

Furthermore, we have identified the peptide having 90% conservancy and then looked for the location of highly frequent mutations. Four and thirteen conserved fragments were identified in OTU and catalytic site domain respectively (Table 2). Out of the four mutated amino acid positions, two mutated amino acids V2074I and V2148A were found to be present in 90% conserved fragments (Table 2). I2134T/A has mutation frequency of 0.95, but it was not recorded in the 90% conserved fragments, as two amino acid variations were observed at this position, whereas Q2695H has mutation frequency of 0.87; therefore, glutamine (Q) amino acid was recorded in the conserved fragment.

Table 2 Ninety percent conserved fragments found in L segment domains

In addition, 42 new sequences released till 21 June 2022 (after 9 August 2021) were downloaded from NCBI database and analysed for whether the selected four highly frequent mutations of catalytic site are persistent or have been mutated over time. It was observed that three mutated amino acids in catalytic domain (V2074I, I2134T/A and V2148A) (Fig. 2) are persistent in all 42 recent sequences. Although Q2695H mutation was also located in 41 sequences, one sequence (Accession no.: QYF06534.1) collected from Gujarat, India, in 2019 displayed Q2695R mutation. Thereafter, highly frequent mutations identified in catalytic site domain were further evaluated to study the effect of point mutations on the stability of catalytic site domain with the aid of various bioinformatics tools and molecular dynamic simulation study.

Structural and solubility analysis

The selected highly frequent mutations (V2074I, I2134A, I2134T, V2148A, Q2695R and Q2695H) identified in catalytic site domain (Fig. 2) were evaluated for structural and solubility analysis using Project HOPE and SODA online web server respectively. SODA web server reported that solubility of catalytic domain is increased upon introduction of point mutations at V2074I, I2134T, I2134A and V2148A, whereas Q2695H and Q2695R mutations have decreased the solubility of protein (Table 3). Project HOPE server stated that V2074I and V2148A may disrupt the function, as both residues are located in the domain which is responsible for main activity of protein. I2134T/A and Q2695H/R are not damaging to protein as mutant residues are more acceptable and found often than the wild residues in at both positions (Table 3). Furthermore, the stability of catalytic domain was also evaluated by molecular dynamic simulation study.

Table 3 Effect of point mutations on stability and solubility of catalytic site domain

Homology modelling and refinement of wild and mutant domains

To evaluate the effect of single amino acid variation on stability and compactness of CCHFV catalytic site domain, homology modelling of wild and mutant catalytic domain was carried out. Six mutant models were developed by introducing single point mutation in each model. Three-dimensional structure of catalytic site domain is not available in protein data bank (PDB) database. The PDB structure of Ebola virus RdRp protein was used as template (PDBid: 7YES), as it displayed maximum identity with catalytic site domain; therefore, it was used for homology modelling. Wild catalytic site domain was constructed using template (PDBid: 7YES), followed by refinement and energy minimization of wild model. Thereafter, refined and minimized wild model was used as template for the construction of the six mutant models. All mutant models were also refined and minimized before molecular dynamic simulation study. Wild and six mutant models displayed 90.3% and 91.4–91.8% (Table S2) amino acids in favourable regions in Ramachandran plots respectively (Figure S1 and Figure S2). Superimposed images of wild and mutant models suggest that modelled protein of all mutant models displayed deviation in the range of 1.139–1.689 Å RMSD from wild type model in PyMOL software (Fig. 3 and Table S2). All the mutant residues are superimposed over the wild residue whilst I2134A and V2148A displayed minimum deviation of 1.189 Å and 1.139 Å respectively upon superimposition (Fig. 3).

Fig. 3
figure 3

Pictorial representation of template model superimposed on (a) V2074I, b I2134A, c I2134T, d V2148A, e Q2695H and f Q2695R. V2148A and I2134A displayed minimum deviation

Molecular dynamic simulation study

The stability of wild and six mutant models of catalytic site domain was assessed by MD simulation study of 50 ns in charmm27 force field in TIP3P water box. All the models were solvated with SOL molecules and neutralized by 3 Cl ions each.

In root mean square deviation (RMSD) plot, wild type model has shown less average deviation (11.01 ± 1.24 Å) than V2074I (11.82 ± 1.37 Å), I2134A (12.92 ± 1.72 Å), I2134T (12.16 ± 1.49 Å), V2148A (12.82 ± 1.87 Å), Q2695R (11.84 ± 1.16 Å) and Q2695H (14.7 ± 2.51 Å) (Fig. 4).

Fig. 4
figure 4

Average RMSD plot of wild and mutant models. All mutant models have shown more deviation than wild model

RMSD trajectories revealed that wild type model gained convergence after 5 ns and V2074I and Q2695H mutant models gained convergence before 5 ns, whereas I2134A/T and V2148A gained convergence after 10 ns (Fig. 5). Among all models, Q2695R displayed large deviation during 50 ns simulation and gained convergence after 25 ns.

Fig. 5
figure 5

RMSD plot of catalytic site domain representing deviation for 50 ns simulation run of (a) all models, b wild model, c V2074I, d I2134T/A, e V2148A and f Q2695H/R. All mutant models have shown more deviation with respect to wild model

Root mean square fluctuation (RMSF) graph displayed fluctuations at each amino acids in the catalytic site domain of RdRp protein; therefore, V2074I model has been mutated at 34 position in catalytic site domain, I2134T/A at 94 position, V2148A at 108 position and Q2695H/R at 652 position. All mutant models displayed more RMSF values w.r.t. wild model, as core residues of all mutant models have shown large fluctuations than wild model. All mutant models except Q2695H/R displayed large fluctuations in the central residues (between 305 and 344) w.r.t. wild model. As V2074I and I2134T have large RMSF difference between 305 and 326 amino acid (aa) residues, I2134A and V2148A displayed large RMSF difference between 326 and 344 aa residues (Fig. 6). On the other hand, mutant models have also shown stability w.r.t. wild model at some positions, as V2074I gained stability at 19–34 aa residues, 170–178 aa residues and 494–542 aa residues (Fig. 6). I2134T/A have shown less RMSF values than wild model at 18–30 aa residues and 496–541 aa residues. V2148A model has shown stability at 494–511 aa residues. Among Q2695H and Q2695R, Q2695R is highly unstable and have shown large fluctuations w.r.t wild model. Among all mutant models, V2074I, V2148A and Q2695H exhibited minimum fluctuation, whereas other mutant models displayed large fluctuation upon single amino acid mutation.

Fig. 6
figure 6

RMSF plot of catalytic site domain representing fluctuation at each amino acid residue, a V34I (V2074I) vs wild model, b I94T/A (I2134T/A) vs wild model, c V2148A vs wild model, d Q2695H/R vs wild model

Overall, RMSF plots decipher that catalytic site domain lost stability upon introduction of mutant residues and large difference in RMSF values observed in the mutant models in comparison with wild model which suggests that point mutations have apparently affected the stability of catalytic site domain. The rationale behind the large fluctuation and deviation in mutant models was further assessed by hydrogen bond analysis. It was observed that the number of mean intra-hydrogen bonds in mutant models has been increased w.r.t. wild model (358.2). Among all mutant models, V2074I has maximum mean hydrogen bonds (369.1), followed by Q2695R (368.6), V2148A (364.9) and I2134T (363.1). Q2695H exhibits similar number of mean hydrogen bonds (358.3) w.r.t, whereas the number of mean hydrogen bonds was reduced in I2134A (355.4). Wild, V2074I, I2134A, I2134T, V2148A, Q2695H and Q2695R displayed 349, 359, 347, 352, 356, 347 and 357 hydrogen bonds respectively at 25th percentile. At 75th percentile, 370, 384, 368, 377, 378, 373 and 385 mean hydrogen bonds have been recorded in wild, V2074I, I2134A, I2134T, V2148A, Q2695H and Q2695R models respectively (Fig. 7). Overall MD trajectories deciphered that local interaction has been affected as a result of a single mutant residue in the catalytic site domain. Out of six mutant models, four mutations (V2074I, I2134T, V2148A and Q2695R) have induced local interactions, whereas I2134A displayed reduction in local interactions. Q2695H displayed minimum deviation and fluctuation during 50 ns simulation as well as local interactions remained stable due to mutation.

Fig. 7
figure 7

Average number of intramolecular hydrogen bonds in (a) both wild and mutant models, b wild, c V2074I, d I2134T/A, e V2148A, f Q2695H/R

Structural variation of catalytic site domain among CCHFV genotypes

CCHFV has been classified into six genotypes based on their topological preferences. One representative protein sequence of each CCHFV L segment for genotypes II, III, IV and V (Table S3) was taken from the protein dataset to analyse the structural variation. As protein sequences of genotype I and genotype VI of CCHFV L segment have not been reported till date, therefore, their structural variation was not analysed. Thereafter, three-dimensional structure of each CCHFV L segment genotype catalytic site domain was executed by taking modelled catalytic site domain of NCBI reference sequence (Accession No.: YP_325663.1) of CCHFV L segment by homology modelling. Three-dimensional structure of each genotype was visualized and superimposed with wild model to calculate the RMSD value using PyMOL software. Upon superimposition of each genotype on reference sequence, large structural variation was observed in genotype II (RMSD=13.1 Å) and genotype IV (RMSD=14.86 Å) (Fig. 8). Genotype III and genotype V displayed less structural variation as they have shown RMSD value of 4.51 Å and 3.67 Å respectively (Fig. 8). Moreover, NCBI reference sequence also belongs to genotype III; therefore, less variation was observed between genotype III and reference sequence.

Fig. 8
figure 8

Pictorial representation of catalytic site domain of NCBI reference sequence with (a) genotype II, b genotype III, c genotype IV and d genotype V

Discussion

Mutations aid in the survival of virus, resistance against drugs and evolution of new strains. Mutation profiling provides insight about the effect of point mutation on virus virulence, structure, replication and host invasion [36, 37]. A study described the effect of multi-variable point mutation (D53Q/E/H/W) in HIV on the stability of envelope glycoprotein. Out of four mutations, D53W displayed high binding affinity than wild protein [38]. Another study evaluated the effect of point mutation on Zika virus polyprotein in mice and Vero cells and found that few mutations reduced replication fitness whilst few mutations aided virus persistence and transmission in future outbreaks [39]. In the current study, four highly frequent mutations were found to be located in the catalytic domain of RNA-dependent RNA polymerase (RdRp) protein and molecular dynamic simulation analysis revealed that these point mutations have affected the stability of protein.

CCHFV genome has three RNA segments (L, M and S segment) with fatality rate of 10–40% and has been classified into six genotypes based on their geographical predominance [35, 40]. Phylogenetic tree developed with 107 sequences represented less divergence among the sequences belonging to same CCHFV genotypes and results were in concordance with studies reported in the literature [8, 9, 35, 41]. NCBI reference sequence (Accession No.: YP_325663.1) belongs to genotype III; therefore, genotype III sequences displayed less evolutionary distance from reference sequence in the phylogenetic tree, albeit genotypes II, IV and V have shown large distance in the phylogenetic tree. As phylogenetic tree displayed divergence among CCHFV genotypes, thereby, mutation profiling of 106 sequences was executed.

Researchers have reported in vitro and in vivo studies describing effect of point mutation in viral RdRp on the viral replication and ribavirin resistance. G64S point mutation in poliovirus [42], A372V point mutation in Coxsackie virus [43], L123F point mutation in human enterovirus 71 [44], R84H point mutation in foot and mouth disease virus [45], V43I point mutation in influenza-A virus [46] and V793I and G806R point mutation in West Nile virus [47] have caused ribavirin resistance, better survival of virus and enhanced polymerase fidelity (rate of error generation during polymerization) [42, 44,45,46,47]. Mutations in spike protein of SARS-CoV2 reduces infectivity upon glycosylation at N331 and N343 and enhances infectivity upon D614G mutations whereas few mutations cause antibody resistance [48]. Considering these facts, mutation profiling in 106 sequences w.r.t. NCBI reference sequence (Accession No.: YP_325663.1) of L segment (encoding for RdRp protein) was carried out and has reported a total of 729 mutated amino acid positions among 3945 amino acids. The majority of the mutations were less frequent and 38 amino acid positions were in the frequency interval of 0.81–1.0 which was considered highly frequent mutations.

Function of proteins is primarily executed by their functional and structural elements termed as domains [17]. L segment has two domains: ovarian tumour domain (OTU) and catalytic site domain. OTU has deubiquitination (DUB) activity and participates in evading host immune response which suggests DUB inhibitors can be designed for CCHF treatment [15]. Catalytic site domain of RdRp plays key role in RNA-dependent polymerization (https://prosite.expasy.org/rule/PRU00539), which is crucial for virus survival [49]. Hence, the highly frequent mutations were mapped in L segment and four mutations were found in catalytic site (V2074I, I2134T/A, V2148 and Q2695H) but OTU domain has no mutated residue. Out of four mutated amino acid positions, two mutations were located in 90% conserved fragments, whereas I2134T/A was not located in these conserved fragments as two amino acid variations were identified and Q2695H has 0.87 mutation frequency; therefore, it was not found in the 90% conserved fragment. In addition, persistence of four mutated positions identified in the catalytic site domain was assessed in 42 recent sequences. It was observed that three mutated amino acid positions (V2074I, I2134T/A and V2148A) are persistent in all 42 sequences. Although Q2695H mutation was also located in 41 sequences, one sequence (Accession no.: QYF06534.1) displayed Q2695R mutation.

Researchers have reported that catalytic activity of protein decreases when the stability of protein increases due to mutation [50]. Saini et al. described enhancement of stability due to mutations (A49L and Q106T) in interleukin-4 [23]. Salinas et al. have reported that mutations can alter binding site pocket shape and its ligand identification properties [22]. To study the effect of point mutations on the solubility and stability of catalytic domain, three-dimensional structure of wild and six mutant models was generated using MODELLER. Ramachandran plot of mutant models displayed more amino acids in favourable regions in comparison with wild model, as wild and six mutant models displayed 90.3% and 91.4–91.8% amino acids in favourable regions respectively. Solubility analysis deciphered that solubility of protein might increase due to V2074I, I2134T/A and V2148A point mutations, whereas Q2695H/R point mutation might decrease the solubility. HOPE server suggests that V2074I and V2148A might disrupt the protein function as they are located near highly conserved region, whereas I2134T/A and Q2696H/R may not interfere protein function, as the mutant residues are more acceptable at these positions than the wild residue. In addition, researchers have studied the effect of point mutations on the stability of protein aided molecular dynamic simulation study [51, 52]. It has been reported that I591D point mutation in alpha-dystroglycan caused instability of protein [51], whilst another study has reported less RMSD deviation in mutant models than wild model of serum and glucocorticoid-regulated kinase 1 (SGK1) protein [52]. Zhang et al. observed less deviation in wild type of human cytochrome P450 A2 protein than mutant F186L mutant model [53]. Therefore, stability of wild and mutant catalytic domain was executed for 50 ns using GROMACS v2021.3 and the topology of each molecule was created in charmm27 force field in TIP3P water box. Overall, all mutant models displayed more deviation and fluctuation w.r.t wild model. Among six mutant models, V2074I, I2134T/A and Q2695H displayed minimum deviation over 50 ns simulation run but RMSF plot of V2074I model displayed high RMSF values of core aa residues than wild model, whereas Q2695H model displayed minimum deviation as well as less fluctuation in the core amino acid residues. Along with Q2695H model, RMSF plot of V2074I and I2134T/A models also described minimum fluctuation at majority core aa residues. At multi-variable positions, it was observed that I2134T and Q2695H are more stable than I2134A and Q2695R respectively. The results predicted by project HOPE server are comparable with molecular simulation study, as HOPE server predicted that in I2134T/A and Q2695H models, these mutant residues often occur at these positions and exhibit similar properties and therefore might not disrupt the stability of protein, and interestingly, these models displayed comparative deviation and fluctuation w.r.t. wild model during 50 ns simulation run as well. Our findings are in agreement with a study reported by Khan et al., describing increase in flexibility and deviation of residues in mutant models of ribosomal protein S1 w.r.t. wild type [54]. Another study has evaluated the effect of mutations on intramolecular contacts and reported the number of contacts in Rab5a mutants: Ala30Pro and Ala30Arg contacts have been increased to 100 and 202 respectively [55]. Additionally, structural variation of catalytic site domain of NCBI reference sequence was executed among all genotypes and large structural variation was observed in genotype II (RMSD: 13.1 Å) and genotype IV (RMSD: 14.86 Å). These structure variations are in accordance with phylogenetic analysis where genotypes II and IV are distant from reference sequence. However, genotype V is also distant from reference sequence but there is less structure variation (RMSD). One of the reasons could be that catalytic domain may be not undergoing much structure change in this genotype V. Moreover, all the mutant residues are part of catalytic site domain responsible for RNA polymerization, and it may affect the viral fidelity and efficacy for RdRp-targeted drugs, as other viruses have gained resistance due to point mutation in their RdRp protein [42, 44,45,46,47].

Conclusion

Due to high mutation rate of viruses, it is difficult to develop diagnostic kits and treatment which can be used over years. The current study aimed at identification of highly frequent mutations and study of effect of mutations on the stability of CCHFV domain. It was found that OTU domain is a highly conserved region of CCHFV L segment as no highly frequent mutation was found in it, whereas four mutations were found in catalytic site domain. Interestingly, all mutant models displayed less stability and more deviation than wild model in molecular dynamic simulation run of 50 ns and conclude that catalytic site domain has affected stability upon introduction of those mutations and displayed large amino acid fluctuation. All catalytic site domain mutations were found to be persistent in all genotypes and recent strains as well; therefore, they might lead to resistance against RdRp-targeted drugs due to high fidelity as observed in case of other viruses.