In silico discrimination of nsSNPs in hTERT gene by means of local DNA sequence context and regularity
- First Online:
- Cite this article as:
- Doss, C.G.P., Chakraborty, C., Rajith, B. et al. J Mol Model (2013) 19: 3517. doi:10.1007/s00894-013-1888-7
- 251 Views
Understanding and predicting the significance of novel genetic variants revealed by DNA sequencing is a major challenge to integrate and interpret in medical genetics with medical practice. Recent studies have afforded significant advances in characterization and predicting the association of single nucleotide polymorphisms in human TERT with various disorders, but the results remain inconclusive. In this context, a comparative study between disease causing and novel mutations in hTERT gene was performed computationally. Out of 59 missense mutations, five variants were predicted to be less stable with the most deleterious effect on hTERT gene by in silico tools, in which two mutations (L584W and M970T) were not previously reported to be involved in any of the human disorders. To get insight into the structural and functional impact due to the mutation, docking study and interaction analysis was performed followed by 6 ns molecular dynamics simulation. These results may provide new perspectives for the targeted drug discovery in the coming future.
KeywordsDockinghTERTMolecular dynamics simulationSNPs
DNA sequencing technology is becoming the method of choice for medical genetic diagnostics. However, the important challenge in the DNA sequencing technology involves the difficulty of interpreting novel sequence variants. Most geneticists use a combination of traditional genetic methods relying on segregation with the disease in families, frequency in controls, biochemical characterization, and evolutionary conservation at the variant position . It is often a time consuming and laborious task to study the molecular basis of diseases like cancer by these methods. Associations with polymorphisms in candidate genes have been confirmed in many diseases, and genome-wide association studies (GWAS) are identifying many novel associations in genes that had not been strong a priori candidates for the disease under test . However, the modest increase in risk implies that large well-designed and analyzed studies exist that incorporate robust computational methods to classify novel variants accurately. The massive capacity of computational application can be harnessed for effective screening and validation of genetic variants, which could be a valuable resource for the pharmacogenomics approach.
Single nucleotide polymorphisms (SNPs) are the most common type of genetic variation in the human genome [3, 4]. However, not all the SNPs can correlate with human diseases. nsSNPs that occur in a coding region can cause an amino acid substitution, thereby impart structural and functional changes on protein  are termed as “deleterious” and, those nsSNPs which do not have any impact on protein functions are termed as “tolerated”. Hence, it is necessary to differentiate deleterious from tolerated nsSNPs. This will definitely assist in better understanding the genetic basis of human diseases, and also help in identifying the molecular and potential therapeutic targets.
Over expression experiments in human cells have shown that TERT exhibits activities in cellular transformation, proliferation, cell survival and chromatin regulation [6, 7]. Telomerase is a specialized ribonucleoprotein complex that plays a crucial role in maintaining the integrity of telomeric DNA . Telomerase consists of a protein component with reverse transcriptase activity (TERT), and an RNA component (TERC) which provides the template for the telomere repeat . hTERT gene is located within a locus at chromosome 5p13.33. It encodes a mature protein of 1132 amino acids arranged within four domains namely, N-terminal, C-terminal, RNA-binding and reverse transcriptase (RT) domain. Telomerase associated proteins such as dyskerin, nucleolar protein-10 (NOP10), non-histone protein-2 (NHP2) and glycine arginine rich-1 (GAR1) are required for the assembly of a functional telomerase holoenzyme complex . Telomerase is active in some epithelial, haemopoietic and germ line cells. The mutations in protein components of TERT are linked to certain inherited human disorders of the haemopoietic system, such as dyskeratosis congenita (DC) aplastic anemia (AA) and idiopathic pulmonary fibrosis (IPF) [11–20]. Deleterious nsSNPs in hTERT and its impact on protein structure and function have not been predicted so far using in silico approach, although they have received great attention from experimental biologists. In view of this, we carried out fine-mapping, followed by functional analysis of associated SNPs identified within the coding region of hTERT gene using SIFT, PolyPhen and I-Mutant 2.0 [21–23]. However, the lack of a structural framework posed serious challenges in rationalizing results from polymorphic studies to characterize the impact on protein function. While, the pursuit of a high resolution experimental structure is underway, we decided to generate a three dimensional (3D) model based on homology modeling using two TERT domain structures: 3KYL and 2R4G using SWISS MODEL work space . Consequently, in order to understand the molecular mechanism underlying the impact of mutation, docking analysis and binding analysis were undertaken. Curcumin, a well known inhibitor for hTERT inhibition was used to determine the binding affinity toward hTERT . An atomic-level look at the protein dynamics through molecular dynamics simulations helped in better understanding the effects of these mutations on the protein structure, which allows for investigating how an amino acid variation can create a ripple effect throughout the protein structure and ultimately affect function. This finding is likely to have major consequences in understanding of telomerase biology and the molecular details of telomerase activities due to polymorphisms.
Materials and methods
Retrieval of SNPs
Defining the functional context of missense mutation
The pathogenic effects of missense mutations were analyzed using SIFT, PolyPhen and I-Mutant 2.0. The default parameters of all programs were applied, and only the protein sequence and missense variant were given as input information for each program.
SIFT is a sequence homology-based tool that predicts the variants as “neutral” or “deleterious” using normalized probabilities calculated from the input multiple sequence alignment. It uses relevant multiple sequence alignments (MSAs) from pre-computed BLAST searches from the NCBI. Variants at a position with normalized probability scores less than 0.05 or 0 to 0.05 are predicted as deleterious and scores greater than 0.05 are predicted to be neutral.
PolyPhen predicts the possible impact of amino acid substitutions on protein structure and function using straight forward physical and evolutionary comparative considerations. This prediction is based on straightforward empirical rules that are applied to the sequence, phylogenetic and structural information characterizing the substitution. The input of PolyPhen is an amino acid sequence (FASTA) or corresponding IDs with the position of the amino acid variant. PolyPhen searches for the 3D protein structures, multiple alignments of homologous sequences and amino acid contact information in several protein structure databases. Subsequently, it calculates PSIC scores for each of two variants, and computes the difference of the PSIC scores of these variants. The higher a PSIC score difference the higher the functional impact a particular amino acid substitution is likely to have. A PSIC score difference of 1.5 and above is considered to be damaging and less than 1.5 considered as neutral.
I-Mutant 2.0 is a support vector machine (SVM)-based tool for the automatic prediction of protein stability changes upon single point mutations. The predictions are performed starting either from the protein structure or, more importantly, from the protein sequence. I-Mutant 2.0 can be used both as a classifier for predicting the sign of the protein stability change upon mutation and as a regression estimator for predicting the related DeltaDeltaG values. Input for I-Mutant 2.0 is either a protein structure or a sequence. We used the sequence-based version of I-Mutant2.0 which classifies the prediction in two classes: (i) DDG<0: decrease stability (ii) DDG>0: increase stability. The output file shows the predicted free energy change (DDG) which is calculated from the unfolding Gibbs free energy change of the mutated protein minus the unfolding free energy value of the native protein (kcal mol-1) . DDG.0 means that the mutated protein has high stability and vice versa.
Modeling the effect of deleterious nsSNPs
Homology modeling and structural validation of human TERT was carried out on the basis of two TERT domain structures: 3KYL and 2R4G using SWISS MODEL workspace and RAMPAGE [24, 29]. Mutation analysis was performed based on the results obtained from various in silico tools as mentioned above. SwissPDB viewer was used to perform mutations at their respective coordinates  and hydrogen atoms were added to the structures using MolProbity . MolProbity also adds all atom contacts into the structures and flips asparagine and glutamine side chains when necessary. By visualizing the position of the mutated amino acid residues, it is possible to suggest a physiochemical rationale for the effect on protein activity. The constructed models were subjected to energy minimization by steepest descent, using GROMOS96 53a6 force field. The ligand structure was downloaded from PubChem  in SMILE string format and converted to protein data bank (PDB) format using CORINA .
Docking and interaction analysis of hTERT
We used PatchDock for docking native and mutants of hTERT with the drug curcumin. Patchdock performs docking based on molecular shape representation, surface patch matching plus filtering and scoring . PatchDock is more reliable because of its fast transformational search, which is driven by local feature matching rather than brute force searching of the six dimensional transformation space. It further speeds up the computational processing time by utilizing advanced data structures and spatial pattern detection techniques, such as geometric hashing and pose clustering. Protein and the ligand molecule were given as input for performing the docking experiments with default root-mean-square deviation (RMSD) value (4.00 Å). It generated several complex structures based on docking scores. The complex structure file, with the best docking score was selected for further analysis. For a better dynamic stability of the ligand-receptor complex, electrostatic energy, van der Waals interaction and hydrogen bond which mainly contribute the total interaction energy play a major role . The total interaction energy of the hTERT-curcumin complex was calculated by PEARLS web server . The negative value of total interaction energy enables better interaction and vice-versa.
Molecular dynamics simulation
Molecular dynamics simulations were performed using the GROMACS 4.5.5 software package  with the GROMOS96 53a6 force field. The systems were solvated using the 0.9 nm simple point charge (SPC) water embedded in the simulation boxes, and sufficient potassium and chloride ions were added to neutralize the charge of the systems. The system was energy-minimized using the steepest descent algorithm for 5000 steps with no constraints. The energy minimized system was equilibrated using the position restrained simulation under an NVT ensemble (constant number of particles, volume and temperature) for 1000 ps to stabilize the temperature at 300 K with Berendsen thermostat followed by an NPT ensemble (constant number of particles, pressure and temperature) for 1000 ps to stabilize the pressure at 1.0 bar with Parrinello-Rahman pressure coupling factor. Finally, unrestrained MD simulation was performed for 6 ns with Berendsen thermostat of 300 K and the pressure at 1.0 bar with Parrinello-Rahman pressure coupling factor. The trjconv, g_rms, g_sasa, g_rmsf and g_hbond  utilities of GROMACS 4.5.5 were used to analyze the MD results. In order to generate the three-dimensional backbone RMSD, RMSF of carbon-alpha, number of hydrogen bonds, SASA analysis and motion projection of the protein in phase space of the system were plotted for all the simulations using Graphing, Advanced Computation and Exploration (GRACE) program.
Analysis of deleterious nsSNPs using SIFT
SIFT predicts whether an amino acid substitution affects the protein function based on sequence homology and the physical properties of amino acid. SIFT program focuses more on sequence conservation over evolutionary time and the nature of amino acids in predicting the effect of residue substitutions on function. About 14 % of nsSNPs were predicted as highly deleterious, exhibited a SIFT score of 0.00, 15 % of nsSNPs exhibited a score ranging from 0.01 to 0.05 were predicted as deleterious, and the remaining 71 % of nsSNPs were predicted as benign (Supplementary Table 1). Thus, 29 % of nsSNPs were predicted to be intolerant, that could bring about a change in protein function.
Analysis of deleterious nsSNPs using PolyPhen
PolyPhen evaluates the location of the amino acid replacement within identified functional domains and 3D structures. All protein sequences submitted to SIFT were also submitted to PolyPhen. Unlike SIFT, it does not solely depend on sequence homology alone to make SNP functional prediction, but also on structural information. By PolyPhen 25 % of the nsSNPs were predicted to be “probably damaging”, 27 % of the nsSNPs to be “possibly damaging”, and the remaining 48 % were characterized as benign. Most of the mutations predicted to be deleterious were also predicted to be damaging by PolyPhen (Supplementary Table 1).
Identification of functional nsSNPs using I-Mutant 2.0
All the nsSNPs submitted to SIFT and PolyPhen were also submitted as input to the I-Mutant 2.0. Based on the difference in Gibbs free energy value of mutated and wild type proteins, 83 % of nsSNPs are found to destabilize the protein (DDG < 0 Kcal mol-1) (Supplemental Table 1).
Mutation structural analysis
Summary of nsSNP predicted to be deleterious by SIFT, PolyPhen and I-Mutant 2.0
Analysis of the local environment changes
Aplastic anemia (K570N)
AA was first identified to be associated with mutations in pseudoknot region of hTERT. According to the previous report, K570N mutation results in complete loss of the ability of telomerase to add hexameric repeats to telomeres, abolishing telomerase enzymatic function which in turn causes AA . In K570N, lysine is smaller than the wild type residue asparagines, which may cause empty space in the core of the protein. This mutation may also lead to the change in polarity of hTERT from positively charged lysine to neutral asparagine. The change in the interacting residues and polar contacts due to mutation is given in Fig. 3a.
Dyskeratosis congenita (P721R)
DC is an inherited disorder characterized by premature aging and also causes increase risk of cancer. In P721R mutation, the mutant Argenine (positively charged) was bigger than the wild-type residue proline (neutral). The mutated residue is located in a domain that is necessary for the main activity of the protein. In turn, the mutation will cause loss of hydrophobic interactions in the core of the protein. P721R substitution has a significant impact on telomerase structure and function because of its non-conservative nature and general importance of proline residue in protein folding . The change in the interacting residues and polar contacts due to mutation is given in Fig. 3b.
Idiopathic pulmonary fibrosis (R865H)
IPF is a form of idiopathic interstitial pneumonia characterized by progressive and chronic formation of fibrotic scar tissue in the lungs without any known causative agent . The wild-type residue histidine forms a hydrogen bond with the glutamic acid on position 325. The size difference between wild-type and mutant residue may alter the hydrogen bond as the original wild-type residue did. The difference in charge will disturb the ionic interaction made by the wild-type residue. This can cause loss of interactions with other molecules and in turn leads to possible loss of external interactions. R865H mutation disrupts the nucleotide positioning in the active site and therefore, directly compromises the catalytic reaction. The change in the interacting residues and polar contacts due to mutation is given in Fig. 3c.
Variant L584W and M970T
L584W and M970T mutations have not been previously identified, and their effect in structural level is not known. To address this, we generated and compared the structure of native and mutant models. Leucine, which is smaller than tryptophan is buried in the core of the protein. Hence, the mutant residue probably will not fit in the protein structure and may in turn destabilize the protein. Whereas in variant M970T, mutant threonine is smaller that the native methionine which causes an empty space in the core of the protein. This mutation will lead to loss of hydrophobicity. All these nsSNPs might lead to decrease in the stability of protein and therefore a proper validation is needed to know the conformational as well as functional implications. The change in the interacting residues and polar contacts due to mutation is given in Fig. 3d and e.
Docking and interaction analysis
Comparison of docking score, atomic contact energy (ACE) and ligand-receptor
Ligand receptor electrostatic energy (kcal mol-1)
Ligand receptor van der Waals energy (kcal mol-1)
Ligand receptor total interaction energy (kcal mol-1)
Molecular dynamics simulation
In this comprehensive analysis, we provide functional evidence for the disease-associated point mutations of the protein component of human telomerase. We have presented a list of nsSNPs that could constitute as relevant genetic markers and also useful for disease association and linkage disequilibrium studies. The selection of nsSNPs likely to cause the most severe effects on the function of the protein and on the phenotype could be facilitated considering several criteria. Some amino acid variations are more likely to alter 3D structure of the candidate proteins than others. The possible impact of amino acid allelic variants on protein activity is thus a function of both the structural locations of nsSNPs and phylogenetic conservation . The basic criteria for these computational methods are sequence homology, physicochemical properties of the substituted residues and structural information. To study the functional consequences of nsSNPs in relation to the molecular basis of diseases at the structural level requires the integration of heterogeneous information such as protein sequence, protein structure (3D), and their associated variants. Mapping of deleterious nsSNPs to protein 3D structures and, analyzing at the structural level will reveal the full extent to which they can alter the activity of protein. Proteins with mutations do not always have 3D structures that are analyzed and submitted in Protein data bank (PDB). Therefore, it is necessary to model 3D structure of protein by locating the mutation in 3D structures. This is a simple way of detecting what kind of adverse effects that a mutation can have on a protein. In silico approaches such as homology modeling and molecular dynamics approach will aid in elucidating the structural impact of deleterious nsSNPs at the molecular level. To determine the functional nsSNPs in hTERT gene, in silico tools with diverse approaches like SIFT, PolyPhen and I-Mutant 2.0 were used. From the results obtained, SIFT, PolyPhen and I-Mutant 2.0 predicted 29 %, 52 % and 83 %, nsSNPs to be deleterious and 71 %, 48 % and 17 %, to be tolerated respectively. The variation in the prediction score of SIFT and PolyPhen is mainly due to the difference in protein sequence alignment, and the scores used to classify the variants . A recent analysis by Flanagan et al. 2010 confirmed the accuracy of SIFT and PolyPhen in predicting the effect of nsSNPs on protein function . Our group also tried to evaluate the accuracy of SIFT, PolyPhen and I-Mutant 2.0 based predictions on ATM, G6PD, F8 and F9 genes [43–45]. In order to improve our efficiency and rationality for validating the deleterious nsSNPs in hTERT, the 3D homology structure was constructed. The modeled structure was verified using RAMPAGE server, to be of good quality and thus was used for docking analysis followed by MD approach. We first considered the functional impact of mutations in hTERT that have recently been identified in association with diseases like AA, DKC and IPF. Our findings successfully identified the following mutations K570N, P721R and R865H that lead to drastic change in protein stability and showed a good concordance with experimentally proved data. It was noteworthy that, K570N mutation effectively abolished telomerase enzymatic function, even though it is highly divergent among the telomerase of the different species . Similarly, P721R and R865H mutations drastically reduced telomerase enzymatic activity, suggesting that these seemingly non-conserved residues may be involved in either the structural formation or functional property of telomerase [16, 18]. The precise function of L584W and M970T is still not elucidated, but we investigated computationally using homology modeling and molecular dynamics approach. Calculating the interaction energy is extremely crucial to understand the biological activity of most protein interacting with its partner. All mutant models exhibited low docking score and high interacting energy indicating loss of interaction of curcumin with hTERT when compared with native protein. From these studies, it can be concluded that mutation had altered the residues surrounding the binding residues thereby disturbing the normal biological process. This information might provide molecular insights into the impact of mutations on protein stability, folding and function. Furthermore, one of the novel findings in this study was the identification of two deleterious mutations L584W and M970T, for which there is no information regarding the biological role in telomerase database and literature search.
In conclusion, we have addressed the problem faced by the experimental biologist in identifying novel mutations. The main aim of this analysis is to suggest the impact of several important nsSNPs, both disease causing and novel that could impart structural and functional alteration in hTERT gene. A comparative analysis between the disease associated, and novel mutations, ascertains that L584W and M970T could play a major role in affecting the telomerase activity. To the best of our knowledge this is the first ever reported study that incorporates in silico tools in combination with docking study and interaction analysis followed by molecular dynamics approach for prioritizing of deleterious nsSNPs in hTERT gene. The set of in silico SNPs we have identified provides information necessary for investigating its mechanism further for polymorphism analysis in addition to the available resources assembled in telomerase database .
The authors take this opportunity to thank the management of Vellore Institute of Technology University for providing the facilities and encouragement to carry out this work.
Conflict of interest