Background

Phosphorylation is a common, reversible post-translational modification that occurs at serine (S), threonine (T), and tyrosine (Y) residues in proteins [1]. Overall, phosphorylation can alter the structure, function, interaction, stability, and the sub-cellular location of the proteins [24], and therefore play an indispensable role in regulation of the cellular processes such as signal transduction, gene expression, cytoskeletal regulation, apoptosis, homeostasis, cell cycle, and DNA damage recognition and repair [511]. The phosphorylation state of a protein is determined by the opposing actions of kinases and phosphatases [12]. Proteins may contain multiple phosphorylation sites, which may be targeted by different kinases/phosphatases [2]. The activity of kinases and phosphatases at different times and/or upon different stimuli provides a means of powerful control over the protein phosphorylation state and thus the biological processes the protein is involved in.

In the post-genomic era, there is an expanding interest in identification of the single nucleotide polymorphisms (SNPs) that might affect the protein function and thus contribute to the disease susceptibility. The non-synonymous SNPs (nsSNPs) substitute encoded amino acids in proteins, and therefore are good candidates as disease-modifiers. A variety of approaches have been developed and applied, based on criteria such as the evolutionary conservation status or structural parameters, to characterize and select the nsSNPs that are most likely to have functional consequences [1319].

In this report, we predicted the potential effect of a set of nsSNPs [20, 21] in altering the phosphorylation status of DNA repair and cell cycle proteins using the NetPhos tool [22], which is an artificial neural network method that predicts the phosphorylation sites with a sensitivity of 69–96%. DNA repair and cell cycle pathways interact during the cell growth and division to maintain the genomic stability of dividing cells. Abnormalities in the DNA repair and/or the cell cycle pathways can lead to abnormal cell growth/division or cellular death [23], and are implicated in many human diseases, including cancer [2430]. Functional significance of many phosphorylated residues of several DNA repair and cell cycle proteins has already been evaluated. For example, phosphorylation of STATα residue S727 is required for its maximal transcriptional activation [31] and enhances its binding to the BRCA1 protein [32]. Similarly, phosphorylation of S383 and S387 are required for the FANCG function during mitosis [33]. Likewise, mutations of the phosphorylated residues Ser366 and Thr387 of p53 affect its transactivation function [34]. To our knowledge, although SNPs of DNA repair and cell cycle proteins have already been shown to contribute to cancer risk [3537], the potential role of nsSNPs in alteration of phosphorylation patterns of proteins has not been evaluated before. Therefore, the novel approach described in this study will accelerate the formation of a bridge between variations in DNA repair/cell cycle function and predisposition to disease.

Methods

The nsSNPs extracted from public SNP databases were previously reported [20, 21], however, only the nsSNPs that were found in ≥2 chromosomes in a sample panel of ≥46 chromosomes were included into that manuscript. A total of 89 nsSNPs from 47 genes involved in DNA repair and cell cycle constituted the final data set. The NetPhos [22] algorithm was utilized to predict putative phosphorylation sites for both the wild type and the variant protein sequences. Only the predictions that remove or create a site at either the SNP location or at kinase recognition motifs are included into this manuscript. Please note that the BRCA1 and NFKB1 proteins were initially identified as cell cycle protein interacting proteins [21]. However, in this manuscript, we classified the BRCA1 as a DNA repair and the NFKB1 as a cell cycle protein. The mouse orthologues were retrieved from the LocusLink resource of NCBI [38] and aligned with the human proteins using the ClustalW program [39] to identify the corresponding mouse residue.

Results and discussion

We utilized the NetPhos algorithm to predict putative phosphorylation sites along the DNA repair and cell cycle proteins, and studied whether 89 naturally occurring nsSNPs (64 from 28 DNA repair and 25 from 19 cell cycle genes) might alter the phosphorylation patterns in these proteins. The sensitivity of NetPhos prediction has been reported to be 69–96% with a false-positive prediction rate of 0–26% for Y, 0–11% for S, and 0–14% for T [22]. The results obtained using the NetPhos software are shown in Table I, and are summarized in Table II. Our results have shown that 16.9% (15/89) of the nsSNPs studied are likely to abolish or create 17 putative phosphorylation sites in 44.0% (14/32) of the proteins. As summarized in Table II, five nsSNPs (ERCC5-S311C, OGG1-S326C, XRCC3-T241M, CCND3-S259A, and CDKN1A-S31R) were predicted to abolish putative phosphorylation sites, whereas four nsSNPs were predicted to create putative phosphorylation sites in the proteins (ERCC2-H201Y, ERCC4-P379S, LIG4-P231S, and XRCC1-P309S). These nsSNPs resulted in the addition or removal of a S, T or Y residue at the predicted phosphorylation site.

Table 1 nsSNPs that abolish or create putative phosphorylated residues in DNA repair and cell cycle proteins. Only the NetPhos [22] predictions that remove or create a site at either the SNP location or at kinase recognition motifs are shown. The nsSNPs that create or abolish putative phosphorylation sites at the nsSNP position are shown in bold. Under the wild type and variant columns are the NetPhos outputs with the location of the amino acid, the phosphorylation motif (the putative phosphorylated residue is underlined), the score, and the residue being phosphorylated. 1 and 2 under the frequency column represents the nsSNP minor allele frequencies <5% and ≥5%, respectively [20-21]. Please note that the BRCA1 and NFKB1 proteins were initially identified as cell cycle protein interacting proteins [21]. However, in this manuscript, we classified the BRCA1 as a DNA repair and the NFKB1 as a cell cycle protein. §The putative phosphorylation sites that are also predicted in mouse proteins.
Table 2 Distribution of the nsSNPs predicted to alter the phosphorylation sites.

The kinase recognition/interaction motif involves 7–12 amino acids around the phosphorylated residue [40], and the physicochemical characteristics of these amino acids determine the specificity of the protein kinases [41, 42]. Thus, the amino acid substitutions within the kinase recognition motifs are likely to influence the substrate recognition and the subsequent phosphorylation by kinases. Accordingly, we have identified six nsSNPs (Table I, II) located within the phosphorylation motif of six proteins (within 4 amino acids on either side of the putative phosphorylated residue based on NetPhos outputs) that abolished eight putative phosphorylation sites (BRCA1-P871L at S868, BRCA1-S1040N at S1041, ERCC5-S311C at S310, IGHMBP2-T671A at S672, WRN-S1079L at S1083 and at S1084, CCNI-V207I at S208, and NFKB1-H712Q at T716). Interestingly, NetPhos predicts two overlapping phosphorylation motifs for the ERCC5-S311C nsSNP (S311 SLPSSSKMH and S310 ESLPSSSKM), which are both completely abolished by the substitution of the serine residue (position 311) with a cysteine (Table I). Similarly, the WRN-S1079L nsSNP was also predicted to remove 2 putative overlapping phosphorylation motifs (S1083 SKTVSSGTK and S1084 KTVSSGTKE) simultaneously.

The Swiss-Prot [43], HPRD [44], PhosphoBase [45], and Phospho.ELM [46] databases and the existing literature did not reveal any experimentally verified phosphorylation at the predicted sites. Analysis of the mouse orthologues showed that the corresponding amino acids at the BRCA1-S1041, CCNI-S208, ERCC5-S310, IGHMBP2-S672, WRN-S1083 and XRCC3-T241 residues were also predicted to be phosphorylated, suggesting that these motifs/sites might have been evolutionarily conserved between two species. On the other hand, the remaining phosphorylation sites, which are not detected in mouse proteins, may represent the newly evolved phosphorylation motifs in human. However, considering the false-positive rate of NetPhos as well as the possibility that the negative selection acting on the nsSNP sites can result in higher false-positive rates, we cannot totally rule out that all predictions in Table 2 are false. Yet these predictions are still of a great value and suggest possible phosphorylation sites that can be experimentally evaluated. In future, when sufficient molecular data regarding the phosphorylation status of orthologous proteins is available, more systematic analyses can be performed to maximize the accuracy of phosphorylation predictions.

We have also performed an extensive literature review to investigate the role of the reported nsSNPs (minor allele frequencies ≥5%) in human cancer predisposition (Table III). Supporting our hypothesis, three SNPs (CDKN1A-S31R, OGG1-S326C, and XRCC3-T241M) have already found to be associated with altered cancer risk. XRCC3-T241M nsSNP was reported to be associated with increased breast cancer [47, 48] and melanoma risk [49], and was also found to be protective against bladder cancer in heavy smokers [50]. XRCC3 is a key DNA repair protein involved in base excision repair [29] and is involved in repairing the alterations caused by many DNA damaging agents. Recently, the XRCC3-M241 variant has been associated with increased risk of incidence of tetraploid cells, frequently observed in cancers, through affecting the function of the XRCC3- and Rad52-associated RPA protein [51]. Similarly, the OGG1-S326C SNP was found to be associated with increased lung [52], orolaryngeal and esophageal cancer risk [53, 54]. OGG1 is a DNA repair protein that is protective against the mutations induced by the 8-hydroxyguanine. Yamane et al., [55] suggested that OGG1-C326, when compared to OGG1-S326, was associated with a lower repair capacity for 8-hydroxyguanine induced mutations in human cells. In the case of CDKN1A-S31R, the CDKN1A-S31 was suggested to be associated with increased endometrial cancer [56] whereas CDKN1A-R31 was associated with increased primary open-angle glaucoma [57] and esophageal cancer risk [58]. The CDKN1A-R31 form of the protein was not significantly different than the CDKN1A-S31 form in terms of its ability to suppress colony formation [59]. However, it is not clear whether this result would suggest that the CDKN1A-R31 would be functionally equivalent to the wild type allele in other diverse cellular mechanisms that the CDKN1A protein is involved in, such as apoptosis, cell migration, and senescence [60, 61].

Table 3 Common nsSNPs with a possible role in cancer predisposition. Only the information derived from the studies on the protein function as well as the studies with a suggestion of disease-association have been included. 1 and 2 under the frequency column represents the nsSNP with minor allele frequencies <5% and ≥5%, respectively [20-21].

In addition to the SNPs already implicated in cancer risk, we identified one relatively common nsSNP potentially altering the phosphorylation pattern of a major breast and ovarian cancer susceptibility gene, BRCA1. The BRCA1-P871L SNP was not found to be associated with either breast [62] or ovarian cancer risk [63], however, further analyses is required to see whether this nsSNP or the other nsSNPs in Table III play a role in susceptibility to other cancer types.

How can we explain that commonly occurring nsSNPs (minor allele frequencies ≥5%) are likely to affect the phosphorylation and thus the function of the proteins? If the phosphorylation site is necessary for the function of the protein and the protein is necessary for the fitness of the organism (indispensable/essential protein), then we would expect such nsSNPs (deleterious alleles) to be either removed from the population or be kept at low allele frequency by means of the purifying selection. Thus, in this case, one can conclude that the common nsSNPs presented in this report can be falsely predicted as removing/creating putative phosphorylation sites by NetPhos program. However, the allele frequencies of the deleterious alleles from proteins that are essential for fitness get higher than expected when the nsSNPs are a) created by hot-spot mutation mechanism(s), b) subject to balancing selection, too [64]. Alternatively, even though the nsSNPs (and the abolished/created phosphorylation sites) have important impact on the protein function, the protein and/or the altered protein function may not affect the fitness, which can also explain the lack of purifying selection against such nsSNPs and their relatively high minor allele frequencies. Besides, the biological consequences of altered protein function may only be exerted under certain environmental conditions.

Conclusion

Here we report a set of nsSNPs in DNA repair and cell cycle genes that are predicted to alter the phosphorylation motifs of the encoded proteins, with possible consequences on protein function, structure, interaction, and stability. If the nsSNPs with a ≥5% minor allele frequency listed in Table III do indeed alter the phosphorylation state of the corresponding proteins, they then represent important candidates for disease susceptibility studies, especially relating to cancer risk. We conclude with the suggestion that our approach and the resulting data indicate a novel mechanism of SNP action: alteration of the functional characteristics of the proteins through phosphorylation may significantly contribute to our understanding of the molecular basis of complex diseases, such as cancer. This study is unique in the sense that it systematically links the possible post-translational modification functional effects of SNPs to disease (cancer) predisposition.