Skip to main content
Log in

Novel “extended sequons” of human N-glycosylation sites improve the precision of qualitative predictions: an alignment-free study of pattern recognition using ProtDCal protein features

  • Original Article
  • Published:
Amino Acids Aims and scope Submit manuscript

Abstract

N-Glycosylation is a common post-translational modification that plays an important role in the proper folding and function of many proteins. This modification is largely dependent on the presence of a sequence motif called a “sequon” defined as Asn-Xxx-Ser/Thr. However, evidence has shown that the presence of such a “sequon” is insufficient to determine the occurrence of N-glycosylation with high precision. This study aims to elucidate patterns that can more accurately predict N-glycosylation sites in human proteins. The novel motifs are evaluated using benchmarking data from 188 organisms. Performance is largely sustained compared to the human data, which validates the robustness of the novel extracted “extended sequons”. We, therefore, introduce new knowledge about sequence-related factors that control N-glycosylation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Caragea C, Sinapov J, Silvescu A, Dobbs I, Honaver V (2007) Glycosylation site prediction using ensembles of support vector machines classifiers. BMC Bioinformatics 8:438

    Article  PubMed  PubMed Central  Google Scholar 

  • Chen K, Kurgan LA, Ruan J (2007) Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs. BMC Struct Biol 7(1):1–13. doi:10.1186/1472-6807-7-25

    Article  CAS  Google Scholar 

  • Chi YH, Koo YD, Dai SY, Ahn JE, Yun DJ, Lee SY, Zhu-Salzman K (2010) N-glycosylation at non-canonical Asn-X-Cys sequence of an insect recombinant cathepsin B-like counter-defense protein. Comp Biochem Physiol B Biochem Mol Biol 156(1):40–47. doi:10.1016/j.cbpb.2010.01.017

    Article  PubMed  Google Scholar 

  • Cohen WW (1995) Fast Effective Rule Induction. Paper presented at the Twelfth International Conference on Machine Learning

  • Collantes ER, Dunn-III WJ (1995) Amino acid side chain descriptors for quantitative structure-activity relationship studies of peptide analogues. J Med Chem 38:2705–2713

    Article  CAS  PubMed  Google Scholar 

  • Dell A, Galadari A, Sastre F, Hitchen P (2010) Similarities and differences in the glycosylation mechanisms in prokaryotes and eukaryotes. Int J Microbiol 2010:14. doi:10.1155/2010/148178

    Article  Google Scholar 

  • García-Jacas CR, Marrero-Ponce Y, Acevedo-Martínez L, Barigye SJ, Valdés-Martiní JR, Contreras-Torres E (2014) QuBiLS-MIDAS: a parallel free-software for molecular descriptors computation based on multi-linear algebraic maps. J Comput Chem 35:1395–1409

    Article  PubMed  Google Scholar 

  • Gavel Y, von Heijne G (1990) Sequence differences between glycosylated and non-glycosylated Asn-X-Thr/Ser acceptor sites: implications for protein engineering. Protein Eng 3(5):433–442

    Article  CAS  PubMed  Google Scholar 

  • Gupta R, Brunak S (2002) Prediction of glycosylation across the human proteome and the correlation to protein function. Pac Symp Biocomput 2002:310–322

    Google Scholar 

  • Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1):10–18

    Article  Google Scholar 

  • Hamby S, Hirst J (2008) Prediction of glycosylation sites using random forests. BMC Bioinformatics 9(1):500

    Article  PubMed  PubMed Central  Google Scholar 

  • Hart G (1992) Glycosylation. Curr Opin Cell Biol 4:1017–1023

    Article  CAS  PubMed  Google Scholar 

  • Helenius A, Aebi M (2004) Roles of N-linked glycans in the endoplasmic reticulum. Annu Rev Biochem 73:1019–1049

    Article  CAS  PubMed  Google Scholar 

  • Hellberg S, Sjostrom M, Skagerberg B, Wold S (1987) Peptide quantitative structure-activity relationships, a multivariate approach. J Med Chem 30:1126–1135

    Article  CAS  PubMed  Google Scholar 

  • Kasturi L, Chen H, Shakin-Eshleman SH (1997) Regulation of N-linked core glycosylation: use of a site-directed mutagenesis approach to identify Asn-Xaa-Ser/Thr sequons that are poor oligosaccharide acceptors. Biochem J 323(Pt 2):415–419

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kasturi L, Eshleman JR, Wunner WH, Shakin-Eshleman SH (1995a) The hydroxy amino acid in an Asn-X-Ser/Thr sequon can influence N-linked core glycosylation efficiency and the level of expression of a cell surface glycoprotein. J Biol Chem 270(24):14756–14761

    Article  CAS  PubMed  Google Scholar 

  • Kasturi L, Eshleman JR, Wunner WH, Shakin-Eshleman SH (1995b) The hydroxy amino acid in an Asn-X-Ser/Thr sequon can influence N-linked core glycosylation efficiency and the level of expression of a cell surface glycoprotein. J Biol Chem 270(24):14756–14761. doi:10.1074/jbc.270.24.14756

    Article  CAS  PubMed  Google Scholar 

  • Katrin S, Karelson M, Järv J (1999) Modeling of the amino acid side chain effects on peptide conformation. Bioorg Chem 27:434–442

    Article  Google Scholar 

  • Kawashima S, Kanehisa M (2000) AAindex: amino acid index database. Nucleic Acids Res 28(1):374

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kelleher DJ, Gilmore R (2006) An evolving view of the eukaryotic oligosaccharyltransferase. Glycobiology 16(4):47r–62r. doi:10.1093/glycob/cwj066

    Article  CAS  PubMed  Google Scholar 

  • Kowarik M, Young NM, Numao S, Schulz BL, Hug I, Callewaert N, Mills DC, Watson DC, Hernandez M, Kelly JF, Wacker M, Aebi M (2006) Definition of the bacterial N-glycosylation site consensus sequence. EMBO J 25(9):1957–1966. doi:10.1038/sj.emboj.7601087

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kyte J, Doolitle RF (1982) A simple method for displaying the hydropathic character of a protein. J Mol Biol 157:105–132

    Article  CAS  PubMed  Google Scholar 

  • Levitt M (1978) Conformational preferences of amino acids in globular proteins. Biochemistry 17(20):4277–4285

    Article  CAS  PubMed  Google Scholar 

  • Li F, Li C, Wang M, Webb GI, Zhang Y, Whisstock JC, Song J (2015) GlycoMine: a machine learning-based approach for predicting N-, C- and O-linked glycosylation in the human proteome. Bioinformatics 31(9):1411–1419. doi:10.1093/bioinformatics/btu852

    Article  CAS  PubMed  Google Scholar 

  • Lu C-T, Huang K-Y, Su M-G, Lee T-Y, Bretaña N, Chang W-C, Chen Y-J, Huang H-D (2013) DbPTM 3.0: an informative resource for investigating substrate site specificity and functional association of protein post-translational modifications. Nucl Acids Res 41(Database issue):D295–305

    Article  CAS  PubMed  Google Scholar 

  • Miletich JP, Broze GJ Jr (1990) Beta protein C is not glycosylated at asparagine 329. The rate of translation may influence the frequency of usage at asparagine-X-cysteine sites. J Biol Chem 265(19):11397–11404

    CAS  PubMed  Google Scholar 

  • Obenauer JC, Cantley LC, Yaffe MB (2003) Scansite 2.0: Proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucl Acids Res 31:3635–3641

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Rao HB, Zhu F, Yang GB, Li ZR, Chen YZ (2011) Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucl Acids Res 39(Web Server):W385–W390

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Ruiz-Blanco YB, Marrero-Ponce Y, García Y, Puris A, Bello R, Green J, Sotomayor-Torres CM (2014) A physics-based scoring function for protein structural decoys: dynamic testing on targets of CASP-ROLL. Chem Phys Lett 610–611:135–140. doi:10.1016/j.cplett.2014.07.014

    Article  Google Scholar 

  • Ruiz-Blanco YB, Marrero-Ponce Y, Paz W, García Y, Salgado J (2013) Global stability of protein folding from an empirical free energy function. J Theor Biol 321:44–53. doi:10.1016/j.jtbi.2012.12.023

    Article  CAS  PubMed  Google Scholar 

  • Ruiz-Blanco YB, Marrero-Ponce Y, Prieto PJ, Salgado J, García Y, Sotomayor-Torres CM (2015a) A Hooke׳ s law-based approach to protein folding rate. J Theor Biol 364:407–417

    Article  CAS  PubMed  Google Scholar 

  • Ruiz-Blanco YB, Paz W, Green J, Marrero-Ponce Y (2015b) ProtDCal: a program to compute general-purpose-numerical descriptors for sequences and 3D-structures of proteins. BMC Bioinformatics 16:162

    Article  PubMed  PubMed Central  Google Scholar 

  • Ruiz-Canada C, Kelleher DJ, Gilmore R (2009) Cotranslational and posttranslational N-Glycosylation of polypeptides by distinct mammalian OST isoforms. Cell 136:272–283

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Sato C, Kim JH, Abe Y, Saito K, Yokoyama S, Kohda D (2000) Characterization of the N-oligosaccharides attached to the atypical Asn-X-Cys sequence of recombinant human epidermal growth factor receptor. J Biochem 127(1):65–72

    Article  CAS  PubMed  Google Scholar 

  • Schulz BL (2012) Beyond the Sequon: sites of N-Glycosylation. In: INTECH, Rijeka, pp 21–39. doi:10.5772/50260

  • Schwarz F, Aebi M (2011) Mechanisms and principles of N-linked protein glycosylation. Curr Opin Struct Biol 21(5):576–582. doi:10.1016/j.sbi.2011.08.005

    Article  CAS  PubMed  Google Scholar 

  • Seitz O (2000) Synthesis and the effects of glycosylation on protein structure and activity. Chem BioChem 1:214–246

    CAS  Google Scholar 

  • Shannon CE (1949) Communication theory of secrecy systems. Bell Syst Tech J 28(4):656–715. doi:10.1002/j.1538-7305.1949.tb00928.x

    Article  Google Scholar 

  • Shannon CE (2001) A mathematical theory of communication. SIGMOBILE Mob Comput Commun Rev 5(1):3–55. doi:10.1145/584091.584093

    Article  Google Scholar 

  • Urias RWP, Barigye SJ, Marrero-Ponce Y, García-Jacas CR, Valdes-Martiní JR, Perez-Gimenez F (2015) IMMAN: free software for information theory-based chemometric analysis. Mol Divers 19(2):305–319. doi:10.1007/s11030-014-9565-z

    Article  CAS  PubMed  Google Scholar 

  • Vance BA, Wu W, Ribaudo RK, Segal DM, Kearse KP (1997) Multiple dimeric forms of human CD69 result from differential addition of N-glycans to typical (Asn-X-Ser/Thr) and atypical (Asn-X-cys) glycosylation motifs. J Biol Chem 272(37):23117–23122

    Article  CAS  PubMed  Google Scholar 

  • Whitley P, Nilsson I, Gv Heijne (1996) A nascent secretory protein may traverse the ribosome/endoplasmic reticulum translocase complex as an extended chain. J Biol Chem 271:6241–6244

    Article  CAS  PubMed  Google Scholar 

  • Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. ICML 3:856–863

    Google Scholar 

  • Zielinska DF, Gnad F, Wisniewski JR, Mann M (2010) Precision mapping of an in vivo N-glycoproteome reveals rigid topological and sequence constraints. Cell 141(5):897–907. doi:10.1016/j.cell.2010.04.012

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yasser B. Ruiz-Blanco.

Ethics declarations

Conflict of interest

None declared. The present report contains no research involving human and/or animals participants.

Additional information

Handling Editor: L. Taher.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ruiz-Blanco, Y.B., Marrero-Ponce, Y., García-Hernández, E. et al. Novel “extended sequons” of human N-glycosylation sites improve the precision of qualitative predictions: an alignment-free study of pattern recognition using ProtDCal protein features. Amino Acids 49, 317–325 (2017). https://doi.org/10.1007/s00726-016-2362-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00726-016-2362-5

Keywords

Navigation