Molecular Diversity

, Volume 15, Issue 2, pp 427–433 | Cite as

Prediction of mucin-type O-glycosylation sites by a two-staged strategy

Full-Length Paper

Abstract

The mucin-type O-glycosylation of a protein is an important type of protein post-translational modification. This process is mediated by a family of UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferases which transfer the N-acetylgalactosamine (GalNAc) to the serine or threonine residues with unknown specificity. In order to determine the glycosylation sites of a given protein, we present a two-staged prediction method here, which first determines whether a protein is a glycoprotein, and then determines the glycosylation sites of a protein that has been predicted to be glycosylated in the first stage. In the first stage, a protein is encoded by the protein families in PFAM, which is a collective annotated database of classified protein families; then it is predicted by a predictor trained by the training set. In the second stage, nonapeptides of the predicted mucin-type glycoproteins, with serine or threonine residues at their fifth sites, are represented by indices in AAIndex. Then, it is predicted whether the nonapeptides are attached by GalNAc by a predictor, which is constructed with features selected by feature selection methods [Maximum Relevance Minimum Redundancy (mRMR) method and Incremental Feature Selection method]. The prediction accuracy of the first stage is 94.9% validated by Leave-One-Out validation method; the prediction accuracy of the second stage is 99.4%. These results show that this method is valuable to study the mucin-type O-glycosylation. The analysis of the features used to construct the predictor of the second stage confirms the previously obtained results from other groups. The residues at position −1 and +3 have great impact on the prediction. Among other amino acid indices, the indices about alpha and turn propensities and indices about hydrophobicity of the residues in nonapeptide also influence the recognition of the GalNAc transferases. A web server is available at http://chemdata.shu.edu.cn/gal/.

Keywords

Mucin-type O-glycosylation sites PFAM Minimum Redundancy Maximum Relevance Feature Selection Nearest Neighbor Algorithm K-fold cross-validation 

Abbreviations

mRMR

The Maximum Relevance Minimum Redundancy

NNA

Nearest Neighbor Algorithm

AAIndex

Amino Acid Index

IFS method

Incremental Feature Selection

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

11030_2010_9240_MOESM1_ESM.rar.
ESM 1 (RAR 43,020 kb)
11030_2010_9240_MOESM2_ESM.txt (1.7 mb)
ESM 2 (TXT 1,771 kb)
11030_2010_9240_MOESM3_ESM.txt (7.8 mb)
ESM 3 (TXT 7,978 kb)
11030_2010_9240_MOESM4_ESM.txt (91 kb)
ESM 4 (TXT 91 kb)
11030_2010_9240_MOESM5_ESM.rar.
ESM 5 (RAR 5,314 kb)

References

  1. 1.
    Varki A, Cummings R, Esko JD, Freeze HH, Hart GW, Marth J (1999) Essential of glycobiology. Cold Spring Harbour Laboratory Press, New YorkGoogle Scholar
  2. 2.
    Bansil R, Turner BS (2006) Mucin structure, aggregation, physiological functions and biomedical applications. Curr Opin Colloid Interface 11: 164–170. doi: 10.1016/j.cocis.2005.11.001 CrossRefGoogle Scholar
  3. 3.
    Robb RJ, Kutny RM, Panico M (1984) Amino acid sequence and post-translational modification of human interleukin 2. Proc Natl Acad Sci USA 81: 6486–6490PubMedCrossRefGoogle Scholar
  4. 4.
    Jelkmann W (1992) Erythropoietin: structure, control of production, and function. Physiol Rev 72: 449–489PubMedGoogle Scholar
  5. 5.
    Van Den Steen P, Rudd PM, Dwek RA, Opdenakker G (1998) Concepts and principles of O-linked glycosylation. Crit Rev Biochem Mol 33: 151–208. doi: 10.1080/10409239891204198 CrossRefGoogle Scholar
  6. 6.
    Coppo R, Amore A (2004) Aberrant glycosylation in IgA nephropathy (IgAN). Kidney Int 65: 1544–1547. doi: 10.1111/j.1523-1755.2004.05407.x PubMedCrossRefGoogle Scholar
  7. 7.
    Espinosa B, Guevara J, Hernandez P, Slomianny MC, Guzman A, Martinez-Cairo S, Zenteno E (2003) Characterization of an O-glycosylated plaque-associated protein from Alzheimer disease brain. J Neuropath Exp Neurol 62: 34–41PubMedGoogle Scholar
  8. 8.
    Pahlsson P, Spitalnik SL (1996) The role of glycosylation in synthesis and secretion of β-amyloid precursor protein by Chinese hamster ovary cells. Arch Biochem Biophys 331: 177–186. doi: 10.1006/abbi.1996.0296 PubMedCrossRefGoogle Scholar
  9. 9.
    Hollingsworth MA, Swanson BJ (2004) Mucins in cancer: protection and control of the cell surface. Nat Rev Cancer 4: 45–60. doi: 10.1038/nrc1251 PubMedCrossRefGoogle Scholar
  10. 10.
    Yonezawa S, Sato E (1997) Expression of mucin antigens in human cancers and its relationship with malignancy potential. Pathol Int 47: 813–830. doi: 10.1111/j.1440-1827.1997.tb03713.x PubMedCrossRefGoogle Scholar
  11. 11.
    Baldus SE, Engelmann K, Hanisch FG (2004) MUC1 and the MUCs: A family of human mucins with impact in cancer biology. Crit Rev Clin Lab Sci 41: 189–231. doi: 10.1080/10408360490452040 PubMedCrossRefGoogle Scholar
  12. 12.
    Ten Hagen KG, Fritz TA, Tabak LA (2003) All in the family: the UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferases. Glycobiology 13. doi: 10.1093/glycob/cwg007
  13. 13.
    Dell A, Morris HR (2001) Glycoprotein structure determination by mass spectrometry. Science 291: 2351–2356. doi: 10.1126/science.1058890 PubMedCrossRefGoogle Scholar
  14. 14.
    Elhammer AP, Poorman RA, Brown E, Maggiora LL, Hoogerheide JG, Kezdy FJ (1993) The specificity of UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase as inferred from a database of in vivo substrates and from the in vitro glycosylation of proteins and peptides. J Biol Chem 268: 10029–10038PubMedGoogle Scholar
  15. 15.
    Hansen JE, Lund O, Tolstrup N, Gooley AA, Williams KL, Brunak S (1998) NetOglyc: prediction of mucin type O-glycosylation sites based on sequence context and surface accessibility. Glycoconj J 15: 115–130. doi: 10.1023/A:1006960004440 PubMedCrossRefGoogle Scholar
  16. 16.
    Chen YZ, Tang YR, Sheng ZY, Zhang Z (2008) Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of k-spaced amino acid pairs. BMC Bioinform 9: 101. doi: 10.1186/1471-2105-9-101 CrossRefGoogle Scholar
  17. 17.
    Lu L, Niu B, Zhao J, Liu L, Lu WC, Liu XJ, Li YX, Cai YD (2009) GalNAc-transferase specificity prediction based on feature selection method. Peptides 30: 359–364. doi: 10.1016/j.peptides.2008.09.020 PubMedCrossRefGoogle Scholar
  18. 18.
    Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal 27: 1226–1238. doi: 10.1109/TPAMI.2005.159 CrossRefGoogle Scholar
  19. 19.
    Cai Y, He J, Li X, Lu L, Yang X, Feng K, Lu W, Kong X (2009) A novel computational approach to predict transcription factor DNA binding preference. J Proteome Res 8: 999–1003. doi: 10.1021/pr800717y PubMedCrossRefGoogle Scholar
  20. 20.
    Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz H-R, Ceric G, Forslund K, Eddy SR, Sonnhammer ELL, Bateman A (2008) The Pfam protein families database. Nucleic Acids Res 36: D281–D288. doi: 10.1093/nar/gkm960 PubMedCrossRefGoogle Scholar
  21. 21.
    Loewenstein Y, Linial M (2008) Connect the dots: exposing hidden protein family connections from the entire sequence tree. Bioinformatics 24. doi: 10.1093/bioinformatics/btn301
  22. 22.
    Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, Finn RD, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Laugraud A, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Mulder N, Natale D, Orengo C, Quinn AF, Selengut JD, Sigrist CJA, Thimma M, Thomas PD, Valentin F, Wilson D, Wu CH, Yeats C (2009) InterPro: the integrative protein signature database. Nucleic Acids Res 37. doi: 10.1093/nar/gkn785
  23. 23.
    Chen CC, Lin CY, Lo YS, Yang JM (2009) PPISearch: a web server for searching homologous protein–protein interactions across multiple species. Nucleic Acids Res 37. doi: 10.1093/nar/gkp309
  24. 24.
    Kawashima S, Kanehisa M (2000) AAindex: Amino Acid index database. Nucleic Acids Res 28: 374. doi: 10.1093/nar/28.1.374 PubMedCrossRefGoogle Scholar
  25. 25.
    Jain E, Bairoch A, Duvaud S, Phan I, Redaschi N, Suzek BE, Martin MJ, McGarvey P, Gasteiger E (2009) Infrastructure for the life sciences: design and implementation of the UniProt website. BMC Bioinform 10: 136. doi: 10.1186/1471-2105-10-136 CrossRefGoogle Scholar
  26. 26.
    Fritz TA, Raman J, Tabak LA (2006) Dynamic association between the catalytic and lectin domains of human UDP-GalNAc:polypeptide α-N-acetylgalactosaminyltransferase-2. J Biol Chem 281: 8613–8619. doi: 10.1074/jbc.M513590200 PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2010

Authors and Affiliations

  1. 1.Institute of System BiologyShanghai UniversityShanghaiPeople’s Republic of China
  2. 2.Department of Biomedical EngineeringShanghai Jiao Tong UniversityShanghaiPeople’s Republic of China
  3. 3.Centre for Computational Systems BiologyFudan UniversityShanghaiPeople’s Republic of China

Personalised recommendations