Orphan Protein Function and Its Relation to Glycosylation

  • R. Gupta
  • L. J. Jensen
  • S. Brunak
Conference paper
Part of the Ernst Schering Research Foundation Workshop book series (SCHERING FOUND, volume 38)


Since the first bacterial genomes were completely sequenced, the surge in genome sequence data has overwhelmed the scientific community’s efforts towards elucidating protein function. Computational methods have made it possible to work with sequences from complete genomes and proteomes, and inference of protein function by exploiting direct sequence similarity indeed goes a long way in describing a proteome’s functional capacity. However, at least 40% of the gene products in newly sequenced genomes typically remain uncharacterised. Proteins without an annotated function are also known as orphan proteins since they do not belong to a functionally characterised protein family. Many sequences must, therefore, be compared using their features rather than by direct comparison in the conventional sequence space. Here we focus on one such feature — glycosylation — that is common in eukaryotic proteomes.


Glycosylation Site Protein Chain Dictyostelium Discoideum Phylogenetic Profile Cellular Role 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Apweiler R, Hermjakob H, Sharon N (1999) On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. Biochim Biophys Acta 1473: 4–8PubMedCrossRefGoogle Scholar
  2. Ashburner M, Ball C, Blake J, Botstein D, Butler H, Cherry J, Davis A, Dolinski K, Dwight S, Eppig J, Harris M, Hill D, Issel Tarver L, Kasarskis A, Lewis S, Matese J, Richardson J, Ringwald M, Rubin G, Sherlock G (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25: 25–29Google Scholar
  3. Attwood T (2000) The quest to deduce protein function from sequence: the role of pattern databases. Int J Biochem Cell Biol 32: 139–155PubMedCrossRefGoogle Scholar
  4. Blom N, Gammeltoft S, Brunak S (1999) Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J Mol Biol 294: 1351–1362PubMedCrossRefGoogle Scholar
  5. Bork P, Dandekar T, Diaz Lazcoz Y, Eisenhaber F, Huynen M, Yuan Y (1998) Predicting function: from genes to genomes and back. J Mol Biol 283: 707–725PubMedCrossRefGoogle Scholar
  6. Brown M, Grundy W, Lin D, Cristianini N, Sugnet C, Furey T, Ares Jr M, Haussier D (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA 97: 262–267PubMedCrossRefGoogle Scholar
  7. Casari G, Ouzounis C, Valencia A, Sander C (1996) Genequiz-H: Automatic function assignment for genome sequence analysis. In: Hunter L, Klein T (eds) Proceedings of the First Annual Pacific Symposium on Biocomputing. World Scientific, Hawaii, pp 707–709Google Scholar
  8. Chen C, Colley K (2000) Minimal structural and glycosylation requirements for Gal I activity and traficking. Glycobiology 10: 531–583PubMedCrossRefGoogle Scholar
  9. Cohen P (2000) The regulation of protein function by multisite phosphorylation — a 25 year update. Trends Biochem Sci 25: 596–601PubMedCrossRefGoogle Scholar
  10. Comer F, Hart G (1999) O-G1cNAc and the control of gene expression. Biochim Biophys Acta 1473: 161–171PubMedCrossRefGoogle Scholar
  11. Corner F, Hart G (2000) 0-Glycosylation of nuclear and cytosolic proteins: dynamic interplay between O-G1cNAc and O-Phosphate. J Biol Chem 275: 29179–29182Google Scholar
  12. Dandekar T, Snel B, Huynen M, Bork P (1998) Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci 23: 324–328PubMedCrossRefGoogle Scholar
  13. Devos D, Valencia A (2000) Practical limits of function prediction. Proteins 41: 98–107PubMedCrossRefGoogle Scholar
  14. Eisen M, Spellman P, Brown P, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95: 14863–14868PubMedCrossRefGoogle Scholar
  15. Eisenberg D, Marcotte E, Xenarios I, Yeates T (2000) Protein function in the post-genomic era. Nature 405: 823–826PubMedCrossRefGoogle Scholar
  16. Eisenhaber B, Bork P, Eisenhaber F (1999) Prediction of potential GPI-modification sites in proprotein sequences. J Mol Biol 292: 741–758PubMedCrossRefGoogle Scholar
  17. Enright A, Iliopoulos I, Kyrpides N, Ouzounis C (1999) Protein interaction maps for complete genomes based on gene fusion events. Nature 402: 86–90PubMedCrossRefGoogle Scholar
  18. Gupta R, Birch H, Rapacki K, Brunak S, Hansen J (1999a) O-GLYCBASE version 4.0: a revised database of 0-glycosylated proteins. Nucleic Acids Res 27: 370–372PubMedCrossRefGoogle Scholar
  19. Gupta R, Jung E, Gooley A, Williams K, Brunak S, Hansen J (1999b) Scanning the available Dictyostelium discoideum proteome for O-linked GIcNAc glycosylation sites using neural networks. Glycobiology 9: 1009–1022PubMedCrossRefGoogle Scholar
  20. Hanover J (2001) Glycan-dependent signaling: 0-linked N-acetylglucosamine. FASEB J 15: 1865–1876PubMedCrossRefGoogle Scholar
  21. Hansen JE, Lund O, Engelbrecht J, Bohr H, Nielsen JO, Hansen JES, Brunak S (1995) Prediction of 0-glycosylation of mammalian proteins: specificity patterns of UDP- Ga1NAc:polypeptide N-acetylgalactosaminyltransferase. Biochem J 308: 801–813PubMedGoogle Scholar
  22. Hansen JE, Lund O, Tolstrup N, Gooley AA, Williams KL, Brunak S (1998) NetOglyc: Prediction of mucin type 0-glycosylation sites based on sequence context and surface accessibility. Glycoconjugate J 15: 115–130CrossRefGoogle Scholar
  23. Hart GW, Greis KD, Dong LY, Blomberg MA, Chou TY, Jiang MS, Roquemore EP, Snow DM, Kreppel LK, Cole RN (1995) 0-linked N-acetylglucosamine: the “yin-yang” of Ser/Thr phosphorylation? Nuclear and cytoplasmic glycosylation. Adv Exp Med Biol 376: 115–123Google Scholar
  24. Heyer L, Kruglyak S, Yooseph S (1999) Exploring expression data identification and analysis of coexpressed genes. Genome Res 9: 1106–1115PubMedCrossRefGoogle Scholar
  25. Hounsell EF, Davies MJ, Renouf DV (1996) 0-linked protein glycosylation structure and function. Glycoconjugate J 13: 19–26Google Scholar
  26. Huynen M, Dandekar T, Bork P (1998) Differential genome analysis applied to the species-specific features of Helicobacter pylori. FEBS Lett 426: 1–5PubMedCrossRefGoogle Scholar
  27. Iliopoulos I, Tsoka S, Andrade MA, Janssen P, Audit B, Tramontano A, Valencia A, Leroy C, Sander C, Ouzounis CA (2000) Genome sequences and great expectations. Genome Biology 2: 1–2CrossRefGoogle Scholar
  28. Arabidopsis Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. The Arabidopsis Genome Initiative. Nature 408: 796–815CrossRefGoogle Scholar
  29. Krieg J, Hartmann S, Vicentini A, Glasner W, Hess D, Hofsteenge J (1998) Recognition signal for C-mannosylation of Trp-7 in RNase 2 consists of sequence Trp-x-x-Trp. Mol Biol Cell 9: 301–309PubMedGoogle Scholar
  30. Kukuruzinska M, Lennon K (1998) Protein N-glycosylation: molecular genetics and functional significance. Crit Rev Oral Biol Med 9: 415–448PubMedCrossRefGoogle Scholar
  31. Lis H, Sharon N (1993) Protein glycosylation: Structural and functional aspects. Cur J Biochem 218: 1–27Google Scholar
  32. Marcotte E (2000) Computational genetics: finding protein function by nonhomology methods. Curr Opin Struct Biol 10: 359–365PubMedCrossRefGoogle Scholar
  33. Marcotte E, Pellegrini M, Ng H, Rice D, Yeates T, Eisenberg D (1999) Detecting protein function and protein-protein interactions from genome sequences. Science 285: 751–753PubMedCrossRefGoogle Scholar
  34. Nielsen H, Krogh A (1998) Prediction of signal peptides and signal anchors by a hidden Markov model. In: Glasgow J, Littlejohn T, Major F, Lathrop RGoogle Scholar
  35. Sankoff D, Sensen C (eds) Proceedings, Sixth International Conference on Intelligent Systems for Molecular Biology, vol. 6. AAAI Press, Menlo Park, pp 122–130Google Scholar
  36. Nielsen H, Engelbrecht J, Brunak S, von Heijne G (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng 10: 1–6PubMedCrossRefGoogle Scholar
  37. Nilsson I, von Heijne G (1993) Determination of the distance between the oligosaccharyl-transferase active site and the endoplasmic reticulum membrane. J Biol Chem 268: 5798–5801PubMedGoogle Scholar
  38. Nilsson I, von Heijne G (2000) Glycosylation eficiency of Asn-Xaa-Thr sequons depends both on the distance from the C terminus and on the presence of a downstream transmembrane segment. J Biol Chem 275: 17338–17343PubMedCrossRefGoogle Scholar
  39. Overbeek R, Fonstein M, D’Souza M, Pusch G, Maltsev N (1999) The use of gene clusters to infer functional coupling. Proc Natl Acad Sci USA 96: 2896–2901PubMedCrossRefGoogle Scholar
  40. Pellegrini M, Marcotte E, Thompson M, Eisenberg D, Yeates T (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci USA 96: 4285–4288PubMedCrossRefGoogle Scholar
  41. Rechsteiner M, Rogers S (1996) PEST sequences and regulation by proteolysis. Trends Biochem Sci 21: 267–271PubMedGoogle Scholar
  42. Riley M (1993) Functions of the gene products of Escherichia coli. Microbiol Rev 57: 862–952PubMedGoogle Scholar
  43. Roth J, Wang Y, Eckhardt AE, Hill RL (1994) Subcellular localization of the UDP-N-acetyl-d-galactosamine: polypeptide Nacetylgalactosaminyltransferase-mediated O- glycosylation reaction in the submaxillary gland. Proc Nati Acad Sci USA 91: 8935–8939CrossRefGoogle Scholar
  44. Rubin G, Yandell M, Wortman J, Gabor Miklos G, Nelson C, Hariharan I, Fortini M, Li P, Apweiler R, Fleischmann W, Cherry J, Henikofi S, Skupski M, Misra S, Ashburner M, Birney E, Boguski M, Brody T, Brokstein P, Celniker S, Chervitz S, Coates D, Cravchik A, Gabrielian A, Galle R, Gelbart W, George R, Goldstein L, Gong F, Guan P, Harris N, Hay B, Hoskins R, Li J, Li Z, Hynes R, Jones S, Kuehl P, Lemaitre B, Littleton J, Morrison D, Mungall C, OFarrell P, Pickeral O, Shue C, Vosshall L, Zhang J, Zhao Q, Zheg X, Zhong F, Zhong W, Gibbs R, Venter J, Adams M, Lewis S (2000) Comparative genomics of the eukaryotes. Science 287: 2204–2215PubMedCrossRefGoogle Scholar
  45. Snow DM, Hart GW (1998) Nuclear and Cytoplasmic Glycosylation. Int Rev Cytol 181: 43–74PubMedCrossRefGoogle Scholar
  46. Sonnhammer E, von Heijne G, Krogh A (1998) A hidden Markov model for predicting transmembrane helices in protein sequences. Proc Int Conf Intell Syst Mol Biol 6: 175–182PubMedGoogle Scholar
  47. Tamames J, Casari G, Ouzounis C, Valencia A (1997) Conserved clusters of functionally related genes in two bacterial genomes. J Mol Evol 44: 66–73PubMedCrossRefGoogle Scholar
  48. Tatusov R, Koonin E, Lipman D (1997) A genomic perspective on protein families. Science 278: 631–637PubMedCrossRefGoogle Scholar
  49. Van den Steen P, Rudd PM, Dwek RA, Opdenakker G (1998) Concepts and Principles of 0-linked Glycosylation. Crit Rev Biochem Mol Biol 33: 151–208PubMedCrossRefGoogle Scholar
  50. Varki A (1993) Biological roles of oligosaccharides: all of the theories are correct. Glycobiology 3: 97–130PubMedCrossRefGoogle Scholar
  51. Varshaysky A (1996) The N-end rule: functions, mysteries, uses. Proc Natl Acad Sci USA 93: 12142–12149CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • R. Gupta
  • L. J. Jensen
  • S. Brunak

There are no affiliations available

Personalised recommendations