Abstract
Completely sequenced genomes and other types of genomics data provide us with new information to predict protein function. While classical, homology-based function prediction provides information about a proteins’ molecular function (what does the protein do at a molecular scale?), the analysis of the sequence in the context of its genome or in other types of genomics data provides information about its functional context (what are the proteins’ interaction partners, and in which biological process does it play a role?) Genomic context data are however inherently noisy. Only by combining different types of genomic(s) context data (vertical comparative genomics) or by combining the same type of genomics data from different species (horizontal comparative genomics) do they become sufficiently reliable to be used for protein function prediction. Homology-based function prediction and context-based function prediction provide complementary information about a protein’s function and can becombined to make predictions that are specific enough for experimental testing. Here we discuss the genomic coverage and reliability of combining genomics data for protein function prediction and survey predictions that have actually led to experimental confirmation. Using a number of examples we illustrate how combining the information from various types of genomics data can lead to specific protein function predictions. These include the prediction that the Ribonuclease L inhibitor (RLI) is involved in the maturation of ribosomal RNA.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bork P, Dandekar T, Diaz-Lazcoz Y et al. Predicting function: From genes to genomes and back. J Mol Biol 1998; 283:707–725.
Marcotte EM, Pellegrini M, Ng HL et al. Detecting protein function and protein-protein interactions from genome sequences. Science 1999; 285:751–753.
Enright AJ, Iliopoulos I, Kyrpides NC et al. Protein interaction maps for complete genomes based on gene fusion events. Nature 1999; 402:86–90.
Overbeek R, Fonstein M, D’Souza M et al. Use of contiguity on the chromosome to predict functional coupling. In Silico Biol 1998; 1:93–108.
Dandekar T, Snel B, Huynen M et al. Conservation of gene order: A fingerprint of proteins that physically interact. Trends Biochem Sci 1998; 23:324–328.
Pellegrini M, Marcotte EM, Thompson MJ et al. Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. Proc Natl Acad Sci USA 1999; 96:4285–4288.
Huynen MA, Bork P. Measuring genome evolution. Proc Natl Acad Sci USA 1998; 95:5849–5856.
Galperin MY, Koonin EV. Who’s your neighbor? New computational approaches for functional genomics. Nat Biotechnol 2000; 18:609–613.
Gelfand MS, Mironov AA, Jomantas J et al. A conserved RNA structure element involved in the regulation of bacterial riboflavin synthesis genes. Trends Genet 1999; 15:439–442.
McGuire AM, Hughes JD, Church GM. Conservation of DNA regulatory motifs and discovery of new motifs in microbial genomes. Genome Res 2000; 10:744–757.
van Nimwegen E, Zavolan M, Rajewsky N et al. Probabilistic clustering of sequences: Inferring new bacterial regulons by comparative genomics. Proc Natl Acad Sci USA 2002; 99:7323–7328.
Pazos F, Valencia A. Similarity of phylogenetic trees as indicator of protein-protein interaction. Protein Eng 2001; 14:609–614.
Ramani AK, Marcotte EM. Exploiting the coevolution of interacting proteins to discover interaction specificity. J Mol Biol 2003; 327:273–284.
Goh CS, Bogan AA, Joachimiak M et al. Coevolution of proteins with their interaction partners. J Mol Biol 2000; 299:283–293.
Huynen MA, Snel B. Exploiting the variations in the genomic associations of genes to predict pathways and reconstruct their evolution. In: Koonin EV, ed. Frontiers in Computational Genomics. Wymondham: Caisters 2003: 3:145–166.
Marcotte EM. Computational genetics: Finding protein function by nonhomology methods. Curr Opin Struct Biol 2000; 10:359–365.
Valencia A, Pazos F. Computational methods for the prediction of protein interactions. Curr Opin Struct Biol 2002; 12:368–373.
Huynen M, Snel B, Lathe W et al. Exploitation of gene context. Curr Opin Struct Biol 2000; 10:366–370.
Teichmann S, Babu M. Conservation of gene coregulation in prokaryotes and eukaryotes. Trends Biotechnol 2002; 20:407.
van Noort V, Snel B, Huynen MA. Predicting gene function by conserved coexpression. Trends Genet 2003; 19:238–242.
Stuart JM, Segal E, Koller D et al. A gene-coexpression network for global discovery of conserved genetic modules. Science 2003; 302:249–255.
Kelley BP, Sharan R, Karp RM et al. Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc Natl Acad Sci USA 2003; 100:11394–11399.
Huynen MA, Snel B, van Noort V. Comparative genomics for reliable protein function prediction. Trends Genet 2004, (in press).
von Mering C, Krause R, Snel B et al. Comparative assessment of large-scale data sets of protein-protein interactions. Nature 2002; 417:399–403.
Jansen R, Yu H, Greenbaum D et al. A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 2003; 302:449–453.
Snel B, Bork P, Huynen M. Genome evolution. Gene fusion versus gene fission. Trends Genet 2000; 16:9–11.
Welch GR, Easterby JS. Metabolic channeling versus free diffusion: Transition-time analysis. Trends Biochem Sci 1994; 19:193–197.
Yanai I, Derti A, DeLisi C. Genes linked by fusion events are generally of the same functional category: A systematic analysis of 30 microbial genomes. Proc Natl Acad Sci USA 2001; 98:7940–7945.
Tsoka S, Ouzounis CA. Prediction of protein interactions: Metabolic enzymes are frequently involved in gene fusion. Nat Genet 2000; 26:141–142.
Mushegian AR, Koonin EV. Gene order is not conserved in bacterial evolution. Trends Genet 1996; 12:289–290.
Watanabe H, Mori H, Itoh T et al. Genome plasticity as a paradigm of eubacteria evolution. J Mol Evol 1997; 44(Suppl 1):S57–64.
Wolf YI, Rogozin IB, Kondrashov AS et al. Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context. Genome Res 2001; 11:356–372.
Overbeek RF, D’Souza M, Pusch M et al. Use of contiguity on the chromosome to infer functional coupling. In Silico Biol 1998; 2:93–108.
Moreno-Hagelsieb G, Trevino V, Perez-Rueda E et al. Transcription unit conservation in the three domains of life: A perspective from Escherichia coli. Trends Genet 2001; 17:175–177.
Overbeek R, Fonstein M, D’Souza M et al. The use of gene clusters to infer functional coupling. Proc Natl Acad Sci USA 1999; 96:2896–2901.
Blumenthal T, Evans D, Link CD et al. A global analysis of Caenorhabditis elegans operons. Nature 2002; 417:851–854.
Rodionov DA, Vitreschak AG, Mironov AA et al. Comparative genomics of thiamin biosynthesis in procaryotes. New genes and regulatory mechanisms. J Biol Chem 2002; 277:48949–48959.
van Nimwegen E, Zavolan M, Rajewsky N et al. Probabilistic clustering of sequences: Inferring new bacterial regulons by comparative genomics. Proc Nad Acad Sci USA 2002; 99:7323–7328.
Huynen MA, Snel B. Gene and context: Integrative approaches to genome analysis. Adv Protein Chem 2000; 54:345–379.
Gaasterland T, Ragan MA. Microbial genescapes: Phyletic and functional patterns of ORF distribution among prokaryotes. Microb Comp Genomics 1998; 3:199–217.
Liberles DA, Thoren A, von Heijne G et al. The use of phylogenetic profiles for gene prediction. Current Genomics 2002; 3:131–137.
von Mering C, Huynen M, Jaeggi D et al. STRING: A database of predicted functional associations between proteins. Nucleic Acids Res 2003; 31:258–261.
Koonin EV, Mushegian AR, Bork P. Nonorthologous gene displacement. Trends Genet 1996; 12:334–336.
Fryxell KJ. The coevolution of gene family trees. Trends Genet 1996; 12:364–369.
Hughes AL, Yeager M. Coevolution of the mammalian chemokines and their receptors. Immunogenetics 1999; 49:115–124.
Valencia A, Pazos F. Prediction of protein-protein interactions from evolutionary information. Methods Biochem Anal 2003; 44:411–426.
Pazos F, Helmer-Citterich M, Ausiello G et al. Correlated mutations contain information about protein-protein interaction. J Mol Biol 1997; 271:511–523.
Pazos F, Valencia A. In silico two-hybrid system for the selection of physically interacting protein pairs. Proteins 2002; 47:219–227.
Huynen M, Snel B, Lathe IIIrd W et al. Predicting protein function by genomic context: Quantitative evaluation and qualitative inferences. Genome Res 2000; 10:1204–1210.
von Mering C, Huynen MA, Jaeggi D et al. STRING-adatabase of predicted functional associations between proteins. Nucleic Acids Res 2003.
Yanai I, Mellor JC, DeLisi C. Identifying functional links between genes using conserved chromo somal proximity. Trends Genet 2002; 18:176–179.
Bairoch A, Apweiler R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res 2000; 28:45–48.
Kanehisa M, Goto S, Kawashima S et al. The KEGG databases at GenomeNet. Nucleic Acids Res 2002; 30:42–46.
Uetz P, Giot L, Cagney G et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 2000; 403:623–627.
Ho Y, Gruhler A, Heilbut A et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 2002; 415:180–183.
Eisenberg D, Marcotte EM, Xenarios I et al. Protein function in the post-genomic era. Nature 2000; 405:823–826.
Huynen MA, Snel B, von Mering C et al. Function prediction and protein networks. Curr Opin Cell Biol 2003; 15:191–198.
Morett E, Korbel JO, Rajan E et al. Systematic discovery of analogous enzymes in thiamin biosynthesis. Nat Biotechnol 2003; 21:790–795.
Tatusov RL, Natale DA, Garkavtsev IV et al. The COG database: New developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res 2001; 29:22–28.
Lim K, Tempczyk A, Parsons JF et al. Crystal structure of YbaB from Haemophilus influenzae (HI0442), a protein of unknown function coexpressed with the recombinational DNA repair pro tein RecR. Proteins 2003; 50:375–379.
Yeung T, Mullin DA, Chen KS et al. Sequence and expression of the Escherichia coli recR locus. J Bacteriol 1990; 172:6042–6047.
Letunic I, Copley RR, Schmidt S et al. SMART 4.0: Towards genomic data integration. Nucleic Acids Res 2004; 32 (Database issue):D142–144.
Hughes TR, Marton MJ, Jones AR et al. Functional discovery via a compendium of expression profiles. Cell 2000; 102:109–126.
Gonzalez-Parraga P, Hernandez JA, Arguelles JC. Role of antioxidant enzymatic defences against oxidative stress H(2)O(2) and the acquisition of oxidative tolerance in Candida albicans. Yeast 2003; 20:1161–1169.
Aldea M, Hernandez-Chico C, de la Campa AG et al. Identification, cloning, and expression of bolA, an ftsZ-dependent morphogene of Escherichia coli. J Bacteriol 1988; 170:5169–5176.
Santos JM, Freire P, Vicente M et al. The stationary-phase morphogene bolA from Escherichia coli is induced by stress during early stages of growth. Mol Microbiol 1999; 32:789–798.
Kim SH, Kim M, Lee JK et al. Identification and expression of uvi31+, a UV-inducible gene from Schizosaccharomyces pombe. Environ Mol Mutagen 1997; 30:72–81.
Ito T, Tashiro K, Muta S et al. Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proc Natl Acad Sci USA 2000; 97:1143–1147.
Giot L, Bader JS, Brouwer C et al. A protein interaction map of Drosophila melanogaster. Science 2003; 302:1727–1736.
Kasai T, Inoue M, Koshiba S et al. Solution structure of a BolA-like protein from Mus musculus. Protein Sci 2004; 13:545–548.
Holm L, Sander C. Protein structure comparison by alignment of distance matrices. J Mol Biol 1993; 233:123–138.
Gabaldon T, Huynen MA. Reconstruction of the proto-mitochondrial metabolism. Science 2003; 301:609.
Sen GC, Lengyel P. The interferon system. A bird’s eye view of its biochemistry. J Biol Chem 1992; 267:5017–5020.
Bisbal C, Martinand C, Silhol M et al. Cloning and characterization of a RNAse L inhibitor. A new component of the interferon-regulated 2–5A pathway. J Biol Chem 1995; 270:13308–13317.
Zimmerman C, Klein KC, Kiser PK et al. Identification of a host protein essential for assembly of immature HIV-1 capsids. Nature 2002; 415:88–92.
Letunic I, Goodstadt L, Dickens NJ et al. Recent improvements to the SMART domain-based sequence annotation resource. Nucleic Acids Res 2002; 30:242–244.
Stuart JM, Segal E, Koller D et al. A gene coexpression network for global discovery of conserved genetic modules. Science 2003; 302:249–255.
Ito T, Chiba T, Ozawa R et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA 2001; 98:4569–4574.
Krogan NJ, Peng WT, Cagney G et al. High-definition macromolecular composition of yeast RNA-processing complexes. Mol Cell 2004; 13:225–239.
Valasek L, Hasek J, Nielsen KH et al. Dual function of eIF3j/Hcrlp in processing 20 S prerRNA and translation initiation. J Biol Chem 2001; 276:43351–43360.
Huh WK, Falvo JV, Gerke LC et al. Global analysis of protein localization in budding yeast. Nature 2003; 425:686–691.
Kressler D, Linder P, de La Cruz J. Protein trans-acting factors involved in ribosome biogenesis in Saccharomyces cerevisiae. Mol Cell Biol 1999; 19:7897–7912.
Giaever G, Chu AM, Ni L et al. Functional profiling of the Saccharomyces cerevisiae genome. Nature 2002; 418:387–391.
Estevez AM, Haile S, Steinbuchel M et al. Effects of depletion and overexpression of the Trypanosoma brucei ribonuclease L inhibitor homologue. Mol Biochem Parasitol 2004; 133:137–141.
Fromme JC, Verdine GL. Structure of a trapped endonuclease III-DNA covalent intermediate. EMBO J 2003; 22:3461–3471.
Porello SL, Cannon MJ, David SS. A substrate recognition role for the [4Fe-4S]2+ cluster of the DNA repair glycosylase MutY. Biochemistry 1998; 37:6465–6475.
Weller GR, Kysela B, Roy R et al. Identification of a DNA nonhomologous end-joining complex in bacteria. Science 2002; 297:1686–1689.
Thomas G, Coutts G, Merrick M. The glnKamtB operon. A conserved gene pair in prokaryotes. Trends Genet 2000; 16:11–14.
Coutts G, Thomas G, Blakey D et al. Membrane sequestration of the signal transduction protein GlnK by the ammonium transporter AmtB. EMBO J 2002; 21:536–545.
Bobik TA, Rasche ME. Identification of the human methylmalonyl-CoA racemase gene based on the analysis of prokaryotic gene arrangements. Implications for decoding the human genome. J Biol Chem 2001; 276:37194–37198.
Horswill AR, Escalante-Semerena JC. In vitro conversion of propionate to pyruvate by Salmonella enterica enzymes: 2-methylcitrate dehydratase (PrpD) and aconitase Enzymes catalyze the conver sion of 2-methylcitrate to 2-methylisocitrate. Biochemistry 2001; 40:4703–4713.
Daugherty M, Vonstein V, Overbeek R et al. Archaeal shikimate kinase, a new member of the GHMP-kinase family. J Bacteriol 2001; 183:292–300.
Graham DE, Graupner M, Xu H et al. Identification of coenzyme M biosynthetic 2-phosphosulfolactate phosphatase. A member of a new class of Mg(2+)-dependent acid phosphatases. Eur J Biochem 2001; 268:5176–5188.
Kurnasov O, Jablonski L, Polanuyer B et al. Aerobic tryptophan degradation pathway in bacteria: Novel kynurenine formamidase. FEMS Microbiol Lett 2003; 227:219–227.
Graham DE, Xu H, White RH. Methanococcus jannaschii uses a pyruvoyl-dependent arginine decarboxylase in polyamine biosynthesis. J Biol Chem 2002; 277:23500–23507.
Heath RJ, Rock CO. A triclosan-resistant bacterial enzyme. Nature 2000; 406:145–146.
Marrakchi H, Choi KH, Rock CO. A new mechanism for anaerobic unsaturated fatty acid formation in Streptococcus pneumoniae. J Biol Chem 2002; 277:44809–44816.
Bishop AC, Xu J, Johnson RC et al. Identification of the tRNA-dihydrouridine synthase family. J Biol Chem 2002; 277:25090–25095.
Huynen MA, Snel B, Bork P et al. The phylogenetic distribution of frataxin indicates a role in iron-sulfur cluster protein assembly. Hum Mol Genet 2001; 10:2463–2468.
Muhlenhoff U, Richhardt N, Ristow M et al. The yeast frataxin homolog Yfhlp plays a specific role in the maturation of cellular Fe/S proteins. Hum Mol Genet 2002; 11:2025–2036.
Luttgen H, Rohdich F, Herz S et al. Biosynthesis of terpenoids: YchB protein of Escherichia coli phosphorylates the 2-hydroxy group of 4-diphosphocytidyl-2C-methyl-D-erythritol. Proc Natl Acad Sci USA 2000; 97:1062–1067.
Karzai AW, Susskind MM, Sauer RT. SmpB, a unique RNA-binding protein essential for the peptide-tagging activity of SsrA (tmRNA). EMBO J 1999; 18:3793–3799.
Myllykallio H, Lipowski G, Leduc D et al. An alternative flavin-dependent mechanism for thymidylate synthesis. Science 2002; 297:105–107.
Rouhier N, Gelhaye E, Sautiere PE et al. Isolation and characterization of a new peroxiredoxin from poplar sieve tubes that uses either glutaredoxin or thioredoxin as a proton donor. Plant Physiol 2001; 127:1299–1309.
Herz S, Wungsintaweekul J, Schuhr CA et al. Biosynthesis of terpenoids: YgbB protein con verts 4-diphosphocytidyl-2C-methyl-D-erythritol 2-phosphate to 2C-methyl-D-erythritol 2,4-cyclodiphosphate. Proc Natl Acad Sci USA 2000; 97:2486–2490.
Kryukov GV, Kumar RA, Koc A et al. Selenoprotein R is a zinc-containing stereo-specific methionine sulfoxide reductase. Proc Natl Acad Sci USA 2002; 99:4245–4250.
Campbell JW, Cronan Jr JE. The enigmatic Escherichia coli fadE gene is yafH. J Bacteriol 2002; 184:3759–3764.
Sadovskaya NS, Laikova ON, Mironov AA et al. Study of regulation of long-Chain Fatty acid me tabolism using computer analysis of complete bacterial genomes. Mol Biol 2001; 35:862–866.
Rodionov DA, Mironov AA, Rakhmaninova AB et al. Transcriptional regulation of transport and utilization systems for hexuronides, hexuronates and hexonates in gamma purple bacteria. Mol Microbiol 2000; 38:673–683.
Hugouvieux-Cotte-Pattat N, Blot N, Reverchon S. Identification of TogMNAB, an ABC transporter which mediates the uptake of pectic oligomers in Erwinia chrysanthemi 3937. Mol Microbiol 2001; 41:1113–1123.
Zhang Z, Feige JN, Chang AB et al. A transporter of Escherichia coli specific for L-and D-methionine is the prototype for a new family within the ABC superfamily. Arch Microbiol 2003; 180:88–100.
Uetz P, Giot L, Cagney G et al. A comprehensive analysis of protein-protein interactions in Saccha-romyces cerevisiae. Nature 2000; 403:623–627.
Gabaldon T, Huynen MA. Prediction of protein function and pathways in the genome era. Cell Mol Life Sci 2004; 61:930–944.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2006 Landes Bioscience and Springer Science+Business Media
About this chapter
Cite this chapter
Huynen, M.A., Snel, B., Gabaldón, T. (2006). Reliable and Specific Protein Function Prediction by Combining Homology with Genomic(s) Context. In: Discovering Biomolecular Mechanisms with Computational Biology. Molecular Biology Intelligence Unit. Springer, Boston, MA. https://doi.org/10.1007/0-387-36747-0_2
Download citation
DOI: https://doi.org/10.1007/0-387-36747-0_2
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-34527-7
Online ISBN: 978-0-387-36747-7
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)
