Advertisement

Comparative Genomics Approaches to Identifying Functionally Related Genes

  • Michael Y. Galperin
  • Eugene V. Koonin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8542)

Abstract

The rapid progress in genome sequencing makes it possible to address fundamental problems of biology and achieve critical insights into the functioning of the live cells and entire organisms. However, the widening gap between the rapidly accumulating sequence data and our ability to properly annotate these data constitutes a major problem that slows down the progress of genome biology. This paper discusses the notion of “function” as it relates to computational biology, lists the most common ways of assigning function to the new genes, particularly those that specifically rely on comparative genome analysis, and briefly reviews the drawbacks of the current algorithms for semi-automated high-throughput functional annotation of genomes.

Keywords

genome annotation genomic context gene neighborhood operon functional genomics orthology databases 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Fleischmann, R.D., Adams, M.D., White, O., Clayton, R.A., Kirkness, E.F., Kerlavage, A.R., Bult, C.J., Tomb, J.-F., Dougherty, B.A., Merrick, J.M., McKenney, K., Sutton, G.G., FitzHugh, W., Fields, C., Gocayne, J.D., Scott, J., Shirley, R., Liu, L.-I., Glodek, A., Kelley, J.M., Weidman, J.F., Phillips, C.A., Spriggs, T., Hedblom, E., Cotton, M.D., Utterback, T.R., Hanna, M.C., Nguyen, D., Saudek, D.M., Brandon, R.C., Fine, L.D., Frichtman, J.L., Fuhrmann, J.L., Geoghagen, N.S.M., Gnehm, C.L., McDonald, L.A., Small, K.V., Fraser, C.M., Smith, H.O., Venter, J.C.: Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269, 496–512 (1995)CrossRefGoogle Scholar
  2. 2.
    Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., Funke, R., Gage, D., Harris, K., Heaford, A., Howland, J., Kann, L., Lehoczky, J., LeVine, R., McEwan, P., McKernan, K., Meldrim, J., Mesirov, J.P., Miranda, C., Morris, W., Naylor, J., Raymond, C., Rosetti, M., Santos, R., Sheridan, A., Sougnez, C., Stange-Thomann, N., Stojanovic, N., Subramanian, A., Wyman, D., Rogers, J., Sulston, J., Ainscough, R., Beck, S., Bentley, D., Burton, J., Clee, C., Carter, N., Coulson, A., Deadman, R., Deloukas, P., Dunham, A., Dunham, I., Durbin, R., French, L., Grafham, D., Gregory, S., Hubbard, T., Humphray, S., Hunt, A., Jones, M., Lloyd, C., McMurray, A., Matthews, L., Mercer, S., Milne, S., Mullikin, J.C., Mungall, A., Plumb, R., Ross, M., Shownkeen, R., Sims, S., Waterston, R.H., Wilson, R.K., Hillier, L.W., McPherson, J.D., Marra, M.A., Mardis, E.R., Fulton, L.A., Chinwalla, A.T., Pepin, K.H., Gish, W.R., Chissoe, S.L., Wendl, M.C., Delehaunty, K.D., Miner, T.L., Delehaunty, A., Kramer, J.B., Cook, L.L., Fulton, R.S., Johnson, D.L., Minx, P.J., Clifton, S.W., Hawkins, T., Branscomb, E., Predki, P., Richardson, P., Wenning, S., Slezak, T., Doggett, N., Cheng, J.F., Olsen, A., Lucas, S., Elkin, C., Uberbacher, E., Frazier, M., Gibbs, R.A., Muzny, D.M., Scherer, S.E., Bouck, J.B., Sodergren, E.J., Worley, K.C., Rives, C.M., Gorrell, J.H., Metzker, M.L., Naylor, S.L., Kucherlapati, R.S., Nelson, D.L., Weinstock, G.M., Sakaki, Y., Fujiyama, A., Hattori, M., Yada, T., Toyoda, A., Itoh, T., Kawagoe, C., Watanabe, H., Totoki, Y., Taylor, T., Weissenbach, J., Heilig, R., Saurin, W., Artiguenave, F., Brottier, P., Bruls, T., Pelletier, E., Robert, C., Wincker, P., Smith, D.R., Doucette-Stamm, L., Rubenfield, M., Weinstock, K., Lee, H.M., Dubois, J., Rosenthal, A., Platzer, M., Nyakatura, G., Taudien, S., Rump, A., Yang, H., Yu, J., Wang, J., Huang, G., Gu, J., Hood, L., Rowen, L., Madan, A., Qin, S., Davis, R.W., Federspiel, N.A., Abola, A.P., Proctor, M.J., Myers, R.M., Schmutz, J., Dickson, M., Grimwood, J., Cox, D.R., Olson, M.V., Kaul, R., Shimizu, N., Kawasaki, K., Minoshima, S., Evans, G.A., Athanasiou, M., Schultz, R., Roe, B.A., Chen, F., Pan, H., Ramser, J., Lehrach, H., Reinhardt, R., McCombie, W.R., de la Bastide, M., Dedhia, N., Blocker, H., Hornischer, K., Nordsiek, G., Agarwala, R., Aravind, L., Bailey, J.A., Bateman, A., Batzoglou, S., Birney, E., Bork, P., Brown, D.G., Burge, C.B., Cerutti, L., Chen, H.C., Church, D., Clamp, M., Copley, R.R., Doerks, T., Eddy, S.R., Eichler, E.E., Furey, T.S., Galagan, J., Gilbert, J.G., Harmon, C., Hayashizaki, Y., Haussler, D., Hermjakob, H., Hokamp, K., Jang, W., Johnson, L.S., Jones, T.A., Kasif, S., Kaspryzk, A., Kennedy, S., Kent, W.J., Kitts, P., Koonin, E.V., Korf, I., Kulp, D., Lancet, D., Lowe, T.M., McLysaght, A., Mikkelsen, T., Moran, J.V., Mulder, N., Pollara, V.J., Ponting, C.P., Schuler, G., Schultz, J., Slater, G., Smit, A.F., Stupka, E., Szustakowski, J., Thierry-Mieg, D., Thierry-Mieg, J., Wagner, L., Wallis, J., Wheeler, R., Williams, A., Wolf, Y.I., Wolfe, K.H., Yang, S.P., Yeh, R.F., Collins, F., Guyer, M.S., Peterson, J., Felsenfeld, A., Wetterstrand, K.A., Patrinos, A., Morgan, M.J., de Jong, P., Catanese, J.J., Osoegawa, K., Shizuya, H., Choi, S., Chen, Y.J.: Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001)CrossRefGoogle Scholar
  3. 3.
    Zhou, J., Rudd, K.E.: EcoGene 3.0. Nucleic Acids Res. 41, D613–D624 (2013)Google Scholar
  4. 4.
    Rigden, D.J., Galperin, M.Y.: Sequence analysis of GerM and SpoVS, uncharacterized bacterial ’sporulation’ proteins with widespread phylogenetic distribution. Bioinformatics 24, 1793–1797 (2008)CrossRefGoogle Scholar
  5. 5.
    Galperin, M.Y., Mekhedov, S.L., Puigbo, P., Smirnov, S., Wolf, Y.I., Rigden, D.J.: Genomic determinants of sporulation in Bacilli and Clostridia: Towards the minimal set of sporulation-specific genes. Environ. Microbiol. 14, 2870–2890 (2012)CrossRefGoogle Scholar
  6. 6.
    Kuznetsova, E., Proudfoot, M., Sanders, S.A., Reinking, J., Savchenko, A., Arrowsmith, C.H., Edwards, A.M., Yakunin, A.F.: Enzyme genomics: Application of general enzymatic screens to discover new enzymes. FEMS Microbiol. Rev. 29, 263–279 (2005)Google Scholar
  7. 7.
    Kuznetsova, E., Proudfoot, M., Gonzalez, C.F., Brown, G., Omelchenko, M.V., Borozan, I., Carmel, L., Wolf, Y.I., Mori, H., Savchenko, A.V., Arrowsmith, C.H., Koonin, E.V., Edwards, A.M., Yakunin, A.F.: Genome-wide analysis of substrate specificities of the Escherichia coli haloacid dehalogenase-like phosphatase family. J. Biol. Chem. 281, 36149–36161 (2006)CrossRefGoogle Scholar
  8. 8.
    Koonin, E.V., Galperin, M.Y.: Sequence - Evolution - Function. Computational Approaches in Comparative Genomics. Kluwer, Boston (2003)Google Scholar
  9. 9.
    Galperin, M.Y., Koonin, E.V.: From complete genome sequence to ‘complete’ understanding? Trends Biotechnol. 28, 398–406 (2010)CrossRefGoogle Scholar
  10. 10.
    Abhiman, S., Sonnhammer, E.L.: FunShift: A database of function shift analysis on protein subfamilies. Nucleic Acids Res. 33, D197–D200 (2005)Google Scholar
  11. 11.
    Mi, H., Muruganujan, A., Thomas, P.D.: PANTHER in 2013: Modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucleic Acids Res. 41, D377–D386 (2013)Google Scholar
  12. 12.
    Akiva, E., Brown, S., Almonacid, D.E., Barber, A.E., Custer, A.F., Hicks, M.A., Huang, C.C., Lauck, F., Mashiyama, S.T., Meng, E.C., Mischel, D., Morris, J.H., Ojha, S., Schnoes, A.M., Stryke, D., Yunes, J.M., Ferrin, T.E., Holliday, G.L., Babbitt, P.C.: The Structure-Function Linkage Database. Nucleic Acids Res. 42, D521–D530 (2014)Google Scholar
  13. 13.
    Copley, S.D.: Moonlighting is mainstream: Paradigm adjustment required. Bioessays 34, 578–588 (2012)CrossRefGoogle Scholar
  14. 14.
    Hernandez, S., Ferragut, G., Amela, I., Perez-Pons, J., Pinol, J., Mozo-Villarias, A., Cedano, J., Querol, E.: MultitaskProtDB: A database of multitasking proteins. Nucleic Acids Res. 42, D517–D520 (2014)Google Scholar
  15. 15.
    Tatusov, R.L., Koonin, E.V., Lipman, D.J.: A genomic perspective on protein families. Science 278, 631–637 (1997)CrossRefGoogle Scholar
  16. 16.
    Tatusov, R.L., Galperin, M.Y., Natale, D.A., Koonin, E.V.: The COG database: A tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 28, 33–36 (2000)CrossRefGoogle Scholar
  17. 17.
    Altenhoff, A.M., Schneider, A., Gonnet, G.H., Dessimoz, C.: OMA 2011: Orthology inference among 1000 complete genomes. Nucleic Acids Res. 39, D289–D294 (2011)Google Scholar
  18. 18.
    Fischer, S., Brunk, B.P., Chen, F., Gao, X., Harb, O.S., Iodice, J.B., Shanmugam, D., Roos, D.S., Stoeckert, C.J.: Using OrthoMCL to assign proteins to OrthoMCL-DB groups or to cluster proteomes into new ortholog groups. Curr. Protoc. Bioinformatics ch. 6, unit 6 12 , 11–19 (2011)Google Scholar
  19. 19.
    Waterhouse, R.M., Tegenfeldt, F., Li, J., Zdobnov, E.M., Kriventseva, E.V.: OrthoDB: A hierarchical catalog of animal, fungal and bacterial orthologs. Nucleic Acids Res. 41, D358–D365 (2013)Google Scholar
  20. 20.
    Powell, S., Forslund, K., Szklarczyk, D., Trachana, K., Roth, A., Huerta-Cepas, J., Gabaldon, T., Rattei, T., Creevey, C., Kuhn, M., Jensen, L.J., von Mering, C., Bork, P.: eggnog v4.0: Nested orthology inference across 3686 organisms. Nucleic Acids Res. 42, 231–239 (2014)CrossRefGoogle Scholar
  21. 21.
    Datta, R.S., Meacham, C., Samad, B., Neyer, C., Sjolander, K.: Berkeley PHOG: PhyloFacts orthology group prediction web server. Nucleic Acids Res. 37, W84–W89 (2009)Google Scholar
  22. 22.
    Ostlund, G., Schmitt, T., Forslund, K., Kostler, T., Messina, D.N., Roopra, S., Frings, O., Sonnhammer, E.L.: InParanoid 7: New algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Res 38, D196–D203 (2010)Google Scholar
  23. 23.
    Kanehisa, M., Goto, S., Sato, Y., Kawashima, M., Furumichi, M., Tanabe, M.: Data, information, knowledge and principle: Back to metabolism in KEGG. Nucleic Acids Res. 42, D199–D205 (2014)Google Scholar
  24. 24.
    Galperin, M.Y., Koonin, E.V.: A diverse superfamily of enzymes with ATP-dependent carboxylate-amine/thiol ligase activity. Protein Sci. 6, 2639–2643 (1997)CrossRefGoogle Scholar
  25. 25.
    Galperin, M.Y., Bairoch, A., Koonin, E.V.: A superfamily of metalloenzymes unifies phosphopentomutase and cofactor- independent phosphoglycerate mutase with alkaline phosphatases and sulfatases. Protein Sci. 7, 1829–1835 (1998)CrossRefGoogle Scholar
  26. 26.
    Moroz, O.V., Murzin, A.G., Makarova, K.S., Koonin, E.V., Wilson, K.S., Galperin, M.Y.: Dimeric dUTPases, HisE, and MazG belong to a new superfamily of all-alpha NTP pyrophosphohydrolases with potential “house-cleaning” functions. J. Mol. Biol. 347, 243–255 (2005)CrossRefGoogle Scholar
  27. 27.
    Galperin, M.Y., Koonin, E.V.: Divergence and convergence in enzyme evolution. J. Biol. Chem. 287, 21–28 (2012)CrossRefGoogle Scholar
  28. 28.
    The UniProt Consortium: Activities at the Universal Protein Resource (UniProt). Nucleic Acids Res. 42, D191–D198 (2014) Google Scholar
  29. 29.
    Finn, R.D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R.Y., Eddy, S.R., Heger, A., Hetherington, K., Holm, L., Mistry, J., Sonnhammer, E.L., Tate, J., Punta, M.: Pfam: The protein families database. Nucleic Acids Res. 42, D222–D230 (2014)Google Scholar
  30. 30.
    Huynen, M.A., Snel, B.: Gene and context: Integrative approaches to genome analysis. Adv. Protein Chem. 54, 345–379 (2000)CrossRefGoogle Scholar
  31. 31.
    Galperin, M.Y., Koonin, E.V.: Who’s your neighbor? New computational approaches for functional genomics. Nat. Biotechnol. 18, 609–613 (2000)CrossRefGoogle Scholar
  32. 32.
    Marcotte, E.M., Pellegrini, M., Ng, H.L., Rice, D.W., Yeates, T.O., Eisenberg, D.: Detecting protein function and protein-protein interactions from genome sequences. Science 285, 751–753 (1999)CrossRefGoogle Scholar
  33. 33.
    Marcotte, E.M., Pellegrini, M., Thompson, M.J., Yeates, T.O., Eisenberg, D.: A combined algorithm for genome-wide prediction of protein function. Nature 402, 83–86 (1999)CrossRefGoogle Scholar
  34. 34.
    Pellegrini, M., Marcotte, E.M., Thompson, M.J., Eisenberg, D., Yeates, T.O.: Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. Proc. Natl. Acad. Sci. USA 96, 4285–4288 (1999)CrossRefGoogle Scholar
  35. 35.
    Overbeek, R., Begley, T., Butler, R.M., Choudhuri, J.V., Chuang, H.Y., Cohoon, M., de Crecy-Lagard, V., Diaz, N., Disz, T., Edwards, R., Fonstein, M., Frank, E.D., Gerdes, S., Glass, E.M., Goesmann, A., Hanson, A., Iwata-Reuyl, D., Jensen, R., Jamshidi, N., Krause, L., Kubal, M., Larsen, N., Linke, B., McHardy, A.C., Meyer, F., Neuweger, H., Olsen, G., Olson, R., Osterman, A., Portnoy, V., Pusch, G.D., Rodionov, D.A., Ruckert, C., Steiner, J., Stevens, R., Thiele, I., Vassieva, O., Ye, Y., Zagnitko, O., Vonstein, V.: The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 33, 5691–5702 (2005)CrossRefGoogle Scholar
  36. 36.
    Overbeek, R., Fonstein, M., D’Souza, M., Pusch, G.D., Maltsev, N.: The use of contiguity on the chromosome to predict functional coupling. Silico Biol. 1 (1998)Google Scholar
  37. 37.
    Overbeek, R., Fonstein, M., D’Souza, M., Pusch, G.D., Maltsev, N.: The use of gene clusters to infer functional coupling. Proc. Natl. Acad. Sci. USA 96, 2896–2901 (1999)CrossRefGoogle Scholar
  38. 38.
    Gaasterland, T., Ragan, M.A.: Microbial genescapes: Phyletic and functional patterns of ORF distribution among prokaryotes. Microb. Comp. Genomics 3, 199–217 (1998)CrossRefGoogle Scholar
  39. 39.
    Rogozin, I.B., Makarova, K.S., Murvai, J., Czabarka, E., Wolf, Y.I., Tatusov, R.L., Szekely, L.A., Koonin, E.V.: Connected gene neighborhoods in prokaryotic genomes. Nucleic Acids Res. 30, 2212–2223 (2002)CrossRefGoogle Scholar
  40. 40.
    Rogozin, I.B., Makarova, K.S., Wolf, Y.I., Koonin, E.V.: Computational approaches for the analysis of gene neighbourhoods in prokaryotic genomes. Brief Bioinform. 5, 131–149 (2004)CrossRefGoogle Scholar
  41. 41.
    Wolf, Y.I., Rogozin, I.B., Kondrashov, A.S., Koonin, E.V.: Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context. Genome Res. 11, 356–372 (2001)CrossRefGoogle Scholar
  42. 42.
    Yanai, I., Mellor, J.C., DeLisi, C.: Identifying functional links between genes using conserved chromosomal proximity. Trends Genet. 18, 176–179 (2002)CrossRefGoogle Scholar
  43. 43.
    Franceschini, A., Szklarczyk, D., Frankild, S., Kuhn, M., Simonovic, M., Roth, A., Lin, J., Minguez, P., Bork, P., von Mering, C., Jensen, L.J.: STRING v9.1: Protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res. 41, 808–815 (2013)CrossRefGoogle Scholar
  44. 44.
    Koonin, E.V., Wolf, Y.I.: Genomics of bacteria and archaea: The emerging dynamic view of the prokaryotic world. Nucleic Acids Res. 36, 6688–6719 (2008)CrossRefGoogle Scholar
  45. 45.
    Natale, D.A., Galperin, M.Y., Tatusov, R.L., Koonin, E.V.: Using the COG database to improve gene recognition in complete genomes. Genetica 108, 9–17 (2000)CrossRefGoogle Scholar
  46. 46.
    Koonin, E.V., Mushegian, A.R., Bork, P.: Non-orthologous gene displacement. Trends Genet. 12, 334–336 (1996)CrossRefGoogle Scholar
  47. 47.
    Schmitt, T., Ogris, C., Sonnhammer, E.L.: FunCoup 3.0: Database of genome-wide functional coupling networks. Nucleic Acids Res. 42, 380–388 (2014)CrossRefGoogle Scholar
  48. 48.
    Koonin, E.V., Galperin, M.Y.: Prokaryotic genomes: The emerging paradigm of genome-based microbiology. Curr. Opin. Genet. Dev. 7, 757–763 (1997)CrossRefGoogle Scholar
  49. 49.
    Osterman, A., Overbeek, R.: Missing genes in metabolic pathways: A comparative genomics approach. Curr. Opin. Chem. Biol. 7, 238–251 (2003)CrossRefGoogle Scholar
  50. 50.
    Overbeek, R., Olson, R., Pusch, G.D., Olsen, G.J., Davis, J.J., Disz, T., Edwards, R.A., Gerdes, S., Parrello, B., Shukla, M., Vonstein, V., Wattam, A.R., Xia, F., Stevens, R.: The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 42, D206–D214 (2014)Google Scholar
  51. 51.
    Rodionov, D.A., Mironov, A.A., Gelfand, M.S.: Transcriptional regulation of pentose utilisation systems in the Bacillus/Clostridium group of bacteria. FEMS Microbiol. Lett. 205, 305–314 (2001)CrossRefGoogle Scholar
  52. 52.
    Rodionov, D.A., Vitreschak, A.G., Mironov, A.A., Gelfand, M.S.: Comparative genomics of thiamin biosynthesis in procaryotes. New genes and regulatory mechanisms. J. Biol. Chem. 277, 48949–48959 (2002)CrossRefGoogle Scholar
  53. 53.
    Mironov, A.A., Koonin, E.V., Roytberg, M.A., Gelfand, M.S.: Computer analysis of transcription regulatory patterns in completely sequenced bacterial genomes. Nucleic Acids Res. 27, 2981–2989 (1999)CrossRefGoogle Scholar
  54. 54.
    Gelfand, M.S., Koonin, E.V., Mironov, A.A.: Prediction of transcription regulatory sites in Archaea by a comparative genomic approach. Nucleic Acids Res. 28, 695–705 (2000)CrossRefGoogle Scholar
  55. 55.
    Gelfand, M.S.: Recognition of regulatory sites by genomic comparison. Res. Microbiol. 150, 755–771 (1999)CrossRefGoogle Scholar
  56. 56.
    Rodionov, D.A., Novichkov, P.S., Stavrovskaya, E.D., Rodionova, I.A., Li, X., Kazanov, M.D., Ravcheev, D.A., Gerasimova, A.V., Kazakov, A.E., Kovaleva, G.Y., Permina, E.A., Laikova, O.N., Overbeek, R., Romine, M.F., Fredrickson, J.K., Arkin, A.P., Dubchak, I., Osterman, A.L., Gelfand, M.S.: Comparative genomic reconstruction of transcriptional networks controlling central metabolism in the Shewanella genus. BMC Genomics 12(suppl. 1), S3 (2011)Google Scholar
  57. 57.
    Rodionov, D.A., Dubchak, I.L., Arkin, A.P., Alm, E.J., Gelfand, M.S.: Dissimilatory metabolism of nitrogen oxides in bacteria: Comparative reconstruction of transcriptional networks. PLoS Comput. Biol. 1, e55 (2005)Google Scholar
  58. 58.
    Tsoy, O.V., Pyatnitskiy, M.A., Kazanov, M.D., Gelfand, M.S.: Evolution of transcriptional regulation in closely related bacteria. BMC Evol. Biol. 12, 200 (2012)CrossRefGoogle Scholar
  59. 59.
    Gelfand, M.S.: Evolution of transcriptional regulatory networks in microbial genomes. Curr. Opin. Struct. Biol. 16, 420–429 (2006)CrossRefGoogle Scholar
  60. 60.
    Thompson, W., Rouchka, E.C., Lawrence, C.E.: Gibbs Recursive Sampler: Finding transcription factor binding sites. Nucleic Acids Res. 31, 3580–3585 (2003)CrossRefGoogle Scholar
  61. 61.
    Thompson, W., McCue, L.A., Lawrence, C.E.: Using the Gibbs motif sampler to find conserved domains in DNA and protein sequences. Curr. Protoc. Bioinformatics ch. 2, unit 2 8 (2005)Google Scholar
  62. 62.
    Novichkov, P.S., Rodionov, D.A., Stavrovskaya, E.D., Novichkova, E.S., Kazakov, A.E., Gelfand, M.S., Arkin, A.P., Mironov, A.A., Dubchak, I.: RegPredict: An integrated system for regulon inference in prokaryotes by comparative genomics approach. Nucleic Acids Res. 38, W299–W307 (2010)Google Scholar
  63. 63.
    Thompson, W.A., Newberg, L.A., Conlan, S., McCue, L.A., Lawrence, C.E.: The Gibbs Centroid Sampler. Nucleic Acids Res. 35, W232–W237 (2007)Google Scholar
  64. 64.
    Newberg, L.A., Thompson, W.A., Conlan, S., Smith, T.M., McCue, L.A., Lawrence, C.E.: A phylogenetic Gibbs sampler that yields centroid solutions for cis-regulatory site prediction. Bioinformatics 23, 1718–1727 (2007)CrossRefGoogle Scholar
  65. 65.
    Novichkov, P.S., Kazakov, A.E., Ravcheev, D.A., Leyn, S.A., Kovaleva, G.Y., Sutormin, R.A., Kazanov, M.D., Riehl, W., Arkin, A.P., Dubchak, I., Rodionov, D.A.: RegPrecise 3.0–a resource for genome-scale exploration of transcriptional regulation in bacteria. BMC Genomics 14, 745 (2013)CrossRefGoogle Scholar
  66. 66.
    Cipriano, M.J., Novichkov, P.N., Kazakov, A.E., Rodionov, D.A., Arkin, A.P., Gelfand, M.S., Dubchak, I.: RegTransBase–a database of regulatory sequences and interactions based on literature: A resource for investigating transcriptional regulation in prokaryotes. BMC Genomics 14, 213 (2013)CrossRefGoogle Scholar
  67. 67.
    Enright, A.J., Illopoulos, I., Kyrpides, N.C., Ouzounis, C.A.: Protein interaction maps for complete genomes based on gene fusion events. Nature 402, 86–90 (1999)CrossRefGoogle Scholar
  68. 68.
    Doolittle, R.F.: Do you dig my groove? Nat. Genet. 23, 6–8 (1999)CrossRefGoogle Scholar
  69. 69.
    Hunter, S., Jones, P., Mitchell, A., Apweiler, R., Attwood, T.K., Bateman, A., Bernard, T., Binns, D., Bork, P., Burge, S., de Castro, E., Coggill, P., Corbett, M., Das, U., Daugherty, L., Duquenne, L., Finn, R.D., Fraser, M., Gough, J., Haft, D., Hulo, N., Kahn, D., Kelly, E., Letunic, I., Lonsdale, D., Lopez, R., Madera, M., Maslen, J., McAnulla, C., McDowall, J., McMenamin, C., Mi, H., Mutowo-Muellenet, P., Mulder, N., Natale, D., Orengo, C., Pesseat, S., Punta, M., Quinn, A.F., Rivoire, C., Sangrador-Vegas, A., Selengut, J.D., Sigrist, C.J., Scheremetjew, M., Tate, J., Thimmajanarthanan, M., Thomas, P.D., Wu, C.H., Yeats, C., Yong, S.Y.: InterPro in 2011: New developments in the family and domain prediction database. Nucleic Acids Res. 40, D306–D312 (2012)Google Scholar
  70. 70.
    Marchler-Bauer, A., Zheng, C., Chitsaz, F., Derbyshire, M.K., Geer, L.Y., Geer, R.C., Gonzales, N.R., Gwadz, M., Hurwitz, D.I., Lanczycki, C.J., Lu, F., Lu, S., Marchler, G.H., Song, J.S., Thanki, N., Yamashita, R.A., Zhang, D., Bryant, S.H.: CDD: Conserved domains and protein three-dimensional structure. Nucleic Acids Res. 41, D348–D352 (2013)Google Scholar
  71. 71.
    Suhre, K., Claverie, J.M.: FusionDB: A database for in-depth analysis of prokaryotic gene fusion events. Nucleic Acids Res. 32, D273–D276 (2004)Google Scholar
  72. 72.
    Galperin, M.Y.: Diversity of structure and function of response regulator output domains. Curr. Opin. Microbiol. 13, 150–159 (2010)CrossRefGoogle Scholar
  73. 73.
    Basu, M.K., Carmel, L., Rogozin, I.B., Koonin, E.V.: Evolution of protein domain promiscuity in eukaryotes. Genome Res. 18, 449–461 (2008)CrossRefGoogle Scholar
  74. 74.
    Mosca, R., Ceol, A., Stein, A., Olivella, R., Aloy, P.: 3did: A catalog of domain-based interactions of known three-dimensional structure. Nucleic Acids Res. 42, D374–D379 (2014)Google Scholar
  75. 75.
    Finn, R.D., Miller, B.L., Clements, J., Bateman, A.: iPfam: A database of protein family and domain interactions found in the Protein Data Bank. Nucleic Acids Res. 42, D364–D373 (2014)Google Scholar
  76. 76.
    Raghavachari, B., Tasneem, A., Przytycka, T.M., Jothi, R.: DOMINE: A database of protein domain interactions. Nucleic Acids Res. 36, D656–D661 (2008)Google Scholar
  77. 77.
    Luo, Q., Pagel, P., Vilne, B., Frishman, D.: DIMA 3.0: Domain Interaction Map. Nucleic Acids Res. 39, D724–D729 (2011)Google Scholar
  78. 78.
    Licata, L., Briganti, L., Peluso, D., Perfetto, L., Iannuccelli, M., Galeota, E., Sacco, F., Palma, A., Nardozza, A.P., Santonico, E., Castagnoli, L., Cesareni, G.: MINT, the molecular interaction database: 2012 update. Nucleic Acids Res. 40, D857–D861 (2012)Google Scholar
  79. 79.
    Kerrien, S., Aranda, B., Breuza, L., Bridge, A., Broackes-Carter, F., Chen, C., Duesbury, M., Dumousseau, M., Feuermann, M., Hinz, U., Jandrasits, C., Jimenez, R.C., Khadake, J., Mahadevan, U., Masson, P., Pedruzzi, I., Pfeiffenberger, E., Porras, P., Raghunath, A., Roechert, B., Orchard, S., Hermjakob, H.: The IntAct molecular interaction database in 2012. Nucleic Acids Res. 40, D841–D846 (2012)Google Scholar
  80. 80.
    Orchard, S., Ammari, M., Aranda, B., Breuza, L., Briganti, L., Broackes-Carter, F., Campbell, N.H., Chavali, G., Chen, C., Del-Torn, N., Duesbury, M., Dumousseau, M., Galeota, E., Hinz, U., Iannuccelli, M., Jagannathan, S., Jimenez, R., Khadake, J., Lagreid, A., Licata, L., Lovering, R.C., Meldal, B., Melidoni, A.N., Milagros, M., Peluso, D., Perfetto, L., Porras, P., Raghunath, A., Ricard-Blum, S., Roechert, B., Stutz, A., Tognolli, M., van Roey, K., Cesareni, G., Hermjakob, H.: The MIntAct project–IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res. 42, D358–D363 (2014)Google Scholar
  81. 81.
    Patil, A., Nakai, K., Nakamura, H.: HitPredict: A database of quality assessed protein-protein interactions in nine species. Nucleic Acids Res. 39, D744–D749 (2011)Google Scholar
  82. 82.
    Salwinski, L., Miller, C.S., Smith, A.J., Pettit, F.K., Bowie, J.U., Eisenberg, D.: The Database of Interacting Proteins: 2004 update. Nucleic Acids Res. 32, D449–D451 (2004)Google Scholar
  83. 83.
    Benson, M.L., Smith, R.D., Khazanov, N.A., Dimcheff, B., Beaver, J., Dresslar, P., Nerothin, J., Carlson, H.A.: Binding MOAD, a high-quality protein-ligand database. Nucleic Acids Res. 36, D674–D678 (2008)Google Scholar
  84. 84.
    Chatr-Aryamontri, A., Breitkreutz, B.J., Heinicke, S., Boucher, L., Winter, A., Stark, C., Nixon, J., Ramage, L., Kolas, N., O’Donnell, L., Reguly, T., Breitkreutz, A., Sellam, A., Chen, D., Chang, C., Rust, J., Livstone, M., Oughtred, R., Dolinski, K., Tyers, M.: The BioGRID interaction database: 2013 update. Nucleic Acids Res. 41, D816–D823 (2013)Google Scholar
  85. 85.
    Fernandez-Suarez, X.M., Rigden, D.J., Galperin, M.Y.: The 2014 Nucleic Acids Research Database Issue and an updated NAR online Molecular Biology Database Collection. Nucleic Acids Res. 42, D1–D6 (2014)Google Scholar
  86. 86.
    Blohm, P., Frishman, G., Smialowski, P., Goebels, F., Wachinger, B., Ruepp, A., Frishman, D.: Negatome 2.0: A database of non-interacting proteins derived by literature mining, manual annotation and protein structure analysis. Nucleic Acids Res. 42, D396–D400 (2014)Google Scholar
  87. 87.
    Angiuoli, S.V., Gussman, A., Klimke, W., Cochrane, G., Field, D., Garrity, G., Kodira, C.D., Kyrpides, N., Madupu, R., Markowitz, V., Tatusova, T., Thomson, N., White, O.: Toward an online repository of Standard Operating Procedures (SOPs) for (meta)genomic annotation. OMICS 12, 137–141 (2008)CrossRefGoogle Scholar
  88. 88.
    Glasner, J.D., Plunkett, G., Anderson, B.D., Baumler, D.J., Biehl, B.S., Burland, V., Cabot, E.L., Darling, A.E., Mau, B., Neeno-Eckwall, E.C., Pot, D., Qiu, Y., Rissman, A.I., Worzella, S., Zaremba, S., Fedorko, J., Hampton, T., Liss, P., Rusch, M., Shaker, M., Shaull, L., Shetty, P., Thotakura, S., Whitmore, J., Blattner, F.R., Greene, J.M., Perna, N.T.: Enteropathogen Resource Integration Center (ERIC): bioinformatics support for research on biodefense-relevant enterobacteria. Nucleic Acids Res. 36, D519–D523 (2008)Google Scholar
  89. 89.
    Kolker, E., Picone, A.F., Galperin, M.Y., Romine, M.F., Higdon, R., Makarova, K.S., Kolker, N., Anderson, G.A., Qiu, X., Auberry, K.J., Babnigg, G., Beliaev, A.S., Edlefsen, P., Elias, D.A., Gorby, Y.A., Holzman, T., Klappenbach, J.A., Konstantinidis, K.T., Land, M.L., Lipton, M.S., McCue, L.A., Monroe, M., Pasa-Tolic, L., Pinchuk, G., Purvine, S., Serres, M.H., Tsapin, S., Zakrajsek, B.A., Zhu, W., Zhou, J., Larimer, F.W., Lawrence, C.E., Riley, M., Collart, F.R., Yates, J.R., Smith, R.D., Giometti, C.S., Nealson, K.H., Fredrickson, J.K., Tiedje, J.M.: Global profiling of Shewanella oneidensis MR-1: Expression of hypothetical genes and improved functional annotations. Proc. Natl. Acad. Sci. USA 102, 2099–2104 (2005)CrossRefGoogle Scholar
  90. 90.
    Pedruzzi, I., Rivoire, C., Auchincloss, A.H., Coudert, E., Keller, G., de Castro, E., Baratin, D., Cuche, B.A., Bougueleret, L., Poux, S., Redaschi, N., Xenarios, I., Bridge, A.: HAMAP in 2013, new developments in the protein family classification and annotation system. Nucleic Acids Res 41, D584–D589 (2013)Google Scholar
  91. 91.
    Roberts, R.J., Chang, Y.C., Hu, Z., Rachlin, J.N., Anton, B.P., Pokrzywa, R.M., Choi, H.P., Faller, L.L., Guleria, J., Housman, G., Klitgord, N., Mazumdar, V., McGettrick, M.G., Osmani, L., Swaminathan, R., Tao, K.R., Letovsky, S., Vitkup, D., Segre, D., Salzberg, S.L., Delisi, C., Steffen, M., Kasif, S.: COMBREX: A project to accelerate the functional annotation of prokaryotic genomes. Nucleic Acids Res. 39, D11–D14 (2011)Google Scholar
  92. 92.
    Anton, B.P., Chang, Y.C., Brown, P., Choi, H.P., Faller, L.L., Guleria, J., Hu, Z., Klitgord, N., Levy-Moonshine, A., Maksad, A., Mazumdar, V., McGettrick, M., Osmani, L., Pokrzywa, R., Rachlin, J., Swaminathan, R., Allen, B., Housman, G., Monahan, C., Rochussen, K., Tao, K., Bhagwat, A.S., Brenner, S.E., Columbus, L., de Crecy-Lagard, V., Ferguson, D., Fomenkov, A., Gadda, G., Morgan, R.D., Osterman, A.L., Rodionov, D.A., Rodionova, I.A., Rudd, K.E., Soll, D., Spain, J., Xu, S.Y., Bateman, A., Blumenthal, R.M., Bollinger, J.M., Chang, W.S., Ferrer, M., Friedberg, I., Galperin, M.Y., Gobeill, J., Haft, D., Hunt, J., Karp, P., Klimke, W., Krebs, C., Macelis, D., Madupu, R., Martin, M.J., Miller, J.H., O’Donovan, C., Palsson, B., Ruch, P., Setterdahl, A., Sutton, G., Tate, J., Yakunin, A., Tchigvintsev, D., Plata, G., Hu, J., Greiner, R., Horn, D., Sjolander, K., Salzberg, S.L., Vitkup, D., Letovsky, S., Segre, D., DeLisi, C., Roberts, R.J., Steffen, M., Kasif, S.: The COMBREX project: Design, methodology, and initial results. PLoS Biol. 11, e1001638 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Michael Y. Galperin
    • 1
  • Eugene V. Koonin
    • 1
  1. 1.National Center for Biotechnology Information, National Library of MedicineNational Institutes of HealthBethesdaUSA

Personalised recommendations