Comparative Genomics pp 35-56 | Cite as
Comparative Genome Analysis in the Integrated Microbial Genomes (IMG) System
Summary
Comparative genome analysis is critical for the effective exploration of a rapidly growing number of complete and draft sequences for microbial genomes. The Integrated Microbial Genomes (IMG) system (img.jgi.doe.gov) has been developed as a community resource that provides support for comparative analysis of microbial genomes in an integrated context. IMG allows users to navigate the multidimensional microbial genome data space and focus their analysis on a subset of genes, genomes, and functions of interest. IMG provides graphical viewers, summaries, and occurrence profile tools for comparing genes, pathways, and functions (terms) across specific genomes. Genes can be further examined using gene neighborhoods and compared with sequence alignment tools.
Key Words
Comparative genome data analysis integrated microbial genomes occurrence profiles microbial genome data management comparative genome data analysis gene occurrence profile functional occurrence profile gene model validation integrated microbial genomesNotes
Acknowledgments
We thank Krishna Palaniappan, Ernest Szeto, Frank Korzeniewski, Iain Anderson, Natalia Ivanova, Athanasios Lykidis, Kostas Mavrommatis, Phil Hugenholtz, Anu Padki, Kristen Taylor, Xueling Zhao, Shane Brubaker, Greg Werner, and Inna Dubchak for their contribution to the development and maintenance of IMG. With their comments and suggestions, Krishna Palaniappan and Iain Anderson helped improve the examples in this chapter. Eddy Rubin and James Bristow provided, support, advice, and encouragement throughout the IMG project. IMG uses tools and data from a number of publicly available resources, their availability and value is gratefully acknowledged. The work presented in this paper was supported by the Director, Office of Science, Office of Biological and Environmental Research, Life Sciences Division, US Department of Energy under contract no. DE-AC03-76SF00098.
References
- 1.Liolios, K., Tavernarakis, N., Hugenholtz, P., and Kyrpides, N. C. (2006) The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide. Nucleic Acid Res. 34, D332–D334.CrossRefPubMedGoogle Scholar
- 2.Bateman, A., Coin, L., Durbin, R., et al. (2004) The Pfam Protein Families Database. Nucleic Acids Res. 32, D138–D141.CrossRefPubMedGoogle Scholar
- 3.Mulder, N. J., Apweiler, R., Attwood, T. K., et al. (2005) InterPro, progress and status in 2005. Nucleic Acids Res. 33, D201–D205.CrossRefPubMedGoogle Scholar
- 4.Tatusov, R. L., Koonin, E. V., and Lipman, D. J. (1997) A genomic perspective on protein families. Science 278, 631–637.CrossRefPubMedGoogle Scholar
- 5.Marchler-Bauer, A., Panchenko, A. R., Shoemaker, B. A., Thiessen, P. A., Geer, L. Y., and Bryant, S. H. (2002) CDD: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Res. 30, 281–283.CrossRefPubMedGoogle Scholar
- 6.Kanehisa, M., Goto, S., Kawashima, S. Okuno, Y., and Hattori, M. (2004) The KEGG resource for deciphering the genome. Nucleic Acids Res. 32, D277–D280.CrossRefPubMedGoogle Scholar
- 7.Gene Ontology Consortium. (2004) The Gene Ontology Database and Informatics Resource. Nucleic Acids Res. 32, 258–261.Google Scholar
- 8.Kersey, P., Bower, L., Morris, L., et al., (2005) Integr8 and genome reviews: integrated views of complete genomes and proteomes. Nucleic Acid Res. 33, D297–D302.CrossRefPubMedGoogle Scholar
- 9.Pruitt, K. D., Tatusova, T., and Maglott, D. R. (2005) NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts, and proteins. Nucleic Acid Res. 33, D501–D504.CrossRefPubMedGoogle Scholar
- 10.Bowers, P. M., Pellegrini, M., Thompson, M. J., Fierro, J., Yeates, T. O., and Eisenberg, D. (2004) Prolinks: a database of protein functional linkages derived from coevolution. Genome Biol. 5, R35.CrossRefPubMedGoogle Scholar
- 11.Hauser, L., Larimer, F., Land, M., Shah, M., and Uberbacher, E. (2004) Analysis and annotation of microbial genome sequences. Genet. Eng. 26, 225–238.Google Scholar
- 12.Markowitz, V. M., Korzeniewski, F., Palaniappan, K., et al. (2006) The Integrated Microbial Genomes (IMG) system. Nucleic Acids Res. 34, D344–D348.CrossRefPubMedGoogle Scholar
- 13.BioPAX. (2006) Biological Pathways Exchange. http://www.biopax.org/.
- 14.Pellegrini, M., Marcotte, E. M., Thompson, M. J., Eisenberg, D., and Yeates, T. O. (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl. Acad. Sci. 96, 4285–4288.CrossRefPubMedGoogle Scholar
- 15.Osterman, A. and Overbeek, R. (2003) Missing genes in metabolic pathways: a comparative genomic approach. Chem. Biol. 7, 238–251.Google Scholar
- 16.Overbeek, R., Larsen, N., Pusch, G. D., et al. (2000) WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction. Nucleic Acids Res. 28, 123–125.CrossRefPubMedGoogle Scholar
- 17.Overbeek, R., Larsen, N., Walunas, T., et al. (2003) The ERGO genome analysis and discovery system. Nucleic Acid Res. 31, 164–171.CrossRefPubMedGoogle Scholar
- 18.Uchiyama, I. (2003) MBGD: microbial genome database for comparative analysis. Nucleic Acid Res. 31, 58–62.CrossRefPubMedGoogle Scholar
- 19.Overbeek, R., Begley, T., Butler, R. M., et al. (2005) The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acid Res. 33, 5691–5702.CrossRefPubMedGoogle Scholar
- 20.Alm, E. J., Huang, K. H., Price, M. N., et al. (2005) The microbes online web site for comparative genomics. Genome Res. 15, 1015–1022.CrossRefPubMedGoogle Scholar
- 21.Maltsev, N., Glass, E., Sulakhe, D., et al. (2006) PUMA2: grid-based high-throughput analysis of genomes and metabolic pathways. Nucleic Acids Res. 34, D369–D372.CrossRefPubMedGoogle Scholar