Annotation and Comparative Genomics of Prokaryotes Made Easy
  • Alessandro Romualdi
  • Marius Felder
  • Dominic Rose
  • Ulrike Gausmann
  • Markus Schilhabel
  • Gernot Glöckner
  • Matthias Platzer
  • Jürgen Sühnel
Part of the Methods in Molecular Biology™ book series (MIMB, volume 395)


GenColors ( is a new web-based software/database system aimed at an improved and accelerated annotation of prokaryotic genomes considering information on related genomes and making extensive use of genome comparison. It offers a seamless integration of data from ongoing sequencing projects and annotated genomic sequences obtained from GenBank. A variety of export/import filters manages an effective data flow from sequence assembly and manipulation programs (e.g., GAP4) to GenColors and back as well as to standard GenBank file(s). The genome comparison tools include best bidirectional hits, gene conservation, syntenies, and gene core sets. Precomputed UniProt matches allow annotation and analysis in an effective manner. In addition to these analysis options, base-specific quality data (coverage and confidence) can also be handled if available. The GenColors system can be used both for annotation purposes in ongoing genome projects and as an analysis tool for finished genomes. GenColors comes in two types, as dedicated genome browsers and as the Jena Prokaryotic Genome Viewer (JPGV). Dedicated genome browsers contain genomic information on a set of related genomes and offer a large number of options for genome comparison. The system has been efficiently used in the genomic sequencing of Borrelia garinii and is currently applied to various ongoing genome projects on Borrelia, Legionella, Escherichia, and Pseudomonas genomes. One of these dedicated browsers, the Spirochetes Genome Browser ( with Borrelia, Leptospira, and Treponema genomes, is freely accessible. The others will be released after finalization of the corresponding genome projects. JPGV ( offers information on almost all finished bacterial genomes, as compared to the dedicated browsers with reduced genome comparison functionality, however. As of January 2006, this viewer includes 632 genomic elements (e.g., chromosomes and plasmids) of 293 species. The system provides versatile quick and advanced search options for all currently known prokaryotic genomes and generates circular and linear genome plots. Gene information sheets contain basic gene information, database search options, and links to external databases. GenColors is also available on request for local installation.


Genome analysis genome comparison bioinformatics prokaryotic genomes 



The help of Kerstin Wagner in setting up and maintaining the SGB external link page as well as in icon design is gratefully acknowledged. We are also grateful to Andreas Petzold who has contributed code to GenColors. This work was supported by the grants 0312704E and 0313652D of the German Ministry for Education and Research.


  1. 1.
    Fleischmann, R. D., Adams, M. D., White, O., et al. (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269, 496–512.CrossRefPubMedGoogle Scholar
  2. 2.
    Fraser, C. M., Gocayne, J. D., White, O., et al. (1995) The minimal gene complement of Mycoplasma genitalium. Science 270, 397–403.CrossRefPubMedGoogle Scholar
  3. 3.
    Bernal, A., Ear, U., and Kyrpides, N. (2001) Genomes OnLine Database (GOLD): a monitor of genome projects world-wide. Nucleic Acids Res. 29, 126–127.CrossRefPubMedGoogle Scholar
  4. 4.
    Thomson, N., Sebaihia, M., Cerdeno-Tarraga, A., Bentley, S., Crossman, L., and Parkhill, J. (2003) The value of comparison. Nat. Rev. Microbiol. 1, 11–12.CrossRefPubMedGoogle Scholar
  5. 5.
    Bentley, S. D. and Parkhill, J. (2004) Comparative genomic structure of prokaryotes. Annu. Rev. Genet. 38, 771–792.CrossRefPubMedGoogle Scholar
  6. 6.
    Fouts, D. E., Mongodin, E. F., Mandrell, R. E., et al. (2005) Major structural differences and novel potential virulence mechanisms from the genomes of multiple campylobacter species. PLoS Biol. 3, e15.CrossRefPubMedGoogle Scholar
  7. 7.
    Romualdi, A., Siddiqui, R., Glöckner, G., Lehmann, R., and Sühnel, J. (2005) GenColors: accelerated comparative analysis and annotation of prokaryotic genomes at various stages of completeness. Bioinformatics 21, 3669–3671.CrossRefPubMedGoogle Scholar
  8. 8.
    Stajich, J. E., Block, D., Boulez, K., et al. (2002) The Bioperl Toolkit: Perl modules for the life sciences. Genome Res. 12, 1611–1618.CrossRefPubMedGoogle Scholar
  9. 9.
    Rice, P., Longden, I., and Bleasby, A. (2000) EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 16, 276–277.CrossRefPubMedGoogle Scholar
  10. 10.
    Wu, C. H., Apweiler, R., Bairoch, A., et al. (2006) The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res. 34, D187–D191.CrossRefPubMedGoogle Scholar
  11. 11.
    Altschul, S. F., Madden, T. L., Schaffer, A. A., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.CrossRefPubMedGoogle Scholar
  12. 12.
    Tatusov, R. L., Koonin, E. V., and Lipman, D. J. (1997) A genomic perspective on protein families. Science 278, 631–637.CrossRefPubMedGoogle Scholar
  13. 13.
    Glöckner, G., Lehmann, R., Romualdi, A., et al. (2004) Comparative analysis of the Borrelia garinii genome. Nucleic Acids Res. 32, 6038–6046.CrossRefPubMedGoogle Scholar
  14. 14.
    Needleman, S. B. and Wunsch, C. D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453.CrossRefPubMedGoogle Scholar
  15. 15.
    Gonzales, M. J., Dugan, J. M., and Shafer, R. W. (2002) Synonymous-non-synonymous mutation rates between sequences containing ambiguous nucleotides (Syn-SCAN). Bioinformatics 18, 886–887.CrossRefPubMedGoogle Scholar
  16. 16.
    Nei, M. and Gojobori, T. (1986) Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3, 418–426.PubMedGoogle Scholar
  17. 17.
    Sharp, P. M. and Matassi, G. (1994) Codon usage and genome evolution. Curr. Opin. Genet. Dev. 4, 851–860.CrossRefPubMedGoogle Scholar
  18. 18.
    Supek, F. and Vlahovicek, K. (2005) Comparison of codon usage measures and their applicability in prediction of microbial gene expressivity. BMC Bioinformatics 6, 182.CrossRefPubMedGoogle Scholar
  19. 19.
    Passarge, E., Horsthemke, B., and Farber, R. A. (1999) Incorrect use of the term synteny. Nat. Genet. 23, 387.CrossRefPubMedGoogle Scholar
  20. 20.
    Clark, M. S. (1999) Comparative genomics: the key to understanding the Human Genome Project. Bioessays 21, 121–130.CrossRefPubMedGoogle Scholar
  21. 21.
    Birney, E., Andrews, D., Caccamo, M., et al. (2006) Ensembl 2006. Nucleic Acids Res. 34, D556–D561.CrossRefPubMedGoogle Scholar
  22. 22.
    Nascimento, A. L., Ko, A. I., Martins, E. A., et al. (2004) Comparative genomics of two Leptospira interrogans serovars reveals novel insights into physiology and pathogenesis. J. Bacteriol. 186, 2164–2172.CrossRefPubMedGoogle Scholar
  23. 23.
    Ewing, B. and Green, P. (1998) Base-calling of automated sequencer traces using Phred. II. Error probabilities. Genome Res. 8, 186–194.PubMedGoogle Scholar
  24. 24.
    Mulder, N. J., Apweiler, R., Attwood, T. K., et al. (2005) InterPro, progress and status in 2005. Nucleic Acids Res. 33, D201–D205.CrossRefPubMedGoogle Scholar
  25. 25.
    Harris, M. A., Clark. J., Ireland, A., and Gene Ontology Consortium. (2004) The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32, D258–D261.Google Scholar
  26. 26.
    Berman, H. M., Westbrook, J., Feng, Z., et al. (2000) The Protein Data Bank. Nucleic Acids Res. 28, 235–242.CrossRefPubMedGoogle Scholar
  27. 27.
    Hulo, N., Sigrist, C. J., Le Saux, V., et al. (2004) Recent improvements to the PROSITE database. Nucleic Acids Res. 32, D134–D137.CrossRefPubMedGoogle Scholar
  28. 28.
    Bonfield, J. K., Smith, K., and Staden, R. (1995) A new DNA sequence assembly program. Nucleic Acids Res. 23, 4992–4999.CrossRefPubMedGoogle Scholar
  29. 29.
    Blattner, F. R., Plunkett, G. 3rd, Bloch, C. A., et al. (1997) The complete genome sequence of Escherichia coli K-12. Science 277, 1453–1474.CrossRefPubMedGoogle Scholar
  30. 30.
    Freemann, J. M., Plasterer, T. N., Smith, T. F., and Mohr, S. C. (1998) Patterns of genome organization in bacteria. Science 279, 1827a.CrossRefGoogle Scholar
  31. 31.
    Gattiker, A., Michoud, K., Rivoire, C., et al. (2003) Automated annotation of microbial proteomes in SWISS-PROT. Comput. Biol. Chem. 27, 49–58.CrossRefPubMedGoogle Scholar
  32. 32.
    Degnan, P. H., Lazarus, A. B., and Wernegreen, J. J. (2005) Genome sequence of Blochmannia pennsylvanicus indicates parallel evolutionary trends among bacterial mutualists of insects. Genome Res. 15, 1023–1033.CrossRefPubMedGoogle Scholar
  33. 33.
    Kerkhoven, R., van Enckevort, F. H., Boekhorst, J., Molenaar, D., and Siezen, R. J. (2004) Visualization for genomics: the Microbial Genome Viewer. Bioinformatics 20, 1812–1814.CrossRefPubMedGoogle Scholar
  34. 34.
    Meyer, F., Goesmann, A., McHardy, A. C., et al. (2003) GenDB: an open source genome annotation system for prokaryote genomes. Nucleic Acids Res. 31, 2187–2195.CrossRefPubMedGoogle Scholar
  35. 35.
    Ghai, R., Hain, T., and Chakraborty, T. (2004) GenomeViz: visualizing microbial genomes. BMC Bioinformatics 5, 198.CrossRefPubMedGoogle Scholar
  36. 36.
    Vallenet, D., Labarre, L., Rouy, Z., et al. (2006) MaGe: a microbial genome annotation system supported by synteny results. Nucleic Acids Res. 34, 53–65.CrossRefPubMedGoogle Scholar
  37. 37.
    Alm, E. J., Huang, K. H., Price, M. N., et al. (2005) The MicrobesOnline Web site for comparative genomics. Genome Res. 15, 1015–1022.CrossRefPubMedGoogle Scholar
  38. 38.
    Leader, D. P. (2004) BugView: a browser for comparing genomes. Bioinformatics 20, 129–130.CrossRefPubMedGoogle Scholar
  39. 39.
    Markowitz, V. M., Korzeniewski, F., Palaniappan, K., et al. (2006) The integrated microbial genomes (IMG) system. Nucleic Acids Res. 34, D344–D348.CrossRefPubMedGoogle Scholar
  40. 40.
    Berriman, M. and Rutherford, K. (2003) Viewing and annotating sequence data with Artemis. Brief. Bioinformatics 4, 124–132.CrossRefPubMedGoogle Scholar
  41. 41.
    Field, D., Feil, E. J., and Wilson, G. A. (2005) Databases and software for the comparison of prokaryotic genomes. Microbiology 51, 2125–2132.CrossRefGoogle Scholar
  42. 42.
    Gogarten, J. P. and Townsend, J. P. (2005) Horizontal gene transfer, genome innovation and evolution. Nat. Rev. Microbiol. 3, 679–687.CrossRefPubMedGoogle Scholar
  43. 43.
    Waack, S., Keller, O., Asper, R., et al. (2006) Score-based prediction of genomic islands in prokaryotic genomes using hidden Markov models. BMC Bioinformatics 7, 142.CrossRefPubMedGoogle Scholar

Copyright information

© Humana Press Inc. 2007

Authors and Affiliations

  • Alessandro Romualdi
    • 1
  • Marius Felder
    • 1
  • Dominic Rose
    • 2
  • Ulrike Gausmann
    • 3
  • Markus Schilhabel
    • 3
  • Gernot Glöckner
    • 3
  • Matthias Platzer
    • 3
  • Jürgen Sühnel
    • 1
  1. 1.Biocomputing GroupLeibniz Institute for Age Research – Fritz Lipmann Institute, Jena Centre for BioinformaticsJenaGermany
  2. 2.Bioinformatics Group, Department of Computer ScienceUniversity of LiepzigLiepzigGermany
  3. 3.Genome Analysis GroupLeibniz Institute for Age Research – Fritz Lipmann Institute, Jena Centre for BioinformaticsJenaGermany

Personalised recommendations