Gene Function Analysis pp 93-108
Sybil: Methods and Software for Multiple Genome Comparison and Visualization
With the successful completion of genome sequencing projects for a variety of model organisms, the selection of candidate organisms for future sequencing efforts has been guided increasingly by a desire to enable comparative genomics. This trend has both depended on and encouraged the development of software tools that can elucidate and capitalize on the similarities and differences between genomes. “Sybil,” one such tool, is a primarily web-based software package whose primary goal is to facilitate the analysis and visualization of comparative genome data, with a particular emphasis on protein and gene cluster data. Herein, a two-phase protein clustering algorithm, used to generate protein clusters suitable for analysis through Sybil and a method for creating graphical displays of protein or gene clusters that span multiple genomes are described. When combined, these two relatively simple techniques provide the user of the Sybil software (The Institute for Genomic Research [TIGR] Bioinformatics Department) with a browsable graphical display of his or her “input” genomes, showing which genes are conserved based on the parameters supplied to the protein clustering algorithm. For any given protein cluster the graphical display consists of a local alignment of the genomes in which the clustered genes are located. The genomes are arranged in a vertical stack, as in a multiple alignment, and shaded areas are used to connect genes in the same cluster, thus displaying conservation at the protein level in the context of the underlying genomic sequences. The authors have found this display—and slight variants thereof—useful for a variety of annotation and comparison tasks, ranging from identifying “missed” gene models or single-exon discrepancies between orthologous genes, to finding large or small regions of conserved gene synteny, and investigating the properties of the breakpoints between such regions.
Key WordsBioinformatics Bioperl comparative genomics ortholog paralog protein clustering visualization
- 10.Jaccard, P. (1908) Nouvelles recherches sur la distribution florale. Bull. Soc. Vaud. Sci. Nat. 44, 223–270.Google Scholar
- 14.Lewis, S. E., Searle, S. M. J., Harris, N., et al. (2002) Apollo: a sequence annotation editor. Genome Biol. 3(12), RESEARCH0082.Google Scholar
- 15.Gish, W. (1996–2005) http://blast.wustl.edu.
- 18.Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. (2001) Minimum Spanning Trees, in Introduction to Algorithms, 2nd ed., MIT Press and McGraw-Hill, pp. 561–579.Google Scholar
- 19.Chado—The GMOD Database Schema. http://www.gmod.org/schema.
- 20.GMOD—Generic Software Components for Model Organism Databases. http://www.gmod.org.
- 21.BSML: Bioinformatic Sequence Markup Language. http://www.bsml.org.
- 27.Scalable Vector Graphics (SVG). http://www.w3.org/Graphics/SVG/.
- 28.Batik SVG Toolkit. http://xmlgraphics.apache.org/batik/.