Recent advances in genome sequencing technology and algorithms have made it possible to determine the sequence of a whole genome quickly in a cost-effective manner. As a result, there are more than 200 completely sequenced genomes. However, annotation of a genome is still a challenging task. One of the most effective methods to annotate a newly sequenced genome is to compare it with well-annotated and closely related genomes using computational tools and databases. Comparing genomes requires use of a number of computational tools and produces a large amount of output, which should be analyzed by genome annotators. Because of this difficulty, genome projects are mostly carried out at large genome sequencing centers. To alleviate the requirement for expert knowledge in computational tools and databases, we have developed a web-based genome annotation system, called CGAS (a comparative genome annotation system; http://platcom.org/CGAS). This chapter describes how to use CGAS and necessary background knowledge on the computational tools and resources. As an example, a Bacillus subtilis genome is considered as an unannotated target genome and compared with several reference genomes, including Bacillus halodurans, Oceanobacillus iheyensis HTE831, and Bacillus cereus group genomes (representative strain of Bacillus. cereus, Bacillus anthracis).
Key WordsComparative genomics genome annotation Bidirectional Best Hit (BBH) sequence clustering protein domain genome context
This research was partially by NSF CAREER Award DBI-0237901 INGEN (Indiana Genomics Initiatives), and AVIDD (Analysis and Visualization of Instrument-Driven Data) Linux cluster.
- 10.Kim, S., Choi, J. -H., and Yang, J. (2005) Gene teams with relaxed proximity constraint. Proc. IEEE Comput. Syst. Bioinform. Conf. 44–55.Google Scholar