Genome Mapping Statistics and Bioinformatics
The unprecedented availability of genome sequences, coupled with user-friendly, web-enabled search and analysis tools allows practitioners to locate interesting genome features or sequence tracts with relative ease. Although many public model organism- and genome-mapping resources offer pre-mapped genome browsing, biologists also still need to perform de novo mapping analyses. Correct interpretation of the results in genome annotation databases or the results of one’s individual analyses requires at least a conceptual understanding of the statistics and mechanics of genome searches, the expected results from statistical considerations, as well as the algorithms used by different search tools. This chapter introduces the basic statistical results that underlie mapping of nucleotide sequences to genomes and briefly surveys the common programs and algorithms that are used to perform genome mapping, all available via public hosted web sites. Selection of the appropriate sequence search and mapping tool will often demand tradeoffs in sensitivity and specificity relating to the statistics of the search.
Key WordsBioinformatics genome annotation genome mapping genomics human genome mammalian genome sequence alignment sequence analysis sequence search
- 1.Waterman, M. S. (1995) Introduction to Computational Biology. London, Chapman & Hall.Google Scholar
- 2.Ewens, W. J., and Grant, G. R. (2001) Statistical Methods in Bioinformatics. New York, Springer-Verlag.Google Scholar
- 4.Kent, W. J. (2002) BLAT-the BLAST-like alignment tool. Genome Res. 12, 6–664.Google Scholar
- 21.Korf, I., Yandell, M., and Bedell, B. (2003) BLAST. Sebastopol, O’Reilly & Associates.Google Scholar