Abstract
Comparative gene finding is a fairly new and emerging field within computational biology. The new generation of gene finders has a considerable number of advantages over its single species predecessors, including higher prediction accuracy, and the ability to annotate more varying gene features that previously have eluded computational approaches. In Chap. 2 we described some of the most common algorithms used as main algorithms in single species gene finding. In this chapter we exemplify some of the corresponding algorithms in comparative gene finding, ranging from similarity based techniques, to pair hidden Markov models and generalized pair hidden Markov models, to gene mapping. Last we present some of the first attempts to extend the pairwise approaches to multiple sequence gene finding. Each section is finished off by an example of a gene finding software using the method in question.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Alexandersson, M., Cawley, S., Pachter, L.: SLAM: cross-species gene finding and alignment with a generalized pair hidden Markov model. Genome Res. 13, 496–502 (2003)
Altschul, S.F., Gish, W., Miller, W., Myers, E.M., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)
Ansari-Lari, M.A., Oeltjen, J.C., Schwartz, S., Zhang, Z., Muzny, D.M., Lu, J., Gorrell, J.H., Chinault, A.C., Belmont, J.W., Miller, W., Gibbs, R.A.: Comparative sequence analysis of a gene-rich cluster at human chromosome 12p13 and its syntenic region in mouse chromosome 6. Genome Res. 8, 29–40 (1998)
Bafna, V., Huson, D.H.: The conserved exon method for gene finding. Proc. Int. Conf. Intell. Syst. Mol. Biol. 8, 3–12 (2000)
Batzoglou, S., Pachter, L., Mesirov, J., Berger, B., Lander, E.S.: Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Res. 10, 950–958 (2000)
Birney, E., Clamp, M., Durbin, R.: GeneWise and Genomewise. Genome Res. 14, 988–995 (2004)
Burge, C.: Identification of genes in human genomic DNA. Ph.D. thesis, Stanford University, Stanford, CA (1997)
Burge, C.B.: Modeling dependencies in pre-mRNA splicing signals. In: Salzberg, S.L., Searls, D.B., Kasif, S. (eds.) Computational Methods in Molecular Biology, pp. 109–128. Elsevier Science B.V., Amsterdam (1998)
Burge, C., Karlin, S.: Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997)
Chatterji, S., Pachter, L.: Reference based annotation with GeneMapper. Genome Biol. 7, R29 (2006)
Dewey, C., Wu, J.Q., Cawley, S., Alexandersson, M., Gibbs, R., Pachter, L.: Accurate identification of novel human genes through simultaneous gene prediction in human, mouse, and rat. Genome Res. 14, 661–664 (2004)
Gelfand, M.S., Mironov, A.A., Pevzner, P.A.: Gene recognition via spliced sequence alignment. Proc. Natl. Acad. Sci. USA 93, 9061–9066 (1996)
Gish, W., States, D.J.: Identification of protein coding regions by database similarity search. Nat. Genet. 3, 266–272 (1993)
Gross, S.S., Brent, M.R.: Using multiple alignments to improve gene prediction. J. Comput. Biol. 13, 379–393 (2006)
Hardison, R.C., Oeltjen, J., Miller, W.: Long human–mouse sequence alignments reveal novel regulatory elements: a reason to sequence the mouse genome. Genome Res. 7, 959–966 (1997)
Hirschberg, D.S.: A linear space algorithm for the computing maximal common subsequences. Commun. ACM 18, 341–343 (1975)
Kent, W.J.: BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002)
Kim, N., Shin, S., Lee, S.: ECgene: genome-based EST clustering and gene modeling for alternative splicing. Genome Res. 15, 566–576 (2005)
Korf, I., Flicek, P., Duan, D., Brent, M.R.: Integrating genomic homology into gene structure prediction. Bioinformatics 17, S140–S148 (2001)
Krogh, A.: Using database matches with HMMGene for automated gene detection in Drosophila. Genome Res. 10, 523–528 (2000)
Kulp, D., Haussler, D., Reese, M.G., Eeckman, F.H.: A generalized hidden Markov model for the recognition of human genes in DNA. Proc. Int. Conf. Intell. Syst. Mol. Biol. 4, 134–142 (1996)
Kulp, D., Haussler, D., Reese, M.G., Eeckman, F.H.: Integrating database homology in a probabilistic gene structure model. Pac. Symp. Biocomput. 2, 232–244 (1997)
Levine, A.: StrataSplice at http:www.sanger.ac.uk Software analysis stratasplice
Meyer, I.M., Durbin, R.: Comparative ab initio prediction of gene structures using pair HMMs. Bioinformatics 18, 1309–1318 (2002)
Meyer, I.M., Durbin, R.: Gene structure conservation aids similarity based gene prediction. Nucleic Acids Res. 32, 776–783 (2004)
Mouse Genome Sequencing Consortium: Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002)
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)
Pachter, L., Batzoglou, S., Spitkovsky, V.I., Banks, E., Lander, E.S., Kleitman, D.J., Berger, B.: A dictionary based approach for gene annotation. J. Comput. Biol. 6, 419–430 (1999)
Pachter, L., Alexandersson, M., Cawley, S.: Applications of generalized pair hidden Markov models to alignment and gene finding problems. J. Comput. Biol. 9, 389–399 (2002)
Parra, G., Agarwal, P., Abril, J.F., Wiehe, T., Fickett, J.W., Guigó, R.: Comparative gene prediction in human and mouse. Genome Res. 13, 108–117 (2003)
Rat Genome Sequencing Consortium: Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428, 493–521 (2004)
Schwartz, S., Zhang, Z., Frazer, K.A., Smit, A., Riemer, C., Bouck, J., Gibbs, R., Hardison, R., Miller, W.: PipMaker—a web server for aligning two genomic DNA sequences. Genome Res. 10, 577–586 (2000)
Siepel, A., Haussler, D.: Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. Mol. Biol. Evol. 21, 468–488 (2004)
Smit, A.F.A., Hubley, R., Green, P.: RepeatMasker at http:www.repeatmasker.org
Snyder, E.E., Stormo, G.D.: Identification of protein coding regions in genomic DNA. J. Mol. Biol. 248, 1–18 (1995)
Wu, T.D., Watanabe, C.K.: GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005)
Xu, Y., Mural, R.J., Einstein, J.R., Shah, M.B., Uberbacher, E.C.: GRAIL: a multi-agent neural network system for gene identification. Proc. IEEE 84, 1544–1552 (1996)
Xu, Y., Uberbacher, E.C.: In: Salzberg, S.L., Searls, D.B., Kasif., S. (eds.) Computational Methods in Molecular Biology, pp. 109–128. Elsevier Science B.V., Amsterdam (1998)
Yeh, R.F., Lim, L.P., Burge, C.B.: Computational inference of homologous gene structures in the human genome. Genome Res. 11, 803–816 (2001)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2010 Springer-Verlag London
About this chapter
Cite this chapter
Axelson-Fisk, M. (2010). Comparative Gene Finding. In: Comparative Gene Finding. Computational Biology, vol 11. Springer, London. https://doi.org/10.1007/978-1-84996-104-2_4
Download citation
DOI: https://doi.org/10.1007/978-1-84996-104-2_4
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-84996-103-5
Online ISBN: 978-1-84996-104-2
eBook Packages: Computer ScienceComputer Science (R0)