Abstract
Identifying protein coding genes is one of most important task in newly sequenced genomes. With increasing numbers of gene annotations verified by experiments, it is feasible to identify genes in newly sequenced genomes by comparing with genes annotated on phylogenetically close organisms. Here, we propose a program, GeneAlign, which predicts the genes on one sequence by measuring the similarity between the predicted sequence and related genes annotated on another genome. The program applies CORAL, a heuristic linear time alignment tool, to determine whether the regions flanked by candidate signals are similar with the annotated exons or not. The approach, which employs the conservation of gene structures and sequence homologies between protein coding regions, increases the prediction accuracy. GeneAlign was tested on Projector data set of 449 human-mouse homologous sequence pairs. At the gene level, the sensitivity and specificity of GeneAlign are 80%, and larger than 96% at the exon level.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Alexandersson, M., Cawley, S., Pachter, L.: SLAM: cross-organisms gene finding and alignment with a generalized pair hidden Markov model. Genome Res. 13, 496–502 (2003)
Allen, J.E., Pertea, M., Salzberg, S.L.: Computational gene prediction using multiple sources of evidence. Genome Res. 14, 142–148 (2004)
Allen, J.E., Salzberg, S.L.: JIGSAW: integration of multiple sources of evidence for gene prediction. Bioinformatics 21, 3596–3606 (2005)
Batzoglou, S., Pachter, L., Mesirovi, J.P., Berger, B., Lander, E.S.: Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Res. 10, 950–958 (2000)
Bernal, A., Ear, U., Kyrpides, N.: Genomes OnLine Database (GOLD): a monitor of genome projects world-wide. Nucleic Acids Res. 29, 126–127 (2001)
Birney, E., Clamp, M., Durbin, R.: GeneWise and Genomewise. Genome Res. 14, 988–995 (2004)
Brejova, B., Brown, D.G., Li, M., Vinar, T.: ExonHunter: a comprehensive approach to gene finding. Bioinformatics 21, 57–65 (2005)
Brendel, V., Xing, L., Zhu, W.: Gene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus. Bioinformatics 20, 1157–1169 (2004)
Brent, M.R., Buigo, R.: Recent advances in gene structure prediction. Curr. Opin. Struct. Biol. 14, 264–272 (2004)
Burge, C., Karlin, S.: Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997)
Burset, M., Guigó, R.: Evaluation of gene structure prediction programs. Genomics 34, 353–367 (1996)
Chen, T.M., Lu, C.C., Li, W.H.: Prediction of splice sites with dependency graphs and their expanded bayesian networks. Bioinformatics 21, 471–482 (2005)
Curwen, V., Eyras, E., Andrews, T.D., Clarke, L., Mongin, E., Searle, S.M., Clamp, M.: The Ensembl automatic gene annotation system. Genome Res. 14, 942–950 (2004)
Florea, L., Hartzell, G., Zhang, Z., Rubin, G.M., Miller, W.: A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res. 8, 967–974 (1998)
Gelfand, M.S., Mironov, A.A., Pevzner, P.A.: Gene recognition via spliced sequence alignment. Proc. Natl. Acad. Sci. 93, 9061–9066 (1996)
Hsieh, S.J., Lin, C.Y., Chung, Y.S., Tang, C.Y.: Comparative exon prediction based on heuristic coding region alignment. In: Proceedings of the International Symposium on Parallel Architectures, Algorithms, and Networks, pp. 14–19 (2005)
Karolchik, D., Baertsch, R., Diekhans, M., Furey, T.S., Hinrichs, A., Lu, Y.T., Roskin, K.M., Schwartz, M., Sugnet, C.W., Thomas, D.J., et al.: The UCSC Genome Browser Database. Nucleic Acids Res. 31, 51–54 (2003)
Kent, W.J., Zahler, A.M.: Conservation, regulation, synteny, and introns in a large-scale C. briggsae-C. elegans genomic alignment. Genome Res. 10, 1115–1125 (2000)
Korf, I., Flicek, P., Duan, D., Brent, M.R.: Integrating genomic homology into gene structure prediction. Bioinformatics 17, 140–148 (2001)
Meyer, I.M., Durbin, R.: Gene structure conservation aids similarity based gene prediction. Nucleic Acids Res. 32, 776–783 (2004)
Meyer, I.M., Durbin, R.: Comparative ab initio prediction of gene structures using pair HMMs. Bioinformatics 18, 1309–1318 (2002)
Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002)
Nadershahi, A., Fahrenkrug, S.C., Ellis, L.B.: Comparison of computational method for identifying translation initiation sites in EST data. BMC Bioinformatics 5, 14 (2004)
Novichkov, P.S., Gelfand, M.S., Mironov, A.A.: Gene recognition in eukaryotic DNA by comparison of genomic sequences. Bioinformatics 17, 1011–1018 (2001)
Parra, G., Agarwal, P., Abril, J.F., Wiehe, T., Fickett, J.W., Guigó, R.: Comparative gene prediction in human and mouse. Genome Res. 13, 108–117 (2003)
Pedersen, A.G., Nielen, H.: Neural Network Prediction of Translation Initiation Sites in Eukaryotes: Perspectives for EST and Genome analysis. In: Proc. Int. Conf. Intell. Syst. Mol. Biol., vol. 5, pp. 226–233 (1997)
Pruitt, K.D., Tatusova, T., Maglott, D.R.: NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 33, 501–504 (2005)
Wheelan, S.J., Church, D.M., Ostell, J.M.: Spidey: a tool for mRNA-to-genomic alignments. Genome Res. 11, 1952–1957 (2001)
Wu, T.D., Watanabe, C.K.: GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hsieh, S.J., Lin, C.Y., Liu, N.H., Tang, C.Y. (2006). Comparative Gene Prediction Based on Gene Structure Conservation. In: Rajapakse, J.C., Wong, L., Acharya, R. (eds) Pattern Recognition in Bioinformatics. PRIB 2006. Lecture Notes in Computer Science(), vol 4146. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11818564_5
Download citation
DOI: https://doi.org/10.1007/11818564_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37446-6
Online ISBN: 978-3-540-37447-3
eBook Packages: Computer ScienceComputer Science (R0)