Comparative Gene Prediction Based on Gene Structure Conservation

Hsieh, Shu Ju; Lin, Chun Yuan; Liu, Ning Han; Tang, Chuan Yi

doi:10.1007/11818564_5

Shu Ju Hsieh²²,
Chun Yuan Lin²³,
Ning Han Liu²² &
…
Chuan Yi Tang²²

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4146))

Included in the following conference series:

International Workshop on Pattern Recognition in Bioinformatics

665 Accesses

Abstract

Identifying protein coding genes is one of most important task in newly sequenced genomes. With increasing numbers of gene annotations verified by experiments, it is feasible to identify genes in newly sequenced genomes by comparing with genes annotated on phylogenetically close organisms. Here, we propose a program, GeneAlign, which predicts the genes on one sequence by measuring the similarity between the predicted sequence and related genes annotated on another genome. The program applies CORAL, a heuristic linear time alignment tool, to determine whether the regions flanked by candidate signals are similar with the annotated exons or not. The approach, which employs the conservation of gene structures and sequence homologies between protein coding regions, increases the prediction accuracy. GeneAlign was tested on Projector data set of 449 human-mouse homologous sequence pairs. At the gene level, the sensitivity and specificity of GeneAlign are 80%, and larger than 96% at the exon level.

Download to read the full chapter text

Chapter PDF

GeMoMa: Homology-Based Gene Prediction Utilizing Intron Position Conservation and RNA-seq Data

An optimized approach for annotation of large eukaryotic genomic sequences using genetic algorithm

Article Open access 24 October 2017

A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms

Article Open access 09 April 2020

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Alexandersson, M., Cawley, S., Pachter, L.: SLAM: cross-organisms gene finding and alignment with a generalized pair hidden Markov model. Genome Res. 13, 496–502 (2003)
Article Google Scholar
Allen, J.E., Pertea, M., Salzberg, S.L.: Computational gene prediction using multiple sources of evidence. Genome Res. 14, 142–148 (2004)
Article Google Scholar
Allen, J.E., Salzberg, S.L.: JIGSAW: integration of multiple sources of evidence for gene prediction. Bioinformatics 21, 3596–3606 (2005)
Article Google Scholar
Batzoglou, S., Pachter, L., Mesirovi, J.P., Berger, B., Lander, E.S.: Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Res. 10, 950–958 (2000)
Article Google Scholar
Bernal, A., Ear, U., Kyrpides, N.: Genomes OnLine Database (GOLD): a monitor of genome projects world-wide. Nucleic Acids Res. 29, 126–127 (2001)
Article Google Scholar
Birney, E., Clamp, M., Durbin, R.: GeneWise and Genomewise. Genome Res. 14, 988–995 (2004)
Article Google Scholar
Brejova, B., Brown, D.G., Li, M., Vinar, T.: ExonHunter: a comprehensive approach to gene finding. Bioinformatics 21, 57–65 (2005)
Article Google Scholar
Brendel, V., Xing, L., Zhu, W.: Gene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus. Bioinformatics 20, 1157–1169 (2004)
Article Google Scholar
Brent, M.R., Buigo, R.: Recent advances in gene structure prediction. Curr. Opin. Struct. Biol. 14, 264–272 (2004)
Article Google Scholar
Burge, C., Karlin, S.: Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997)
Article Google Scholar
Burset, M., Guigó, R.: Evaluation of gene structure prediction programs. Genomics 34, 353–367 (1996)
Article Google Scholar
Chen, T.M., Lu, C.C., Li, W.H.: Prediction of splice sites with dependency graphs and their expanded bayesian networks. Bioinformatics 21, 471–482 (2005)
Article Google Scholar
Curwen, V., Eyras, E., Andrews, T.D., Clarke, L., Mongin, E., Searle, S.M., Clamp, M.: The Ensembl automatic gene annotation system. Genome Res. 14, 942–950 (2004)
Article Google Scholar
Florea, L., Hartzell, G., Zhang, Z., Rubin, G.M., Miller, W.: A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res. 8, 967–974 (1998)
Google Scholar
Gelfand, M.S., Mironov, A.A., Pevzner, P.A.: Gene recognition via spliced sequence alignment. Proc. Natl. Acad. Sci. 93, 9061–9066 (1996)
Article Google Scholar
Hsieh, S.J., Lin, C.Y., Chung, Y.S., Tang, C.Y.: Comparative exon prediction based on heuristic coding region alignment. In: Proceedings of the International Symposium on Parallel Architectures, Algorithms, and Networks, pp. 14–19 (2005)
Google Scholar
Karolchik, D., Baertsch, R., Diekhans, M., Furey, T.S., Hinrichs, A., Lu, Y.T., Roskin, K.M., Schwartz, M., Sugnet, C.W., Thomas, D.J., et al.: The UCSC Genome Browser Database. Nucleic Acids Res. 31, 51–54 (2003)
Article Google Scholar
Kent, W.J., Zahler, A.M.: Conservation, regulation, synteny, and introns in a large-scale C. briggsae-C. elegans genomic alignment. Genome Res. 10, 1115–1125 (2000)
Article Google Scholar
Korf, I., Flicek, P., Duan, D., Brent, M.R.: Integrating genomic homology into gene structure prediction. Bioinformatics 17, 140–148 (2001)
Google Scholar
Meyer, I.M., Durbin, R.: Gene structure conservation aids similarity based gene prediction. Nucleic Acids Res. 32, 776–783 (2004)
Article Google Scholar
Meyer, I.M., Durbin, R.: Comparative ab initio prediction of gene structures using pair HMMs. Bioinformatics 18, 1309–1318 (2002)
Article Google Scholar
Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002)
Google Scholar
Nadershahi, A., Fahrenkrug, S.C., Ellis, L.B.: Comparison of computational method for identifying translation initiation sites in EST data. BMC Bioinformatics 5, 14 (2004)
Article Google Scholar
Novichkov, P.S., Gelfand, M.S., Mironov, A.A.: Gene recognition in eukaryotic DNA by comparison of genomic sequences. Bioinformatics 17, 1011–1018 (2001)
Article Google Scholar
Parra, G., Agarwal, P., Abril, J.F., Wiehe, T., Fickett, J.W., Guigó, R.: Comparative gene prediction in human and mouse. Genome Res. 13, 108–117 (2003)
Article Google Scholar
Pedersen, A.G., Nielen, H.: Neural Network Prediction of Translation Initiation Sites in Eukaryotes: Perspectives for EST and Genome analysis. In: Proc. Int. Conf. Intell. Syst. Mol. Biol., vol. 5, pp. 226–233 (1997)
Google Scholar
Pruitt, K.D., Tatusova, T., Maglott, D.R.: NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 33, 501–504 (2005)
Article Google Scholar
Wheelan, S.J., Church, D.M., Ostell, J.M.: Spidey: a tool for mRNA-to-genomic alignments. Genome Res. 11, 1952–1957 (2001)
Google Scholar
Wu, T.D., Watanabe, C.K.: GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science,
Shu Ju Hsieh, Ning Han Liu & Chuan Yi Tang
Institute of Molecular and Cellular Biology, National Tsing-Hua University, Hsinchu, Taiwan, ROC
Chun Yuan Lin

Authors

Shu Ju Hsieh
View author publications
You can also search for this author in PubMed Google Scholar
Chun Yuan Lin
View author publications
You can also search for this author in PubMed Google Scholar
Ning Han Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chuan Yi Tang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Singapore-MIT Alliance, 50 Nanyang Avenue, N2-B2C-15, Singapore
Jagath C. Rajapakse
School of Computing, National University of Singapore, Singapore
Limsoon Wong
Computer Science and Engineering, The Penn State University, USA
Raj Acharya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hsieh, S.J., Lin, C.Y., Liu, N.H., Tang, C.Y. (2006). Comparative Gene Prediction Based on Gene Structure Conservation. In: Rajapakse, J.C., Wong, L., Acharya, R. (eds) Pattern Recognition in Bioinformatics. PRIB 2006. Lecture Notes in Computer Science(), vol 4146. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11818564_5

Download citation

DOI: https://doi.org/10.1007/11818564_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37446-6
Online ISBN: 978-3-540-37447-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Comparative Gene Prediction Based on Gene Structure Conservation

Abstract

Chapter PDF

Similar content being viewed by others

GeMoMa: Homology-Based Gene Prediction Utilizing Intron Position Conservation and RNA-seq Data

An optimized approach for annotation of large eukaryotic genomic sequences using genetic algorithm

A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Comparative Gene Prediction Based on Gene Structure Conservation

Abstract

Chapter PDF

Similar content being viewed by others

GeMoMa: Homology-Based Gene Prediction Utilizing Intron Position Conservation and RNA-seq Data

An optimized approach for annotation of large eukaryotic genomic sequences using genetic algorithm

A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation