The Prediction of Human Genes in DNA Based on a Generalized Hidden Markov Model

  • Rui GuoEmail author
  • Ke Yan
  • Wei He
  • Jian Zhang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9967)


The Generalized Hidden Markov Model (GHMM) has been proved to be an excellently general probabilistic model of the gene structure of human genomic sequences. It can simultaneously incorporate different signal descriptions like splicing sites and content descriptions, for instance, compositional features of exons and introns. Enjoying its flexibility and convincing probabilistic underpinnings, we integrate some other modification of submodels and then implement a prediction program of Human Genes in DNA. The program has the capacity to predict multiple genes in a sequence, to deal with partial as well as complete genes, and to predict consistent sets of genes occurring on either or both DNA strands. More importantly, it also can perform well for longer sequences with an unknown number of genes in them. In the experiments, the results show that the proposed method has better performance in prediction accuracy than some existing methods, and over 70 % of exons can be identified exactly.


Gene prediction WWAM IMM GHMM The prefix sum arrays The method based on similarity weighting of sequence patterns 



This article is partly supported by the Shenzhen Municipal Science and Technology Innovation Council (Nos. JCYJ20140904154645958).


  1. 1.
    Cairui, L., Changsong, Z., Guoli, S.: Recent progress in gene mapping through high-throughput sequencing technology and forward genetic approaches. Yi chuan = Hereditas/Zhongguo yi chuan xue hui bian ji 37(8), 765–776 (2015)Google Scholar
  2. 2.
    Burge, C., Karlin, S.: Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268(1), 78–94 (1997)CrossRefGoogle Scholar
  3. 3.
    Burset, M., Seledtsov, I.A., Solovyev, V.V.: Analysis of canonical and non-canonical splice sites in mammalian genomes. Nucleic Acids Res. 28(21), 4364–4375 (2000)CrossRefGoogle Scholar
  4. 4.
    Guigó, R., et al.: Prediction of gene structure ☆. J. Mol. Biol. 226(1), 141–157 (1992)CrossRefGoogle Scholar
  5. 5.
    Haussler, D., David, K., Reese, M.G., Eeckman, F.H.: A generalized hidden Markov model for the recognition of human genes in DNA. In: Proceedings of the International Conference on Intelligent Systems for Molecular Biology, St. Louis (1996)Google Scholar
  6. 6.
    Stanke, M., Waack, S.: Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19(suppl 2), 215–225 (2003)CrossRefGoogle Scholar
  7. 7.
    Fickett, J.W.: Finding genes by computer: the state of the art. Trends Genet. 12(8), 316–320 (1996)CrossRefGoogle Scholar
  8. 8.
    Krogh, A., Mian, I.S., Haussler, D.: A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Res. 22(22), 4768–4778 (1994)CrossRefGoogle Scholar
  9. 9.
    Salzberg, Steven L., D. B. Searls, and S. Kasif. “Computational methods in molecular biology.” Computational Methods in Molecular Biology49.2(1999):191-192Google Scholar
  10. 10.
    Ryan, M.S., Nudd, G.R.: The viterbi algorithm. Warwick Res. Rep. Rr 37(2), 160–163 (1993)MathSciNetGoogle Scholar
  11. 11.
    Majoros, W.H., et al.: Efficient decoding algorithms for generalized hidden Markov model gene finders. BMC Bioinform. 6(2), 8–16 (2005)Google Scholar
  12. 12.
    Zhang, M.Q., Marr, T.G.: A weight array method for splicing signal analysis. Comput. Appl. Biosci. Cabios 9(5), 499–509 (1993)Google Scholar
  13. 13.
    Salzberg, S.L., et al.: Microbial gene identification using interpolated Markov models. Nucleic Acids Res. 26(2), 544–548 (1998)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.Bio-Computing Research Center, Shenzhen Graduate SchoolHarbin Institute of TechnologyShenzhenChina

Personalised recommendations