Neural Computing & Applications

, Volume 14, Issue 2, pp 122–131 | Cite as

A neural network based multi-classifier system for gene identification in DNA sequences

Original Article

Abstract

The paper presents a neural network based multi-classifier system for the identification of Escherichia coli promoter sequences in strings of DNA. As each gene in DNA is preceded by a promoter sequence, the successful location of an E. coli promoter leads to the identification of the corresponding E. coli gene in the DNA sequence. A set of 324 known E. coli promoters and a set of 429 known non-promoter sequences were encoded using four different encoding methods. The encoded sequences were then used to train four different neural networks. The classification results of the four individual neural networks were then combined through an aggregation function, which used a variation of the logarithmic opinion pool method. The weights of this function were determined by a genetic algorithm. The multi-classifier system was then tested on 159 known promoter sequences and 171 non-promoter sequences not contained in the training set. The results obtained through this study proved that the same data set, when presented to neural networks in different forms, can provide slightly varying results. It also proves that when different opinions of more classifiers on the same input data are integrated within a multi-classifier system, we can obtain results that are better than the individual performances of the neural networks. The performances of our multi-classifier system outperform the results of other prediction systems for E. coli promoters developed so far.

Keywords

Neural networks Neural network optimization Multi-classifier systems Promoter recognition Genetic algorithms 

References

  1. 1.
    Baldi P, Brunak S (1998) Bioinformatics–the machine learning approach. MIT Press, CambridgeGoogle Scholar
  2. 2.
    Birney E (2001) Hidden Markov Models in biological sequence analysis. IBM J Res Dev 45(3/4):449–454Google Scholar
  3. 3.
    Brunak S, Engelbrecht J, Knudsen S (1991) Prediction of human mRNA donor and acceptor sites from the DNA sequence. J Mol Biol 220(1):49–65CrossRefGoogle Scholar
  4. 4.
    Demeler B, Zhou GW (1991) Neural network optimization for E. coli promoter prediction. Nucleic Acids Res 19(7):1593–1599Google Scholar
  5. 5.
    Dietterich TG (1997) Machine learning research: four current directions. AI Mag 18(4):97–136Google Scholar
  6. 6.
    Galas DJ, Eggert M, Waterman MS (1985) Rigorous pattern-recognition methods for DNA sequences: analysis of promoter sequences from E. coli. J Mol Biol 186(1):117–128CrossRefGoogle Scholar
  7. 7.
    Hansen JV, Krogh A (1995) A general method for combining in predictors tested on protein secondary structure prediction. http://www.citeseer.ist.psu.edu/324992.html
  8. 8.
    Henderson, J, Salzberg S, Fasman K (1997) Finding genes in DNA with a hidden markov model. J Comput Biol 4(2):127–141Google Scholar
  9. 9.
    Koza JR, Andre D (1995) Automatic discovery of protein motifs using genetic programming. http://citeseer.ist.psu.edu/2158.html
  10. 10.
    Krogh A (1997) Two methods for improving performance of a HMM and their application for gene finding. In: Proceedings of the 5th international conference on intelligent systems for molecular biology. AAAI Press, Menlo Park, pp 179–186Google Scholar
  11. 11.
    Kulp D, Haussler D, Reese MG, Eeckman FHÄ (1996) Generalized hidden markov model for the recognition of human genes in DNA. In: Proceedings of the 4th international conference on intelligent systems for molecular biology. AAAI Press, Menlo Park, pp 134–142Google Scholar
  12. 12.
    Ma Q, Wang JTL, Wu CH (2001) Application of Bayesian neural networks to biological data mining: a case study in DNA sequence classification. IEEE Trans Syst Man Cybern Part C 31(4):468–475CrossRefGoogle Scholar
  13. 13.
    Ma Q, Wang JTL (1999) Recognizing promoters in DNA using Bayesian neural networks. In: Proceedings of the IASTED international conference, artificial intelligence and computing, 9–12 August. http://www.citeseer.ist.psu.edu/ma99recognizing.html
  14. 14.
    Mahadevan I, Ghosh I (1994) Analysis of E. coli promoter structures using neural networks. Nucleic Acids Res 22(11):2158–2165Google Scholar
  15. 15.
    Mandler EJ, Schurmann J (1988) Combining the classification results of independent classifiers based on the Dempster/Schafer theory of evidence. Pattern Recognit Artif Intell X:381–393Google Scholar
  16. 16.
    Ohno-Machado L, Vinterbo S, Weber G (2002) Classification of gene expression data using fuzzy logic. J Intell Fuzzy Syst 12(1):19–24Google Scholar
  17. 17.
    Partridge D, Yates WB (1996) Engineering multiversion neural-net systems. Neural Comput 8:869–893Google Scholar
  18. 18.
    Pedersen AG, Jensen LJ, Brunak S, Stærfeldt A, Ussery DW (2000) A DNA structural atlas for Escherichia coli. J Mol Biol 299:907–390CrossRefGoogle Scholar
  19. 19.
    Reidmiller M, Braun H (1993) A direct adaptive method for faster Backpropagation learning: the RPROP algorithm. In: International conference on neural networks (ICNN-93, San Francisco, CA). IEEE Press, Piscataway, pp 586–591Google Scholar
  20. 20.
    Riis SK, Krogh A (1996) Improving prediction of protein secondary structure using neural networks and multiple sequence alignments. J Comput Biol 3:163–183Google Scholar
  21. 21.
    Rogova G (1994) Combining the results of several neural network classifiers. Neural Netw 7(5):777–781CrossRefGoogle Scholar
  22. 22.
    Roli F, Giacinto G (2002) Hybrid methods in pattern recognition, chapter design of multiple classifier systems. Worldwide Scientific Publishing, pp 199–226Google Scholar
  23. 23.
    Rost B, Sander C (1993) Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol 232(2):584–599CrossRefPubMedGoogle Scholar
  24. 24.
    Ruta D, Gabrys B (2001) Analysis of the correlation between majority voting error and the diversity measures in multiple classifier systems. In: Proceedings of the SOCO/ISFI’2001 conference, ISBN: 3-906454-27-4, Abstract p 50, Paper no.#1824-025, PaisleyGoogle Scholar
  25. 25.
    Salzberg S, Delcher AL, Fasman KH, Henderson J (1998) A decision tree system for finding genes in DNA. J Comput Biol Winter 5(4):667–80Google Scholar
  26. 26.
    Sharkey ACJ, Sharkey NE (1997) Combining diverse neural networks. Knowl Eng Rev 12(3):231–247CrossRefGoogle Scholar
  27. 27.
    Snyder EE, Stormo G (1995) Identification of protein coding regions in genomic DNA. J Mol Biol 248:1–18CrossRefGoogle Scholar
  28. 28.
    Stormo GD, Schneider TD, Gold LM, Ehrenfeucht A (1982) Use of the Perceptron algorithm to distinguish translation initiation sites in E. coli. Nucleic Acids Res 10:2997–3011Google Scholar
  29. 29.
    Uberbacher EC, Mural RJ (1991) Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. Proc Natl Acad Sci USA 88:11261–11265Google Scholar
  30. 30.
    Woolf PJ, Wang Y (2000) A fuzzy logic approach to analysing gene expression data. Physiol Genomics 3:9–15MATHGoogle Scholar
  31. 31.
    Wu CH (1997) Artificial neural networks for molecular sequence analysis. Comput Chem 21(4):237–256CrossRefMATHGoogle Scholar
  32. 32.
    Xu L, Krzyzak A, Suen CY (1991) Associative Switch for combining multiple classifiers. In: Proceedings of the international joint conference on neural networks, IEEE Press, Seattle, pp I-43–48Google Scholar
  33. 33.
    Xu L, Krzyzak A, Suen CY (1992) Methods for combining multiple classifiers and their applications to handwriting recognition. IEEE Trans Syst Man Cybern 22(3):418–435CrossRefGoogle Scholar
  34. 34.
    Zenobi G, Cuningham P (2001) Using diversity in preparing ensembles of classifiers based on different feature subsets to minimize generalization error. In: Proceedings of the 12th European conference on machine learning, pp 576–587Google Scholar

Copyright information

© Springer-Verlag London Limited 2004

Authors and Affiliations

  1. 1.Computing LaboratoryUniversity of OxfordUK

Personalised recommendations