Abstract
The paper presents a neural network based multi-classifier system for the identification of Escherichia coli promoter sequences in strings of DNA. As each gene in DNA is preceded by a promoter sequence, the successful location of an E. coli promoter leads to the identification of the corresponding E. coli gene in the DNA sequence. A set of 324 known E. coli promoters and a set of 429 known non-promoter sequences were encoded using four different encoding methods. The encoded sequences were then used to train four different neural networks. The classification results of the four individual neural networks were then combined through an aggregation function, which used a variation of the logarithmic opinion pool method. The weights of this function were determined by a genetic algorithm. The multi-classifier system was then tested on 159 known promoter sequences and 171 non-promoter sequences not contained in the training set. The results obtained through this study proved that the same data set, when presented to neural networks in different forms, can provide slightly varying results. It also proves that when different opinions of more classifiers on the same input data are integrated within a multi-classifier system, we can obtain results that are better than the individual performances of the neural networks. The performances of our multi-classifier system outperform the results of other prediction systems for E. coli promoters developed so far.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
http://www.embl-heidenberg.de/ predictprotein / predictprotein.html
The consensus sequence is an ideal sequence for the interaction with its regulatory protein.
false positives and false negatives
References
Baldi P, Brunak S (1998) Bioinformatics–the machine learning approach. MIT Press, Cambridge
Birney E (2001) Hidden Markov Models in biological sequence analysis. IBM J Res Dev 45(3/4):449–454
Brunak S, Engelbrecht J, Knudsen S (1991) Prediction of human mRNA donor and acceptor sites from the DNA sequence. J Mol Biol 220(1):49–65
Demeler B, Zhou GW (1991) Neural network optimization for E. coli promoter prediction. Nucleic Acids Res 19(7):1593–1599
Dietterich TG (1997) Machine learning research: four current directions. AI Mag 18(4):97–136
Galas DJ, Eggert M, Waterman MS (1985) Rigorous pattern-recognition methods for DNA sequences: analysis of promoter sequences from E. coli. J Mol Biol 186(1):117–128
Hansen JV, Krogh A (1995) A general method for combining in predictors tested on protein secondary structure prediction. http://www.citeseer.ist.psu.edu/324992.html
Henderson, J, Salzberg S, Fasman K (1997) Finding genes in DNA with a hidden markov model. J Comput Biol 4(2):127–141
Koza JR, Andre D (1995) Automatic discovery of protein motifs using genetic programming. http://citeseer.ist.psu.edu/2158.html
Krogh A (1997) Two methods for improving performance of a HMM and their application for gene finding. In: Proceedings of the 5th international conference on intelligent systems for molecular biology. AAAI Press, Menlo Park, pp 179–186
Kulp D, Haussler D, Reese MG, Eeckman FHÄ (1996) Generalized hidden markov model for the recognition of human genes in DNA. In: Proceedings of the 4th international conference on intelligent systems for molecular biology. AAAI Press, Menlo Park, pp 134–142
Ma Q, Wang JTL, Wu CH (2001) Application of Bayesian neural networks to biological data mining: a case study in DNA sequence classification. IEEE Trans Syst Man Cybern Part C 31(4):468–475
Ma Q, Wang JTL (1999) Recognizing promoters in DNA using Bayesian neural networks. In: Proceedings of the IASTED international conference, artificial intelligence and computing, 9–12 August. http://www.citeseer.ist.psu.edu/ma99recognizing.html
Mahadevan I, Ghosh I (1994) Analysis of E. coli promoter structures using neural networks. Nucleic Acids Res 22(11):2158–2165
Mandler EJ, Schurmann J (1988) Combining the classification results of independent classifiers based on the Dempster/Schafer theory of evidence. Pattern Recognit Artif Intell X:381–393
Ohno-Machado L, Vinterbo S, Weber G (2002) Classification of gene expression data using fuzzy logic. J Intell Fuzzy Syst 12(1):19–24
Partridge D, Yates WB (1996) Engineering multiversion neural-net systems. Neural Comput 8:869–893
Pedersen AG, Jensen LJ, Brunak S, Stærfeldt A, Ussery DW (2000) A DNA structural atlas for Escherichia coli. J Mol Biol 299:907–390
Reidmiller M, Braun H (1993) A direct adaptive method for faster Backpropagation learning: the RPROP algorithm. In: International conference on neural networks (ICNN-93, San Francisco, CA). IEEE Press, Piscataway, pp 586–591
Riis SK, Krogh A (1996) Improving prediction of protein secondary structure using neural networks and multiple sequence alignments. J Comput Biol 3:163–183
Rogova G (1994) Combining the results of several neural network classifiers. Neural Netw 7(5):777–781
Roli F, Giacinto G (2002) Hybrid methods in pattern recognition, chapter design of multiple classifier systems. Worldwide Scientific Publishing, pp 199–226
Rost B, Sander C (1993) Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol 232(2):584–599
Ruta D, Gabrys B (2001) Analysis of the correlation between majority voting error and the diversity measures in multiple classifier systems. In: Proceedings of the SOCO/ISFI’2001 conference, ISBN: 3-906454-27-4, Abstract p 50, Paper no.#1824-025, Paisley
Salzberg S, Delcher AL, Fasman KH, Henderson J (1998) A decision tree system for finding genes in DNA. J Comput Biol Winter 5(4):667–80
Sharkey ACJ, Sharkey NE (1997) Combining diverse neural networks. Knowl Eng Rev 12(3):231–247
Snyder EE, Stormo G (1995) Identification of protein coding regions in genomic DNA. J Mol Biol 248:1–18
Stormo GD, Schneider TD, Gold LM, Ehrenfeucht A (1982) Use of the Perceptron algorithm to distinguish translation initiation sites in E. coli. Nucleic Acids Res 10:2997–3011
Uberbacher EC, Mural RJ (1991) Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. Proc Natl Acad Sci USA 88:11261–11265
Woolf PJ, Wang Y (2000) A fuzzy logic approach to analysing gene expression data. Physiol Genomics 3:9–15
Wu CH (1997) Artificial neural networks for molecular sequence analysis. Comput Chem 21(4):237–256
Xu L, Krzyzak A, Suen CY (1991) Associative Switch for combining multiple classifiers. In: Proceedings of the international joint conference on neural networks, IEEE Press, Seattle, pp I-43–48
Xu L, Krzyzak A, Suen CY (1992) Methods for combining multiple classifiers and their applications to handwriting recognition. IEEE Trans Syst Man Cybern 22(3):418–435
Zenobi G, Cuningham P (2001) Using diversity in preparing ensembles of classifiers based on different feature subsets to minimize generalization error. In: Proceedings of the 12th European conference on machine learning, pp 576–587
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ranawana, R., Palade, V. A neural network based multi-classifier system for gene identification in DNA sequences. Neural Comput & Applic 14, 122–131 (2005). https://doi.org/10.1007/s00521-004-0447-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-004-0447-7