Abstract
The computational identification of genes in DNA sequences has become an issue of crucial importance due to the large number of DNA molecules being currently sequenced. We present a novel neural network based multi-classifier system, MultiNNProm, for the identification of promoter regions in E.Coli1 DNA sequences. The DNA sequences were encoded using four different encoding methods and were used to train four different neural networks. The classification results of these neural networks were then aggregated using a variation of the LOP method. The aggregating weights used within the modified LOP aggregating algorithm were obtained through a genetic algorithm. We show that the use of different neural networks, trained on the same set of data, could provide slightly varying results if the data were differently encoded. We also show that the combination of more neural classifiers provides us with better accuracy than the individual networks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
[1]Baldi P., Brunak S, “Bioinformatics – The Machine Learning Approach”, MIT Press, Cambridge MA, 1998.
Birney, E. “Hidden Markov Models in Biological Sequence Analysis”. IBM Journal of Research and Development Volume 45, Numbers ¾, 2001.
Hansen J.V., Krogh A., “A general method for combining in predictors tested on protein secondary structure prediction”, citeseer.ist.psu.edu/324992.html.
Henderson, J., Salzberg, S. and Fasman, K. “Finding Genes in DNA with a Hidden Markov Model”. Journal of Computational Biology, Vol. 4, No. 2 (1997), pp. 127–141.
Koza J.R, Andre D., “Automatic Discovery of Protein Motifs Using Genetic Programming”, Evolutionary Computation: Theory and Applications, 1995.
Krogh, A. “Two Methods for Improving Performance of a HMM and Their Application for Gene Finding”. Proceedings of the Fifth International Conference on Intelligent Systems for Molecular Biology, AAAI Press, Menlo Park, CA, 1997, pp. 179–186.
Kulp, D., Haussler, D., Reese, M. G. and Eeckman, F. H. Ä Generalized Hidden Markov Model for the Recognition of Human Genes in DNA". Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology, AAAI Press, Menlo Park, CA, 1996, pp. 134–142.
Ma Q., Wang J.T.L., Wu C.H., “Application of Bayesian Neural Networks to Biological Data Mining: A Case Study in DNA Sequence Classification”, citeseer.ist.psu.edu/314880.html.
Mahadevan I., Ghosh I., “Analysis of E.Coli promoter structures using neural networks, Nucleic Acids Research, Vol 22, Issue 11 2158–2165, 1994.
Ohno-Machado L., Vinterbo S., Webber G., “Classification of Gene Expression Data Using Fuzzy Logic”, Decision Systems Group.
Riis S.K., Krogh A., “Improving prediction of protein secondary structure using neural networks and multiple sequence alignments”, Journal of Computational Biology, 3:163–183, 1996.
Rogova G., “Combining the results of several neural network classifiers”, Neural Networks, 7(5):777–781, 1994.
Rost B., Sander C., “Prediction of protein secondary structure at better than 70% accuracy”, Journal of Molecular Biology, 232(2):584–599, Jul 20, 1993.
Salzberg S., Delcher A.L., Fasman K.H., Henderson J., “A Decision Tree System for Finding Genes in DNA”, Journal of Computational Biology, 1997.
Sharkey A.C.J., Sharkey N.E., “Combining diverse neural networks”, The Knowledge Engineering Review, 12(3):231–247, 1997.
Snyder E.E., Stormo G., “Identification of Protein Coding Regions in Genomic DNA”, Journal of Molecular Biology (1995) 248, 1–18.
Uberbacher E.C., Mural R. J., “Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach”, Proc. Natl. Acad. Sci. USA, Vol 88, 11261–11265, 1991.
Woolf P.J., Wang Y., “A Fuzzy Logic Approach to Analysing Gene Expression Data”, Physiol Genomics, 3: 9–15, 2000.
Zenobi G., Cuningham P., “Using diversity in preparing ensembles of classifiers based on different feature subsets to minimize generalization error”, in proceedings of the 12th European Conference on Machine Learning, pages 576–587, 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer
About this paper
Cite this paper
Ranawana, R., Palade, V. (2006). MultiNNProm: A Multi-Classifier System for Finding Genes. In: Abraham, A., de Baets, B., Köppen, M., Nickolay, B. (eds) Applied Soft Computing Technologies: The Challenge of Complexity. Advances in Soft Computing, vol 34. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-31662-0_35
Download citation
DOI: https://doi.org/10.1007/3-540-31662-0_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31649-7
Online ISBN: 978-3-540-31662-6
eBook Packages: EngineeringEngineering (R0)