Abstract
This paper stresses the importance of stochastic machine learning theory for analyzing genetic information such as protein sequences. It is commonly recognized that machine learning theory would play an essential role to extract important information from the enormous amounts of raw genetic information generated by biologists. However, it is also true that more flexible and robust learning methodologies are required to deal with divergence occurring on the genetic information. For this purpose, we adopt stochastic knowledge representations and stochastic learning algorithms and show their effectiveness with a stochastic motif extraction system. The system aims to extract stable common patterns conserved in some protein category. In the system, common patterns (stochastic motifs) are represented by stochastic decision predicates, and a genetic algorithm with Rissanen's minimum description length principle is used to select “good stochastic motifs” from the viewpoint of increasing prediction performance.
Preview
Unable to display preview. Download preview PDF.
References
Aitken, Alastair, (1990). Identification of Protein Consensus Sequences, Ellis Horwood Series in Biochemistry and Biotechnology.
Rissanen,J. (1978). Modeling by shortest data description. Automatica, 14, 465–471.
Yamanishi, K.(1990). A learning criterion for stochastic rules. Proceedings of the 3-rd Annual Workshop on Computational Learning Theory, (pp. 67–81), Rochester, NY: Morgan Kaufmann. Its full version is to appear in Jr. on Machine Learning.
Yamanishi, K. & Konagaya, A.(1991). Leaning Stochastic Motifs from Genetic Sequences. in Proc. of the Eighth International Workshop of Machine Learning.
Rissanen, J.(1983). A universal prior for integers and estimation by minimum description length. Annals of Statistics, 11, 416–431.
Goldberg,D.E., (1989). Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley Publishing Company, Inc.
Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.J.(1984). Classification and regression trees. Wadsworth Statistics/Probability Stries.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1993 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Konagaya, A. (1993). A stochastic approach to genetic information processing. In: Doshita, S., Furukawa, K., Jantke, K.P., Nishida, T. (eds) Algorithmic Learning Theory. ALT 1992. Lecture Notes in Computer Science, vol 743. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57369-0_25
Download citation
DOI: https://doi.org/10.1007/3-540-57369-0_25
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-57369-2
Online ISBN: 978-3-540-48093-8
eBook Packages: Springer Book Archive