A machine discovery from amino acid sequences by decision trees over regular patterns

Arikawa, Setsuo; Miyano, Satoru; Shinohara, Ayumi; Kuhara, Satoru; Mukouchi, Yasuhito; Shinohara, Takeshi

doi:10.1007/BF03037183

A machine discovery from amino acid sequences by decision trees over regular patterns

Special Issue
Invited
Published: September 1993

Volume 11, pages 361–375, (1993)
Cite this article

New Generation Computing Aims and scope Submit manuscript

Setsuo Arikawa¹,
Satoru Miyano¹,
Ayumi Shinohara¹,
Satoru Kuhara²,
Yasuhito Mukouchi³ &
…
Takeshi Shinohara⁴

86 Accesses
42 Citations
Explore all metrics

Abstract

This paper describes a machine learning system that discovered a “negative motif”, in transmembrane domain identification from amino acid sequences, and reports its experiments on protein data using PIR database. We introduce a decision tree whose nodes are labeled with regular patterns. As a hypothesis, the system produces such a decision tree for a small number of randomly chosen positive and negative examples from PIR. Experiments show that our system finds reasonable hypotheses very successfully. As a theoretical foundation, we show that the class of languages defined by decesion trees of depth at mostd overk-variable regular patterns is polynomial-time learnable in the sense of probably approximately correct (PAC) learning for any fixedd, k≥0.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Protein Structure Prediction: Conventional and Deep Learning Perspectives

Article 28 May 2021

A Practical Fixed-Parameter Algorithm for Constructing Tree-Child Networks from Multiple Binary Trees

Article 15 February 2022

Persistent-homology-based machine learning: a survey and a comparative study

Article 19 February 2022

References

Arikawa, S., Kuhara, S., Miyano, S., Shinohara, A. and Shinohara, T., “A Learning Algorithm for Elementary Formal Systems and Its Experiments on Identification of Transmembrane Domains,” inProc. 25th Hawaii Int. Conf. on Sys. Sci., pp. 675–684, IEEE, 1992.
Bairoch, A., “PROSITE: A Dictionary of Sites and Patterns in Proteins,”Nucleic Acids Res., 19, pp. 2241–2245, 1991.
Google Scholar
Blumer, A., Ehrenfeucht, A., and Haussler, D. and Warmuth, M. K., “Learnability and the Vapnik-Chervonenkis Dimension,”JACM, 36, pp. 929–965, 1989.
Article MATH MathSciNet Google Scholar
Ehrenfeucht, A. and Haussler, D., “Learning Decision Trees from Random Examples,”Inform. Comput., 82, pp. 231–246, 1989.
Article MATH MathSciNet Google Scholar
Engelman, D. M., Steiz, T. A. and Goldman, A., “Identifying Nonpolar Transbilayer Helices in Amino Acid Sequences of Membrane Proteins,”Ann. Rev. Biophys. Biophys. Chem., 15, pp. 321–353, 1986.
Article Google Scholar
Gusev, V. and Chuzhanova, N., “The Algorithms for Recognition of the Functional Sites in Genetic Texts,” inProc. 1st Workshop on Algorithmic Learning Theory, Tokyo, pp. 109–119, 1990.
Hartmann, E., Rapoport, T. A. and Lodish, H. F., “Predicting the Orientation of Eukaryotic Membrane-Spanning Proteins,” inProc. Natl. Acad. Sci. U.S.A., 86, pp. 5786–5790, 1989.
Holly, L. H. and Karplus, M., “Protein Secondary Structure Prediction with a Neural Network,” inProc. Natl. Acad. Sci. USA, 86, pp. 152–156, 1989.
Kyte, J. and Doolittle, R. F., “A Simple Method for Displaying the Hydropathic Character of Protein,”J. Mol. Biol., 157, pp. 105–132, 1982.
Article Google Scholar
Lipp, J., Flint, N., Haeuptle, M. T. and Dobberstein, B., “Structural Requirements for Membrane Assembly of Proteins Spanning the Membrane Several Times,”J. Cell Biol., 109, pp. 2013–2022, 1989.
Article Google Scholar
Miyano, S., Shinohara, A. and Shinohara, T., “Which Classes of Elementary Formal Systems are Polynomial-Time Learnable?” inProc. 2nd Algorithmic Learning Theory, Tokyo, pp. 139–150, 1991.
Natarajan, B. K., “On Learning Sets and Functions,”Machine Learning, 4, pp. 67–97, 1989.
Google Scholar
Protein Identification Resource, National Biomedical Research Foundation.
Quinlan, J. R., “Induction of Decision Trees,”Machine Learning, 1, pp. 81–106, 1986.
Google Scholar
Quinlan, J. R. and Rivest, R. L., “Inferring Decision Trees using the Minimum Description Length Principle,”Inform. Comput., 80, pp. 227–248, 1989.
Article MATH MathSciNet Google Scholar
Rao, J. K. M. and Argos, P., “A Confirmational Preference Parameter to Predict Helices in Integral Membrane Proteins,”Biochim. Biophys. Acta, 869, pp. 197–214, 1986.
Google Scholar
Shinohara, T., “Polynomial Time Inference of Pattern Languages and its Applications,” inProc. 7th IBM Symp. Mathematical Foundations of Computer Science, pp. 191–209, 1982.
Shinohara, T., “Polynomial Time Inference of Regular Pattern Languages,” inProc. RIMS Symp. Software Science and Engineering (Lecture Notes in Computer Science, 147), pp. 115–127, 1983.
Utgoff, P. E., “Incremental Induction of Decision Tree,”Machine Learning, 4, pp. 161–186, 1989.
Article Google Scholar
Valiant, L., “A Theory of the Learnable,”Commun. ACM, 27, pp. 1134–1142, 1984.
Article MATH Google Scholar
Von Heijine, G., “Transcending the Impenetrable: How Proteins Come to Terms with Membranes,”Biochim. Biophys. Acta, 947, pp. 307–333, 1988.
Google Scholar
Wu C. H., Whiston, G. M. and Montllor, G. J., “PROCANS: A Protein Classification System Using a Neural Network,”IJCNN Int. Joint Conf. Neural Networks, 2, pp. 91–96, 1990.
Google Scholar

Download references

Author information

Authors and Affiliations

Research Institute of Fundamental Information Science, Kyushu University 33, 812, Fukuoka, Japan
Setsuo Arikawa, Satoru Miyano & Ayumi Shinohara
Graduate School of Genetic Resources Technology, Kyushu University 46, 812, Fukuoka, Japan
Satoru Kuhara
Department of Information Systems, Kyushu University 39, 816, Kasuga, Japan
Yasuhito Mukouchi
Department of Artificial Intelligence, Kyushu Institute of Technology, 820, Iizuka, Japan
Takeshi Shinohara

Authors

Setsuo Arikawa
View author publications
You can also search for this author in PubMed Google Scholar
Satoru Miyano
View author publications
You can also search for this author in PubMed Google Scholar
Ayumi Shinohara
View author publications
You can also search for this author in PubMed Google Scholar
Satoru Kuhara
View author publications
You can also search for this author in PubMed Google Scholar
Yasuhito Mukouchi
View author publications
You can also search for this author in PubMed Google Scholar
Takeshi Shinohara
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Setsuo Arikawa, Ph. D.: He is a Professor and the Director of Research Institute of Fundamental Information Science, Kyushu University. He received the B. S. degree in 1964, the MS degree in 1966 and the Dr. Sci. degree in 1969 all in Mathematics from Kyushu University. He has been working on algorithmic learning theory, logic and inference in Al, and information retrieval systems.

Satoru Kuhara, Ph.D.: He is an Associate Professor of Graduate School of Genetic Resources Technology, Kyushu University. He received the B. A. degree in 1974, the M. Agr. degree in 1976 and the Dr. Agr. in 1980 from Kyushu University. His present interests include computer analysis of genetic information and protein structures.

Satoru Miyano, Ph. D.: He received the B. S. degree in 1977, the M. S. degree in 1979 and the Dr. Sci. degree in 1984 from Kyushu University. Presently, he is a Professor of Research Institute of Fundamental Information Science, Kyushu University. He has been making researches on computational complexity, parallel algorithms, algorithmic learning theory, and genome informatics.

Yasuhito Mukouchi: He is currently a graduate student of Doctor Course of Department of Information Systems, Kyushu University. He received the B. E. and the M. A. degrees from University of Osaka Prefecture in 1987 and 1991, respectively. His research interests are inductive inference and computational learning theory.

Ayumi Shinohara: He received the B. S. degree in 1988 in Mathematics and the M. S. degree in 1990 in Information Systems from Kyushu University. Currently, he is an Assistant of Research Institute of Fundamental Information Science, Kyushu University. He has been working on computational learning theory and genome informatics.

Takeshi Shinohara, Ph. D.: He is an Associate Professor of Department of Artificial Intelligence, Kyushu Institute of Technology. He received the B. S. degree in 1980 from Kyoto University, the M. S. and the Dr. Sci. degrees from Kyushu University in 1982 and 1986, respectively. His present interests are information retrieval, string pattern matching algorithms and computational learning theory.

About this article

Cite this article

Arikawa, S., Miyano, S., Shinohara, A. et al. A machine discovery from amino acid sequences by decision trees over regular patterns. New Gener Comput 11, 361–375 (1993). https://doi.org/10.1007/BF03037183

Download citation

Received: 07 December 1992
Issue Date: September 1993
DOI: https://doi.org/10.1007/BF03037183

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A machine discovery from amino acid sequences by decision trees over regular patterns

Abstract

Access this article

Similar content being viewed by others

Protein Structure Prediction: Conventional and Deep Learning Perspectives

A Practical Fixed-Parameter Algorithm for Constructing Tree-Child Networks from Multiple Binary Trees

Persistent-homology-based machine learning: a survey and a comparative study

References

Author information

Authors and Affiliations

Additional information

About this article

Cite this article

Keywords

Navigation

A machine discovery from amino acid sequences by decision trees over regular patterns

Abstract

Access this article

Similar content being viewed by others

Protein Structure Prediction: Conventional and Deep Learning Perspectives

A Practical Fixed-Parameter Algorithm for Constructing Tree-Child Networks from Multiple Binary Trees

Persistent-homology-based machine learning: a survey and a comparative study

References

Author information

Authors and Affiliations

Additional information

About this article

Cite this article

Share this article

Keywords

Search

Navigation