Skip to main content
Log in

A machine discovery from amino acid sequences by decision trees over regular patterns

  • Special Issue
  • Invited
  • Published:
New Generation Computing Aims and scope Submit manuscript

Abstract

This paper describes a machine learning system that discovered a “negative motif”, in transmembrane domain identification from amino acid sequences, and reports its experiments on protein data using PIR database. We introduce a decision tree whose nodes are labeled with regular patterns. As a hypothesis, the system produces such a decision tree for a small number of randomly chosen positive and negative examples from PIR. Experiments show that our system finds reasonable hypotheses very successfully. As a theoretical foundation, we show that the class of languages defined by decesion trees of depth at mostd overk-variable regular patterns is polynomial-time learnable in the sense of probably approximately correct (PAC) learning for any fixedd, k≥0.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Arikawa, S., Kuhara, S., Miyano, S., Shinohara, A. and Shinohara, T., “A Learning Algorithm for Elementary Formal Systems and Its Experiments on Identification of Transmembrane Domains,” inProc. 25th Hawaii Int. Conf. on Sys. Sci., pp. 675–684, IEEE, 1992.

  2. Bairoch, A., “PROSITE: A Dictionary of Sites and Patterns in Proteins,”Nucleic Acids Res., 19, pp. 2241–2245, 1991.

    Google Scholar 

  3. Blumer, A., Ehrenfeucht, A., and Haussler, D. and Warmuth, M. K., “Learnability and the Vapnik-Chervonenkis Dimension,”JACM, 36, pp. 929–965, 1989.

    Article  MATH  MathSciNet  Google Scholar 

  4. Ehrenfeucht, A. and Haussler, D., “Learning Decision Trees from Random Examples,”Inform. Comput., 82, pp. 231–246, 1989.

    Article  MATH  MathSciNet  Google Scholar 

  5. Engelman, D. M., Steiz, T. A. and Goldman, A., “Identifying Nonpolar Transbilayer Helices in Amino Acid Sequences of Membrane Proteins,”Ann. Rev. Biophys. Biophys. Chem., 15, pp. 321–353, 1986.

    Article  Google Scholar 

  6. Gusev, V. and Chuzhanova, N., “The Algorithms for Recognition of the Functional Sites in Genetic Texts,” inProc. 1st Workshop on Algorithmic Learning Theory, Tokyo, pp. 109–119, 1990.

  7. Hartmann, E., Rapoport, T. A. and Lodish, H. F., “Predicting the Orientation of Eukaryotic Membrane-Spanning Proteins,” inProc. Natl. Acad. Sci. U.S.A., 86, pp. 5786–5790, 1989.

  8. Holly, L. H. and Karplus, M., “Protein Secondary Structure Prediction with a Neural Network,” inProc. Natl. Acad. Sci. USA, 86, pp. 152–156, 1989.

  9. Kyte, J. and Doolittle, R. F., “A Simple Method for Displaying the Hydropathic Character of Protein,”J. Mol. Biol., 157, pp. 105–132, 1982.

    Article  Google Scholar 

  10. Lipp, J., Flint, N., Haeuptle, M. T. and Dobberstein, B., “Structural Requirements for Membrane Assembly of Proteins Spanning the Membrane Several Times,”J. Cell Biol., 109, pp. 2013–2022, 1989.

    Article  Google Scholar 

  11. Miyano, S., Shinohara, A. and Shinohara, T., “Which Classes of Elementary Formal Systems are Polynomial-Time Learnable?” inProc. 2nd Algorithmic Learning Theory, Tokyo, pp. 139–150, 1991.

  12. Natarajan, B. K., “On Learning Sets and Functions,”Machine Learning, 4, pp. 67–97, 1989.

    Google Scholar 

  13. Protein Identification Resource, National Biomedical Research Foundation.

  14. Quinlan, J. R., “Induction of Decision Trees,”Machine Learning, 1, pp. 81–106, 1986.

    Google Scholar 

  15. Quinlan, J. R. and Rivest, R. L., “Inferring Decision Trees using the Minimum Description Length Principle,”Inform. Comput., 80, pp. 227–248, 1989.

    Article  MATH  MathSciNet  Google Scholar 

  16. Rao, J. K. M. and Argos, P., “A Confirmational Preference Parameter to Predict Helices in Integral Membrane Proteins,”Biochim. Biophys. Acta, 869, pp. 197–214, 1986.

    Google Scholar 

  17. Shinohara, T., “Polynomial Time Inference of Pattern Languages and its Applications,” inProc. 7th IBM Symp. Mathematical Foundations of Computer Science, pp. 191–209, 1982.

  18. Shinohara, T., “Polynomial Time Inference of Regular Pattern Languages,” inProc. RIMS Symp. Software Science and Engineering (Lecture Notes in Computer Science, 147), pp. 115–127, 1983.

  19. Utgoff, P. E., “Incremental Induction of Decision Tree,”Machine Learning, 4, pp. 161–186, 1989.

    Article  Google Scholar 

  20. Valiant, L., “A Theory of the Learnable,”Commun. ACM, 27, pp. 1134–1142, 1984.

    Article  MATH  Google Scholar 

  21. Von Heijine, G., “Transcending the Impenetrable: How Proteins Come to Terms with Membranes,”Biochim. Biophys. Acta, 947, pp. 307–333, 1988.

    Google Scholar 

  22. Wu C. H., Whiston, G. M. and Montllor, G. J., “PROCANS: A Protein Classification System Using a Neural Network,”IJCNN Int. Joint Conf. Neural Networks, 2, pp. 91–96, 1990.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Setsuo Arikawa, Ph. D.: He is a Professor and the Director of Research Institute of Fundamental Information Science, Kyushu University. He received the B. S. degree in 1964, the MS degree in 1966 and the Dr. Sci. degree in 1969 all in Mathematics from Kyushu University. He has been working on algorithmic learning theory, logic and inference in Al, and information retrieval systems.

Satoru Kuhara, Ph.D.: He is an Associate Professor of Graduate School of Genetic Resources Technology, Kyushu University. He received the B. A. degree in 1974, the M. Agr. degree in 1976 and the Dr. Agr. in 1980 from Kyushu University. His present interests include computer analysis of genetic information and protein structures.

Satoru Miyano, Ph. D.: He received the B. S. degree in 1977, the M. S. degree in 1979 and the Dr. Sci. degree in 1984 from Kyushu University. Presently, he is a Professor of Research Institute of Fundamental Information Science, Kyushu University. He has been making researches on computational complexity, parallel algorithms, algorithmic learning theory, and genome informatics.

Yasuhito Mukouchi: He is currently a graduate student of Doctor Course of Department of Information Systems, Kyushu University. He received the B. E. and the M. A. degrees from University of Osaka Prefecture in 1987 and 1991, respectively. His research interests are inductive inference and computational learning theory.

Ayumi Shinohara: He received the B. S. degree in 1988 in Mathematics and the M. S. degree in 1990 in Information Systems from Kyushu University. Currently, he is an Assistant of Research Institute of Fundamental Information Science, Kyushu University. He has been working on computational learning theory and genome informatics.

Takeshi Shinohara, Ph. D.: He is an Associate Professor of Department of Artificial Intelligence, Kyushu Institute of Technology. He received the B. S. degree in 1980 from Kyoto University, the M. S. and the Dr. Sci. degrees from Kyushu University in 1982 and 1986, respectively. His present interests are information retrieval, string pattern matching algorithms and computational learning theory.

About this article

Cite this article

Arikawa, S., Miyano, S., Shinohara, A. et al. A machine discovery from amino acid sequences by decision trees over regular patterns. New Gener Comput 11, 361–375 (1993). https://doi.org/10.1007/BF03037183

Download citation

  • Received:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF03037183

Keywords

Navigation