Use of Adaptive Networks to Define Highly Predictable Protein Secondary–Structure Classes

Lapedes, Alan S.; Steeg, Evan W.; Farber, Robert M.

doi:10.1023/A:1022621815529

Use of Adaptive Networks to Define Highly Predictable Protein Secondary–Structure Classes

Published: October 1995

Volume 21, pages 103–124, (1995)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Use of Adaptive Networks to Define Highly Predictable Protein Secondary–Structure Classes

Download PDF

Alan S. Lapedes^1,2,
Evan W. Steeg³ &
Robert M. Farber^1,2

386 Accesses
Explore all metrics

Abstract

We present an adaptive, neural network method that determines new classes of protein secondary structure that are significantly more predictable from local amino–acid sequence than conventional classifications. Accurate prediction of the conventional secondary–structure classes, alpha-helix, beta-strand, and coil, from primary sequence has long been an important problem in computational molecular biology, with many ramifications, including multiple–sequence alignment, prediction of functionally important regions of proteins, and prediction of tertiary structure from primary sequence. The algorithm presented here uses adaptive networks to simultaneously examine both sequence and structure data, as available from, for example, the Brookhaven Protein Database, and to determine new secondary–structure classes that can be predicted from sequence with high accuracy. These new classes have both similarities to, and differences from, conventional secondary–structure classes. They represent a new, nontrivial classification of protein secondary structure that is predictable from primary sequence.

Article PDF

Artificial Neural Network Models

GENN: A GEneral Neural Network for Learning Tabulated Data with Examples from Protein Structure Prediction

AlphaFold predictions are valuable hypotheses and accelerate but do not replace experimental structure determination

Article Open access 30 November 2023

References

Abola, E. E., Bernstein, F. C., Bryant, S. H., Koetzle, T. F., & Weng, J. (1987). Protein data bank. In Crystallographic databases. International Union of Crystallography.
Becker, S. & Hinton, G. (1992). A self-organizing neural network that discovers surfaces in random-dot stereograms. Nature, 355, 161–163.
Google Scholar
Becker, H. S. (1992). An Information-theoretic Unsupervised Learning Algorithm for Neural Networks. Ph.D. thesis, Department of Computer Science, University of Toronto.
Chou, P. Y. & Fasman, G. D. (1978). Prediction of the secondary structure of proteins from their amino acid sequence. Advances in Enzymology, 47, 45–147.
Google Scholar
de Sa, V. R. (1994). Learning classification with unlabeled data. In Advances in Neural Information Processing Systems 6, San Francisco. Morgan Kaufmann.
Google Scholar
Delorme, M.-O. & Henaut, A. (1988). Merging of distance matrices and classification by dynamic clustering. Computer Applications in the Biosciences, 4, 453–458.
Google Scholar
Efron, B. & Tibshirani, R. (1991). Statistical data analysis in the computer age. Science, 253, 390–395.
Google Scholar
Farber, R., Lapedes, A., & Sirotkin, K. (1992). Determination of eukaryotic protein coding regions using neural networks and information theory. Journal of Molecular Biology, 226, 471–482.
Google Scholar
Fitch, W.M. (1981). A non-sequential method for constructing trees and hierarchical classifications. Journal of Molecular Evolution, 18, 30–37.
Google Scholar
Connectionist Research Group. (1990). Xerion Neural Network Simulator Libraries and Manual Pages; version 3.183. Department of Computer Science, University of Toronto.
Hertz, J., Krogh, A., & Palmer, R. (1986). Introduction to the Theory of Neural Computation. Menlo Park, CA. Addison-Wesley (Santa Fe Institute Studies in the Sciences of Complexity).
Google Scholar
Hunter, L. & States, D. (1992). Bayesian classification of protein structure. IEEE Expert, 7 (4), 67–75.
Google Scholar
Hunter, L., Harris, N., & States, D. (1992). Efficient classification of massive unsegmented datastreams. In Proceedings of the Ninth International Conference on Machine Learning, San Mateo, CA. Morgan Kaufmann Associates.
Google Scholar
Holland, J., Holyoak, K., Nisbett, R., & Thagard, P. (1986). Induction: Process of Inference, Learning and Discovery. Cambridge, MA. MIT Press.
Google Scholar
Kabsch, W. & Sander, C. (1983). Dictionary of protein secondary structure: pattern recognition of hydrogenbonded and geometrical features. Biopolymers, 22, 2577–2637.
Google Scholar
Kneller, D. G., Cohen, F. E., & Langridge, R. (1990). Improvements in protein secondary structure prediction by an enhanced neural network. Journal of Molecular Biology, 214, 171–182.
Google Scholar
Korber, B. T. M., Farber, R. M., Wolpert, D. H., & Lapedes, A. S. (1993). Covariation of mutations in the V3 loop of HIV-1: An information-theoretic analysis. Proceedings of the National Academy of Sciences, USA, 90, 7176–7180.
Google Scholar
Lapedes, A., Barnes, C., Burks, C., Farber, R., & Sirotkin, K. (1990). Application of neural networks and other machine learning algorithms to DNA sequence analysis. In G. I. Bell and T. G. Marr (Eds.), Computers and DNA. Menlo Park, CA. Addison-Wesley (Santa Fe Institute Studies in the Science of Complexity.)
Google Scholar
Lapedes, A. S., Steeg, E. W., & Farber, R. M. (1994). Neural network definitions of highly predictable protein secondary structure classes. In Advances in Neural Information Processing Systems 6, San Francisco. Morgan Kaufmann.
Google Scholar
Maclin, R. & Shavlik, J. W. (1992). Using knowledge-based neural networks to improve algorithms: Refining the Chou-Fasman algorithm for protein folding. In Proceedings of the Tenth National Conference on Artificial Intelligence, San Francisco. Morgan Kauffman.
Google Scholar
Mardia, K. V., Kent, J. T., & Bibby, J. M. (1979). Multivariate Analysis. New York, Academic Press.
Google Scholar
Michalewicz, Z. (1986). Genetic Algorithms. Menlo Park, CA. Addison-Wesley (Santa Fe Institute Studies in the Sciences of Complexity).
Google Scholar
Pauling, L., Corey, R., & Branson, H. R. (1951). The structure of proteins: two hydrogen-bonded helical configurations of the polypeptide chain. Proceedings of the National Academy of Sciences, USA, 37, 205–211.
Google Scholar
Perutz, M. F. (1951). New x-ray evidence on the configuration of polypeptide chains; polypeptide chains in poly-γ-benzyl-L-glutamate, keratin, haemoglobin. Nature, 167, 1053–1059.
Google Scholar
Press, W. H., Flannery, B. P., Teukolsky, S. A., & Vetterling, W. T. (1988). Numerical Recipes in C. London, Cambridge University Press.
Google Scholar
Prestrelski, S. J., Williams, A. L. Jr., & Liebman, M. J. (1992). Classification of protein secondary structure. I. Overview of the methods and results. Proteins: Structure, Function, and Genetics, 14, 430–439.
Google Scholar
Qian, N. & Sejnowski, T. J. (1988). Predicting the secondary structure of globular proteins using neural network models. Journal of Molecular Biology, 202, 865–884.
Google Scholar
Rissanen, J. (1986). Stochastic complexity and modeling. Annals of Statistics, 14 (3), 1080–1094.
Google Scholar
Rumelhart, D. & McClelland, J. (1986). Parallel Distributed Processing. Boston. MIT Press.
Google Scholar
Schmidhuber, J. (1992). Discovering predictable classifications. Technical Report CU-CS-626-92, Department of Computer Science, University of Colorado.
Schulz, G. E. & Schirmer, R. H. (1979). Prediction of secondary structure from the amino acid sequence. In Principles of Protein Structure. New York. Springer-Verlag.
Google Scholar
Skolnick, J & Kolinski, A. (1991). Dynamic Monte Carlo simulations of a new lattice model of globular protein folding, structure and dynamics. Journal of Molecular Biology, 223, 583–597.
Google Scholar
Stolorz, P., Lapedes, A., & Yuan, X. (1992). Predicting protein secondary structure using neural net and statistical methods. Journal of Molecular Biology, 225, 363–378.
Google Scholar
Unger, R., Harel, D., Wherland, S., & Sussman, J. L. (1989). A 3D building blocks approach to analyzing and predicting structure of proteins. Proteins: Structure, Function, and Genetics, 5, 355–363.
Google Scholar
Zemel, R. (1994). A Minimum Description Length Framework for Unsupervised Learning. Ph.D. thesis, Department of Computer Science, University of Toronto.
Zhang, X. & Waltz, D. (1993). Developing hierarchical representations for protein structures: An incremental approach. In L. Hunter (Ed.), Artificial Intelligence and Molecular Biology (pp.195–209). Menlo Park, CA, AAAI Press (MIT Press).
Google Scholar

Download references

Author information

Authors and Affiliations

Complex Systems Group (T13), LANL, Los Alamos, NM, 87545
Alan S. Lapedes & Robert M. Farber
The Santa Fe Institute, Santa Fe, NM
Alan S. Lapedes & Robert M. Farber
Department of Computer Science, University of Toronto, Toronto, ON M5S 1A4, Canada
Evan W. Steeg

Authors

Alan S. Lapedes
View author publications
You can also search for this author in PubMed Google Scholar
Evan W. Steeg
View author publications
You can also search for this author in PubMed Google Scholar
Robert M. Farber
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lapedes, A.S., Steeg, E.W. & Farber, R.M. Use of Adaptive Networks to Define Highly Predictable Protein Secondary–Structure Classes. Machine Learning 21, 103–124 (1995). https://doi.org/10.1023/A:1022621815529

Download citation

Issue Date: October 1995
DOI: https://doi.org/10.1023/A:1022621815529

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Use of Adaptive Networks to Define Highly Predictable Protein Secondary–Structure Classes

Abstract

Article PDF

Similar content being viewed by others

Artificial Neural Network Models

GENN: A GEneral Neural Network for Learning Tabulated Data with Examples from Protein Structure Prediction

AlphaFold predictions are valuable hypotheses and accelerate but do not replace experimental structure determination

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Use of Adaptive Networks to Define Highly Predictable Protein Secondary–Structure Classes

Abstract

Article PDF

Similar content being viewed by others

Artificial Neural Network Models

GENN: A GEneral Neural Network for Learning Tabulated Data with Examples from Protein Structure Prediction

AlphaFold predictions are valuable hypotheses and accelerate but do not replace experimental structure determination

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation