Featureless Pattern Recognition in an Imaginary Hilbert Space and Its Application to Protein Fold Classification
The featureless pattern recognition methodology based on measuring some numerical characteristics of similarity between pairs of entities is applied to the problem of protein fold classification. In computational biology, a commonly adopted way of measuring the likelihood that two proteins have the same evolutionary origin is calculating the so-called alignment score between two amino acid sequences that shows properties of inner product rather than those of a similarity measure. Therefore, in solving the problem of determining the membership of a protein given by its amino acid sequence (primary structure) in one of preset fold classes (spatial structure), we treat the set of all feasible amino acid sequences as a subset of isolated points in an imaginary space in which the linear operations and inner product are defined in an arbitrary unknown manner, but without any conjecture on the dimension, i.e. as a Hilbert space.
Unable to display preview. Download preview PDF.
- 1.Cortes, C, Vapnik, V.: Support-vector networks. Machine Learning, Vol. 20, No. 3, 1995.Google Scholar
- 2.Vapnik, V. Statistical Learning Theory. John-Wiley & Sons, Inc. 1998.Google Scholar
- 3.Duin, R.P.W, De Ridder, D., Tax, D.M.J. Featureless classification. Proceedings of the Workshop on Statistical Pattern Recognition, Prague, June 1997.Google Scholar
- 6.Fetrow J.S., Bryant S.H. New programs for protein tertiary structure prediction. Biotechnology, Vol. 11, April 1993, pp. 479–484.Google Scholar
- 8.Mottl, V., Dvoenko, S., Seredin, O., Kulikowski, C, Muchnik, I. Alignment Scores in a Regularized Support Vector Classification Method for Fold Recognition of Remote Protein Families. DIMACS Technical Report 2001-01, January 2001. Center for Discrete Mathematics and Theoretical Computer Science. Rutgers University, the State University of New Jersey, 33 p.Google Scholar
- 9.Durbin, R., Eddy, S., Krogh, A., Mitchison, G. Biological Sequence Analysis. Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, 1988.Google Scholar