Skip to main content

A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features

Abstract

In the past, nearest neighbor algorithms for learning from examples have worked best in domains in which all features had numeric values. In such domains, the examples can be treated as points and distance metrics can use standard definitions. In symbolic domains, a more sophisticated treatment of the feature space is required. We introduce a nearest neighbor algorithm for learning in domains with symbolic features. Our algorithm calculates distance tables that allow it to produce real-valued distances between instances, and attaches weights to the instances to further modify the structure of feature space. We show that this technique produces excellent classification accuracy on three problems that have been studied by machine learning researchers: predicting protein secondary structure, identifying DNA promoter sequences, and pronouncing English text. Direct experimental comparisons with the other learning algorithms show that our nearest neighbor algorithm is comparable or superior in all three domains. In addition, our algorithm has advantages in training speed, simplicity, and perspicuity. We conclude that experimental evidence favors the use and continued development of nearest neighbor algorithms for domains such as the ones studied here.

References

  • Aha, D. (1989). Incremental, instance-based learning of independent and graded concept descriptions. Proceedings of the Sixth International Workshop on Machine Learning (pp. 387–391). Ithaca, NY: Morgan Kaufmann.

    Google Scholar 

  • Aha, D. & Kibler, D. (1989). Noise-tolerant instance-based learning algorithms. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (p. 794–799). Detroit, MI: Morgan Kaufmann.

    Google Scholar 

  • Aha, D. (1990). A study of instance-based algorithms for supervised learning tasks. Doctoral dissertation, Department of Information and Computer Science, University of California, Irvine. Technical Report 90-42.

  • Aha, D., Kibler, D., & Albert, M. (1991). Instance-based learning algorithms. Machine Learning, 6 (1) 37–66.

    Google Scholar 

  • Chou, P. & Fasman, G. (1978). Prediction of the secondary structure of proteins from their amino acid sequence. Advanced Enzymology, 47, 45–148. Biochemistry, 13, 222–245.

    Google Scholar 

  • Cohen, F, Abarbanel, R., Kuntz, I., & Fletterick, R. (1986). Turn prediction in proteins using a pattern matching approach. Biochemistry, 25, 266–275.

    Google Scholar 

  • Cost, S. (1990). Master's thesis, Department of Computer Science, Johns Hopkins University.

  • Cost, S. & Salzberg, S. (1990). Exemplar-based learning to predict protein folding. Proceedings of the Symposium on Computer Applications to Medical Care (pp. 114–118). Washington, DC.

  • Cover, T. & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13 (1), 21–27.

    Google Scholar 

  • Crick, F. & Asanuma, C. (1986). Certain aspects of the anatomy and physiology of the cerebral cortex. In J. McClelland, D. Rumelhart, & the PDP Research Group (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol. II). Cambridge, MA: MIT Press.

    Google Scholar 

  • Dietterich, T., Hild, H., & Bakiri, G. (1990). A comparative study of ID3 and backpropagation for English text-to-speech mapping. Proceedings of the 7th International Conference on Machine Learning (pp. 24–31), San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Fertig, S. & Gelernter, D. (1991). FGP: A virtual machine for acquiring knowledge from cases. Proceedings of the 12th International Joint Conference on Artificial Intelligence (pp. 796–802). Los Altos, CA: Morgan Kaufmann.

    Google Scholar 

  • Fisher, D. & McKusick, K. (1989). An empirical comparison of ID3 and backpropagation. Proceedings of the International Joint Conference on Artificial Intelligence (pp. 788–793) San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Garnier, J., Osguthorpe, D., & Robson, B. (1978). Analysis of the accuracy and implication of simple methods for predicting the secondary structure of globular proteins. Journal of Molecular Biology, 120, 97–120.

    Google Scholar 

  • Hanson, S. & Burr, D. (1990). What connectionist models learn: Learning and representation in connectionist networks. Behavioral and Brain Sciences, 13 471–518.

    Google Scholar 

  • Holley, L. & Karplus, M. (1989). Protein secondary structure prediction with a neural network. Proceedings of the National Academy of Sciences USA, 86, 152–156.

    Google Scholar 

  • Kabsch, W. & Sander, C. (1983). Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometric features. Biopolymers, 22, 2577–2637.

    PubMed  Google Scholar 

  • Kontogiorgis, S. (1988). Automatic letter-to-phoneme transcription for speech synthesis (Technical Report JHU-88/22). Department of Computer Science, Johns Hopkins University.

  • Lathrop, R., Webster, T., & Smith, T. (1987). ARIADNE: Pattern-directed inference and hierarchical abstraction in protein structure recognition. Communications of the ACM, 30 (11), 909–921.

    Google Scholar 

  • Lim, V. (1974). Algorithms for prediction of alpha-helical and beta-structural regions in globular proteins. Journal of Molecular Biology, 88, 873–894.

    Google Scholar 

  • Mathews, B.W. (1975). Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta, 405, 442–451.

    Google Scholar 

  • McClelland, J. & Rumelhart, D. (1986). A distributed model of human learning and memory. In J. McClelland, D. Rumelhart, & the PDP Research Group (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol. II). Cambridge, MA: MIT Press.

    Google Scholar 

  • Medin, D. & Schaffer, M. (1978). Context theory of classification learning. Psychological Review, 85 (3) 207–238.

    Google Scholar 

  • Mooney, R., Shavlik, J., Towell, G., & Gove, A. (1989). An experimental comparison of symbolic and connectionist learning algorithms. Proceedings of the International Joint Conference on Artificial Intelligence (pp. 775–780). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Nosofsky, R. (1984). Choice, similarity, and the context theory of classification. Journal of Experimental Psychology: Learning, Memory, and Cognition 10 (1), 104–114.

    Google Scholar 

  • O'Neill, M. (1989). Escherichia coli promoters: I. Consensus as it relates to spacing class, specificity, repeat substructure, and three dimensional organization. Journal of Biological Chemistry, 264, 5522–5530.

    Google Scholar 

  • Preparata, F. & Shamos, M. (1985). Computational geometry: An introduction. New York: Springer-Verlag.

    Google Scholar 

  • Qian, N. & Sejnowski, T. (1988). Predicting the secondary structure of globular proteins using neural network models. Journal of Molecular Biology, 202, 865–884.

    Google Scholar 

  • Reed, S. (1972). Pattern recognition and categorization. Cognitive Psychology, 3, 382–407.

    Google Scholar 

  • Rumelhart, D., Hinton, G., & Williams, R. (1986). Learning representations by backpropagating errors. Nature, 323 (9), 533–536.

    Google Scholar 

  • Rumelhart, D., Smolensky, P., McClelland, J., & Hinton, G. (1986). Schemata and sequential thought processes in PDP models. In J. McClelland, D. Rumelhart, & the PDP Research Group (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol. II). Cambridge, MA: MIT Press.

    Google Scholar 

  • Rumelhart, D., McClelland, J., & the PDP Research Group (1986). Parallel distributed processing: Explorations in the microstructure of cognition (Vol. I). Cambridge, MA: MIT Press.

    Google Scholar 

  • Salzberg, S. (1989). Nested hyper-rectangles for exemplar-based learning. In K.P. Jantke (Ed.), Analogical and Inductive Inference: International Workshop All '89. Berlin: Springer-Verlag.

    Google Scholar 

  • Salzberg, S. (1990). Learning with nested generalized exemplars. Norwell, MA: Kluwer Academic Publishers.

    Google Scholar 

  • Salzberg, S. (1991). A nearest hyperrectangle learning method. Machine Learning, 6 (3), 251–276.

    Google Scholar 

  • Sejnowski, T. & Rosenberg, C. (1987). NETtalk: A parallel network that learns to read aloud. Complex Systems, 1 145–168. (Also Technical Report JHU/EECS-86/01. Baltimore, MD: John Hopkins University.

    Google Scholar 

  • Shavlik, J., Mooney, R., & Towell, G. (1989). Symbolic and neural learning algorithms: an experimental comparison (Technical Report #857). Madison, WI: Computer Sciences Department, University of Wisconsin.

    Google Scholar 

  • Sigillito, V. (1989). Personal communication.

  • Stanfill, C. & Waltz, D. (1986). Toward memory-based reasoning. Communications of the ACM, 29 (12), 1213–1228.

    Google Scholar 

  • Towell, G., Shavlik, J., & Noordewier, M. (1990). Refinement of approximate domain theories by knowledge-based neural networks. Proceedings Eighth National Conference on Artificial Intelligence (pp. 861–866). Menlo Park, CA: AAAI Press.

    Google Scholar 

  • Waltz, D. (1990). Massively parallel AI. Proceedings Eighth National Conference on Artificial Intelligence (pp. 1117–1122). Menlo Park, CA: AAAI Press.

    Google Scholar 

  • Weiss, S. & Kapouleas, I. (1989). An empirical comparison of pattern recognition, neural nets, and machine learning classification methods. Proceedings of the International Joint Conference on Artificial Intelligence (pp. 781–787). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Cost, S., Salzberg, S. A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features. Machine Learning 10, 57–78 (1993). https://doi.org/10.1023/A:1022664626993

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1022664626993

  • Nearest neighbor
  • exemplar-based learning
  • protein structure
  • text pronunciation
  • instance-based learning