Machine Learning

, Volume 38, Issue 3, pp 257–286 | Cite as

Reduction Techniques for Instance-Based Learning Algorithms

  • D. Randall Wilson
  • Tony R. Martinez
Article

Abstract

Instance-based learning algorithms are often faced with the problem of deciding which instances to store for use during generalization. Storing too many instances can result in large memory requirements and slow execution speed, and can cause an oversensitivity to noise. This paper has two main purposes. First, it provides a survey of existing algorithms used to reduce storage requirements in instance-based learning algorithms and other exemplar-based algorithms. Second, it proposes six additional reduction algorithms called DROP1–DROP5 and DEL (three of which were first described in Wilson & Martinez, 1997c, as RT1–RT3) that can be used to remove instances from the concept description. These algorithms and 10 algorithms from the survey are compared on 31 classification tasks. Of those algorithms that provide substantial storage reduction, the DROP algorithms have the highest average generalization accuracy in these experiments, especially in the presence of uniform class noise.

instance-based learning nearest neighbor instance reduction pruning classification 

References

  1. Aha, D. W. (1992). Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms. International Journal of Man-Machine Studies, 36, 267–287.Google Scholar
  2. Aha, D.W., Kibler, D., & Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning, 6, 37–66.Google Scholar
  3. Batchelor, B. G. (1978). Pattern recognition: ideas in practice. New York: Plenum Press.Google Scholar
  4. Biberman, Y. (1994).Acontext similarity measure. Proceedings of the European Conference on Machine Learning (ECML-94) (pp. 49–63). Catania, Italy: Springer Verlag.Google Scholar
  5. Brodley, C. E. (1993). Addressing the selective superiority problem: automatic algorithm/model class selection. Proceedings of the Tenth International Machine Learning Conference, Amherst, MA (pp. 17–24).Google Scholar
  6. Broomhead, D. S. & Lowe, D. (1988). Multi-variable functional interpolation and adaptive networks. Complex Systems, 2, 321–355.Google Scholar
  7. Cameron-Jones, R. M. (1995). Instance selection by encoding length heuristic with random mutation hill climbing. Proceedings of the Eighth Australian Joint Conference on Artificial Intelligence (pp. 99–106).Google Scholar
  8. Carpenter, G. A. & Grossberg, S. (1987). A massively parallel architecture for a self-organizing neural pattern recognition machine.Computer Vision, Graphics, and Image Processing, 37, 54–115.Google Scholar
  9. Chang, C.-L. (1974). Finding prototypes for nearest neighbor classifiers. IEEE Transactions on Computers, 23(11), 1179–1184.Google Scholar
  10. Conover, W. J. (1971). Practical nonparametric statistics (pp. 206–209). New York: John Wiley.Google Scholar
  11. Cover, T. M. & Hart, P. E. (1967). Nearest neighbor pattern classification. Institute of Electrical and Electronics Engineers Transactions on Information Theory, 13(1), 21–27.Google Scholar
  12. Dasarathy, B. V. (1991). Nearest neighbor (NN) norms: NN pattern classification techniques. Los Alamitos, CA: IEEE Computer Society Press.Google Scholar
  13. DeGroot, M. H. (1986). Probability and statistics, 2nd edn. Reading, MA: Addison-Wesley.Google Scholar
  14. Diday, E. (1974). Recent progress in distance and similarity measures in pattern recognition. Second International Joint Conference on Pattern Recognition (pp. 534–539).Google Scholar
  15. Dietterich, T. G. (1989). Limitations on inductive learning. Proceedings of the Sixth International Conference on Machine Learning (pp. 124–128). San Mateo, CA: Morgan Kaufmann.Google Scholar
  16. Domingos, P. (1995). Rule induction and instance-based learning: a unified approach. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI-95) (pp. 1226–1232). Montreal, Canada: Morgan Kaufmann.Google Scholar
  17. Domingos, P. (1996). Unifying instance-based and rule-based induction. Machine Learning, 24, 141–168.Google Scholar
  18. Dudani, S. A. (1976). The distance-weighted k-nearest-neighbor rule. IEEE Transactions on Systems, Man and Cybernetics, 6(4), 325–327.Google Scholar
  19. Gates, G.W. (1972). The reduced nearest neighbor rule. IEEE Transactions on Information Theory, 18(3), 431–433.Google Scholar
  20. Hart, P. E. (1968). The condensed nearest neighbor rule. IEEE Transactions on Information Theory, 14, 515–516.Google Scholar
  21. Hecht-Nielsen, R. (1987). Counterpropagation networks. Applied Optics, 26(23), 4979–4984.Google Scholar
  22. Kibler, D. & Aha, D.W. (1987). Learning representative exemplars of concepts: an initial case study. Proceedings of the Fourth International Workshop on Machine Learning (pp. 24–30). Irvine, CA: Morgan Kaufmann.Google Scholar
  23. Kubat, M. & Matwin, S. (1997). Addressing the curse of imbalanced training sets: one-sided selection. In D. Fisher (Ed.), Machine Learning: Proceedings of the Fourteenth International Conference (ICML'97) (pp. 179–186). San Francisco, CA: Morgan Kaufmann Publishers.Google Scholar
  24. Lowe, D. G. (1995). Similarity Metric Learning for aVariable-Kernel Classifier. Neural Computation, 7(1), 72–85.Google Scholar
  25. Merz, C. J. & Murphy, P. M. (1996). UCI repository of machine learning databases. Irvine, CA: University of California Irvine, Department of Information and Computer Science. Internet: http://www.ics.uci.edu/ ~mlearn/ MLRepository.html.Google Scholar
  26. Michalski, R. S., Stepp, R. E., & Diday, E. (1981). In L. N. Kanal & Azriel Rosenfeld (Eds.). A recent advance in data analysis: clustering objects into classes characterized by conjunctive concepts. Progress in Pattern Recognition (Vol. 1, pp. 33–56). New York: North-Holland.Google Scholar
  27. Mitchell, T. M. (1980). The need for biases in learning generalizations. In J.W. Shavlik & T. G. Dietterich (Eds.), Readings in Machine Learning (pp. 184–191). San Mateo, CA: Morgan Kaufmann.Google Scholar
  28. Nadler, M. & Smith, E. P. (1993). Pattern recognition engineering. New York: Wiley.Google Scholar
  29. Papadimitriou, C. H. & Steiglitz, K. (1982). Combinatorial Optimization: Algorithms and Complexity, Englewood Cliffs, NJ, Prentice-Hall.Google Scholar
  30. Papadimitriou, C. H. & Bentley, J. L. (1980). A worst-case analysis of nearest neighbor searching by projection. Lecture Notes in Computer Science (Vol. 85): Automata, Languages and Programming (pp. 470–482). New York: Springer-Verlag.Google Scholar
  31. Renals, S. & Rohwer, R. (1989). Phoneme classification experiments using radial basis functions. Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN'89) (Vol. 1, pp. 461–467).Google Scholar
  32. Ritter, G. L., Woodruff, H. B., Lowry, S. R.,& Isenhour, T. L. (1975). An algorithm for a selective nearest neighbor decision rule. IEEE Transactions on Information Theory, 21(6), 665–669.Google Scholar
  33. Rumelhart, D. E. & McClelland, J. L. (1986). Parallel distributed processing (Ch. 8, pp. 318–362). MIT Press.Google Scholar
  34. Salzberg, S. (1991). A nearest hyperrectangle learning method. Machine Learning, 6, 277–309.Google Scholar
  35. Schaffer, C. (1994). A conservation law for generalization performance. Proceedings of the Eleventh International Conference on Machine Learning (ML'94) (pp. 259–265). New Brunswick, NJ: Morgan Kaufmann.Google Scholar
  36. Skalak, D. B. (1994). Prototype and feature selection by sampling and random mutation hill climbing algorithms. Proceedings of the Eleventh International Conference on Machine Learning (ML94) (pp. 293–301). Morgan Kaufmann.Google Scholar
  37. Specht, D. F. (1992). Enhancements to probabilistic neural networks. Proceedings International Joint Conference on Neural Networks (IJCNN '92) (Vol. 1, pp. 761–768).Google Scholar
  38. Sproull, R. F. (1991). Refinements to nearest-neighbor searching in k-dimensional trees. Algorithmica, 6, 579–589.Google Scholar
  39. Stanfill, C. & Waltz, D. (1986). Toward memory-based reasoning. Communications of the ACM, 29, 1213–1228.Google Scholar
  40. Tomek, I. (1976). An experiment with the edited nearest-neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics, 6(6), 448–452.Google Scholar
  41. Tversky, A. (1977). Features of similarity. Psychological Review, 84(4), 327–352.Google Scholar
  42. Wasserman, P. D. (1993). Advanced methods in neural computing (pp. 147–176). New York, NY: Van Nostrand Reinhold.Google Scholar
  43. Watson, I. & Marir, F. (1994). Case-based reasoning: a review. The knowledge engineering review, 9(4), Cambridge, UK: Cambridge University Press.Google Scholar
  44. Wess, S., Althoff, K.-D.,& Richter, M.M. (1993). Using k-d trees to improve the retrieval step in case-based reasoning. Topics in Case-Based Reasoning, First European Workshop (EWCBR-93) (pp. 67–181). Springer-Verlag.Google Scholar
  45. Wettschereck, D. (1994). A hybrid nearest-neighbor and nearest-hyperrectangle algorithm, In F. Bergadano & Raedt, L. de (Eds.), Proceedings of the 7th European Conference on Machine Learning (pp. 323–335). LNAI, Vol. 784.Google Scholar
  46. Wettschereck, D. & Dietterich, T. G. (1995). An experimental comparison of nearest-neighbor and nearesthyperrectangle algorithms. Machine Learning, 19(1), 5–28.Google Scholar
  47. Wettschereck, D., Aha, D.W. & Mohri, T. (1997). A review and comparative evaluation of feature weighting methods for a class of lazy learning algorithms. Artificial Intelligence Review, 11 (Special issue on Lazy Learning), pp. 273–314.Google Scholar
  48. Wilson, D.R. & Martinez, T.R.(1996) Heterogeneous radial basis functions Proceedings of the International Conference on Neural Networks (ICNN'96) (Vol.2pp.1263–1267).Google Scholar
  49. Wilson, D. R. & Martinez, T. R. (1997a). Improved heterogeneous distance functions. Journal of Artificial Intelligence Research (JAIR), 6(1), 1–34.Google Scholar
  50. Wilson, D. R. & Martinez, T. R. (1997b). Improved center point selection for radial basis function networks. Proceedings of the International Conference on Artificial Neural Networks and Genetic Algorithms (ICANNGA'97) (pp. 514–517).Google Scholar
  51. Wilson, D. R. & Martinez, T. R. (1997c). Instance pruning techniques. In Fisher, D. (Ed.), Machine Learning: Proceedings of the Fourteenth International Conference (ICML'97) (pp.403–411). San Francisco, CA: Morgan Kaufmann Publishers.Google Scholar
  52. Wilson, D. L. (1972). Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics, 2(3), 408–421.Google Scholar
  53. Wolpert, D. H. (1993). On overfitting avoidance as bias. Technical Report SFI TR 92–03–5001. Santa Fe, NM: The Santa Fe Institute.Google Scholar
  54. Zhang, J. (1992). Selecting typical instances in instance-based learning. Proceedings of the Ninth International Conference on Machine Learning (pp. 470–479). Aberdeen, Scotland: Morgan Kaufmann.Google Scholar

Copyright information

© Kluwer Academic Publishers 2000

Authors and Affiliations

  • D. Randall Wilson
    • 1
  • Tony R. Martinez
    • 2
  1. 1.Neural Network & Machine Learning Laboratory, Computer Science DepartmentBrigham Young UniversityProvoUSA
  2. 2.Neural Network & Machine Learning Laboratory, Computer Science DepartmentBrigham Young UniversityProvoUSA

Personalised recommendations