# Reduction Techniques for Instance-Based Learning Algorithms

- 2.8k Downloads
- 519 Citations

## Abstract

Instance-based learning algorithms are often faced with the problem of deciding which instances to store for use during generalization. Storing too many instances can result in large memory requirements and slow execution speed, and can cause an oversensitivity to noise. This paper has two main purposes. First, it provides a survey of existing algorithms used to reduce storage requirements in instance-based learning algorithms and other exemplar-based algorithms. Second, it proposes six additional reduction algorithms called DROP1–DROP5 and DEL (three of which were first described in Wilson & Martinez, 1997c, as RT1–RT3) that can be used to remove instances from the concept description. These algorithms and 10 algorithms from the survey are compared on 31 classification tasks. Of those algorithms that provide substantial storage reduction, the DROP algorithms have the highest average generalization accuracy in these experiments, especially in the presence of uniform class noise.

## References

- Aha, D. W. (1992). Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms.
*International Journal of Man-Machine Studies*,*36*, 267–287.Google Scholar - Aha, D.W., Kibler, D., & Albert, M. K. (1991). Instance-based learning algorithms.
*Machine Learning*,*6*, 37–66.Google Scholar - Batchelor, B. G. (1978).
*Pattern recognition: ideas in practice*. New York: Plenum Press.Google Scholar - Biberman, Y. (1994).Acontext similarity measure.
*Proceedings of the European Conference on Machine Learning (ECML-94)*(pp. 49–63). Catania, Italy: Springer Verlag.Google Scholar - Brodley, C. E. (1993). Addressing the selective superiority problem: automatic algorithm/model class selection.
*Proceedings of the Tenth International Machine Learning Conference*, Amherst, MA (pp. 17–24).Google Scholar - Broomhead, D. S. & Lowe, D. (1988). Multi-variable functional interpolation and adaptive networks.
*Complex Systems*,*2*, 321–355.Google Scholar - Cameron-Jones, R. M. (1995). Instance selection by encoding length heuristic with random mutation hill climbing.
*Proceedings of the Eighth Australian Joint Conference on Artificial Intelligence*(pp. 99–106).Google Scholar - Carpenter, G. A. & Grossberg, S. (1987). A massively parallel architecture for a self-organizing neural pattern recognition machine.
*Computer Vision, Graphics, and Image Processing*,*37*, 54–115.Google Scholar - Chang, C.-L. (1974). Finding prototypes for nearest neighbor classifiers.
*IEEE Transactions on Computers*,*23*(11), 1179–1184.Google Scholar - Conover, W. J. (1971).
*Practical nonparametric statistics*(pp. 206–209). New York: John Wiley.Google Scholar - Cover, T. M. & Hart, P. E. (1967). Nearest neighbor pattern classification.
*Institute of Electrical and Electronics Engineers Transactions on Information Theory*,*13*(1), 21–27.Google Scholar - Dasarathy, B. V. (1991).
*Nearest neighbor (NN) norms: NN pattern classification techniques*. Los Alamitos, CA: IEEE Computer Society Press.Google Scholar - DeGroot, M. H. (1986).
*Probability and statistics*, 2nd edn. Reading, MA: Addison-Wesley.Google Scholar - Diday, E. (1974). Recent progress in distance and similarity measures in pattern recognition.
*Second International Joint Conference on Pattern Recognition*(pp. 534–539).Google Scholar - Dietterich, T. G. (1989). Limitations on inductive learning.
*Proceedings of the Sixth International Conference on Machine Learning*(pp. 124–128). San Mateo, CA: Morgan Kaufmann.Google Scholar - Domingos, P. (1995). Rule induction and instance-based learning: a unified approach.
*Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI-95)*(pp. 1226–1232). Montreal, Canada: Morgan Kaufmann.Google Scholar - Domingos, P. (1996). Unifying instance-based and rule-based induction.
*Machine Learning*,*24*, 141–168.Google Scholar - Dudani, S. A. (1976). The distance-weighted k-nearest-neighbor rule.
*IEEE Transactions on Systems, Man and Cybernetics*,*6*(4), 325–327.Google Scholar - Gates, G.W. (1972). The reduced nearest neighbor rule.
*IEEE Transactions on Information Theory*,*18*(3), 431–433.Google Scholar - Hart, P. E. (1968). The condensed nearest neighbor rule.
*IEEE Transactions on Information Theory*,*14*, 515–516.Google Scholar - Hecht-Nielsen, R. (1987). Counterpropagation networks.
*Applied Optics*,*26*(23), 4979–4984.Google Scholar - Kibler, D. & Aha, D.W. (1987). Learning representative exemplars of concepts: an initial case study.
*Proceedings of the Fourth International Workshop on Machine Learning*(pp. 24–30). Irvine, CA: Morgan Kaufmann.Google Scholar - Kubat, M. & Matwin, S. (1997). Addressing the curse of imbalanced training sets: one-sided selection. In D. Fisher (Ed.),
*Machine Learning: Proceedings of the Fourteenth International Conference (ICML'97)*(pp. 179–186). San Francisco, CA: Morgan Kaufmann Publishers.Google Scholar - Lowe, D. G. (1995). Similarity Metric Learning for aVariable-Kernel Classifier.
*Neural Computation*,*7*(1), 72–85.Google Scholar - Merz, C. J. & Murphy, P. M. (1996).
*UCI repository of machine learning databases*. Irvine, CA: University of California Irvine, Department of Information and Computer Science. Internet: http://www.ics.uci.edu/ ~mlearn/ MLRepository.html.Google Scholar - Michalski, R. S., Stepp, R. E., & Diday, E. (1981). In L. N. Kanal & Azriel Rosenfeld (Eds.). A recent advance in data analysis: clustering objects into classes characterized by conjunctive concepts.
*Progress in Pattern Recognition*(Vol. 1, pp. 33–56). New York: North-Holland.Google Scholar - Mitchell, T. M. (1980). The need for biases in learning generalizations. In J.W. Shavlik & T. G. Dietterich (Eds.),
*Readings in Machine Learning*(pp. 184–191). San Mateo, CA: Morgan Kaufmann.Google Scholar - Nadler, M. & Smith, E. P. (1993).
*Pattern recognition engineering*. New York: Wiley.Google Scholar - Papadimitriou, C. H. & Steiglitz, K. (1982).
*Combinatorial Optimization: Algorithms and Complexity*, Englewood Cliffs, NJ, Prentice-Hall.Google Scholar - Papadimitriou, C. H. & Bentley, J. L. (1980). A worst-case analysis of nearest neighbor searching by projection.
*Lecture Notes in Computer Science*(Vol. 85):*Automata, Languages and Programming*(pp. 470–482). New York: Springer-Verlag.Google Scholar - Renals, S. & Rohwer, R. (1989). Phoneme classification experiments using radial basis functions.
*Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN'89)*(Vol. 1, pp. 461–467).Google Scholar - Ritter, G. L., Woodruff, H. B., Lowry, S. R.,& Isenhour, T. L. (1975). An algorithm for a selective nearest neighbor decision rule.
*IEEE Transactions on Information Theory*,*21*(6), 665–669.Google Scholar - Rumelhart, D. E. & McClelland, J. L. (1986).
*Parallel distributed processing*(Ch. 8, pp. 318–362). MIT Press.Google Scholar - Salzberg, S. (1991). A nearest hyperrectangle learning method.
*Machine Learning*,*6*, 277–309.Google Scholar - Schaffer, C. (1994). A conservation law for generalization performance.
*Proceedings of the Eleventh International Conference on Machine Learning (ML'94)*(pp. 259–265). New Brunswick, NJ: Morgan Kaufmann.Google Scholar - Skalak, D. B. (1994). Prototype and feature selection by sampling and random mutation hill climbing algorithms.
*Proceedings of the Eleventh International Conference on Machine Learning (ML94)*(pp. 293–301). Morgan Kaufmann.Google Scholar - Specht, D. F. (1992). Enhancements to probabilistic neural networks.
*Proceedings International Joint Conference on Neural Networks (IJCNN '92)*(Vol. 1, pp. 761–768).Google Scholar - Sproull, R. F. (1991). Refinements to nearest-neighbor searching in k-dimensional trees.
*Algorithmica*,*6*, 579–589.Google Scholar - Stanfill, C. & Waltz, D. (1986). Toward memory-based reasoning.
*Communications of the ACM*,*29*, 1213–1228.Google Scholar - Tomek, I. (1976). An experiment with the edited nearest-neighbor rule.
*IEEE Transactions on Systems, Man, and Cybernetics*,*6*(6), 448–452.Google Scholar - Wasserman, P. D. (1993).
*Advanced methods in neural computing*(pp. 147–176). New York, NY: Van Nostrand Reinhold.Google Scholar - Watson, I. & Marir, F. (1994). Case-based reasoning: a review.
*The knowledge engineering review, 9*(4), Cambridge, UK: Cambridge University Press.Google Scholar - Wess, S., Althoff, K.-D.,& Richter, M.M. (1993). Using k-d trees to improve the retrieval step in case-based reasoning.
*Topics in Case-Based Reasoning, First European Workshop (EWCBR-93)*(pp. 67–181). Springer-Verlag.Google Scholar - Wettschereck, D. (1994). A hybrid nearest-neighbor and nearest-hyperrectangle algorithm, In F. Bergadano & Raedt, L. de (Eds.),
*Proceedings of the 7th European Conference on Machine Learning*(pp. 323–335). LNAI, Vol. 784.Google Scholar - Wettschereck, D. & Dietterich, T. G. (1995). An experimental comparison of nearest-neighbor and nearesthyperrectangle algorithms.
*Machine Learning*,*19*(1), 5–28.Google Scholar - Wettschereck, D., Aha, D.W. & Mohri, T. (1997). A review and comparative evaluation of feature weighting methods for a class of lazy learning algorithms.
*Artificial Intelligence Review*,*11*(Special issue on Lazy Learning), pp. 273–314.Google Scholar - Wilson, D.R. & Martinez, T.R.(1996) Heterogeneous radial basis functions
*Proceedings of the International Conference on Neural Networks (ICNN'96)*(Vol.2pp.1263–1267).Google Scholar - Wilson, D. R. & Martinez, T. R. (1997a). Improved heterogeneous distance functions.
*Journal of Artificial Intelligence Research (JAIR)*,*6*(1), 1–34.Google Scholar - Wilson, D. R. & Martinez, T. R. (1997b). Improved center point selection for radial basis function networks.
*Proceedings of the International Conference on Artificial Neural Networks and Genetic Algorithms (ICANNGA'97)*(pp. 514–517).Google Scholar - Wilson, D. R. & Martinez, T. R. (1997c). Instance pruning techniques. In Fisher, D. (Ed.),
*Machine Learning: Proceedings of the Fourteenth International Conference (ICML'97)*(pp.403–411). San Francisco, CA: Morgan Kaufmann Publishers.Google Scholar - Wilson, D. L. (1972). Asymptotic properties of nearest neighbor rules using edited data.
*IEEE Transactions on Systems, Man, and Cybernetics*,*2*(3), 408–421.Google Scholar - Wolpert, D. H. (1993).
*On overfitting avoidance as bias*. Technical Report SFI TR 92–03–5001. Santa Fe, NM: The Santa Fe Institute.Google Scholar - Zhang, J. (1992). Selecting typical instances in instance-based learning.
*Proceedings of the Ninth International Conference on Machine Learning*(pp. 470–479). Aberdeen, Scotland: Morgan Kaufmann.Google Scholar