Abstract
Data reduction is to extract a subset from a dataset. The advantages of data reduction are decreasing the requirement of storage and increasing the efficiency of classification. Using the subset as training data is possible to maintain classification accuracy; sometimes, it can be further improved because of eliminating noises. The key is how to choose representative samples while ignoring noises at the same time. Many instance selection algorithms are based on nearest neighbor decision rule (NN). Some of these algorithms select samples based on two strategies, incremental and decremental. The first type of algorithms select some instances as samples and iteratively add instances which do not have the same class label with their nearest sample to the sample set. The second type of algorithms remove instances which do not have the same class label with their majority of kNN. However, we propose an algorithm based on Reverse Nearest Neighbor (RNN), called the Reverse Nearest Neighbor Reduction (RNNR). RNNR selects samples which can represent other instances in the same class. In addition, RNNR does not need to iteratively scan a dataset which takes much processing time. Experimental results show that RNNR achieves comparable accuracy and selects fewer samples than comparators.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Cover, T., Hart, P.: Nearest Neighbor Pattern Classification. IEEE Trans. Information Theory 13, 21–27 (1967)
Korn, F., Muthukrishnan, S.: Influence sets based on reverse nearest neighbor queries. In: 19th ACM SIGMOD International Conference on Management of Data, pp. 201–212 (2000)
Brighton, H., Mellish, C.: On the consistency of information filters for lazy learning algorithms. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 283–288. Springer, Heidelberg (1999)
Wilson, D.R., Martinez, T.R.: Instance pruning techniques. In: 14th International Conference on Machine Learning, pp. 403–411 (1997)
Marchiori, E.: Hit Miss Networks with applications to instance selection. The Journal of Machine Learning Research 9, 997–1017 (2008)
Hart, P.: The Condensed Nearest Neighbor Rule. IEEE Trans. Information Theory 14, 515–516 (1968)
Devi, F.S., Murty, M.N.: An Incremental Prototype Set Building Technique. Pattern Recognition 35(2), 505–513 (2002)
Angiulli, F.: Fast nearest neighbor condensation for large data sets classification. IEEE Transactions on Knowledge and Data Engineering 19, 1450–1464 (2007)
Chou, C.H., Kuo, B.H., Chang, F.: The generalized condensed nearest neighbor rule as a data reduction method. In: 18th International Conference on Pattern Recognition, pp. 556–559 (2006)
Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man and Cybernetics SMC-2, 408–421 (1972)
Smyth, B., Keane, M.: Remembering to forget: A competence preserving case deletion policy for CBR systems. In: Mellish, C. (ed.) 14th International Joint Conference on Artificial Intelligence, pp. 337–382. Morgan Kaufmann, San Francisco (1995)
Delany, S.J.: The Good, the Bad and the Incorrectly Classified: Profiling Cases for Case-Base Editing. In: McGinty, L., Wilson, D.C. (eds.) ICCBR 2009. LNCS, vol. 5650, pp. 135–149. Springer, Heidelberg (2009)
Delany, S.J., Cunningham, P.: An analysis of case-base editing in a spam filtering system. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004. LNCS (LNAI), vol. 3155, pp. 128–141. Springer, Heidelberg (2004)
University of California, Irvine. Machine Learning Repository, http://archive.ics.uci.edu/ml/
University of Waikato. Weka Machine Learning Project, http://www.cs.waikato.ac.nz/ml/weka/
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley Interscience, Hoboken (2000)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)
Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison Wesley, Reading (2005)
Keller, J.M., Gray, M.R., Givens, J.A.J.R.: A fuzzy K-nearest neighbor algorithm. IEEE Transactions on Systems, Man, and Cybernetics 15(4), 580–585 (1985)
Seidl, T., Kriegel, H.P.: Optimal multi-step k-nearest neighbor search. ACM SIGMOD Record 27(2), 154–165 (1998)
Burges, C.J.C.: A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery 2(2), 121–168 (1998)
Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)
Knaus, W.A., Draper, E.A., Wagner, D.P., Zimmerman, J.E.: APACHE II: a severity of disease classification system. Crit Care Med. 13(10), 818–829 (1985)
Joachims, T.: Transductive Inference for Text Classification using Support Vector Machines. In: 16th International Conference on Machine Learning, pp. 200–209 (1999)
Joachims, T.: Optimizing search engines using click through data. In: 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SESSION: Web Search and Navigation, pp. 133–142 (2002)
Dumais, S., Chen, H.: Hierarchical classification of Web content. In: 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 256–263 (2000)
Haralick, R.M., Shanmugam, K., Dinstein, Its’Hak: Textural Features for Image Classification. IEEE Transactions on Systems, Man and Cybernetics 3(6), 610–621 (1973)
Bevington, P.R., Robinson, D.K.: Data Reduction and Error Analysis for the Physical Sciences, 3 edn. (2002)
Tonry, J., Davis, M.: A survey of galaxy redshifts. I - Data reduction techniques. Astronomical Journal 84, 1511–1525 (1979)
Cano, J.R., Herrera, F., Lozano, M.: Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Transactions on Evolutionary Computation 7(6), 561–575 (2003)
Brighton, H., Mellish, C.: Advances in Instance Selection for Instance-Based Learning Algorithms. Data Mining and Knowledge Discovery 6(2), 153–172 (2002)
Aha, D., Kibler, D., Albert, M.K.: Instance-Based Learning Algorithms. Machine Learning 6(2), 37–66 (1991)
Dunn, J.C.: Some recent investigations of a new fuzzy partition algorithm and its application to pattern classification problems. J. Cybernetics 4, 1–15 (1974)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dai, BR., Hsu, SM. (2011). An Instance Selection Algorithm Based on Reverse Nearest Neighbor. In: Huang, J.Z., Cao, L., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2011. Lecture Notes in Computer Science(), vol 6634. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20841-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-20841-6_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20840-9
Online ISBN: 978-3-642-20841-6
eBook Packages: Computer ScienceComputer Science (R0)