An Instance Selection Algorithm Based on Reverse Nearest Neighbor

Dai, Bi-Ru; Hsu, Shu-Ming

doi:10.1007/978-3-642-20841-6_1

An Instance Selection Algorithm Based on Reverse Nearest Neighbor

Bi-Ru Dai²² &
Shu-Ming Hsu²²

Conference paper

1697 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6634))

Abstract

Data reduction is to extract a subset from a dataset. The advantages of data reduction are decreasing the requirement of storage and increasing the efficiency of classification. Using the subset as training data is possible to maintain classification accuracy; sometimes, it can be further improved because of eliminating noises. The key is how to choose representative samples while ignoring noises at the same time. Many instance selection algorithms are based on nearest neighbor decision rule (NN). Some of these algorithms select samples based on two strategies, incremental and decremental. The first type of algorithms select some instances as samples and iteratively add instances which do not have the same class label with their nearest sample to the sample set. The second type of algorithms remove instances which do not have the same class label with their majority of kNN. However, we propose an algorithm based on Reverse Nearest Neighbor (RNN), called the Reverse Nearest Neighbor Reduction (RNNR). RNNR selects samples which can represent other instances in the same class. In addition, RNNR does not need to iteratively scan a dataset which takes much processing time. Experimental results show that RNNR achieves comparable accuracy and selects fewer samples than comparators.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cover, T., Hart, P.: Nearest Neighbor Pattern Classification. IEEE Trans. Information Theory 13, 21–27 (1967)
Article MATH Google Scholar
Korn, F., Muthukrishnan, S.: Influence sets based on reverse nearest neighbor queries. In: 19th ACM SIGMOD International Conference on Management of Data, pp. 201–212 (2000)
Google Scholar
Brighton, H., Mellish, C.: On the consistency of information filters for lazy learning algorithms. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 283–288. Springer, Heidelberg (1999)
Chapter Google Scholar
Wilson, D.R., Martinez, T.R.: Instance pruning techniques. In: 14th International Conference on Machine Learning, pp. 403–411 (1997)
Google Scholar
Marchiori, E.: Hit Miss Networks with applications to instance selection. The Journal of Machine Learning Research 9, 997–1017 (2008)
MathSciNet MATH Google Scholar
Hart, P.: The Condensed Nearest Neighbor Rule. IEEE Trans. Information Theory 14, 515–516 (1968)
Article Google Scholar
Devi, F.S., Murty, M.N.: An Incremental Prototype Set Building Technique. Pattern Recognition 35(2), 505–513 (2002)
Article MATH Google Scholar
Angiulli, F.: Fast nearest neighbor condensation for large data sets classification. IEEE Transactions on Knowledge and Data Engineering 19, 1450–1464 (2007)
Article Google Scholar
Chou, C.H., Kuo, B.H., Chang, F.: The generalized condensed nearest neighbor rule as a data reduction method. In: 18th International Conference on Pattern Recognition, pp. 556–559 (2006)
Google Scholar
Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man and Cybernetics SMC-2, 408–421 (1972)
Article MathSciNet MATH Google Scholar
Smyth, B., Keane, M.: Remembering to forget: A competence preserving case deletion policy for CBR systems. In: Mellish, C. (ed.) 14th International Joint Conference on Artificial Intelligence, pp. 337–382. Morgan Kaufmann, San Francisco (1995)
Google Scholar
Delany, S.J.: The Good, the Bad and the Incorrectly Classified: Profiling Cases for Case-Base Editing. In: McGinty, L., Wilson, D.C. (eds.) ICCBR 2009. LNCS, vol. 5650, pp. 135–149. Springer, Heidelberg (2009)
Chapter Google Scholar
Delany, S.J., Cunningham, P.: An analysis of case-base editing in a spam filtering system. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004. LNCS (LNAI), vol. 3155, pp. 128–141. Springer, Heidelberg (2004)
Chapter Google Scholar
University of California, Irvine. Machine Learning Repository, http://archive.ics.uci.edu/ml/
University of Waikato. Weka Machine Learning Project, http://www.cs.waikato.ac.nz/ml/weka/
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley Interscience, Hoboken (2000)
MATH Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)
MATH Google Scholar
Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison Wesley, Reading (2005)
Google Scholar
Keller, J.M., Gray, M.R., Givens, J.A.J.R.: A fuzzy K-nearest neighbor algorithm. IEEE Transactions on Systems, Man, and Cybernetics 15(4), 580–585 (1985)
Article Google Scholar
Seidl, T., Kriegel, H.P.: Optimal multi-step k-nearest neighbor search. ACM SIGMOD Record 27(2), 154–165 (1998)
Article Google Scholar
Burges, C.J.C.: A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery 2(2), 121–168 (1998)
Article Google Scholar
Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)
Google Scholar
Knaus, W.A., Draper, E.A., Wagner, D.P., Zimmerman, J.E.: APACHE II: a severity of disease classification system. Crit Care Med. 13(10), 818–829 (1985)
Article Google Scholar
Joachims, T.: Transductive Inference for Text Classification using Support Vector Machines. In: 16th International Conference on Machine Learning, pp. 200–209 (1999)
Google Scholar
Joachims, T.: Optimizing search engines using click through data. In: 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SESSION: Web Search and Navigation, pp. 133–142 (2002)
Google Scholar
Dumais, S., Chen, H.: Hierarchical classification of Web content. In: 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 256–263 (2000)
Google Scholar
Haralick, R.M., Shanmugam, K., Dinstein, Its’Hak: Textural Features for Image Classification. IEEE Transactions on Systems, Man and Cybernetics 3(6), 610–621 (1973)
Google Scholar
Bevington, P.R., Robinson, D.K.: Data Reduction and Error Analysis for the Physical Sciences, 3 edn. (2002)
Google Scholar
Tonry, J., Davis, M.: A survey of galaxy redshifts. I - Data reduction techniques. Astronomical Journal 84, 1511–1525 (1979)
Article Google Scholar
Cano, J.R., Herrera, F., Lozano, M.: Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Transactions on Evolutionary Computation 7(6), 561–575 (2003)
Article Google Scholar
Brighton, H., Mellish, C.: Advances in Instance Selection for Instance-Based Learning Algorithms. Data Mining and Knowledge Discovery 6(2), 153–172 (2002)
Article MathSciNet MATH Google Scholar
Aha, D., Kibler, D., Albert, M.K.: Instance-Based Learning Algorithms. Machine Learning 6(2), 37–66 (1991)
Google Scholar
Dunn, J.C.: Some recent investigations of a new fuzzy partition algorithm and its application to pattern classification problems. J. Cybernetics 4, 1–15 (1974)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

The Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan, ROC
Bi-Ru Dai & Shu-Ming Hsu

Authors

Bi-Ru Dai
View author publications
You can also search for this author in PubMed Google Scholar
Shu-Ming Hsu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Shenzhen Institutes of Advanced Technology (SIAT), Chinese Academy of Sciences, 518055, Shenzhen, China
Joshua Zhexue Huang
Faculty of Engineering and Information Technology, Center for Quantum Computation and Intelligent Systems, Data Sciences and Knowledge Discovery Lab, University of Technology Sydney, NSW 2007, Sydney, Australia
Longbing Cao
Department of Computer Science and Engineering, University of Minnesota, MN 55455, Minneapolis, USA
Jaideep Srivastava

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dai, BR., Hsu, SM. (2011). An Instance Selection Algorithm Based on Reverse Nearest Neighbor. In: Huang, J.Z., Cao, L., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2011. Lecture Notes in Computer Science(), vol 6634. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20841-6_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-20841-6_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20840-9
Online ISBN: 978-3-642-20841-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics