Skip to main content

An Instance Selection Algorithm Based on Reverse Nearest Neighbor

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6634))

Abstract

Data reduction is to extract a subset from a dataset. The advantages of data reduction are decreasing the requirement of storage and increasing the efficiency of classification. Using the subset as training data is possible to maintain classification accuracy; sometimes, it can be further improved because of eliminating noises. The key is how to choose representative samples while ignoring noises at the same time. Many instance selection algorithms are based on nearest neighbor decision rule (NN). Some of these algorithms select samples based on two strategies, incremental and decremental. The first type of algorithms select some instances as samples and iteratively add instances which do not have the same class label with their nearest sample to the sample set. The second type of algorithms remove instances which do not have the same class label with their majority of kNN. However, we propose an algorithm based on Reverse Nearest Neighbor (RNN), called the Reverse Nearest Neighbor Reduction (RNNR). RNNR selects samples which can represent other instances in the same class. In addition, RNNR does not need to iteratively scan a dataset which takes much processing time. Experimental results show that RNNR achieves comparable accuracy and selects fewer samples than comparators.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cover, T., Hart, P.: Nearest Neighbor Pattern Classification. IEEE Trans. Information Theory 13, 21–27 (1967)

    Article  MATH  Google Scholar 

  2. Korn, F., Muthukrishnan, S.: Influence sets based on reverse nearest neighbor queries. In: 19th ACM SIGMOD International Conference on Management of Data, pp. 201–212 (2000)

    Google Scholar 

  3. Brighton, H., Mellish, C.: On the consistency of information filters for lazy learning algorithms. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 283–288. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  4. Wilson, D.R., Martinez, T.R.: Instance pruning techniques. In: 14th International Conference on Machine Learning, pp. 403–411 (1997)

    Google Scholar 

  5. Marchiori, E.: Hit Miss Networks with applications to instance selection. The Journal of Machine Learning Research 9, 997–1017 (2008)

    MathSciNet  MATH  Google Scholar 

  6. Hart, P.: The Condensed Nearest Neighbor Rule. IEEE Trans. Information Theory 14, 515–516 (1968)

    Article  Google Scholar 

  7. Devi, F.S., Murty, M.N.: An Incremental Prototype Set Building Technique. Pattern Recognition 35(2), 505–513 (2002)

    Article  MATH  Google Scholar 

  8. Angiulli, F.: Fast nearest neighbor condensation for large data sets classification. IEEE Transactions on Knowledge and Data Engineering 19, 1450–1464 (2007)

    Article  Google Scholar 

  9. Chou, C.H., Kuo, B.H., Chang, F.: The generalized condensed nearest neighbor rule as a data reduction method. In: 18th International Conference on Pattern Recognition, pp. 556–559 (2006)

    Google Scholar 

  10. Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man and Cybernetics SMC-2, 408–421 (1972)

    Article  MathSciNet  MATH  Google Scholar 

  11. Smyth, B., Keane, M.: Remembering to forget: A competence preserving case deletion policy for CBR systems. In: Mellish, C. (ed.) 14th International Joint Conference on Artificial Intelligence, pp. 337–382. Morgan Kaufmann, San Francisco (1995)

    Google Scholar 

  12. Delany, S.J.: The Good, the Bad and the Incorrectly Classified: Profiling Cases for Case-Base Editing. In: McGinty, L., Wilson, D.C. (eds.) ICCBR 2009. LNCS, vol. 5650, pp. 135–149. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  13. Delany, S.J., Cunningham, P.: An analysis of case-base editing in a spam filtering system. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004. LNCS (LNAI), vol. 3155, pp. 128–141. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  14. University of California, Irvine. Machine Learning Repository, http://archive.ics.uci.edu/ml/

  15. University of Waikato. Weka Machine Learning Project, http://www.cs.waikato.ac.nz/ml/weka/

  16. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley Interscience, Hoboken (2000)

    MATH  Google Scholar 

  17. Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)

    MATH  Google Scholar 

  18. Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison Wesley, Reading (2005)

    Google Scholar 

  19. Keller, J.M., Gray, M.R., Givens, J.A.J.R.: A fuzzy K-nearest neighbor algorithm. IEEE Transactions on Systems, Man, and Cybernetics 15(4), 580–585 (1985)

    Article  Google Scholar 

  20. Seidl, T., Kriegel, H.P.: Optimal multi-step k-nearest neighbor search. ACM SIGMOD Record 27(2), 154–165 (1998)

    Article  Google Scholar 

  21. Burges, C.J.C.: A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery 2(2), 121–168 (1998)

    Article  Google Scholar 

  22. Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)

    Google Scholar 

  23. Knaus, W.A., Draper, E.A., Wagner, D.P., Zimmerman, J.E.: APACHE II: a severity of disease classification system. Crit Care Med. 13(10), 818–829 (1985)

    Article  Google Scholar 

  24. Joachims, T.: Transductive Inference for Text Classification using Support Vector Machines. In: 16th International Conference on Machine Learning, pp. 200–209 (1999)

    Google Scholar 

  25. Joachims, T.: Optimizing search engines using click through data. In: 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SESSION: Web Search and Navigation, pp. 133–142 (2002)

    Google Scholar 

  26. Dumais, S., Chen, H.: Hierarchical classification of Web content. In: 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 256–263 (2000)

    Google Scholar 

  27. Haralick, R.M., Shanmugam, K., Dinstein, Its’Hak: Textural Features for Image Classification. IEEE Transactions on Systems, Man and Cybernetics 3(6), 610–621 (1973)

    Google Scholar 

  28. Bevington, P.R., Robinson, D.K.: Data Reduction and Error Analysis for the Physical Sciences, 3 edn. (2002)

    Google Scholar 

  29. Tonry, J., Davis, M.: A survey of galaxy redshifts. I - Data reduction techniques. Astronomical Journal 84, 1511–1525 (1979)

    Article  Google Scholar 

  30. Cano, J.R., Herrera, F., Lozano, M.: Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Transactions on Evolutionary Computation 7(6), 561–575 (2003)

    Article  Google Scholar 

  31. Brighton, H., Mellish, C.: Advances in Instance Selection for Instance-Based Learning Algorithms. Data Mining and Knowledge Discovery 6(2), 153–172 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  32. Aha, D., Kibler, D., Albert, M.K.: Instance-Based Learning Algorithms. Machine Learning 6(2), 37–66 (1991)

    Google Scholar 

  33. Dunn, J.C.: Some recent investigations of a new fuzzy partition algorithm and its application to pattern classification problems. J. Cybernetics 4, 1–15 (1974)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dai, BR., Hsu, SM. (2011). An Instance Selection Algorithm Based on Reverse Nearest Neighbor. In: Huang, J.Z., Cao, L., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2011. Lecture Notes in Computer Science(), vol 6634. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20841-6_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20841-6_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20840-9

  • Online ISBN: 978-3-642-20841-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics