Advertisement

EHC: Non-parametric Editing by Finding Homogeneous Clusters

  • Stefanos Ougiaroglou
  • Georgios Evangelidis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8367)

Abstract

Editing is a crucial data mining task in the context of k-Nearest Neighbor classification. Its purpose is to improve classification accuracy by improving the quality of training datasets. To obtain such datasets, editing algorithms try to remove noisy and mislabeled data as well as smooth the decision boundaries between the discrete classes. In this paper, a new fast and non-parametric editing algorithm is proposed. It is called Editing through Homogeneous Clusters (EHC) and is based on an iterative execution of a clustering procedure that forms clusters containing items of a specific class only. Contrary to other editing approaches, EHC is independent of input (tuning) parameters. The performance of EHC is experimentally compared to three state-of-the-art editing algorithms on ten datasets. The results show that EHC is faster than its competitors and achieves high classification accuracy.

Keywords

k-NN classification clustering editing noisy items 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991), http://dx.doi.org/10.1023/A:1022689900470 Google Scholar
  2. 2.
    Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S.: Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Multiple-Valued Logic and Soft Computing 17(2-3), 255–287 (2011)Google Scholar
  3. 3.
    Barandela, R., Gasca, E.: Decontamination of training samples for supervised pattern recognition methods. In: Ferri, F.J., Iñesta, J.M., Amin, A., Pudil, P. (eds.) SSPR&SPR 2000. LNCS, vol. 1876, pp. 621–630. Springer, Heidelberg (2000)Google Scholar
  4. 4.
    Brighton, H., Mellish, C.: Advances in instance selection for instance-based learning algorithms. Data Min. Knowl. Discov. 6(2), 153–172 (2002), http://dx.doi.org/10.1023/A:1014043630878 CrossRefzbMATHMathSciNetGoogle Scholar
  5. 5.
    Dasarathy, B.V.: Nearest neighbor (NN) norms: NN pattern classification techniques. IEEE Computer Society Press (1991)Google Scholar
  6. 6.
    Dasarathy, B.V., Snchez, J.S., Townsend, S.: Nearest neighbour editing and condensing tools synergy exploitation. Pattern Analysis & Applications 3(1), 19–30 (2000), http://dx.doi.org/10.1007/s100440050003 CrossRefGoogle Scholar
  7. 7.
    Devijver, P.A., Kittler, J.: On the edited nearest neighbor rule. In: Proceedings of the Fifth International Conference on Pattern Recognition. The Institute of Electrical and Electronics Engineers (1980)Google Scholar
  8. 8.
    Garcia, S., Derrac, J., Cano, J., Herrera, F.: Prototype selection for nearest neighbor classification: Taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012), http://dx.doi.org/10.1109/TPAMI.2011.142 CrossRefGoogle Scholar
  9. 9.
    García-Borroto, M., Villuendas-Rey, Y., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F.: Using maximum similarity graphs to edit nearest neighbor classifiers. In: Bayro-Corrochano, E., Eklundh, J.-O. (eds.) CIARP 2009. LNCS, vol. 5856, pp. 489–496. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  10. 10.
    Grochowski, M., Jankowski, N.: Comparison of instance selection algorithms ii. results and comments. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS (LNAI), vol. 3070, pp. 580–585. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  11. 11.
    Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. The Morgan Kaufmann Series in Data Management Systems. Elsevier Science (2011)Google Scholar
  12. 12.
    Hattori, K., Takahashi, M.: A new edited k-nearest neighbor rule in the pattern classification problem. Pattern Recognition 33(3), 521–528 (2000), http://www.sciencedirect.com/science/article/pii/S0031320399000680 CrossRefGoogle Scholar
  13. 13.
    Grochowski, M., Jankowski, N.: Comparison of instances seletion algorithms i. algorithms survey. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS (LNAI), vol. 3070, pp. 598–603. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  14. 14.
    Jiang, Y., Zhou, Z.-H.: Editing training data for knn classifiers with neural network ensemble. In: Yin, F.-L., Wang, J., Guo, C. (eds.) ISNN 2004. LNCS, vol. 3173, pp. 356–361. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  15. 15.
    Lozano, M.: Data Reduction Techniques in Classification processes (Phd Thesis). Universitat Jaume I (2007)Google Scholar
  16. 16.
    McQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. of 5th Berkeley Symp. on Math. Statistics and Probability, pp. 281–298. University of California Press, Berkeley (1967)Google Scholar
  17. 17.
    Olvera-López, J.A., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Kittler, J.: A review of instance selection methods. Artif. Intell. Rev. 34(2), 133–143 (2010), http://dx.doi.org/10.1007/s10462-010-9165-y CrossRefGoogle Scholar
  18. 18.
    Ougiaroglou, S., Evangelidis, G.: Efficient dataset size reduction by finding homogeneous clusters. In: Proceedings of the Fifth Balkan Conference in Informatics, BCI 2012, pp. 168–173. ACM, New York (2012), http://doi.acm.org/10.1145/2371316.2371349 Google Scholar
  19. 19.
    Ougiaroglou, S., Nanopoulos, A., Papadopoulos, A.N., Manolopoulos, Y., Welzer-Druzovec, T.: Adaptive k-nearest-neighbor classification using a dynamic number of nearest neighbors. In: Ioannidis, Y., Novikov, B., Rachev, B. (eds.) ADBIS 2007. LNCS, vol. 4690, pp. 66–82. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  20. 20.
    Sánchez, J.S., Barandela, R., Marqués, A.I., Alejo, R., Badenas, J.: Analysis of new techniques to obtain quality training sets. Pattern Recogn. Lett. 24(7), 1015–1022 (2003), http://dx.doi.org/10.1016/S0167-86550200225-8 Google Scholar
  21. 21.
    Segata, N., Blanzieri, E., Delany, S.J., Cunningham, P.: Noise reduction for instance-based learning with a local maximal margin approach. J. Intell. Inf. Syst. 35(2), 301–331 (2010), http://dx.doi.org/10.1007/s10844-009-0101-z CrossRefGoogle Scholar
  22. 22.
    Snchez, J., Pla, F., Ferri, F.: On the use of neighbourhood-based non-parametric classifiers. Pattern Recognition Letters 18(11–13), 1179–1186 (1997), http://www.sciencedirect.com/science/article/pii/S0167865597001128 CrossRefGoogle Scholar
  23. 23.
    Snchez, J., Pla, F., Ferri, F.: Prototype selection for the nearest neighbour rule through proximity graphs. Pattern Recognition Letters 18(6), 507–513 (1997), http://www.sciencedirect.com/science/article/pii/S0167865597000354 CrossRefGoogle Scholar
  24. 24.
    Tomek, I.: An experiment with the edited nearest-neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics 6, 448–452 (1976)CrossRefzbMATHMathSciNetGoogle Scholar
  25. 25.
    Toussaint, G.: Proximity graphs for nearest neighbor decision rules: Recent progress. In: 34th Symposium on the INTERFACE, pp. 17–20 (2002)Google Scholar
  26. 26.
    Triguero, I., Derrac, J., Garcia, S., Herrera, F.: A taxonomy and experimental study on prototype generation for nearest neighbor classification. Trans. Sys. Man Cyber Part C 42(1), 86–100 (2012), http://dx.doi.org/10.1109/TSMCC.2010.2103939 CrossRefGoogle Scholar
  27. 27.
    Vázquez, F., Sánchez, J.S., Pla, F.: A stochastic approach to wilson’s editing algorithm. In: Marques, J.S., de la Pérez Blanca, N., Pina, P. (eds.) IbPRIA 2005. LNCS, vol. 3523, pp. 35–42. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  28. 28.
    Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-basedlearning algorithms. Mach. Learn. 38(3), 257–286 (2000), http://dx.doi.org/10.1023/A:1007626913721 CrossRefzbMATHGoogle Scholar
  29. 29.
    Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. on Systems, Man, and Cybernetics 2(3), 408–421 (1972)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Stefanos Ougiaroglou
    • 1
  • Georgios Evangelidis
    • 1
  1. 1.Department of Applied Informatics, School of Information SciencesUniversity of MacedoniaThessalonikiGreece

Personalised recommendations