EHC: Non-parametric Editing by Finding Homogeneous Clusters

Ougiaroglou, Stefanos; Evangelidis, Georgios

doi:10.1007/978-3-319-04939-7_14

Stefanos Ougiaroglou¹⁸ &
Georgios Evangelidis¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8367))

Included in the following conference series:

International Symposium on Foundations of Information and Knowledge Systems

Abstract

Editing is a crucial data mining task in the context of k-Nearest Neighbor classification. Its purpose is to improve classification accuracy by improving the quality of training datasets. To obtain such datasets, editing algorithms try to remove noisy and mislabeled data as well as smooth the decision boundaries between the discrete classes. In this paper, a new fast and non-parametric editing algorithm is proposed. It is called Editing through Homogeneous Clusters (EHC) and is based on an iterative execution of a clustering procedure that forms clusters containing items of a specific class only. Contrary to other editing approaches, EHC is independent of input (tuning) parameters. The performance of EHC is experimentally compared to three state-of-the-art editing algorithms on ten datasets. The results show that EHC is faster than its competitors and achieves high classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991), http://dx.doi.org/10.1023/A:1022689900470
Google Scholar
Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S.: Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Multiple-Valued Logic and Soft Computing 17(2-3), 255–287 (2011)
Google Scholar
Barandela, R., Gasca, E.: Decontamination of training samples for supervised pattern recognition methods. In: Ferri, F.J., Iñesta, J.M., Amin, A., Pudil, P. (eds.) SSPR&SPR 2000. LNCS, vol. 1876, pp. 621–630. Springer, Heidelberg (2000)
Google Scholar
Brighton, H., Mellish, C.: Advances in instance selection for instance-based learning algorithms. Data Min. Knowl. Discov. 6(2), 153–172 (2002), http://dx.doi.org/10.1023/A:1014043630878
Article MATH MathSciNet Google Scholar
Dasarathy, B.V.: Nearest neighbor (NN) norms: NN pattern classification techniques. IEEE Computer Society Press (1991)
Google Scholar
Dasarathy, B.V., Snchez, J.S., Townsend, S.: Nearest neighbour editing and condensing tools synergy exploitation. Pattern Analysis & Applications 3(1), 19–30 (2000), http://dx.doi.org/10.1007/s100440050003
Article Google Scholar
Devijver, P.A., Kittler, J.: On the edited nearest neighbor rule. In: Proceedings of the Fifth International Conference on Pattern Recognition. The Institute of Electrical and Electronics Engineers (1980)
Google Scholar
Garcia, S., Derrac, J., Cano, J., Herrera, F.: Prototype selection for nearest neighbor classification: Taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012), http://dx.doi.org/10.1109/TPAMI.2011.142
Article Google Scholar
García-Borroto, M., Villuendas-Rey, Y., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F.: Using maximum similarity graphs to edit nearest neighbor classifiers. In: Bayro-Corrochano, E., Eklundh, J.-O. (eds.) CIARP 2009. LNCS, vol. 5856, pp. 489–496. Springer, Heidelberg (2009)
Chapter Google Scholar
Grochowski, M., Jankowski, N.: Comparison of instance selection algorithms ii. results and comments. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS (LNAI), vol. 3070, pp. 580–585. Springer, Heidelberg (2004)
Chapter Google Scholar
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. The Morgan Kaufmann Series in Data Management Systems. Elsevier Science (2011)
Google Scholar
Hattori, K., Takahashi, M.: A new edited k-nearest neighbor rule in the pattern classification problem. Pattern Recognition 33(3), 521–528 (2000), http://www.sciencedirect.com/science/article/pii/S0031320399000680
Article Google Scholar
Grochowski, M., Jankowski, N.: Comparison of instances seletion algorithms i. algorithms survey. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS (LNAI), vol. 3070, pp. 598–603. Springer, Heidelberg (2004)
Chapter Google Scholar
Jiang, Y., Zhou, Z.-H.: Editing training data for knn classifiers with neural network ensemble. In: Yin, F.-L., Wang, J., Guo, C. (eds.) ISNN 2004. LNCS, vol. 3173, pp. 356–361. Springer, Heidelberg (2004)
Chapter Google Scholar
Lozano, M.: Data Reduction Techniques in Classification processes (Phd Thesis). Universitat Jaume I (2007)
Google Scholar
McQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. of 5th Berkeley Symp. on Math. Statistics and Probability, pp. 281–298. University of California Press, Berkeley (1967)
Google Scholar
Olvera-López, J.A., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Kittler, J.: A review of instance selection methods. Artif. Intell. Rev. 34(2), 133–143 (2010), http://dx.doi.org/10.1007/s10462-010-9165-y
Article Google Scholar
Ougiaroglou, S., Evangelidis, G.: Efficient dataset size reduction by finding homogeneous clusters. In: Proceedings of the Fifth Balkan Conference in Informatics, BCI 2012, pp. 168–173. ACM, New York (2012), http://doi.acm.org/10.1145/2371316.2371349
Google Scholar
Ougiaroglou, S., Nanopoulos, A., Papadopoulos, A.N., Manolopoulos, Y., Welzer-Druzovec, T.: Adaptive k-nearest-neighbor classification using a dynamic number of nearest neighbors. In: Ioannidis, Y., Novikov, B., Rachev, B. (eds.) ADBIS 2007. LNCS, vol. 4690, pp. 66–82. Springer, Heidelberg (2007)
Chapter Google Scholar
Sánchez, J.S., Barandela, R., Marqués, A.I., Alejo, R., Badenas, J.: Analysis of new techniques to obtain quality training sets. Pattern Recogn. Lett. 24(7), 1015–1022 (2003), http://dx.doi.org/10.1016/S0167-86550200225-8
Google Scholar
Segata, N., Blanzieri, E., Delany, S.J., Cunningham, P.: Noise reduction for instance-based learning with a local maximal margin approach. J. Intell. Inf. Syst. 35(2), 301–331 (2010), http://dx.doi.org/10.1007/s10844-009-0101-z
Article Google Scholar
Snchez, J., Pla, F., Ferri, F.: On the use of neighbourhood-based non-parametric classifiers. Pattern Recognition Letters 18(11–13), 1179–1186 (1997), http://www.sciencedirect.com/science/article/pii/S0167865597001128
Article Google Scholar
Snchez, J., Pla, F., Ferri, F.: Prototype selection for the nearest neighbour rule through proximity graphs. Pattern Recognition Letters 18(6), 507–513 (1997), http://www.sciencedirect.com/science/article/pii/S0167865597000354
Article Google Scholar
Tomek, I.: An experiment with the edited nearest-neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics 6, 448–452 (1976)
Article MATH MathSciNet Google Scholar
Toussaint, G.: Proximity graphs for nearest neighbor decision rules: Recent progress. In: 34th Symposium on the INTERFACE, pp. 17–20 (2002)
Google Scholar
Triguero, I., Derrac, J., Garcia, S., Herrera, F.: A taxonomy and experimental study on prototype generation for nearest neighbor classification. Trans. Sys. Man Cyber Part C 42(1), 86–100 (2012), http://dx.doi.org/10.1109/TSMCC.2010.2103939
Article Google Scholar
Vázquez, F., Sánchez, J.S., Pla, F.: A stochastic approach to wilson’s editing algorithm. In: Marques, J.S., de la Pérez Blanca, N., Pina, P. (eds.) IbPRIA 2005. LNCS, vol. 3523, pp. 35–42. Springer, Heidelberg (2005)
Chapter Google Scholar
Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-basedlearning algorithms. Mach. Learn. 38(3), 257–286 (2000), http://dx.doi.org/10.1023/A:1007626913721
Article MATH Google Scholar
Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. on Systems, Man, and Cybernetics 2(3), 408–421 (1972)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Applied Informatics, School of Information Sciences, University of Macedonia, 156 Egnatia str, GR-54006, Thessaloniki, Greece
Stefanos Ougiaroglou & Georgios Evangelidis

Authors

Stefanos Ougiaroglou
View author publications
You can also search for this author in PubMed Google Scholar
Georgios Evangelidis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Fakultät für Mathematik und Informatik, FernUniversität in Hagen, Universitätstrasse 1, 58084, Hagen, Germany
Christoph Beierle
Istituto di Scienza e Tecnologie della Informazione, Consiglio Nazionale delle Ricerche, Via G. Moruzzi 1, 56124, Pisa, Italy
Carlo Meghini

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ougiaroglou, S., Evangelidis, G. (2014). EHC: Non-parametric Editing by Finding Homogeneous Clusters. In: Beierle, C., Meghini, C. (eds) Foundations of Information and Knowledge Systems. FoIKS 2014. Lecture Notes in Computer Science, vol 8367. Springer, Cham. https://doi.org/10.1007/978-3-319-04939-7_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-04939-7_14
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04938-0
Online ISBN: 978-3-319-04939-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics