Abstract
The aim of this paper is to introduce a novel classification algorithm based on distance to class centroids with weighted Euclidean distance metric. Features are weighted by their predictive powers and in-class homogeneities. For predictive power, information value metric is used. For in-class homogeneity different measures are used. The algorithm is memory based but only the centroid information needs to be stored. The experimentations are carried at 45 benchmark datasets and 5 randomly generated datasets. The results are compared against Nearest Centroid, Logistic Regression, K-Nearest Neighbors and Decision Tree algorithms. The parameters of the new algorithm and of these traditional classification algorithms are tuned before comparison. The results are promising and has potential to trigger further research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wolpert, D.H.: The supervised learning no-free-lunch theorems. Soft Comput. Ind. 25–42 (2002)
Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, vol. 2, p. 670. Springer, New York (2009)
Shmueli, G.: To explain or to predict? Stat. Sci. 25(3), 289–310 (2010)
Kuncheva, L.I.: Prototype classifiers and the big fish: the case of prototype (instance) selection. IEEE Syst. Man Cybern. Mag. 6(2), 49–56 (2020)
Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. 99(10), 6567–6572 (2002)
Hartigan, J.A., Wong, M.A.: Algorithm AS 136: a k-means clustering algorithm. J. Roy. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979)
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)
Alpaydin, E.: Voting over multiple condensed nearest neighbors. In: Aha, D.W. (eds.) Lazy Learning, pp. 115–132. Springer, Dordrecht (1997). https://doi.org/10.1007/978-94-017-2053-3_4
Gou, J., et al.: A representation coefficient-based k-nearest centroid neighbor classifier. Expert Syst. Appl. 194, 116529 (2022)
Elen, A., Avuçlu, E.: Standardized Variable Distances: a distance-based machine learning method. Appl. Soft Comput. 98, 106855 (2021)
Baesens, B., Van Gestel, T., Viaene, S., Stepanova, M., Suykens, J., Vanthienen, J.: Benchmarking state-of-the-art classification algorithms for credit scoring. J. Oper. Res. Soc. 54, 627–635 (2003)
Lessmann, S., Baesens, B., Seow, H.V., Thomas, L.C.: Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research. Eur. J. Oper. Res. 247(1), 124–136 (2015)
Siddiqi, N.: Intelligent Credit Scoring: Building and Implementing Better Credit Risk Scorecards, pp.186–197. Wiley (2017)
Vanschoren, J., Van Rijn, J.N., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. ACM SIGKDD Explor. Newslett. 15(2), 49–60 (2014)
Feurer, M., et al.: OpenML-Python: an extensible Python API for OpenML. J. Mach. Learn. Res. 22(1), 4573–4577 (2021)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Acknowledgements
The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Özçelik, M.H., Bulkan, S. (2024). Nearest Centroid Classifier Based on Information Value and Homogeneity. In: Şen, Z., Uygun, Ö., Erden, C. (eds) Advances in Intelligent Manufacturing and Service System Informatics. IMSS 2023. Lecture Notes in Mechanical Engineering. Springer, Singapore. https://doi.org/10.1007/978-981-99-6062-0_5
Download citation
DOI: https://doi.org/10.1007/978-981-99-6062-0_5
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-6061-3
Online ISBN: 978-981-99-6062-0
eBook Packages: EngineeringEngineering (R0)