Nearest Centroid Classifier Based on Information Value and Homogeneity

Özçelik, Mehmet Hamdi; Bulkan, Serol

doi:10.1007/978-981-99-6062-0_5

Part of the book series: Lecture Notes in Mechanical Engineering ((LNME))

Included in the following conference series:

International Symposium on Intelligent Manufacturing and Service Systems

339 Accesses

Abstract

The aim of this paper is to introduce a novel classification algorithm based on distance to class centroids with weighted Euclidean distance metric. Features are weighted by their predictive powers and in-class homogeneities. For predictive power, information value metric is used. For in-class homogeneity different measures are used. The algorithm is memory based but only the centroid information needs to be stored. The experimentations are carried at 45 benchmark datasets and 5 randomly generated datasets. The results are compared against Nearest Centroid, Logistic Regression, K-Nearest Neighbors and Decision Tree algorithms. The parameters of the new algorithm and of these traditional classification algorithms are tuned before comparison. The results are promising and has potential to trigger further research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Wolpert, D.H.: The supervised learning no-free-lunch theorems. Soft Comput. Ind. 25–42 (2002)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, vol. 2, p. 670. Springer, New York (2009)
Google Scholar
Shmueli, G.: To explain or to predict? Stat. Sci. 25(3), 289–310 (2010)
Article MathSciNet MATH Google Scholar
Kuncheva, L.I.: Prototype classifiers and the big fish: the case of prototype (instance) selection. IEEE Syst. Man Cybern. Mag. 6(2), 49–56 (2020)
Article Google Scholar
Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. 99(10), 6567–6572 (2002)
Article Google Scholar
Hartigan, J.A., Wong, M.A.: Algorithm AS 136: a k-means clustering algorithm. J. Roy. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979)
Google Scholar
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)
Article MATH Google Scholar
Alpaydin, E.: Voting over multiple condensed nearest neighbors. In: Aha, D.W. (eds.) Lazy Learning, pp. 115–132. Springer, Dordrecht (1997). https://doi.org/10.1007/978-94-017-2053-3_4
Gou, J., et al.: A representation coefficient-based k-nearest centroid neighbor classifier. Expert Syst. Appl. 194, 116529 (2022)
Article Google Scholar
Elen, A., Avuçlu, E.: Standardized Variable Distances: a distance-based machine learning method. Appl. Soft Comput. 98, 106855 (2021)
Article Google Scholar
Baesens, B., Van Gestel, T., Viaene, S., Stepanova, M., Suykens, J., Vanthienen, J.: Benchmarking state-of-the-art classification algorithms for credit scoring. J. Oper. Res. Soc. 54, 627–635 (2003)
Article MATH Google Scholar
Lessmann, S., Baesens, B., Seow, H.V., Thomas, L.C.: Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research. Eur. J. Oper. Res. 247(1), 124–136 (2015)
Article MATH Google Scholar
Siddiqi, N.: Intelligent Credit Scoring: Building and Implementing Better Credit Risk Scorecards, pp.186–197. Wiley (2017)
Google Scholar
Vanschoren, J., Van Rijn, J.N., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. ACM SIGKDD Explor. Newslett. 15(2), 49–60 (2014)
Article Google Scholar
Feurer, M., et al.: OpenML-Python: an extensible Python API for OpenML. J. Mach. Learn. Res. 22(1), 4573–4577 (2021)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Applied Analytics AA, İstanbul, Türkiye
Mehmet Hamdi Özçelik
Department of Industrial Engineering, Marmara University, İstanbul, Türkiye
Mehmet Hamdi Özçelik & Serol Bulkan

Authors

Mehmet Hamdi Özçelik
View author publications
You can also search for this author in PubMed Google Scholar
Serol Bulkan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mehmet Hamdi Özçelik .

Editor information

Editors and Affiliations

Istanbul Medipol University, Istanbul, Türkiye
Zekâi Şen
Department of Industrial Engineering, Sakarya University, Serdivan, Sakarya, Türkiye
Özer Uygun
Faculty of Applied Sciences, Sakarya University of Applied Sciences, Kaynarca, Sakarya, Türkiye
Caner Erden

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Özçelik, M.H., Bulkan, S. (2024). Nearest Centroid Classifier Based on Information Value and Homogeneity. In: Şen, Z., Uygun, Ö., Erden, C. (eds) Advances in Intelligent Manufacturing and Service System Informatics. IMSS 2023. Lecture Notes in Mechanical Engineering. Springer, Singapore. https://doi.org/10.1007/978-981-99-6062-0_5

Download citation

DOI: https://doi.org/10.1007/978-981-99-6062-0_5
Published: 02 October 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-6061-3
Online ISBN: 978-981-99-6062-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics