Skip to main content

Nearest Centroid Classifier Based on Information Value and Homogeneity

  • Conference paper
  • First Online:
Advances in Intelligent Manufacturing and Service System Informatics (IMSS 2023)

Abstract

The aim of this paper is to introduce a novel classification algorithm based on distance to class centroids with weighted Euclidean distance metric. Features are weighted by their predictive powers and in-class homogeneities. For predictive power, information value metric is used. For in-class homogeneity different measures are used. The algorithm is memory based but only the centroid information needs to be stored. The experimentations are carried at 45 benchmark datasets and 5 randomly generated datasets. The results are compared against Nearest Centroid, Logistic Regression, K-Nearest Neighbors and Decision Tree algorithms. The parameters of the new algorithm and of these traditional classification algorithms are tuned before comparison. The results are promising and has potential to trigger further research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wolpert, D.H.: The supervised learning no-free-lunch theorems. Soft Comput. Ind. 25–42 (2002)

    Google Scholar 

  2. Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, vol. 2, p. 670. Springer, New York (2009)

    Google Scholar 

  3. Shmueli, G.: To explain or to predict? Stat. Sci. 25(3), 289–310 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  4. Kuncheva, L.I.: Prototype classifiers and the big fish: the case of prototype (instance) selection. IEEE Syst. Man Cybern. Mag. 6(2), 49–56 (2020)

    Article  Google Scholar 

  5. Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. 99(10), 6567–6572 (2002)

    Article  Google Scholar 

  6. Hartigan, J.A., Wong, M.A.: Algorithm AS 136: a k-means clustering algorithm. J. Roy. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979)

    Google Scholar 

  7. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)

    Article  MATH  Google Scholar 

  8. Alpaydin, E.: Voting over multiple condensed nearest neighbors. In: Aha, D.W. (eds.) Lazy Learning, pp. 115–132. Springer, Dordrecht (1997). https://doi.org/10.1007/978-94-017-2053-3_4

  9. Gou, J., et al.: A representation coefficient-based k-nearest centroid neighbor classifier. Expert Syst. Appl. 194, 116529 (2022)

    Article  Google Scholar 

  10. Elen, A., Avuçlu, E.: Standardized Variable Distances: a distance-based machine learning method. Appl. Soft Comput. 98, 106855 (2021)

    Article  Google Scholar 

  11. Baesens, B., Van Gestel, T., Viaene, S., Stepanova, M., Suykens, J., Vanthienen, J.: Benchmarking state-of-the-art classification algorithms for credit scoring. J. Oper. Res. Soc. 54, 627–635 (2003)

    Article  MATH  Google Scholar 

  12. Lessmann, S., Baesens, B., Seow, H.V., Thomas, L.C.: Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research. Eur. J. Oper. Res. 247(1), 124–136 (2015)

    Article  MATH  Google Scholar 

  13. Siddiqi, N.: Intelligent Credit Scoring: Building and Implementing Better Credit Risk Scorecards, pp.186–197. Wiley (2017)

    Google Scholar 

  14. Vanschoren, J., Van Rijn, J.N., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. ACM SIGKDD Explor. Newslett. 15(2), 49–60 (2014)

    Article  Google Scholar 

  15. Feurer, M., et al.: OpenML-Python: an extensible Python API for OpenML. J. Mach. Learn. Res. 22(1), 4573–4577 (2021)

    Google Scholar 

  16. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mehmet Hamdi Özçelik .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Özçelik, M.H., Bulkan, S. (2024). Nearest Centroid Classifier Based on Information Value and Homogeneity. In: Şen, Z., Uygun, Ö., Erden, C. (eds) Advances in Intelligent Manufacturing and Service System Informatics. IMSS 2023. Lecture Notes in Mechanical Engineering. Springer, Singapore. https://doi.org/10.1007/978-981-99-6062-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-6062-0_5

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-6061-3

  • Online ISBN: 978-981-99-6062-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics