Visual K-Means Approach for Handling Class Imbalance Learning

  • Ch. N. Santhosh Kumar
  • K. Nageswara Rao
  • A. Govardhan
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 381)


In this paper, a novel clustering algorithm dubbed as Visual K-Means (VKM) is proposed. The proposed algorithm deals with the uniform effect which is very much visible in k-means algorithm for skewed distributed data sources. The evaluation of the proposed algorithm is conducted with 10 imbalanced dataset against five benchmark algorithms on six evaluation metrics. The observations from the simulation results project that the proposed algorithm is one of the best alternatives to handle the imbalanced datasets effectively.


Imbalanced data k-Means clustering algorithms Under sampling Visual k-means 


  1. 1.
    Xiong, H., Wu, J.J., Chen, J.: K-means clustering versus validation measures: a data-distribution perspective. IEEE Trans. Syst. Man Cybern. B Cybern. 39(2), 318–331 (2009)CrossRefGoogle Scholar
  2. 2.
    Lago-Fernández, L.F., Aragón, J., Martínez-Muñoz, G., González, A.M., Sánchez-Montañés, M.: Cluster validation in problems with increasing dimensionality and unbalanced clusters. Neurocomputing, Elsivier 123, 33–39 (2014)Google Scholar
  3. 3.
    Alejo, R., García, V., Pacheco-Sánchez, J.H.: An efficient over-sampling approach based on Mean Square Error Back propagation for dealing with the multi-class imbalance problem. Neural Process Lett. Elsivier. doi: 10.1007/s11063-014-9376-3
  4. 4.
    Wang, Q.: A hybrid sampling SVM approach to imbalanced data classification. Hindawi Publ. Corp. Abstr. Appl. Anal. 2014(972786), 7.
  5. 5.
    Santhosh Kumar, Ch.N., Nageswara Rao, K., Govardhan, A., Sudheer Reddy, K., Mahmood, A.M.: Undersampled K-means approach for handling imbalanced distributed data. Prog Artif. Intell. Springer. doi: 10.1007/s13748-014-0045-6
  6. 6.
    Brzezinski, D., Stefanowski, J.: Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Trans. Neural Netw. Learn. Syst.
  7. 7.
    Poolsawad, N., Kambhampati, C., Cleland, J.G.F.: Balancing class for performance of classification with a clinical dataset. In: Proceedings of the World Congress on Engineering 2014, vol. I, WCE n, U.KGoogle Scholar
  8. 8.
    Oreški, G., Oreški, S.: An experimental comparison of classification algorithm performances for highly imbalanced datasets. Presented at CECIIS 2014Google Scholar
  9. 9.
    Stefanowski, J.: Overlapping, rare examples and class decomposition in learning classifiers from imbalanced data. In: Emerging Paradigms in Machine Learning Smart Innovation, Systems and Technologies, vol. 13, pp 277–306. Springer, Berlin (2013)Google Scholar
  10. 10.
    Tomašev, N., Mladeni. D.: Class imbalance and the curse of minority hubs. Know.-Based Syst. J. (2013). doi:
  11. 11.
    Bekkar, M., Alitouche, T.A.: Imbalanced data learning approaches review. Int. J. Data Min. Know. Manage. Process (IJDKP) 3(4), (2013)Google Scholar
  12. 12.
    Fernández1, A., García1, S., Herrera, F.: Addressing the classification with imbalanced data: open problems and new challenges on class distribution. In: Corchado, E., Kurzyński, M., Wózniak, M. (eds.) HAIS 2011, Part I, LNAI 6678, pp. 1–10 (2011)Google Scholar
  13. 13.
    Ankerst, M., Breunig, M.M., Kriegel, H.-P., Sander, J.: OPTICS: ordering points to identify the clustering structure. In: Proceedings of ACM SIGMOD’99 International Conference on Management of Data, Philadelphia PA (1999)Google Scholar
  14. 14.
    Alcalá-Fdez, J., Fernandez, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multi.-Valued Logic Soft Comput. 17(2–3), 255–287 (2011)Google Scholar
  15. 15.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)Google Scholar
  16. 16.
    Blake, C., Merz, C.J.: UCI Repository of Machine Learning Databases. Machine-Readable Data Repository. Department of Information and Computer Science, University of California at Irvine, Irvine, (2000).

Copyright information

© Springer India 2016

Authors and Affiliations

  • Ch. N. Santhosh Kumar
    • 1
  • K. Nageswara Rao
    • 2
  • A. Govardhan
    • 3
  1. 1.Department of CSEJNTUHHyderabadIndia
  2. 2.PSCMR College of Engineering and TechnologyVijayawadaIndia
  3. 3.SIT, CSEJNTUHHyderabadIndia

Personalised recommendations