Advertisement

A hybrid heuristic algorithm for evolving models in simultaneous scenarios of classification and clustering

  • Mariela CerradaEmail author
  • Jose Aguilar
  • Junior Altamiranda
  • René-Vinicio Sánchez
Regular Paper
  • 2 Downloads

Abstract

Machine Learning is currently an important research field that attracts interest due to its importance for discovering hidden knowledge or patterns from big datasets. In this paper, we propose a heuristic algorithm which can solve problems related to only classification, only clustering, or classification with clustering by creating models with the ability to evolve to another class/cluster configuration without a retraining process for new incoming data. This algorithm combines supervised and unsupervised learning principles for the incremental construction of both classes and clusters, by using the main guidelines from two classical methods of classification based on distance and clustering based on prototypes, such as KNN and K-means. The algorithm is able to deal with labeled and unlabeled samples as inputs in order to create new groups (classes or clusters), merge or reconfigure existing ones. Basically, the creation of new groups follows three sequential steps: (i) locate the provisional group for an input sample using K-means, (ii) using 1NN, locate the nearest sample to the input sample, only considering the samples in the provisional group, and (iii) merge or reconfigure existing groups following specific guidelines. Several benchmarks, related to classification and clustering problems, were evaluated by our proposal; the results were compared with classical algorithms. On the other hand, artificial datasets with labeled and unlabeled samples have been created to show the ability of our algorithm in the hybrid context to solve classification and clustering combined. As a result, the algorithm is able to create clusters and classes, simultaneously, when required. Finally, a real case study of fault diagnosis in rotating machinery is presented for discovering new groups that might represent patterns from unknown data.

Keywords

Hybrid learning Evolving learning Incremental learning Data mining Classification Clustering 

Notes

Acknowledgements

The authors want to acknowledge deeply M.Sc. Fannia Pacheco for her contribution in this work. The results and guidelines proposed in her Master Thesis (https://bit.ly/2EQtOgm) have sustained the development of this approach. This work was sponsored by the Prometeo Project of the Secretariat for Higher Education, Science, Technology and Innovation (SENESCYT) of the Republic of Ecuador. The experiment of the real case application was developed at the Vibration Lab of the GIDTEC research group, at the Universidad Politécnica Salesiana, Cuenca-Ecuador.

References

  1. 1.
    Aggarwal CC, Han J, Wang J, Yu PS (2006) A framework for on-demand classification of evolving data streams. IEEE Trans Knowl Data Eng 18(5):577–589CrossRefGoogle Scholar
  2. 2.
    Amini A, Saboohi H, Herawan T, Wah TY (2016) Mudi-stream: a multi density clustering algorithm for evolving data stream. J Netw Comput Appl 59:370–385CrossRefGoogle Scholar
  3. 3.
    Basu S (2004) A comparison of inference techniques for semi-supervised clustering with Hidden Markov random fields. In: Proceedings of the ICML-2004 workshop on statistical relational learning and its connections to other fieldsGoogle Scholar
  4. 4.
    Bishop C (2006) Pattern recognition and machine learning. Springer, CambridgezbMATHGoogle Scholar
  5. 5.
    Bordignon F, Gomide F (2014) Uninorm based evolving neural networks and approximation capabilities. Neurocomputing 127:13–20CrossRefGoogle Scholar
  6. 6.
    Cai Q, He H, Man H (2014) Imbalanced evolving self-organizing learning. Neurocomputing 133:258–270CrossRefGoogle Scholar
  7. 7.
    Cara A, Herrera L, Pomares H, Rojas I (2013) New online self-evolving neuro fuzzy controller based on the TaSe-NF model. Inf Sci 220:226–243MathSciNetCrossRefGoogle Scholar
  8. 8.
    Chapelle O, Scholkopf B, Zien A (2006) Semi-supervised learning. MIT Press, CambridgeCrossRefGoogle Scholar
  9. 9.
    Czarnowski I, Jdrzejowicz P (2014) Ensemble classifier for mining data streams. Procedia Comput Sci 35:397–406 (knowledge Based and Intelligent Information; Engineering Systems 18th Annual Conference, KES 2014 Gdynia, September 2014, Poland) CrossRefGoogle Scholar
  10. 10.
    de Andrade SJ, Hruschka ER, Gama J (2017) An evolutionary algorithm for clustering data streams with a variable number of clusters. Expert Syst Appl 67:228–238CrossRefGoogle Scholar
  11. 11.
    Djouadi A, Snorrason O, Garber F (1990) The quality of training-sample estimates of the Bhattacharyya coefficient. IEEE Trans Pattern Anal Mach Intell 12:92–97CrossRefGoogle Scholar
  12. 12.
    Dunham M (2003) Data mining: introductory and advanced topics. Prentice Hall, Upper Saddle RiverGoogle Scholar
  13. 13.
    Ertoz L, Steinbach M, Kumar V (2004) Finding topics in collections of documents: a shared nearest neighbor approach. In: Wu W, Xiong H, Shekhar S (eds) Clustering and information retrieval, network theory and applications, vol 11. Springer, Boston, pp 83–103CrossRefGoogle Scholar
  14. 14.
    Geng X, Smith-Miles K (2009) Incremental learning. Springer, Boston, pp 731–735Google Scholar
  15. 15.
    Ghesmoune M, Lebbah M, Azzag H (2015) Micro-batching growing neural gas for clustering data streams using spark streaming. Procedia Comput Sci 53:158–166 (INNS Conference on Big Data 2015 Program San Francisco, CA, August 2015, USA) CrossRefGoogle Scholar
  16. 16.
    Hartert L, Sayed Mouchaweh M, Billaudel P (2010) A semi-supervised dynamic version of fuzzy K-nearest neighbours to monitor evolving systems. Evol Syst 1(1):3–15CrossRefGoogle Scholar
  17. 17.
    Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Springer, New YorkCrossRefzbMATHGoogle Scholar
  18. 18.
    He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRefGoogle Scholar
  19. 19.
    Hyde R, Angelov P, MacKenzie A (2017) Fully online clustering of evolving data streams into arbitrarily shaped clusters. Inf Sci 382383:96–114CrossRefGoogle Scholar
  20. 20.
    Klancar G, Skrjanc I (2015) Evolving principal component clustering with a low run-time complexity for LRF data mapping. Appl Soft Comput 35:349–358CrossRefGoogle Scholar
  21. 21.
    Kontaki M, Gounaris A, Papadopoulos AN, Tsichlas K, Manolopoulos Y (2016) Efficient and flexible algorithms for monitoring distance-based outliers over data streams. Inf Syst 55:37–53CrossRefGoogle Scholar
  22. 22.
    Krawczyk B, Woniak M (2015) Incremental weighted one-class classifier for mining stationary data streams. J Comput Sci 9:19–25CrossRefGoogle Scholar
  23. 23.
    Larose D (2004) k-nearest neighbor algorithm, in discovering knowledge in data: an introduction to data mining. Wiley, HobokenCrossRefGoogle Scholar
  24. 24.
    Liu B, Xiao Y, Yu P, Cao L, Zhang Y, Hao Z (2014) Uncertain one-class learning and concept summarization learning on uncertain data streams. IEEE Trans Knowl Data Eng 26(2):468–484CrossRefGoogle Scholar
  25. 25.
    Lughofer E (2013) On-line assurance of interpretability criteria in evolving fuzzy systems achievements, new concepts and open issues. Inf Sci 251:22–46MathSciNetCrossRefGoogle Scholar
  26. 26.
    Lughofer E, Sayed-Mouchaweh M (2015) Autonomous data stream clustering implementing split-and-merge concepts towards a plug-and-play approach. Inf Sci 304:54–79CrossRefGoogle Scholar
  27. 27.
    Lughofer E, Weigl E, Heidl W, Eitzinger C, Radauer T (2015a) Integrating new classes on the fly in evolving fuzzy classifier designs and their application in visual inspection. Appl Soft Comput 35:558–582CrossRefGoogle Scholar
  28. 28.
    Lughofer E, Weigl E, Heidl W, Eitzinger C, Radauer T (2015b) Integrating new classes on the fly in evolving fuzzy classifier designs and their application in visual inspection. Appl Soft Comput 35:558–582CrossRefGoogle Scholar
  29. 29.
    Maslennikov OV, Nekorkin VI (2015) Evolving dynamical networks with transient cluster activity. Commun Nonlinear Sci Numer Simul 23(13):10–16CrossRefGoogle Scholar
  30. 30.
    Mythily R, Banu A, Raghunathan S (2015) Clustering models for data stream mining. Procedia Comput Sci 46:619–626 (proceedings of the International Conference on Information and Communication Technologies, ICICT, December 2014, India) CrossRefGoogle Scholar
  31. 31.
    Pacheco F, de Oliveira JV, Sánchez RV, Cerrada M, Cabrera D, Li C, Zurita G, Artés M (2016) A statistical comparison of neuroclassifiers and feature selection methods for gearbox fault diagnosis under realistic conditions. Neurocomputing 194:192–206CrossRefGoogle Scholar
  32. 32.
    Pacheco F, Cerrada M, Snchez RV, Cabrera D, Li C, de Oliveira JV (2017) Attribute clustering using rough set theory for feature selection in fault severity classification of rotating machinery. Expert Syst Appl 71:69–86CrossRefGoogle Scholar
  33. 33.
    PhridviRaj M, GuruRao C (2014) Data mining past, present and future a typical survey on data streams. Procedia Technol 12:255–263 (the 7th International Conference Interdisciplinarity in Engineering, INTER-ENG 2013, October 2013, Romania) CrossRefGoogle Scholar
  34. 34.
    Pimentel M, Clifton D, Clifton L, Tarassenko L (2014) A review of novelty detection. Signal Process 99:215–249CrossRefGoogle Scholar
  35. 35.
    Rehman M, Li T, Yang Y, Wang H (2014) Hyper-ellipsoidal clustering technique for evolving data stream. Knowl Based Syst 70:3–14CrossRefGoogle Scholar
  36. 36.
    Rosenberg A, Hirschberg J (2007) V-measure: a conditional entropy-based external cluster evaluation measure. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), pp 410–420Google Scholar
  37. 37.
    Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65CrossRefzbMATHGoogle Scholar
  38. 38.
    Saffari A, Leistner C, Santner J, Godec M, Bischof H (2009) On-line random forests. In: 2009 IEEE 12th international conference on computer vision workshops, ICCV workshops, pp 1393–1400Google Scholar
  39. 39.
    Silva AM, Caminhas W, Lemos A, Gomide F (2014) A fast learning algorithm for evolving neo-fuzzy neuron. Appl Soft Comput Part B 14:194–209CrossRefGoogle Scholar
  40. 40.
    Skrjanc I, Dovzan D (2015) Evolving gustafson-kessel possibilistic c-means clustering. Procedia Computer Science 53:191–198CrossRefGoogle Scholar
  41. 41.
    Syed Z, Rubinfeld I (2010) Fast anomaly detection in dynamic clinical datasets using near-optimal hashing with concentric expansions. In: 2010 IEEE international conference on data mining workshops (ICDMW), pp 763–770Google Scholar
  42. 42.
    Tan P, Steinbach M, Kumar V (2005) Introduction to data mining. Addison Wesley, BostonGoogle Scholar
  43. 43.
    Xia S, Xiong Z, Luo Y, WeiXu Zhang G (2015) Effectiveness of the euclidean distance in high dimensional spaces. Optik Int J Light Electron Opt 126(24):5614–5619CrossRefGoogle Scholar
  44. 44.
    Yang H, Fong S (2015) Countering the concept-drift problems in big data by an incrementally optimized stream mining model. J Syst Softw 102:158–166CrossRefGoogle Scholar
  45. 45.
    Zhang R, Rudnicky AI (2002) A large scale clustering scheme for kernel k-means. In: Object recognition supported by user interaction for service robots, vol 4, pp 289–292Google Scholar
  46. 46.
    Zhu X, Ding W, Yu P, Zhang C (2011) One-class learning and concept summarization for data streams. Knowl Inf Syst 28(3):523–553CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.GIDTECUniversidad Politécnica SalesianaCuencaEcuador
  2. 2.Computing Sciences DepartmentUniversidad de Los AndesMéridaVenezuela

Personalised recommendations