Evolving Systems

, Volume 3, Issue 3, pp 135–151 | Cite as

A dynamic split-and-merge approach for evolving cluster models

Original Paper

Abstract

This paper describes new dynamic split-and-merge operations for evolving cluster models, which are learned incrementally and expanded on-the-fly from data streams. These operations are necessary to resolve the effects of cluster fusion and cluster delamination, which may appear over time in data stream learning. We propose two new criteria for cluster merging: a touching and a homogeneity criterion for two ellipsoidal clusters. The splitting criterion for an updated cluster applies a 2-means algorithm to its sub-samples and compares the quality of the split cluster with that of the original cluster by using a penalized Bayesian information criterion; the cluster partition of higher quality is retained for the next incremental update cycle. This new approach is evaluated using two-dimensional and high-dimensional streaming clustering data sets, where feature ranges are extended and clusters evolve over time—and on two large streams of classification data, each containing around 500K samples. The results show that the new split-and-merge approach (a) produces more reliable cluster partitions than conventional evolving clustering techniques and (b) reduces impurity and entropy of cluster partitions evolved on the classification data sets.

Keywords

Evolving cluster models Cluster fusion and delamination Dynamic split-and-merge Touching and homogeneity criteria Penalized Bayesian information criterion 

References

  1. Andrews GE, Askey R, Roy R (2001) Special functions. Cambridge University Press, CambridgeGoogle Scholar
  2. Angelov P (2004) An approach for fuzzy rule-base adaptation using on-line clustering. Int J Approx Reason 35(3):275–289MathSciNetCrossRefMATHGoogle Scholar
  3. Angelov P, Filev D, Kasabov N (2010) Evolving intelligent systems—methodology and applications. Wiley, New YorkCrossRefGoogle Scholar
  4. Angelov P, Zhou XW (2006) Evolving fuzzy systems from data streams in real-time. In: 2006 International symposium on evolving fuzzy systems (EFS’06). Ambleside, Lake District, pp 29–35Google Scholar
  5. Beringer J, Hüllermeier E (2006) Online clustering of parallel data streams. Data Knowl Eng 58(2):180–204CrossRefGoogle Scholar
  6. Beringer J, Hüllermeier E (2007) Adaptive optimization of the number of clusters in fuzzy clustering. In: Proceedings of the FUZZ-IEEE 2007, London, pp 1–6Google Scholar
  7. Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 11:1601–1604Google Scholar
  8. Bouchachia A (2011) Evolving clustering: an asset for evolving systems. IEEE SMC Newslett 36Google Scholar
  9. Bouchachia A, Vanaret C (2011) Incremental learning based on growing gaussian mixture models. In: Proceedings of 10th international conference on machine learning and applications (ICMLA 2011), Honululu, Haweii (to appear)Google Scholar
  10. Declercq A, Piater J (2008) Online learning of gaussian mixture models—a two-level approach. In: Proceedings of the 3rd international conference on computer vision theory and applications VISAPP, Funchal, Portugal, pp 605–611Google Scholar
  11. Dovzan D, Skrjanc I (2011) Recursive clustering based on a Gustafson-Kessel algorithm. Evol Syst 2(1):15–24CrossRefGoogle Scholar
  12. Farnstrom F, Lewis J, Elkan C (2000) Scalability for clustering algorithms revisited. In: SIGKDD explorations, London 2(1): 51–57Google Scholar
  13. Gama J (2010) Knowledge discovery from data streams. Chapman & Hall/CRC, Boca RatonCrossRefMATHGoogle Scholar
  14. Gan G, Ma C, Wu J (2007) Data clustering: theory, algorithms, and applications (Asa-Siam Series on Statistics and Applied Probability). Society for Industrial & Applied Mathematics, USAGoogle Scholar
  15. Hall P, Hicks Y (2005) A method to add gaussian mixture models. Tech. rep., University of BathGoogle Scholar
  16. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference and prediction, 2nd edn. Springer, New YorkGoogle Scholar
  17. Hühn J, Hüllermeier E (2009) FR3: a fuzzy rule learner for inducing reliable classifiers. IEEE Trans Fuzzy Syst 17(1):138–149CrossRefGoogle Scholar
  18. Jain A, Dubes R (1988) Algorithms for clustering data. Prentice Hall, Upper Saddle RiverGoogle Scholar
  19. Jimenez L, Landgrebe D (1998) Supervised classification in high-dimensional space: Geometrical, statistical, and asymptotical properties of multivariate data. IEEE Trans Syst Man Cybern Part C Rev Appl 28(1):39–54CrossRefGoogle Scholar
  20. Kasabov NK, Song Q (2002) DENFIS: Dynamic evolving neural-fuzzy inference system and its application for time-series prediction. IEEE Trans Fuzzy Syst 10(2):144–154CrossRefGoogle Scholar
  21. Klinkenberg R (2004) Learning drifting concepts: example selection vs. example weighting. Intell Data Anal 8(3):281–300Google Scholar
  22. Lima E, Hell M, Ballini R, Gomide F (2010) Evolving fuzzy modeling using participatory learning. In: Angelov P, Filev D, Kasabov N (eds) Evolving intelligent systems: methodology and applications. Wiley, New York, pp 67–86Google Scholar
  23. Lughofer E (2008) Extensions of vector quantization for incremental clustering. Pattern Recogn 41(3):995–1011CrossRefMATHGoogle Scholar
  24. Lughofer E (2011) All-pairs evolving fuzzy classifiers for on-line multi-class classification problems. In: Proceedings of the EUSFLAT 2011 conference. Elsevier, Aix-Les-Bains, pp 372–379Google Scholar
  25. Lughofer E, Bouchot JL, Shaker A (2011) On-line elimination of local redundancies in evolving fuzzy systems. Evol Syst 2(3):165–187CrossRefGoogle Scholar
  26. Nelles O (2001) Nonlinear system identification. Springer, BerlinMATHGoogle Scholar
  27. Orlandic R, Lai Y, Yee W (2005) Clustering high-dimensional data using an efficient and effective data space reduction. In: Proceedings of ACM conference on information and knowledge management CIKM05, pp 201–208Google Scholar
  28. Qin S, Li W, Yue H (2000) Recursive PCA for adaptive process monitoring. J Process Control 10(5):471–486CrossRefGoogle Scholar
  29. Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464CrossRefMATHGoogle Scholar
  30. Song M, Wang H (2005) Highly efficient incremental estimation of gaussian mixture models for online data stream clustering. In: Priddy KL (ed) Intelligent computing: theory and applications III. In: Proceedings of the SPIE, vol 5803, pp 174–183Google Scholar
  31. Sun H, Wang S, Jiang Q (2004) FCM-based model selection algorithm for determining the number of clusters. Pattern Recogn 37(10):2027–2037CrossRefMATHGoogle Scholar
  32. Tabata K, Kudo MSM (2010) Data compression by volume prototypes for streaming data. Pattern Recogn 43(9):3162–3176CrossRefMATHGoogle Scholar
  33. Vachkov G (2010) Similarity analysis and knowledge acquisition by use of evolving neural models and fuzzy decision. In: Angelov P, Filev D, Kasabov N(eds) Evolving intelligent systems: methodology and applications. Wiley, Hoboken, pp 247–272Google Scholar
  34. Varmuza K, Filzmoser P (2009) Introduction to multivariate statistical analysis in chemometrics. CRC Press, Boca RatonCrossRefGoogle Scholar
  35. Vitter JS (1985) Random sampling with a reservoir. ACM Trans Math Softw 11:37–57MathSciNetCrossRefMATHGoogle Scholar
  36. Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13(48):841–847CrossRefGoogle Scholar
  37. Zhao Y, Karypis G (2004) Empirical and theoretical comparisons of selected criterion functions for document clustering. Mach Learn 55(3):311–331CrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag 2012

Authors and Affiliations

  1. 1.Department of Knowledge-based Mathematical SystemsJohannes Kepler University of LinzLinzAustria

Personalised recommendations