Skip to main content
Log in

A dynamic split-and-merge approach for evolving cluster models

  • Original Paper
  • Published:
Evolving Systems Aims and scope Submit manuscript

Abstract

This paper describes new dynamic split-and-merge operations for evolving cluster models, which are learned incrementally and expanded on-the-fly from data streams. These operations are necessary to resolve the effects of cluster fusion and cluster delamination, which may appear over time in data stream learning. We propose two new criteria for cluster merging: a touching and a homogeneity criterion for two ellipsoidal clusters. The splitting criterion for an updated cluster applies a 2-means algorithm to its sub-samples and compares the quality of the split cluster with that of the original cluster by using a penalized Bayesian information criterion; the cluster partition of higher quality is retained for the next incremental update cycle. This new approach is evaluated using two-dimensional and high-dimensional streaming clustering data sets, where feature ranges are extended and clusters evolve over time—and on two large streams of classification data, each containing around 500K samples. The results show that the new split-and-merge approach (a) produces more reliable cluster partitions than conventional evolving clustering techniques and (b) reduces impurity and entropy of cluster partitions evolved on the classification data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. http://en.wikipedia.org/wiki/VLDB.

  2. http://en.wikipedia.org/wiki/Meteorology.

  3. http://cs.joensuu.fi/sipu/datasets/.

References

  • Andrews GE, Askey R, Roy R (2001) Special functions. Cambridge University Press, Cambridge

    Google Scholar 

  • Angelov P (2004) An approach for fuzzy rule-base adaptation using on-line clustering. Int J Approx Reason 35(3):275–289

    Article  MathSciNet  MATH  Google Scholar 

  • Angelov P, Filev D, Kasabov N (2010) Evolving intelligent systems—methodology and applications. Wiley, New York

    Book  Google Scholar 

  • Angelov P, Zhou XW (2006) Evolving fuzzy systems from data streams in real-time. In: 2006 International symposium on evolving fuzzy systems (EFS’06). Ambleside, Lake District, pp 29–35

  • Beringer J, Hüllermeier E (2006) Online clustering of parallel data streams. Data Knowl Eng 58(2):180–204

    Article  Google Scholar 

  • Beringer J, Hüllermeier E (2007) Adaptive optimization of the number of clusters in fuzzy clustering. In: Proceedings of the FUZZ-IEEE 2007, London, pp 1–6

  • Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 11:1601–1604

    Google Scholar 

  • Bouchachia A (2011) Evolving clustering: an asset for evolving systems. IEEE SMC Newslett 36

  • Bouchachia A, Vanaret C (2011) Incremental learning based on growing gaussian mixture models. In: Proceedings of 10th international conference on machine learning and applications (ICMLA 2011), Honululu, Haweii (to appear)

  • Declercq A, Piater J (2008) Online learning of gaussian mixture models—a two-level approach. In: Proceedings of the 3rd international conference on computer vision theory and applications VISAPP, Funchal, Portugal, pp 605–611

  • Dovzan D, Skrjanc I (2011) Recursive clustering based on a Gustafson-Kessel algorithm. Evol Syst 2(1):15–24

    Article  Google Scholar 

  • Farnstrom F, Lewis J, Elkan C (2000) Scalability for clustering algorithms revisited. In: SIGKDD explorations, London 2(1): 51–57

  • Gama J (2010) Knowledge discovery from data streams. Chapman & Hall/CRC, Boca Raton

    Book  MATH  Google Scholar 

  • Gan G, Ma C, Wu J (2007) Data clustering: theory, algorithms, and applications (Asa-Siam Series on Statistics and Applied Probability). Society for Industrial & Applied Mathematics, USA

  • Hall P, Hicks Y (2005) A method to add gaussian mixture models. Tech. rep., University of Bath

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference and prediction, 2nd edn. Springer, New York

  • Hühn J, Hüllermeier E (2009) FR3: a fuzzy rule learner for inducing reliable classifiers. IEEE Trans Fuzzy Syst 17(1):138–149

    Article  Google Scholar 

  • Jain A, Dubes R (1988) Algorithms for clustering data. Prentice Hall, Upper Saddle River

  • Jimenez L, Landgrebe D (1998) Supervised classification in high-dimensional space: Geometrical, statistical, and asymptotical properties of multivariate data. IEEE Trans Syst Man Cybern Part C Rev Appl 28(1):39–54

    Article  Google Scholar 

  • Kasabov NK, Song Q (2002) DENFIS: Dynamic evolving neural-fuzzy inference system and its application for time-series prediction. IEEE Trans Fuzzy Syst 10(2):144–154

    Article  Google Scholar 

  • Klinkenberg R (2004) Learning drifting concepts: example selection vs. example weighting. Intell Data Anal 8(3):281–300

    Google Scholar 

  • Lima E, Hell M, Ballini R, Gomide F (2010) Evolving fuzzy modeling using participatory learning. In: Angelov P, Filev D, Kasabov N (eds) Evolving intelligent systems: methodology and applications. Wiley, New York, pp 67–86

  • Lughofer E (2008) Extensions of vector quantization for incremental clustering. Pattern Recogn 41(3):995–1011

    Article  MATH  Google Scholar 

  • Lughofer E (2011) All-pairs evolving fuzzy classifiers for on-line multi-class classification problems. In: Proceedings of the EUSFLAT 2011 conference. Elsevier, Aix-Les-Bains, pp 372–379

  • Lughofer E, Bouchot JL, Shaker A (2011) On-line elimination of local redundancies in evolving fuzzy systems. Evol Syst 2(3):165–187

    Article  Google Scholar 

  • Nelles O (2001) Nonlinear system identification. Springer, Berlin

    MATH  Google Scholar 

  • Orlandic R, Lai Y, Yee W (2005) Clustering high-dimensional data using an efficient and effective data space reduction. In: Proceedings of ACM conference on information and knowledge management CIKM05, pp 201–208

  • Qin S, Li W, Yue H (2000) Recursive PCA for adaptive process monitoring. J Process Control 10(5):471–486

    Article  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464

    Article  MATH  Google Scholar 

  • Song M, Wang H (2005) Highly efficient incremental estimation of gaussian mixture models for online data stream clustering. In: Priddy KL (ed) Intelligent computing: theory and applications III. In: Proceedings of the SPIE, vol 5803, pp 174–183

  • Sun H, Wang S, Jiang Q (2004) FCM-based model selection algorithm for determining the number of clusters. Pattern Recogn 37(10):2027–2037

    Article  MATH  Google Scholar 

  • Tabata K, Kudo MSM (2010) Data compression by volume prototypes for streaming data. Pattern Recogn 43(9):3162–3176

    Article  MATH  Google Scholar 

  • Vachkov G (2010) Similarity analysis and knowledge acquisition by use of evolving neural models and fuzzy decision. In: Angelov P, Filev D, Kasabov N(eds) Evolving intelligent systems: methodology and applications. Wiley, Hoboken, pp 247–272

  • Varmuza K, Filzmoser P (2009) Introduction to multivariate statistical analysis in chemometrics. CRC Press, Boca Raton

    Book  Google Scholar 

  • Vitter JS (1985) Random sampling with a reservoir. ACM Trans Math Softw 11:37–57

    Article  MathSciNet  MATH  Google Scholar 

  • Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13(48):841–847

    Article  Google Scholar 

  • Zhao Y, Karypis G (2004) Empirical and theoretical comparisons of selected criterion functions for document clustering. Mach Learn 55(3):311–331

    Article  MATH  Google Scholar 

Download references

Acknowledgments

This work was funded by the Austrian fund for promoting scientific research (FWF, contract number I328-N23, acronym IREFS). This publication reflects only the authors’ views. The author also acknowledges Eyke Hüllermeier for providing valuable comments on the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Edwin Lughofer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lughofer, E. A dynamic split-and-merge approach for evolving cluster models. Evolving Systems 3, 135–151 (2012). https://doi.org/10.1007/s12530-012-9046-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12530-012-9046-5

Keywords

Navigation