A dynamic split-and-merge approach for evolving cluster models

Lughofer, Edwin

doi:10.1007/s12530-012-9046-5

A dynamic split-and-merge approach for evolving cluster models

Original Paper
Published: 05 February 2012

Volume 3, pages 135–151, (2012)
Cite this article

Evolving Systems Aims and scope Submit manuscript

Edwin Lughofer¹

1211 Accesses
64 Citations
Explore all metrics

Abstract

This paper describes new dynamic split-and-merge operations for evolving cluster models, which are learned incrementally and expanded on-the-fly from data streams. These operations are necessary to resolve the effects of cluster fusion and cluster delamination, which may appear over time in data stream learning. We propose two new criteria for cluster merging: a touching and a homogeneity criterion for two ellipsoidal clusters. The splitting criterion for an updated cluster applies a 2-means algorithm to its sub-samples and compares the quality of the split cluster with that of the original cluster by using a penalized Bayesian information criterion; the cluster partition of higher quality is retained for the next incremental update cycle. This new approach is evaluated using two-dimensional and high-dimensional streaming clustering data sets, where feature ranges are extended and clusters evolve over time—and on two large streams of classification data, each containing around 500K samples. The results show that the new split-and-merge approach (a) produces more reliable cluster partitions than conventional evolving clustering techniques and (b) reduces impurity and entropy of cluster partitions evolved on the classification data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

EvolveCluster: an evolutionary clustering algorithm for streaming data

Article Open access 13 November 2021

Christian Nordahl, Veselka Boeva, … Marie Persson Netz

StreamXM: An Adaptive Partitional Clustering Solution for Evolving Data Streams

An Adaptive Strategy for Dynamic Data Clustering with the K-Means Algorithm

Notes

References

Andrews GE, Askey R, Roy R (2001) Special functions. Cambridge University Press, Cambridge
Google Scholar
Angelov P (2004) An approach for fuzzy rule-base adaptation using on-line clustering. Int J Approx Reason 35(3):275–289
Article MathSciNet MATH Google Scholar
Angelov P, Filev D, Kasabov N (2010) Evolving intelligent systems—methodology and applications. Wiley, New York
Book Google Scholar
Angelov P, Zhou XW (2006) Evolving fuzzy systems from data streams in real-time. In: 2006 International symposium on evolving fuzzy systems (EFS’06). Ambleside, Lake District, pp 29–35
Beringer J, Hüllermeier E (2006) Online clustering of parallel data streams. Data Knowl Eng 58(2):180–204
Article Google Scholar
Beringer J, Hüllermeier E (2007) Adaptive optimization of the number of clusters in fuzzy clustering. In: Proceedings of the FUZZ-IEEE 2007, London, pp 1–6
Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 11:1601–1604
Google Scholar
Bouchachia A (2011) Evolving clustering: an asset for evolving systems. IEEE SMC Newslett 36
Bouchachia A, Vanaret C (2011) Incremental learning based on growing gaussian mixture models. In: Proceedings of 10th international conference on machine learning and applications (ICMLA 2011), Honululu, Haweii (to appear)
Declercq A, Piater J (2008) Online learning of gaussian mixture models—a two-level approach. In: Proceedings of the 3rd international conference on computer vision theory and applications VISAPP, Funchal, Portugal, pp 605–611
Dovzan D, Skrjanc I (2011) Recursive clustering based on a Gustafson-Kessel algorithm. Evol Syst 2(1):15–24
Article Google Scholar
Farnstrom F, Lewis J, Elkan C (2000) Scalability for clustering algorithms revisited. In: SIGKDD explorations, London 2(1): 51–57
Gama J (2010) Knowledge discovery from data streams. Chapman & Hall/CRC, Boca Raton
Book MATH Google Scholar
Gan G, Ma C, Wu J (2007) Data clustering: theory, algorithms, and applications (Asa-Siam Series on Statistics and Applied Probability). Society for Industrial & Applied Mathematics, USA
Hall P, Hicks Y (2005) A method to add gaussian mixture models. Tech. rep., University of Bath
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference and prediction, 2nd edn. Springer, New York
Hühn J, Hüllermeier E (2009) FR3: a fuzzy rule learner for inducing reliable classifiers. IEEE Trans Fuzzy Syst 17(1):138–149
Article Google Scholar
Jain A, Dubes R (1988) Algorithms for clustering data. Prentice Hall, Upper Saddle River
Jimenez L, Landgrebe D (1998) Supervised classification in high-dimensional space: Geometrical, statistical, and asymptotical properties of multivariate data. IEEE Trans Syst Man Cybern Part C Rev Appl 28(1):39–54
Article Google Scholar
Kasabov NK, Song Q (2002) DENFIS: Dynamic evolving neural-fuzzy inference system and its application for time-series prediction. IEEE Trans Fuzzy Syst 10(2):144–154
Article Google Scholar
Klinkenberg R (2004) Learning drifting concepts: example selection vs. example weighting. Intell Data Anal 8(3):281–300
Google Scholar
Lima E, Hell M, Ballini R, Gomide F (2010) Evolving fuzzy modeling using participatory learning. In: Angelov P, Filev D, Kasabov N (eds) Evolving intelligent systems: methodology and applications. Wiley, New York, pp 67–86
Lughofer E (2008) Extensions of vector quantization for incremental clustering. Pattern Recogn 41(3):995–1011
Article MATH Google Scholar
Lughofer E (2011) All-pairs evolving fuzzy classifiers for on-line multi-class classification problems. In: Proceedings of the EUSFLAT 2011 conference. Elsevier, Aix-Les-Bains, pp 372–379
Lughofer E, Bouchot JL, Shaker A (2011) On-line elimination of local redundancies in evolving fuzzy systems. Evol Syst 2(3):165–187
Article Google Scholar
Nelles O (2001) Nonlinear system identification. Springer, Berlin
MATH Google Scholar
Orlandic R, Lai Y, Yee W (2005) Clustering high-dimensional data using an efficient and effective data space reduction. In: Proceedings of ACM conference on information and knowledge management CIKM05, pp 201–208
Qin S, Li W, Yue H (2000) Recursive PCA for adaptive process monitoring. J Process Control 10(5):471–486
Article Google Scholar
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Article MATH Google Scholar
Song M, Wang H (2005) Highly efficient incremental estimation of gaussian mixture models for online data stream clustering. In: Priddy KL (ed) Intelligent computing: theory and applications III. In: Proceedings of the SPIE, vol 5803, pp 174–183
Sun H, Wang S, Jiang Q (2004) FCM-based model selection algorithm for determining the number of clusters. Pattern Recogn 37(10):2027–2037
Article MATH Google Scholar
Tabata K, Kudo MSM (2010) Data compression by volume prototypes for streaming data. Pattern Recogn 43(9):3162–3176
Article MATH Google Scholar
Vachkov G (2010) Similarity analysis and knowledge acquisition by use of evolving neural models and fuzzy decision. In: Angelov P, Filev D, Kasabov N(eds) Evolving intelligent systems: methodology and applications. Wiley, Hoboken, pp 247–272
Varmuza K, Filzmoser P (2009) Introduction to multivariate statistical analysis in chemometrics. CRC Press, Boca Raton
Book Google Scholar
Vitter JS (1985) Random sampling with a reservoir. ACM Trans Math Softw 11:37–57
Article MathSciNet MATH Google Scholar
Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13(48):841–847
Article Google Scholar
Zhao Y, Karypis G (2004) Empirical and theoretical comparisons of selected criterion functions for document clustering. Mach Learn 55(3):311–331
Article MATH Google Scholar

Download references

Acknowledgments

This work was funded by the Austrian fund for promoting scientific research (FWF, contract number I328-N23, acronym IREFS). This publication reflects only the authors’ views. The author also acknowledges Eyke Hüllermeier for providing valuable comments on the paper.

Author information

Authors and Affiliations

Department of Knowledge-based Mathematical Systems, Johannes Kepler University of Linz, Linz, Austria
Edwin Lughofer

Authors

Edwin Lughofer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Edwin Lughofer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lughofer, E. A dynamic split-and-merge approach for evolving cluster models. Evolving Systems 3, 135–151 (2012). https://doi.org/10.1007/s12530-012-9046-5

Download citation

Received: 20 September 2011
Accepted: 12 January 2012
Published: 05 February 2012
Issue Date: September 2012
DOI: https://doi.org/10.1007/s12530-012-9046-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A dynamic split-and-merge approach for evolving cluster models

Abstract

Access this article

Similar content being viewed by others

EvolveCluster: an evolutionary clustering algorithm for streaming data

StreamXM: An Adaptive Partitional Clustering Solution for Evolving Data Streams

An Adaptive Strategy for Dynamic Data Clustering with the K-Means Algorithm

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A dynamic split-and-merge approach for evolving cluster models

Abstract

Access this article

Similar content being viewed by others

EvolveCluster: an evolutionary clustering algorithm for streaming data

StreamXM: An Adaptive Partitional Clustering Solution for Evolving Data Streams

An Adaptive Strategy for Dynamic Data Clustering with the K-Means Algorithm

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation