Improved Gath–Geva clustering for fuzzy segmentation of hydrometeorological time series

Wang, Nini; Liu, Xiaodong; Yin, Jianchuan

doi:10.1007/s00477-011-0542-0

Improved Gath–Geva clustering for fuzzy segmentation of hydrometeorological time series

Original Paper
Published: 26 November 2011

Volume 26, pages 139–155, (2012)
Cite this article

Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Nini Wang^1,2,
Xiaodong Liu^1,2 &
Jianchuan Yin^3,4

613 Accesses
22 Citations
Explore all metrics

Abstract

In this paper, an improved Gath–Geva clustering algorithm is proposed for automatic fuzzy segmentation of univariate and multivariate hydrometeorological time series. The algorithm considers time series segmentation problem as Gath–Geva clustering with the minimum message length criterion as segmentation order selection criterion. One characteristic of the improved Gath–Geva clustering algorithm is its unsupervised nature which can automatically determine the optimal segmentation order. Another characteristic is the application of the modified component-wise expectation maximization algorithm in Gath–Geva clustering which can avoid the drawbacks of the classical expectation maximization algorithm: the sensitivity to initialization and the need to avoid the boundary of the parameter space. The other characteristic is the improvement of numerical stability by integrating segmentation order selection into model parameter estimation procedure. The proposed algorithm has been experimentally tested on artificial and hydrometeorological time series. The obtained experimental results show the effectiveness of our proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive G–G clustering for fuzzy segmentation of multivariate time series

Article 02 June 2020

Dynamic programming-based optimization for segmentation and clustering of hydrometeorological time series

Article 28 December 2015

Applying a Hybrid Algorithm to the Segmentation of the Spanish Stock Market Index Time Series

References

Abonyi J, Feil B, Nemeth S, Arva P (2003) Fuzzy clustering based segmentation of time-series. In: Lecture notes in computer science, pp 275–286
Abonyi J, Feil B, Nemeth S, Arva P (2005) Modified Gath–Geva clustering for fuzzy segmentation of multivariate time-series. Fuzzy Sets Syst 149:39–56
Article Google Scholar
Aksoy H, Unal NE, Gedikli A (2007) Letter to the editor. Stoch Environ Res Risk Assess 21:447–449
Article Google Scholar
Aksoy H, Gedikli A, Unal NE, Kehagias A (2008) Fast segmentation algorithms for long hydrometeorological time series. Hydrol Process 22:4600–4608
Article Google Scholar
Aksoy H, Unal NE, Pektas AO (2008) Smoothed minima baseflow separation tool for perennial and intermittent streams. Hydrol Process 22:4467–4476
Article Google Scholar
Athanasiadis EI, Cavouras DA, Spyridonos PP, Glotsos DT, Kalatzis IK, Nikiforidis GC (2009) Complementary DNA microarray image processing based on the fuzzy gaussian mixture model. IEEE Trans Inf Technol Biomed 13(4):419–425
Article Google Scholar
Beeferman D, Berger A, Lafferty J (1999) Statistical models for text segmentation. Mach Learn 34:177–210
Article Google Scholar
Bezdek JC, Dunn JC (1975) Optimal fuzzy partitions: a heuristic for estimating the parameters in a mixture of normal distributions. IEEE Trans Comput 835–838
Celeux G, Chretien S, Forbes F, Mkhadri A (1999) A component-wise EM algorithm for mixtures. Technical report 3746, INRIA, France
Chatzis S, Varvarigou T (2008) Robust fuzzy clustering using mixtures of student’s-t distributions. Pattern Recognit Lett 29:1901–1905
Article Google Scholar
Figueiredo M, Jain AK (2002) Unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 24(3):381–396
Article Google Scholar
Fisch D, Gruber T, Sick B (2011) Swiftrule: Mining comprehensible classification rules for time series analysis. IEEE Trans Knowl Data Eng 23(5):774–787
Article Google Scholar
Fu Z, Robles-Kelly A, Zhou J (2010) Mixing linear SVMs for nonlinear classification. IEEE Trans Neural Netw 21:1963–1975
Article Google Scholar
Fuchs E, Gruber T, Nitschke J, Sick B (2009) On-line motif detection in time series with swiftmotif. Pattern Recognit 42:3015–3031
Article Google Scholar
Fuchs E, Gruber T, Nitschke J, Sick B (2010) Online segmentation of time series based on polynomial least-squares approximations. IEEE Trans Pattern Anal Mach Intell 32(12):2232–2245
Article Google Scholar
Gath I, Geva AB (1989) Unsupervised optimal fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 7:773–780
Article Google Scholar
Gedikli A, Aksoy H, Unal NE (2008) Segmentation algorithm for long time series analysis. Stoch Environ Res Risk Assess 22(3):291–302
Article Google Scholar
Gedikli A, Aksoy H, Unal NE, Kehagias A (2010) Modified dynamic programming approach for offline segmentation of long hydrometeorological time series. Stoch Environ Res Risk Assess 24:547–557
Article Google Scholar
Hanlon B, Forbes C (2002) Model selection criteria for segmented time series from a bayesian approach to information compression. Working paper, Department of Econometrics and Statistics, Monash University, Melbourne, Australia
Hubert P (2000) The segmentation procedure as a tool for discrete modeling of hydrometeorological regimes. Stoch Environ Res Risk Assess 14:297–304
Article Google Scholar
Kehagias A (2004) A hidden markov model segmentation procedure for hydrological and environmental time series. Stoch Environ Res Risk Assess 18:117–130
Article Google Scholar
Kehagias A, Fortin V (2006) Time series segmentation with shifting means hidden markov models. Nonlinear Process Geophys 13:339–352
Article Google Scholar
Kehagias A, Nidelkou E, Petridis V (2005) A dynamic programming segmentation procedure for hydrological and environmental time series. Stoch Environ Res Risk Assess 20:77–94
Article Google Scholar
Kehagias A, Petridis V, Nidelkou E (2007) Reply by the authors to the letter by Aksoy et al. Stoch Environ Res Risk Assess 21:451–455
Article Google Scholar
Keogh E, Kasetty S (2003) On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Mining Knowl Discov 7(4):349–371
Article Google Scholar
Lanterman AD (2001) Schwarz, Wallace, and Rissanen intertwining themes in theories of model order estimation. Int Stat Rev 69(2):185–212
Article Google Scholar
Liu X, Lin Z, Wang H (2008) Novel online methods for time series segmentation. IEEE Trans Knowl Data Eng 20:1616–1626
Article Google Scholar
Nascimento JC, Figueiredo M, Marques JS (2010) Trajectory classification using switched dynamical hidden Markov models. IEEE Trans Image Process 19(5):1338–1348
Article Google Scholar
Povinelli R, Johnson M, Lindgren A, Ye J (2004) Time series classification using Gaussian mixture models of reconstructed phase spaces. IEEE Trans Knowl Data Eng 16(6):779–783
Article Google Scholar
Seghouane A, Amari S (2007) The AIC criterion and symmetrizing the Kullback-Leibler divergence. IEEE Trans Neural Netw pp 97–106
Vernieuwe H, De Baets B, Verhoest NEC (2006) Comparison of clustering algorithms in the identification of Takagi-Sugeno models: A hydrological case study. Fuzzy Sets Syst 157:2876–2896
Article Google Scholar
Warren Liao T (2005) Clustering of time series data-a survey. Pattern Recognit 38:1857–1874
Article Google Scholar

Download references

Acknowledgements

The authors sincerely thank Professor Victor Leiva (Associate editor), Professor George Christakos (Editor), and the anonymous referees for their kind advice and comments. Their suggestions have led to a major improvement of the paper. This work is supported by the National Natural Science Foundation of China under Grants (No. 61175041) and the Fundamental Research Funds for the Central Universities (No. 2011QN147).

Author information

Authors and Affiliations

Research Center of Information and Control, Dalian University of Technology, Dalian, 116024, China
Nini Wang & Xiaodong Liu
Department of Mathematics, Dalian Maritime University, Dalian, 116026, China
Nini Wang & Xiaodong Liu
Navigation College, Dalian Maritime University, Dalian, 116026, China
Jianchuan Yin
School of Naval Architecture, Ocean and Civil Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
Jianchuan Yin

Authors

Nini Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jianchuan Yin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nini Wang.

Appendices

Appendix 1: Gath–Geva clustering algorithm

Inputs: data set ${\cal{X}}=\{{\bf x}_{k}| 1\leq k\leq n\}, $ number of clusters 1 < c < n, weighting exponent m > 1, termination tolerance $\varepsilon>0, $ initial parameters. $\widehat{\varvec\theta}(0)=\{\widehat{\varvec\theta}_{1},\ldots,\widehat{\varvec\theta}_{c},\widehat{P}_{1},\ldots,\widehat{P}_{c}\}$

Output: optimal parameters $\widehat{\varvec\theta}_{opt}. $

Initialize: partition matrix U(0) = [μ ⁽⁰⁾_i,k ]_c × n such that Eq. 3 holds.

Repeat for $l=1,2,\ldots$

Calculate parameters $\widehat{\varvec\theta}(l). $

$$ \begin{aligned} {\mathbf v}_{i}(l)&=\frac{\sum_{k=1}^{n}\left(\mu_{i,k}^{(l-1)}\right)^{m}{\mathbf x}_{k}}{\sum_{k=1}^{n}\left(\mu_{i,k}^{(l-1)}\right)^{m}},\\ {\mathbf F}_{i}(l)&=\frac{\sum_{k=1}^{n}\left(\mu_{i,k}^{(l-1)}\right)^{m} \left({\mathbf x}_{k}-{\mathbf v}_{i}(l)\right)\left({\mathbf x}_{k}-{\mathbf v}_{i}(l)\right)^{T}}{\sum_{k=1}^{n}\left(\mu_{i,k}^{(l-1)}\right)^{m}},\\ P_{i}(l)&=\frac{1}{n}\sum_{k=1}^{n}\mu_{i,k}^{(l-1)}, \quad 1\leq i \leq c. \end{aligned} $$

(47)

Compute the distance measurement D(x _k,v _i)².

$$ \begin{aligned} D({\mathbf x}_{k},{\mathbf v}_{i})^{2} &=\frac{1}{P_{i}(l)G\left({\mathbf x}_{k}; {\mathbf v}_{i}(l), {\mathbf F}_{i}(l)\right)} =\frac{(2\pi)^{q/2}\sqrt{det({\mathbf F}_{i}(l))}}{P_{i}(l)}\\ &\cdot\exp\left(\frac{1}{2}({\mathbf x}_{k}-{\mathbf v}_{i}(l))^{T}({\mathbf F}_{i}(l))^{-1}({\mathbf x}_{k}-{\mathbf v}_{i}(l))\right). \end{aligned} $$

(48)

Update the partition matrix ${\bf U}(l)=\big[\mu_{i,k}^{(l)}\big]_{c\times n}. $

$$ \mu_{i,k}^{(l)}=\frac{1}{\sum_{j=1}^{c}(D({\mathbf x}_{k},{\mathbf v}_{i})/D({\mathbf x}_{k},{\mathbf v}_{j}))^{2/(m-1)}},\quad 1 \leq i\leq c, 1 \leq k\leq n. $$

(49)

until $\|{\bf U}(l)-{\bf U}(l-1)\|< \varepsilon. $

Appendix 2: Bottom-up segmentation method

Create initial fine approximation by segment boundaries $0=t_{n_{0}}<t_{n_{1}}<\cdots<t_{n_{c}}=t_{n}. $

Find the cost of merging for each pair of segments: $ {\it {mergecost}}(i) = cost(t_{{n}_{i}}+1, t_{{n}_{i+2}}) $

while min(mergecost) < maxerror

Find the cheapest pair to merge: $i = \arg\min_{i} (mergecost(i)).$

Merge the two segments, update the $ (t_{{n}_{i}}, t_{{n}_{i+1}}) $ boundary indices, and recalculate the merge costs.

$ {\it {mergecost}}(i) = cost(t_{{n}_{i}}+1, t_{{n}_{i+2}}) $,

$ {\it {mergecost}}(i-1) = cost(t_{{n}_{i-1}}+1, t_{{n}_{i+1}}) $.

end

Let covariance matrix F ^x_i decompose to the matrix $\varvec\Uplambda_{i}$ that includes the eigenvalues of F ^x_i in its diagonal in decreasing order, and to the matrix U _i that includes the eigenvectors corresponding to the eigenvalues in its columns, i.e., ${\bf F}_{i}^{x}={\bf U}_{i}\varvec\Uplambda_{i}{\bf U}_{i}^{T}. $ The segmentation cost can be equal to the reconstruction error of this segment

$$ cost(t_{n_{i}}+1, t_{n_{i+1}})=\frac{1}{t_{n_{i+1}}-t_{n_{i}}+1}\sum_{k=t_{n_{i}}+1}^{t_{n_{i+1}}}Q_{i,k}, $$

where Q _i,k = x ^T_k (I − U _i,p U ^T_i,p )x _k, and U _i,p is the eigenvectors corresponding to the first few p nonzero eigenvalues. The segmentation cost can also be equal to the Hotelling T ² measure of this segment

$$ cost(t_{n_{i}}+1, t_{n_{i}+1})=\frac{1}{t_{n_{i+1}}-t_{n_{i}}+1}\sum_{i=t_{n_{i}}+1}^{t_{n_{i+1}}}T_{i,k}^{2}, $$

where $T_{i,k}^{2}={\bf y}_{i,k}^{T}{\bf y}_{i,k}, {\bf y}_{i,k}=\varvec\Uplambda_{i,p}^{-\frac{1}{2}}{\bf U}_{i,p}^{T}{\bf x}_{k}. $ The interested reader can find more details about the bottom-up method in Abonyi et al. (2005).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, N., Liu, X. & Yin, J. Improved Gath–Geva clustering for fuzzy segmentation of hydrometeorological time series. Stoch Environ Res Risk Assess 26, 139–155 (2012). https://doi.org/10.1007/s00477-011-0542-0

Download citation

Published: 26 November 2011
Issue Date: January 2012
DOI: https://doi.org/10.1007/s00477-011-0542-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved Gath–Geva clustering for fuzzy segmentation of hydrometeorological time series

Abstract

Access this article

Similar content being viewed by others

Adaptive G–G clustering for fuzzy segmentation of multivariate time series

Dynamic programming-based optimization for segmentation and clustering of hydrometeorological time series

Applying a Hybrid Algorithm to the Segmentation of the Spanish Stock Market Index Time Series

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Gath–Geva clustering algorithm

Appendix 2: Bottom-up segmentation method

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation