Adaptive evolutionary clustering

Xu, Kevin S.; Kliger, Mark; Hero III, Alfred O.

doi:10.1007/s10618-012-0302-x

Adaptive evolutionary clustering

Published: 10 January 2013

Volume 28, pages 304–336, (2014)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Kevin S. Xu¹,
Mark Kliger² &
Alfred O. Hero III¹

3094 Accesses
85 Citations
1 Altmetric
Explore all metrics

Abstract

In many practical applications of clustering, the objects to be clustered evolve over time, and a clustering result is desired at each time step. In such applications, evolutionary clustering typically outperforms traditional static clustering by producing clustering results that reflect long-term trends while being robust to short-term variations. Several evolutionary clustering algorithms have recently been proposed, often by adding a temporal smoothness penalty to the cost function of a static clustering method. In this paper, we introduce a different approach to evolutionary clustering by accurately tracking the time-varying proximities between objects followed by static clustering. We present an evolutionary clustering framework that adaptively estimates the optimal smoothing parameter using shrinkage estimation, a statistical approach that improves a naïve estimate using additional information. The proposed framework can be used to extend a variety of static clustering algorithms, including hierarchical, k-means, and spectral clustering, into evolutionary clustering algorithms. Experiments on synthetic and real data sets indicate that the proposed framework outperforms static clustering and existing evolutionary clustering algorithms in many scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Particle swarm optimization algorithm: an overview

Article 17 January 2017

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Density-Based Clustering Based on Hierarchical Density Estimates

Notes

The term “evolutionary clustering” has also been used to refer to clustering algorithms motivated by biological evolution, which are unrelated to the methods discussed in this paper.
It is also sometimes used to refer to the simple approach of performing static clustering at each time step.

References

Ahmed A, Xing EP (2008) Dynamic non-parametric mixture models and the recurrent chinese restaurant process: with applications to evolutionary clustering. Proceedings of the SIAM international conference on data mining, Atlanta
Anderson TW (2003) An introduction to multivariate statistical analysis, 3rd edn. Wiley, Hoboken
MATH Google Scholar
Bródka P, Saganowski S, Kazienko P (2012) GED: the method for group evolution discovery in social networks. Soc Netw Anal Min. doi: 10.1007/s13278-012-0058-8
Carmi A, Septier F, Godsill SJ (2009) The Gaussian mixture MCMC particle algorithm for dynamic cluster tracking. Proceedings of the 12th international conference on information fusion, Seattle
Chakrabarti D, Kumar R, Tomkins A (2006) Evolutionary clustering. Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, Philadelphia
Charikar M, Chekuri C, Feder T, Motwani R (2004) Incremental clustering and dynamic information retrieval. SIAM J Comput 33(6):1417–1440
Article MATH MathSciNet Google Scholar
Chen Y, Wiesel A, Eldar YC (2010) Shrinkage algorithms for MMSE covariance estimation. IEEE Trans Signal Process 58(10):5016–5029
Article MathSciNet Google Scholar
Chi Y, Song X, Zhou D, Hino K, Tseng BL (2009) On evolutionary spectral clustering. ACM Trans Knowl Discov Data 3(4):17
Article Google Scholar
Chung FRK (1997) Spectral graph theory. American Mathematical Society, Providence
Eagle N, Pentland A, Lazer D (2009) Inferring friendship network structure by using mobile phone data. Proc Nat Acad Sci 106(36):15274–15278
Article Google Scholar
Falkowski T, Bartelheimer J, Spiliopoulou M (2006) Mining and visualizing the evolution of subgroups in social networks. Proceedings of the IEEE/WIC/ACM international conference on web intelligence, Hong Kong
Fenn DJ, Porter MA, McDonald M, Williams S, Johnson NF, Jones NS (2009) Dynamic communities in multichannel data: an application to the foreign exchange market during the 2007–2008 credit crisis. Chaos 19(033):119
Google Scholar
Gavrilov M, Anguelov D, Indyk P, Motwani R (2000) Mining the stock market: Which measure is best? Proceedings of 6th ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, pp 487–496
Greene D, Doyle D, Cunningham P (2010) Tracking the evolution of communities in dynamic social networks. Proceedings of international conference on advanced social network analysis and mining, pp 176–183
Gretton A, Borgwardt KM, Rasch M, Schölkopf B, Smola AJ (2007) A kernel approach to comparing distributions. Proceedings of the 22nd AAAI conference on artificial intelligence
Gupta C, Grossman R (2004) GenIc: a single pass generalized incremental algorithm for clustering. Proceedings SIAM conference on data mining, Lake Buena Vista
Harvey AC (1989) Forecasting, structural time series models and the Kalman filter. Cambridge University Press, Cambridge
Google Scholar
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning: data mining, inference, and prediction. Springer, New York
Book Google Scholar
Haykin S (2001) Kalman filtering and neural networks. Wiley-Interscience, New York
Book Google Scholar
Hossain MS, Tadepalli S, Watson LT, Davidson I, Helm RF, Ramakrishnan N (2010) Unifying dependent clustering and disparate clustering for non-homogeneous data. Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 593–602
Infochimps-WWW (2012) NASDAQ Exchange Daily 1970–2010 Open, Close, High, Low and Volume data set. http://www.infochimps.com/datasets/nasdaq-exchange-daily-1970-2010-open-close-high-low-and-volume
Ji X, Xu W (2006) Document clustering with prior knowledge. Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, New York, pp 405–412
Kuhn HW (1955) The Hungarian method for the assignment problem. Nav Res Logist Quart 2(1–2):83–97
Article Google Scholar
Ledoit O, Wolf M (2003) Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. J Empir Financ 10(5):603–621
Article Google Scholar
Li Y, Han J, Yang J (2004) Clustering moving objects. Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining
Lin YR, Chi Y, Zhu S, Sundaram H, Tseng BL (2009) Analyzing communities and their evolutions in dynamic social networks. ACM Trans Knowl Discov Data 3(2):8
Article Google Scholar
Lütkepohl H (1997) Handbook of matrices. Wiley, New York
Google Scholar
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. Proceedings of the 5th Berkeley symposium on mathematical statistics and probability
Mankad S, Michailidis G, Kirilenko A (2011) Smooth plaid models: a dynamic clustering algorithm with application to electronic financial markets. Tech Rep. http://ssrn.com/abstract=1787577
Milligan GW, Cooper MC (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2):159–179
Article Google Scholar
MIT-WWW (2005) MIT academic calendar 2004–2005. http://web.mit.edu/registrar/www/calendar0405.html
Mucha PJ, Richardson T, Macon K, Porter MA, Onnela JP (2010) Community structure in time-dependent, multiscale, and multiplex networks. Science 328(5980):876–878
Article MATH MathSciNet Google Scholar
NASDAQ-WWW (2012) NASDAQ Companies. http://www.nasdaq.com/screening/companies-by-industry.aspx?exchange=NASDAQ
Newman MEJ (2006) Modularity and community structure in networks. Proc Nat Acad Sci 103(23): 8577–8582
Google Scholar
Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 14:849–856
Google Scholar
Ning H, Xu W, Chi Y, Gong Y, Huang TS (2010) Incremental spectral clustering by efficiently updating the eigen-system. Pattern Recog 43(1):113–127
Article MATH Google Scholar
Parker C (2007) Boids pseudocode. http://www.vergenet.net/conrad/boids/pseudocode.html
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336): 846–850
Google Scholar
Reynolds CW (1987) Flocks, herds, and schools: A distributed behavioral model. Proceedings of 14th annual conference on computer graphics and interactive techniques, Anaheim
Rosswog J, Ghose K (2008) Detecting and tracking spatio-temporal clusters with adaptive history filtering. Proceedings of the 8th IEEE international conference on data mining workshops, Pisa
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Computat Appl Math 20:53–65
Google Scholar
Schäfer J, Strimmer K (2005) A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat Appl Genet Mol Biol 4(1):32
MathSciNet Google Scholar
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Article Google Scholar
Sun J, Papadimitriou S, Yu PS, Faloutsos C (2007) Graphscope: Parameter-free mining of large time-evolving graphs. Proceedings of 13th ACM SIGKDD conference on knowledge discovery and data mining
Tadepalli S, Ramakrishnan N, Watson LT, Mishra B, Helm RF (2009) Gene expression time courses by analyzing cluster dynamics. J Bioinforma Comput Biol 7(2):339–356
Article Google Scholar
Tang L, Liu H, Zhang J, Nazeri Z (2008) Community evolution in dynamic multi-mode networks. Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining
Tantipathananandh C, Berger-Wolf T, Kempe D (2007) A framework for community identification in dynamic social networks. Proceedings of 13th ACM SIGKDD international conference on knowledge discovery and data mining
von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
Article MathSciNet Google Scholar
Wagstaff K, Cardie C, Rogers S, Schroedl S (2001) Constrained K-means clustering with background knowledge. Proceedings of the 18th international conference on machine learning, pp 577–584
Wang X, Davidson I (2010) Flexible constrained spectral clustering. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 563–572
Wang Y, Liu SX, Feng J, Zhou L (2007) Mining naturally smooth evolution of clusters from dynamic data. Proceedings of SIAM conference on data mining
Xu KS, Kliger M, Hero AO III (2010) Evolutionary spectral clustering with adaptive forgetting factor. Proceeding of IEEE international conference on acoustics, speech, and signal processing
Xu T, Zhang Z, Yu PS, Long B (2008a) Dirichlet process based evolutionary clustering. Proceedings of the 8th IEEE international conference on data mining
Xu T, Zhang Z, Yu PS, Long B (2008b) Evolutionary clustering by hierarchical Dirichlet process with hidden Markov state. Proceedings of the 8th IEEE international conference on data mining
Yahoo-WWW (2012) IXIC Historical Prices|NASDAQ composite stock—Yahoo! Finance. http://finance.yahoo.com/q/hp?s=IXIC+Historical+Prices
Yang T, Chi Y, Zhu S, Gong Y, Jin R (2011) Detecting communities and their evolutions in dynamic social networks—a Bayesian approach. Mach Learn 82(2):157–189
Article MATH MathSciNet Google Scholar
Zhang J, Song Y, Chen G, Zhang C (2009) On-line evolutionary exponential family mixture. Proceedings of the 21st international joint conference on artificial intelligence, Pasadena
Zhang J, Song Y, Zhang C, Liu S (2010) Evolutionary hierarchical Dirichlet processes for multiple correlated time-varying corpora. Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining

Download references

Acknowledgments

We would like to thank the anonymous reviewers for their suggestions to improve this article. Kevin Xu was partially supported by an award from the Natural Sciences and Engineering Research Council of Canada. This study was partially supported by the National Science Foundation Grant CCF 0830490 and the US Army Research Office Grant No. W911NF-09-1-0310.

Author information

Authors and Affiliations

EECS Department, University of Michigan, 1301 Beal Avenue, Ann Arbor, MI, 48109-2122, USA
Kevin S. Xu & Alfred O. Hero III
Omek Interactive, Bet Shemesh, Israel
Mark Kliger

Authors

Kevin S. Xu
View author publications
You can also search for this author in PubMed Google Scholar
Mark Kliger
View author publications
You can also search for this author in PubMed Google Scholar
Alfred O. Hero III
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kevin S. Xu.

Additional information

Responsible editor: Ian Davidson.

Appendix

1.1 True similarity matrix for dynamic Gaussian mixture model

We derive the true similarity matrix $\varPsi $ and the matrix of variances of similarities ${\text{ var}}(W)$, where the similarity is taken to be the dot product, for data sampled from the dynamic Gaussian mixture model described in Sect. 3.3. These matrices are required in order to calculate the oracle forgetting factor for the experiments in Sects. 5.1 and 5.2. We drop the superscript $t$ to simplify the notation.

Consider two arbitrary objects $\mathbf{x}_i \sim N(\varvec{\mu }_c,\Sigma _c)$ and $\mathbf{x}_j \sim N(\varvec{\mu }_d, \Sigma _d)$ where the entries of $\varvec{\mu }_c$ and $\Sigma _c$ are denoted by $\mu _{ck}$ and $\sigma _{ckl}$, respectively. For any distinct $i,j$ the mean is

$$\begin{aligned} {\text{ E}}\left[\mathbf{x}_i \mathbf{x}_j^T\right] = \sum _{k=1}^p {\text{ E}}\left[x_{ik}x_{jk}\right] = \sum _{k=1}^p \mu _{ck} \mu _{dk}, \end{aligned}$$

and the variance is

$$\begin{aligned} {\text{ var}}\left(\mathbf{x}_i \mathbf{x}_j^T\right)&= {\text{ E}}\left[\left(\mathbf{x}_i \mathbf{x}_j^T\right)^2\right] - {\text{ E}}\left[\mathbf{x}_i \mathbf{x}_j^T\right]^2 \\&= \sum _{k=1}^p \sum _{l=1}^p \left\{ {\text{ E}}\left[x_{ik}x_{jk}x_{il}x_{jl} \right] - \mu _{ck}\mu _{dk}\mu _{cl}\mu _{dl}\right\} \\&= \sum _{k=1}^p \sum _{l=1}^p \left\{ \left(\sigma _{ckl} + \mu _{ck}\mu _{cl}\right) \left(\sigma _{dkl} + \mu _{dk}\mu _{dl}\right) - \mu _{ck}\mu _{dk}\mu _{cl}\mu _{dl}\right\} \\&= \sum _{k=1}^p \sum _{l=1}^p \left\{ \sigma _{ckl}\sigma _{dkl} + \sigma _{ckl}\mu _{dk}\mu _{dl} + \sigma _{dkl}\mu _{ck} \mu _{cl}\right\} \end{aligned}$$

by independence of $\mathbf{x}_i$ and $\mathbf{x}_j$. This holds both for $\mathbf{x}_i,\mathbf{x}_j$ in the same cluster, i.e. $c=d$, and for $\mathbf{x}_i, \mathbf{x}_j$ in different clusters, i.e. $c \ne d$. Along the diagonal,

$$\begin{aligned} {\text{ E}}\left[\mathbf{x}_i \mathbf{x}_i^T\right] = \sum _{k=1}^p {\text{ E}}\left[x_{ik}^2\right] = \sum _{k=1}^p \left(\sigma _{ckk} + \mu _{ck}^2\right). \end{aligned}$$

The calculation for the variance is more involved. We first note that

$$\begin{aligned} {\text{ E}}\left[x_{ik}^2 x_{il}^2\right] = \mu _{ck}^2\mu _{cl}^2 + \mu _{ck}^2\sigma _{cll} + 4\mu _{ck}\mu _{cl}\sigma _{ckl} + \mu _{cl}^2\sigma _{ckk} + 2\sigma _{ckl}^2 + \sigma _{ckk}\sigma _{cll}, \end{aligned}$$

which can be derived from the characteristic function of the multivariate Gaussian distribution (Anderson 2003). Thus

$$\begin{aligned} {\text{ var}}\left(\mathbf{x}_i \mathbf{x}_i^T\right)&= \sum _{k=1}^p \sum _{l=1}^p \left\{ {\text{ E}}\left[x_{ik}^2 x_{il}^2\right] - \left(\sigma _{ckk} + \mu _{ck}^2\right) \left(\sigma _{cll} + \mu _{cl}^2\right)\right\} \\&= \sum _{k=1}^p \sum _{l=1}^p \left\{ 4\mu _{ck}\mu _{cl}\sigma _{ckl} + 2\sigma _{ckl}^2\right\} . \end{aligned}$$

The calculated means and variances are then substituted into (13) to compute the oracle forgetting factor. Since the expressions for the means and variances depend only on the clusters and not any objects in particular, it is confirmed that both $\varPsi $ and ${\text{ var}}(W)$ do indeed possess the assumed block structure discussed in Sect. 3.3.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, K.S., Kliger, M. & Hero III, A.O. Adaptive evolutionary clustering. Data Min Knowl Disc 28, 304–336 (2014). https://doi.org/10.1007/s10618-012-0302-x

Download citation

Received: 10 April 2011
Accepted: 21 December 2012
Published: 10 January 2013
Issue Date: March 2014
DOI: https://doi.org/10.1007/s10618-012-0302-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive evolutionary clustering

Abstract

Access this article

Similar content being viewed by others

Particle swarm optimization algorithm: an overview

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

1.1 True similarity matrix for dynamic Gaussian mixture model

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Adaptive evolutionary clustering

Abstract

Access this article

Similar content being viewed by others

Particle swarm optimization algorithm: an overview

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

1.1 True similarity matrix for dynamic Gaussian mixture model

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation