Adaptive evolutionary clustering
 Kevin S. Xu,
 Mark Kliger,
 Alfred O. Hero III
 … show all 3 hide
Rent the article at a discount
Rent now* Final gross prices may vary according to local VAT.
Get AccessAbstract
In many practical applications of clustering, the objects to be clustered evolve over time, and a clustering result is desired at each time step. In such applications, evolutionary clustering typically outperforms traditional static clustering by producing clustering results that reflect longterm trends while being robust to shortterm variations. Several evolutionary clustering algorithms have recently been proposed, often by adding a temporal smoothness penalty to the cost function of a static clustering method. In this paper, we introduce a different approach to evolutionary clustering by accurately tracking the timevarying proximities between objects followed by static clustering. We present an evolutionary clustering framework that adaptively estimates the optimal smoothing parameter using shrinkage estimation, a statistical approach that improves a naïve estimate using additional information. The proposed framework can be used to extend a variety of static clustering algorithms, including hierarchical, kmeans, and spectral clustering, into evolutionary clustering algorithms. Experiments on synthetic and real data sets indicate that the proposed framework outperforms static clustering and existing evolutionary clustering algorithms in many scenarios.
 Ahmed A, Xing EP (2008) Dynamic nonparametric mixture models and the recurrent chinese restaurant process: with applications to evolutionary clustering. Proceedings of the SIAM international conference on data mining, Atlanta
 Anderson, TW (2003) An introduction to multivariate statistical analysis. Wiley, Hoboken
 Bródka P, Saganowski S, Kazienko P (2012) GED: the method for group evolution discovery in social networks. Soc Netw Anal Min. doi: 10.1007/s1327801200588
 Carmi A, Septier F, Godsill SJ (2009) The Gaussian mixture MCMC particle algorithm for dynamic cluster tracking. Proceedings of the 12th international conference on information fusion, Seattle
 Chakrabarti D, Kumar R, Tomkins A (2006) Evolutionary clustering. Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, Philadelphia
 Charikar, M, Chekuri, C, Feder, T, Motwani, R (2004) Incremental clustering and dynamic information retrieval. SIAM J Comput 33: pp. 14171440 CrossRef
 Chen, Y, Wiesel, A, Eldar, YC (2010) Shrinkage algorithms for MMSE covariance estimation. IEEE Trans Signal Process 58: pp. 50165029 CrossRef
 Chi, Y, Song, X, Zhou, D, Hino, K, Tseng, BL (2009) On evolutionary spectral clustering. ACM Trans Knowl Discov Data 3: pp. 17 CrossRef
 Chung FRK (1997) Spectral graph theory. American Mathematical Society, Providence
 Eagle, N, Pentland, A, Lazer, D (2009) Inferring friendship network structure by using mobile phone data. Proc Nat Acad Sci 106: pp. 1527415278 CrossRef
 Falkowski T, Bartelheimer J, Spiliopoulou M (2006) Mining and visualizing the evolution of subgroups in social networks. Proceedings of the IEEE/WIC/ACM international conference on web intelligence, Hong Kong
 Fenn, DJ, Porter, MA, McDonald, M, Williams, S, Johnson, NF, Jones, NS (2009) Dynamic communities in multichannel data: an application to the foreign exchange market during the 2007–2008 credit crisis. Chaos 19: pp. 119
 Gavrilov M, Anguelov D, Indyk P, Motwani R (2000) Mining the stock market: Which measure is best? Proceedings of 6th ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, pp 487–496
 Greene D, Doyle D, Cunningham P (2010) Tracking the evolution of communities in dynamic social networks. Proceedings of international conference on advanced social network analysis and mining, pp 176–183
 Gretton A, Borgwardt KM, Rasch M, Schölkopf B, Smola AJ (2007) A kernel approach to comparing distributions. Proceedings of the 22nd AAAI conference on artificial intelligence
 Gupta C, Grossman R (2004) GenIc: a single pass generalized incremental algorithm for clustering. Proceedings SIAM conference on data mining, Lake Buena Vista
 Harvey, AC (1989) Forecasting, structural time series models and the Kalman filter. Cambridge University Press, Cambridge
 Hastie, T, Tibshirani, R, Friedman, J (2001) The elements of statistical learning: data mining, inference, and prediction. Springer, New York CrossRef
 Haykin, S (2001) Kalman filtering and neural networks. WileyInterscience, New York CrossRef
 Hossain MS, Tadepalli S, Watson LT, Davidson I, Helm RF, Ramakrishnan N (2010) Unifying dependent clustering and disparate clustering for nonhomogeneous data. Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 593–602
 InfochimpsWWW (2012) NASDAQ Exchange Daily 1970–2010 Open, Close, High, Low and Volume data set. http://www.infochimps.com/datasets/nasdaqexchangedaily19702010openclosehighlowandvolume
 Ji X, Xu W (2006) Document clustering with prior knowledge. Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, New York, pp 405–412
 Kuhn, HW (1955) The Hungarian method for the assignment problem. Nav Res Logist Quart 2: pp. 8397 CrossRef
 Ledoit, O, Wolf, M (2003) Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. J Empir Financ 10: pp. 603621 CrossRef
 Li Y, Han J, Yang J (2004) Clustering moving objects. Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining
 Lin, YR, Chi, Y, Zhu, S, Sundaram, H, Tseng, BL (2009) Analyzing communities and their evolutions in dynamic social networks. ACM Trans Knowl Discov Data 3: pp. 8 CrossRef
 Lütkepohl, H (1997) Handbook of matrices. Wiley, New York
 MacQueen J (1967) Some methods for classification and analysis of multivariate observations. Proceedings of the 5th Berkeley symposium on mathematical statistics and probability
 Mankad S, Michailidis G, Kirilenko A (2011) Smooth plaid models: a dynamic clustering algorithm with application to electronic financial markets. Tech Rep. http://ssrn.com/abstract=1787577
 Milligan, GW, Cooper, MC (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika 50: pp. 159179 CrossRef
 MITWWW (2005) MIT academic calendar 2004–2005. http://web.mit.edu/registrar/www/calendar0405.html
 Mucha, PJ, Richardson, T, Macon, K, Porter, MA, Onnela, JP (2010) Community structure in timedependent, multiscale, and multiplex networks. Science 328: pp. 876878 CrossRef
 NASDAQWWW (2012) NASDAQ Companies. http://www.nasdaq.com/screening/companiesbyindustry.aspx?exchange=NASDAQ
 Newman MEJ (2006) Modularity and community structure in networks. Proc Nat Acad Sci 103(23): 8577–8582
 Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 14:849–856
 Ning, H, Xu, W, Chi, Y, Gong, Y, Huang, TS (2010) Incremental spectral clustering by efficiently updating the eigensystem. Pattern Recog 43: pp. 113127 CrossRef
 Parker C (2007) Boids pseudocode. http://www.vergenet.net/conrad/boids/pseudocode.html
 Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336): 846–850
 Reynolds CW (1987) Flocks, herds, and schools: A distributed behavioral model. Proceedings of 14th annual conference on computer graphics and interactive techniques, Anaheim
 Rosswog J, Ghose K (2008) Detecting and tracking spatiotemporal clusters with adaptive history filtering. Proceedings of the 8th IEEE international conference on data mining workshops, Pisa
 Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Computat Appl Math 20:53–65
 Schäfer, J, Strimmer, K (2005) A shrinkage approach to largescale covariance matrix estimation and implications for functional genomics. Stat Appl Genet Mol Biol 4: pp. 32
 Shi, J, Malik, J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22: pp. 888905 CrossRef
 Sun J, Papadimitriou S, Yu PS, Faloutsos C (2007) Graphscope: Parameterfree mining of large timeevolving graphs. Proceedings of 13th ACM SIGKDD conference on knowledge discovery and data mining
 Tadepalli, S, Ramakrishnan, N, Watson, LT, Mishra, B, Helm, RF (2009) Gene expression time courses by analyzing cluster dynamics. J Bioinforma Comput Biol 7: pp. 339356 CrossRef
 Tang L, Liu H, Zhang J, Nazeri Z (2008) Community evolution in dynamic multimode networks. Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining
 Tantipathananandh C, BergerWolf T, Kempe D (2007) A framework for community identification in dynamic social networks. Proceedings of 13th ACM SIGKDD international conference on knowledge discovery and data mining
 Luxburg, U (2007) A tutorial on spectral clustering. Stat Comput 17: pp. 395416 CrossRef
 Wagstaff K, Cardie C, Rogers S, Schroedl S (2001) Constrained Kmeans clustering with background knowledge. Proceedings of the 18th international conference on machine learning, pp 577–584
 Wang X, Davidson I (2010) Flexible constrained spectral clustering. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 563–572
 Wang Y, Liu SX, Feng J, Zhou L (2007) Mining naturally smooth evolution of clusters from dynamic data. Proceedings of SIAM conference on data mining
 Xu KS, Kliger M, Hero AO III (2010) Evolutionary spectral clustering with adaptive forgetting factor. Proceeding of IEEE international conference on acoustics, speech, and signal processing
 Xu T, Zhang Z, Yu PS, Long B (2008a) Dirichlet process based evolutionary clustering. Proceedings of the 8th IEEE international conference on data mining
 Xu T, Zhang Z, Yu PS, Long B (2008b) Evolutionary clustering by hierarchical Dirichlet process with hidden Markov state. Proceedings of the 8th IEEE international conference on data mining
 YahooWWW (2012) IXIC Historical PricesNASDAQ composite stock—Yahoo! Finance. http://finance.yahoo.com/q/hp?s=IXIC+Historical+Prices
 Yang, T, Chi, Y, Zhu, S, Gong, Y, Jin, R (2011) Detecting communities and their evolutions in dynamic social networks—a Bayesian approach. Mach Learn 82: pp. 157189 CrossRef
 Zhang J, Song Y, Chen G, Zhang C (2009) Online evolutionary exponential family mixture. Proceedings of the 21st international joint conference on artificial intelligence, Pasadena
 Zhang J, Song Y, Zhang C, Liu S (2010) Evolutionary hierarchical Dirichlet processes for multiple correlated timevarying corpora. Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining
 Title
 Adaptive evolutionary clustering
 Journal

Data Mining and Knowledge Discovery
Volume 28, Issue 2 , pp 304336
 Cover Date
 20140301
 DOI
 10.1007/s106180120302x
 Print ISSN
 13845810
 Online ISSN
 1573756X
 Publisher
 Springer US
 Additional Links
 Topics
 Keywords

 Evolutionary clustering
 Similarity measures
 Clustering algorithms
 Tracking
 Data smoothing
 Adaptive filtering
 Shrinkage estimation
 Industry Sectors
 Authors

 Kevin S. Xu ^{(1)}
 Mark Kliger ^{(2)}
 Alfred O. Hero III ^{(1)}
 Author Affiliations

 1. EECS Department, University of Michigan, 1301 Beal Avenue, Ann Arbor, MI, 481092122, USA
 2. Omek Interactive, Bet Shemesh, Israel