Data Mining and Knowledge Discovery

, Volume 28, Issue 2, pp 304–336 | Cite as

Adaptive evolutionary clustering

Article

Abstract

In many practical applications of clustering, the objects to be clustered evolve over time, and a clustering result is desired at each time step. In such applications, evolutionary clustering typically outperforms traditional static clustering by producing clustering results that reflect long-term trends while being robust to short-term variations. Several evolutionary clustering algorithms have recently been proposed, often by adding a temporal smoothness penalty to the cost function of a static clustering method. In this paper, we introduce a different approach to evolutionary clustering by accurately tracking the time-varying proximities between objects followed by static clustering. We present an evolutionary clustering framework that adaptively estimates the optimal smoothing parameter using shrinkage estimation, a statistical approach that improves a naïve estimate using additional information. The proposed framework can be used to extend a variety of static clustering algorithms, including hierarchical, k-means, and spectral clustering, into evolutionary clustering algorithms. Experiments on synthetic and real data sets indicate that the proposed framework outperforms static clustering and existing evolutionary clustering algorithms in many scenarios.

Keywords

Evolutionary clustering Similarity measures Clustering algorithms Tracking Data smoothing Adaptive filtering Shrinkage estimation 

References

  1. Ahmed A, Xing EP (2008) Dynamic non-parametric mixture models and the recurrent chinese restaurant process: with applications to evolutionary clustering. Proceedings of the SIAM international conference on data mining, AtlantaGoogle Scholar
  2. Anderson TW (2003) An introduction to multivariate statistical analysis, 3rd edn. Wiley, HobokenMATHGoogle Scholar
  3. Bródka P, Saganowski S, Kazienko P (2012) GED: the method for group evolution discovery in social networks. Soc Netw Anal Min. doi: 10.1007/s13278-012-0058-8
  4. Carmi A, Septier F, Godsill SJ (2009) The Gaussian mixture MCMC particle algorithm for dynamic cluster tracking. Proceedings of the 12th international conference on information fusion, SeattleGoogle Scholar
  5. Chakrabarti D, Kumar R, Tomkins A (2006) Evolutionary clustering. Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, PhiladelphiaGoogle Scholar
  6. Charikar M, Chekuri C, Feder T, Motwani R (2004) Incremental clustering and dynamic information retrieval. SIAM J Comput 33(6):1417–1440CrossRefMATHMathSciNetGoogle Scholar
  7. Chen Y, Wiesel A, Eldar YC (2010) Shrinkage algorithms for MMSE covariance estimation. IEEE Trans Signal Process 58(10):5016–5029CrossRefMathSciNetGoogle Scholar
  8. Chi Y, Song X, Zhou D, Hino K, Tseng BL (2009) On evolutionary spectral clustering. ACM Trans Knowl Discov Data 3(4):17CrossRefGoogle Scholar
  9. Chung FRK (1997) Spectral graph theory. American Mathematical Society, ProvidenceGoogle Scholar
  10. Eagle N, Pentland A, Lazer D (2009) Inferring friendship network structure by using mobile phone data. Proc Nat Acad Sci 106(36):15274–15278CrossRefGoogle Scholar
  11. Falkowski T, Bartelheimer J, Spiliopoulou M (2006) Mining and visualizing the evolution of subgroups in social networks. Proceedings of the IEEE/WIC/ACM international conference on web intelligence, Hong KongGoogle Scholar
  12. Fenn DJ, Porter MA, McDonald M, Williams S, Johnson NF, Jones NS (2009) Dynamic communities in multichannel data: an application to the foreign exchange market during the 2007–2008 credit crisis. Chaos 19(033):119Google Scholar
  13. Gavrilov M, Anguelov D, Indyk P, Motwani R (2000) Mining the stock market: Which measure is best? Proceedings of 6th ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, pp 487–496Google Scholar
  14. Greene D, Doyle D, Cunningham P (2010) Tracking the evolution of communities in dynamic social networks. Proceedings of international conference on advanced social network analysis and mining, pp 176–183Google Scholar
  15. Gretton A, Borgwardt KM, Rasch M, Schölkopf B, Smola AJ (2007) A kernel approach to comparing distributions. Proceedings of the 22nd AAAI conference on artificial intelligenceGoogle Scholar
  16. Gupta C, Grossman R (2004) GenIc: a single pass generalized incremental algorithm for clustering. Proceedings SIAM conference on data mining, Lake Buena VistaGoogle Scholar
  17. Harvey AC (1989) Forecasting, structural time series models and the Kalman filter. Cambridge University Press, CambridgeGoogle Scholar
  18. Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning: data mining, inference, and prediction. Springer, New YorkCrossRefGoogle Scholar
  19. Haykin S (2001) Kalman filtering and neural networks. Wiley-Interscience, New YorkCrossRefGoogle Scholar
  20. Hossain MS, Tadepalli S, Watson LT, Davidson I, Helm RF, Ramakrishnan N (2010) Unifying dependent clustering and disparate clustering for non-homogeneous data. Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 593–602Google Scholar
  21. Infochimps-WWW (2012) NASDAQ Exchange Daily 1970–2010 Open, Close, High, Low and Volume data set. http://www.infochimps.com/datasets/nasdaq-exchange-daily-1970-2010-open-close-high-low-and-volume
  22. Ji X, Xu W (2006) Document clustering with prior knowledge. Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, New York, pp 405–412Google Scholar
  23. Kuhn HW (1955) The Hungarian method for the assignment problem. Nav Res Logist Quart 2(1–2):83–97CrossRefGoogle Scholar
  24. Ledoit O, Wolf M (2003) Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. J Empir Financ 10(5):603–621CrossRefGoogle Scholar
  25. Li Y, Han J, Yang J (2004) Clustering moving objects. Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data miningGoogle Scholar
  26. Lin YR, Chi Y, Zhu S, Sundaram H, Tseng BL (2009) Analyzing communities and their evolutions in dynamic social networks. ACM Trans Knowl Discov Data 3(2):8CrossRefGoogle Scholar
  27. Lütkepohl H (1997) Handbook of matrices. Wiley, New YorkGoogle Scholar
  28. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. Proceedings of the 5th Berkeley symposium on mathematical statistics and probabilityGoogle Scholar
  29. Mankad S, Michailidis G, Kirilenko A (2011) Smooth plaid models: a dynamic clustering algorithm with application to electronic financial markets. Tech Rep. http://ssrn.com/abstract=1787577
  30. Milligan GW, Cooper MC (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2):159–179CrossRefGoogle Scholar
  31. MIT-WWW (2005) MIT academic calendar 2004–2005. http://web.mit.edu/registrar/www/calendar0405.html
  32. Mucha PJ, Richardson T, Macon K, Porter MA, Onnela JP (2010) Community structure in time-dependent, multiscale, and multiplex networks. Science 328(5980):876–878CrossRefMATHMathSciNetGoogle Scholar
  33. Newman MEJ (2006) Modularity and community structure in networks. Proc Nat Acad Sci 103(23): 8577–8582Google Scholar
  34. Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 14:849–856Google Scholar
  35. Ning H, Xu W, Chi Y, Gong Y, Huang TS (2010) Incremental spectral clustering by efficiently updating the eigen-system. Pattern Recog 43(1):113–127CrossRefMATHGoogle Scholar
  36. Parker C (2007) Boids pseudocode. http://www.vergenet.net/conrad/boids/pseudocode.html
  37. Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336): 846–850Google Scholar
  38. Reynolds CW (1987) Flocks, herds, and schools: A distributed behavioral model. Proceedings of 14th annual conference on computer graphics and interactive techniques, AnaheimGoogle Scholar
  39. Rosswog J, Ghose K (2008) Detecting and tracking spatio-temporal clusters with adaptive history filtering. Proceedings of the 8th IEEE international conference on data mining workshops, PisaGoogle Scholar
  40. Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Computat Appl Math 20:53–65Google Scholar
  41. Schäfer J, Strimmer K (2005) A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat Appl Genet Mol Biol 4(1):32MathSciNetGoogle Scholar
  42. Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905CrossRefGoogle Scholar
  43. Sun J, Papadimitriou S, Yu PS, Faloutsos C (2007) Graphscope: Parameter-free mining of large time-evolving graphs. Proceedings of 13th ACM SIGKDD conference on knowledge discovery and data miningGoogle Scholar
  44. Tadepalli S, Ramakrishnan N, Watson LT, Mishra B, Helm RF (2009) Gene expression time courses by analyzing cluster dynamics. J Bioinforma Comput Biol 7(2):339–356CrossRefGoogle Scholar
  45. Tang L, Liu H, Zhang J, Nazeri Z (2008) Community evolution in dynamic multi-mode networks. Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data miningGoogle Scholar
  46. Tantipathananandh C, Berger-Wolf T, Kempe D (2007) A framework for community identification in dynamic social networks. Proceedings of 13th ACM SIGKDD international conference on knowledge discovery and data miningGoogle Scholar
  47. von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416CrossRefMathSciNetGoogle Scholar
  48. Wagstaff K, Cardie C, Rogers S, Schroedl S (2001) Constrained K-means clustering with background knowledge. Proceedings of the 18th international conference on machine learning, pp 577–584Google Scholar
  49. Wang X, Davidson I (2010) Flexible constrained spectral clustering. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 563–572Google Scholar
  50. Wang Y, Liu SX, Feng J, Zhou L (2007) Mining naturally smooth evolution of clusters from dynamic data. Proceedings of SIAM conference on data miningGoogle Scholar
  51. Xu KS, Kliger M, Hero AO III (2010) Evolutionary spectral clustering with adaptive forgetting factor. Proceeding of IEEE international conference on acoustics, speech, and signal processingGoogle Scholar
  52. Xu T, Zhang Z, Yu PS, Long B (2008a) Dirichlet process based evolutionary clustering. Proceedings of the 8th IEEE international conference on data miningGoogle Scholar
  53. Xu T, Zhang Z, Yu PS, Long B (2008b) Evolutionary clustering by hierarchical Dirichlet process with hidden Markov state. Proceedings of the 8th IEEE international conference on data miningGoogle Scholar
  54. Yahoo-WWW (2012) IXIC Historical Prices|NASDAQ composite stock—Yahoo! Finance. http://finance.yahoo.com/q/hp?s=IXIC+Historical+Prices
  55. Yang T, Chi Y, Zhu S, Gong Y, Jin R (2011) Detecting communities and their evolutions in dynamic social networks—a Bayesian approach. Mach Learn 82(2):157–189CrossRefMATHMathSciNetGoogle Scholar
  56. Zhang J, Song Y, Chen G, Zhang C (2009) On-line evolutionary exponential family mixture. Proceedings of the 21st international joint conference on artificial intelligence, PasadenaGoogle Scholar
  57. Zhang J, Song Y, Zhang C, Liu S (2010) Evolutionary hierarchical Dirichlet processes for multiple correlated time-varying corpora. Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data miningGoogle Scholar

Copyright information

© The Author(s) 2013

Authors and Affiliations

  • Kevin S. Xu
    • 1
  • Mark Kliger
    • 2
  • Alfred O. Hero III
    • 1
  1. 1.EECS DepartmentUniversity of MichiganAnn ArborUSA
  2. 2.Omek InteractiveBet ShemeshIsrael

Personalised recommendations