Abstract
Data stream is one emerging topic of data mining, it concerns many applications involving large and temporal data sets such as telephone records data, banking data, multimedia data,…For mining of such data, one crucial strategy is analysis of packet data. In this paper, we are interested in an exploratory analysis of strategies for clustering data stream based on a sub-window approach and an efficient clustering algorithm called DCA (Difference of Convex functions Algorithm). Our approach consists of separating the data on different sub-windows and then apply a DCA clustering algorithm on each sub-window. Two clustering strategies are investigated: global clustering (on the whole data set) and independent local clustering (i.e. clustering independently on each sub-window). Our aims are study: (1) the efficiency of the independent local clustering, and (2) the adequation of local clustering and global clustering based on the same DCA clustering algorithm. Comparative experiments with clustering data stream using K-Means, a standard clustering method, on different data sets are presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Da Silva, A.G.: Analyse des données évolutives: application aux données d’usage du Web. Thèse de Doctoral, Paris IX Dauphine, pp. 62–72 (2006)
Aggarwal, C.C.: Data Streams: Models and Algorithms, Advances in Database Systems, vol. 31, pp. 1–5. Springer (2007) ISBN 978-0-387-28759-1
Bradley, B.S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: Shavlik, J. (ed.) Machine Learning Proceedings of the Fifteenth International Conferences (ICML 1998), pp. 82–90. MorganKaufmann, San Francisco (1998)
De Leeuw, J.: Applications of convex analysis to multidimensional scaling, Recent developments. In: Barra, J.R., et al. (eds.) Statistics, pp. 133–145. North-Holland Publishing company, Amsterdam (1997)
Hartigan, J.A.: Clustering algorithms. John Wiley and Sons (1975)
An, L.T.H.: DC programming and DCA, http://lita.sciences.univ-metz.fr/~lethi/DCA.html
An, L.T.H., Tao, P.D.: Solving a class of linearly constrained indefinite quadratic problems by DC algorithms. Journal of Global Optimization 11(3), 253–285 (1997)
An, L.T.H., Tao, P.D.: DC Programming Approach for Solving the Multidimensional Scaling Problem. In: Nonconvex Optimizations and Its Applications: Special Issue, From Local to Global Optimization, pp. 231–276. Kluwer Academic Publishers (2001)
An, L.T.H., Tao, P.D.: Large Scale Molecular Optimization from distances matrices by a DC optimization approach. SIAM Journal of Optimization 14(1), 77–116 (2003)
An, L.T.H., Tao, P.D.: The DC (difference of convex functions) Programming and DCA revisited with DC models of real world nonconvex optimization problems. Annals of Operations Research 133, 23–46 (2005)
An, L.T.H., Minh, L.H., Tao, P.D.: Optimization based DC programming and DCA for Hierarchical Clustering. European Journal of Operational Research 183, 1067–1085 (2007)
An, L.T.H., Tayeb Belghiti, M., Tao, P.D.: A new efficient algorithm based on DC programming and DCA for clustering. Journal of Global Optimization 37, 609–630 (2007)
An, L.T.H., Minh, L.H., Vinh, N.V., Tao, P.D.: A DC Programming approach for Feature Selection in Support Vector Machines learning. Journal of Advances in Data Analysis and Classification 2(3), 259–278 (2008)
MacQueen, J.B.: Some Methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–288. University of California Press, Berkeley (1967)
Neumann, J., Schnörr, C., Steidl, G.: SVM-Based Feature Selection by Direct Objective Minimisation. In: Rasmussen, C.E., Bülthoff, H.H., Schölkopf, B., Giese, M.A. (eds.) DAGM 2004. LNCS, vol. 3175, pp. 212–219. Springer, Heidelberg (2004)
Tao, P.D., An, L.T.H.: Convex analysis approach to d.c. programming: Theory, Algorithms and Applications. Acta Mathematica Vietnamica, Dedicated to Professor Hoang Tuy on the Occasion of his 70th Birthday 22(1), 289–355 (1997)
Tao, P.D., An, L.T.H.: DC optimization algorithms for solving the trust region subproblem. SIAM J. Optimization 8, 476–505 (1998)
Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering Data Streams: Theory and Practice. IEEE TKDE 15, 515–516 (2003)
Yuille, A.L., Rangarajan, A.: The Convex Concave Procedure (CCCP). In: Advances in Neural Information Processing System 14, MIT Press, Cambrige (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ta, M.T., Le Thi, H.A., Boudjeloud-Assala, L. (2012). Clustering Data Stream by a Sub-window Approach Using DCA. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2012. Lecture Notes in Computer Science(), vol 7376. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31537-4_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-31537-4_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31536-7
Online ISBN: 978-3-642-31537-4
eBook Packages: Computer ScienceComputer Science (R0)