Skip to main content

Clustering Data Stream by a Sub-window Approach Using DCA

  • Conference paper
Machine Learning and Data Mining in Pattern Recognition (MLDM 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7376))

Abstract

Data stream is one emerging topic of data mining, it concerns many applications involving large and temporal data sets such as telephone records data, banking data, multimedia data,…For mining of such data, one crucial strategy is analysis of packet data. In this paper, we are interested in an exploratory analysis of strategies for clustering data stream based on a sub-window approach and an efficient clustering algorithm called DCA (Difference of Convex functions Algorithm). Our approach consists of separating the data on different sub-windows and then apply a DCA clustering algorithm on each sub-window. Two clustering strategies are investigated: global clustering (on the whole data set) and independent local clustering (i.e. clustering independently on each sub-window). Our aims are study: (1) the efficiency of the independent local clustering, and (2) the adequation of local clustering and global clustering based on the same DCA clustering algorithm. Comparative experiments with clustering data stream using K-Means, a standard clustering method, on different data sets are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Da Silva, A.G.: Analyse des données évolutives: application aux données d’usage du Web. Thèse de Doctoral, Paris IX Dauphine, pp. 62–72 (2006)

    Google Scholar 

  2. Aggarwal, C.C.: Data Streams: Models and Algorithms, Advances in Database Systems, vol. 31, pp. 1–5. Springer (2007) ISBN 978-0-387-28759-1

    Google Scholar 

  3. Bradley, B.S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: Shavlik, J. (ed.) Machine Learning Proceedings of the Fifteenth International Conferences (ICML 1998), pp. 82–90. MorganKaufmann, San Francisco (1998)

    Google Scholar 

  4. De Leeuw, J.: Applications of convex analysis to multidimensional scaling, Recent developments. In: Barra, J.R., et al. (eds.) Statistics, pp. 133–145. North-Holland Publishing company, Amsterdam (1997)

    Google Scholar 

  5. Hartigan, J.A.: Clustering algorithms. John Wiley and Sons (1975)

    Google Scholar 

  6. An, L.T.H.: DC programming and DCA, http://lita.sciences.univ-metz.fr/~lethi/DCA.html

  7. An, L.T.H., Tao, P.D.: Solving a class of linearly constrained indefinite quadratic problems by DC algorithms. Journal of Global Optimization 11(3), 253–285 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  8. An, L.T.H., Tao, P.D.: DC Programming Approach for Solving the Multidimensional Scaling Problem. In: Nonconvex Optimizations and Its Applications: Special Issue, From Local to Global Optimization, pp. 231–276. Kluwer Academic Publishers (2001)

    Google Scholar 

  9. An, L.T.H., Tao, P.D.: Large Scale Molecular Optimization from distances matrices by a DC optimization approach. SIAM Journal of Optimization 14(1), 77–116 (2003)

    Article  MATH  Google Scholar 

  10. An, L.T.H., Tao, P.D.: The DC (difference of convex functions) Programming and DCA revisited with DC models of real world nonconvex optimization problems. Annals of Operations Research 133, 23–46 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  11. An, L.T.H., Minh, L.H., Tao, P.D.: Optimization based DC programming and DCA for Hierarchical Clustering. European Journal of Operational Research 183, 1067–1085 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  12. An, L.T.H., Tayeb Belghiti, M., Tao, P.D.: A new efficient algorithm based on DC programming and DCA for clustering. Journal of Global Optimization 37, 609–630 (2007)

    Article  MATH  Google Scholar 

  13. An, L.T.H., Minh, L.H., Vinh, N.V., Tao, P.D.: A DC Programming approach for Feature Selection in Support Vector Machines learning. Journal of Advances in Data Analysis and Classification 2(3), 259–278 (2008)

    Article  Google Scholar 

  14. MacQueen, J.B.: Some Methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–288. University of California Press, Berkeley (1967)

    Google Scholar 

  15. Neumann, J., Schnörr, C., Steidl, G.: SVM-Based Feature Selection by Direct Objective Minimisation. In: Rasmussen, C.E., Bülthoff, H.H., Schölkopf, B., Giese, M.A. (eds.) DAGM 2004. LNCS, vol. 3175, pp. 212–219. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  16. Tao, P.D., An, L.T.H.: Convex analysis approach to d.c. programming: Theory, Algorithms and Applications. Acta Mathematica Vietnamica, Dedicated to Professor Hoang Tuy on the Occasion of his 70th Birthday 22(1), 289–355 (1997)

    MathSciNet  MATH  Google Scholar 

  17. Tao, P.D., An, L.T.H.: DC optimization algorithms for solving the trust region subproblem. SIAM J. Optimization 8, 476–505 (1998)

    Article  MATH  Google Scholar 

  18. Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering Data Streams: Theory and Practice. IEEE TKDE 15, 515–516 (2003)

    Google Scholar 

  19. http://archive.ics.uci.edu/ml/

  20. http://faculty.washington.edu/kayee/cluster/

  21. http://genomics.stanford.edu/

  22. Yuille, A.L., Rangarajan, A.: The Convex Concave Procedure (CCCP). In: Advances in Neural Information Processing System 14, MIT Press, Cambrige (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ta, M.T., Le Thi, H.A., Boudjeloud-Assala, L. (2012). Clustering Data Stream by a Sub-window Approach Using DCA. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2012. Lecture Notes in Computer Science(), vol 7376. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31537-4_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31537-4_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31536-7

  • Online ISBN: 978-3-642-31537-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics