Bipartite Graphs for Monitoring Clusters Transitions

  • Márcia Oliveira
  • João Gama
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6065)

Abstract

The study of evolution has become an important research issue, especially in the last decade, due to a greater awareness of our world’s volatility. As a consequence, a new paradigm has emerged to respond more effectively to a class of new problems in Data Mining. In this paper we address the problem of monitoring the evolution of clusters and propose the MClusT framework, which was developed along the lines of this new Change Mining paradigm. MClusT includes a taxonomy of transitions, a tracking method based in Graph Theory, and a transition detection algorithm. To demonstrate its feasibility and applicability we present real world case studies, using datasets extracted from Banco de Portugal and the Portuguese Institute of Statistics. We also test our approach in a benchmark dataset from TSDL. The results are encouraging and demonstrate the ability of MClusT framework to provide an efficient diagnosis of clusters transitions.

Keywords

Bipartite Graphs Change Mining Clustering Monitoring Transitions 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bottcher, M., Hoppner, F., Spiliopoulou, M.: On exploiting the power of time in data mining. SIGKDD Explorations (10), 3–11 (2008)CrossRefGoogle Scholar
  2. 2.
    Hampel, F.: Some thoughts about classification. In: 8th Conference of the International Federation of Classification Societies, pp. 1–19. Springer, Poland (2002)Google Scholar
  3. 3.
    Jain, A.K.: Data Clustering: 50 Years Beyond K-means. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part I. LNCS (LNAI), vol. 5211, pp. 3–4. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  4. 4.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Comput. Surv. (31), 264–323 (1999)CrossRefGoogle Scholar
  5. 5.
    Ganti, V., Gehrke, J., Ramakrishnan, R.: A Framework for Measuring Changes in Data Characteristics. In: Proceedings of the 18th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 126–137. ACM Press, Pennsylvania (1999)Google Scholar
  6. 6.
    Bartolini, I., Ciaccia, P., Ntoutsi, I., Patella, M., Theodoridis, Y.: The Panda framework for Comparing Patterns. Data Knowl. Eng. (68), 244–260 (2009)CrossRefGoogle Scholar
  7. 7.
    Chawathe, S.S., Garcia-Molina, H.: Meaningful Change Detection in Structured Data. In: Peckham, J. (ed.) Proceedings ACM SIGMOD International Conference on Management of Data, pp. 26–37. ACM Press, Arizona (1997)CrossRefGoogle Scholar
  8. 8.
    Spiliopoulou, M., Ntoutsi, I., Theodoridis, Y., Schult, R.: MONIC: modeling and monitoring cluster transitions. In: Eliassi-Rad, T., Ungar, L.H., Craven, M., Gunopulos, D. (eds.) ACM SIGKDD 2006, pp. 706–711. ACM, Philadelphia (2006)Google Scholar
  9. 9.
    Falkowski, T., Bartelheimer, J., Spiliopoulou, M.: Mining and Visualizing the Evolution of Subgroups in Social Networks. In: IEEE / WIC / ACM International Conference on Web Intelligence, pp. 52–58. IEEE Computer Society, China (2006)Google Scholar
  10. 10.
    Yang, H., Parthasarathy, S., Mehta, S.: A generalized framework for mining spatio-temporal patterns in scientific data. In: Grossman, R., Bayardo, R.J., Bennett, K.P. (eds.) Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 716–721. ACM, Illinois (2005)Google Scholar
  11. 11.
    Baron, S., Spiliopoulou, M.: Monitoring Change in Mining Results. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) DaWaK 2001. LNCS, vol. 2114, p. 51. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  12. 12.
    Baron, S., Spiliopoulou, M.: Monitoring the Evolution of Web Usage Patterns. In: Berendt, B., Hotho, A., Mladenič, D., van Someren, M., Spiliopoulou, M., Stumme, G. (eds.) EWMF 2003. LNCS (LNAI), vol. 3209, pp. 181–200. Springer, Heidelberg (2004)Google Scholar
  13. 13.
    Lu, Y.-H., Huaang, Y.: Mining data streams using clustering. In: Proceedings of the 4th International Conference on Machine Learning and Cybernetics, pp. 2079–2083. IEEE Computer Society, China (2005)Google Scholar
  14. 14.
    Aggarwal, C.C.: On Change Diagnosis in Evolving Data Streams. IEEE Trans. Knowl. Data Eng. (17), 587–600 (2005)CrossRefGoogle Scholar
  15. 15.
    Chen, K., Liu, L.: Detecting the Change of Clustering Structure in Categorical Data Streams. In: Ghosh, J., Lambert, D., Skillicorn, D.B., Srivastava, J. (eds.) Proceedings of the 6th SIAM International Conference on Data Mining. SIAM, USA (2006)Google Scholar
  16. 16.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A Framework for Change Diagnosis of Data Streams. In: Halevy, A.Y., Ives, Z.G., Doan, A. (eds.) Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 575–586. ACM, California (2003)CrossRefGoogle Scholar
  17. 17.
    O’Callaghan, L., Meyerson, A., Motwani, R., Mishra, N., Guha, S.: Streaming-Data Algorithms for High-Quality Clustering. In: Proceedings of the 18th International Conference on Data Engineering, p. 685. IEEE Computer Society, California (2002)CrossRefGoogle Scholar
  18. 18.
    Elnekave, S., Last, M., Maimon, O.: Incremental Clustering of Mobile Objects. In: ICDE Workshops (2007)Google Scholar
  19. 19.
    Kalnis, P., Mamoulis, N., Bakiras, S.: On Discovering Moving Clusters in Spatio-temporal Data. In: Bauzer Medeiros, C., Egenhofer, M.J., Bertino, E. (eds.) SSTD 2005. LNCS, vol. 3633, pp. 364–381. Springer, Heidelberg (2005)Google Scholar
  20. 20.
    Li, T., Ma, S., Ogihara, M.: Entropy-based criterion in categorical clustering. In: Proceedings of the 21th international conference on Machine learning, p. 65. ACM, New York (2004)Google Scholar
  21. 21.
    Kaur, S., Bhatnagar, V., Mehta, S., Kapoor, S.: Concept Drift in Unlabeled Data Stream. Technical Report, University of Delhi (2009)Google Scholar
  22. 22.
    Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 53–65 (1987)Google Scholar
  23. 23.
    Time Series Data Library, http://robjhyndman.com/TSDL/

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Márcia Oliveira
    • 1
  • João Gama
    • 1
  1. 1.LIAAD, FEPUniversity of PortoPortoPortugal

Personalised recommendations