Advertisement

A Concept-Drifting Detection Algorithm for Categorical Evolving Data

  • Fuyuan Cao
  • Joshua Zhexue Huang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7819)

Abstract

In data streams analysis, detecting concept-drifting is a very important problem for real-time decision making. In this paper, we propose a new method for detecting concept drifts by measuring the difference of distributions between two concepts. The difference is defined by approximation accuracy of rough set theory, which can also be used to measure the change speed of concepts. We propose a concept-drifting detection algorithm and analyze its complexity. The experimental results on a real data set with a half million records have shown that the proposed algorithm is not only effective in discovering the changes of concepts but also efficient in processing large data sets.

Keywords

Categorical Data Evolving Concept-drifting 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Babcock, B., Babu, S., Dater, M., Motwanti, R.: Models and Issues in data stream systems. In: Proc. PODS, pp. 1–16 (2002)Google Scholar
  2. 2.
    Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden context. Machine Learning 23, 69–101 (1996)Google Scholar
  3. 3.
    Guha, S., Meyerson, A., Mishra, N., Motwani, R., OCallaghan, L.: Clustering data streams: theory and practice. IEEE Transactions Knowledge and Data Engineering 15, 515–528 (2003)CrossRefGoogle Scholar
  4. 4.
    Pawlak, Z.: Rough sets. International Journal of Computer and Information Sciences 11, 341–356 (1982)MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    Cao, F.Y., Liang, J.Y., Bai, L., Zhao, X.W., Dang, C.Y.: A framework for clustering categorical time-evolving data. IEEE Transactions on Fuzzy Systems 18, 872–885 (2010)CrossRefGoogle Scholar
  6. 6.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proc. Very Large Data Bases Conf. (2003)Google Scholar
  7. 7.
    Chakrabarti, D., Kumar, R., Tomkins, A.: Evloluationary clustering. In: Proc. ACM SIGKDD. Knowledge Discovery and Data Mining, pp. 554–560 (2006)Google Scholar
  8. 8.
    Gaber, M.M., Yu, P.S.: Detection and classification of changes in evolving data streams. International Journal of Information Technology and Decision Making 5, 659–670 (2006)CrossRefGoogle Scholar
  9. 9.
    Minku, L.L., White, A.P., Yao, X.: The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Transactions on Knowledge and Data Engineering 22, 730–742 (2010)CrossRefGoogle Scholar
  10. 10.
    UCI Machine Learning Repository (2012), http://www.ics.uci.edu/~mlearn/MLRepository.html
  11. 11.
    Dai, B.-R., Huang, J.-W., Yeh, M.-Y., Chen, M.-S.: Adaptive clustering for multiple evolving steams. IEEE Transactions Knowledge and Data Engineering 18, 1166–1180 (2006)CrossRefGoogle Scholar
  12. 12.
    Yeh, M.Y., Dai, B.R., Chen, M.S.: Clustering over multiple evolving streams by events and corrlations. IEEE Transactions Knowledge and Data Engineering 19, 1349–1362 (2007)CrossRefGoogle Scholar
  13. 13.
    Chen, H.-L., Chen, M.-S., Lin, S.-C.: Catching the trend: A framework for clustering concept-drifting categorical data. IEEE Transactions Knowledge and Data Engineering 21, 652–665 (2009)CrossRefGoogle Scholar
  14. 14.
    Chen, K.K., Liu, L.: HE-Tree:a framework for detecting changes in clustering structure for categorical data streams. The VLDB Journal 18, 1241–1260 (2009)CrossRefGoogle Scholar
  15. 15.
    Nasraoui, O., Soliman, M., Saka, E., Badia, A., Germain, R.: A web usage mining framework for mining evolving user profiles in dynamic web sites. IEEE Transactions Knowledge and Data Engineering 20, 202–215 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Fuyuan Cao
    • 1
    • 2
  • Joshua Zhexue Huang
    • 1
  1. 1.Shenzhen Key Laboratory of High Performance Data Mining, Shenzhen Institutes of Advanced TechnologyChinese Academy of SciencesShenzhenChina
  2. 2.Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, the School of Computer and Information TechnologyShanxi UniversityTaiyuanChina

Personalised recommendations