Advertisement

Mining Multi-label Concept-Drifting Data Streams Using Dynamic Classifier Ensemble

  • Wei Qu
  • Yang Zhang
  • Junping Zhu
  • Qiang Qiu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5828)

Abstract

The problem of mining single-label data streams has been extensively studied in recent years. However, not enough attention has been paid to the problem of mining multi-label data streams. In this paper, we propose an improved binary relevance method to take advantage of dependence information among class labels, and propose a dynamic classifier ensemble approach for classifying multi-label concept-drifting data streams. The weighted majority voting strategy is used in our classification algorithm. Our empirical study on both synthetic data set and real-life data set shows that the proposed dynamic classifier ensemble with improved binary relevance approach outperforms dynamic classifier ensemble with binary relevance algorithm, and static classifier ensemble with binary relevance algorithm.

Keywords

Multi-label Data Stream Concept Drift Binary Relevance Dynamic Classifier Ensemble 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceeding of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 226–235. ACM Press, New York (2003)Google Scholar
  2. 2.
    Hulten, G., Spencer, L., Domingos, P.: Mining Time-Changing Data Streams. In: ACM SIGKDD, pp. 97–106 (2001)Google Scholar
  3. 3.
    Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Machine Learning 23(1), 69–101 (1996)Google Scholar
  4. 4.
    Qu, W., Zhang, Y., Zhu, J., Wang, Y.: Mining concept-drifting multi-label data streams using ensemble classifiers. In: Fuzzy Systems and Knowledge Discovery, Tianjin, China (to appear, 2009)Google Scholar
  5. 5.
    Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. International Journal of Data Warehousing and Mining 3(3), 1–13 (2007)Google Scholar
  6. 6.
    Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining Multi-label Data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, 2nd edn. Springer, Heidelberg (2009), http://mlkd.csd.auth.gr/multilabel.html Google Scholar
  7. 7.
    Clare, A., King, R.: Knowledge discovery in multi-label phenotype data. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 42–53. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  8. 8.
    Schapire, R.E., Singer, Y.: Boostexter: a boosting-based system for text categorization. Machine Learning, 135–168 (2000)Google Scholar
  9. 9.
    McCallum, A.: Multi-label text classification with a mixture model trained by em. In: Proceedings of the AAAI 1999 Workshop on Text Learning, pp. 681–687 (1999)Google Scholar
  10. 10.
    Crammer, K., Singer, Y.: A family of additive online algorithms for category ranking. Journal of Machine Learning Research, 1025–1058 (2003)Google Scholar
  11. 11.
    Elisseeff, A., Weston, J.: A kernel method for multi-labeled classification. In: Advances in Neural Information Processing Systems, pp. 681–687 (2002)Google Scholar
  12. 12.
    Zhang, M.L., Zhou, Z.H.: Multi-label neural networks with applications to functional genomics and text categorization. IEEE Transactions on Knowledge and Data Engineering, 1338–1351 (2006)Google Scholar
  13. 13.
    Godbole, S., Sarawagi, S.: Discriminative methods for multi-labeled classification. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 22–30. Springer, Heidelberg (2004)Google Scholar
  14. 14.
    Zhang, M.L., Zhou, Z.H.: A k-nearest neighbor based algorithm for multi-label classification. In: Proceedings of the 1st IEEE International Conference on Granular Computing, pp. 718–721 (2005)Google Scholar
  15. 15.
    Brinker, K., Hullermeier, E.: Case-based multilabel ranking. In: Proceedings of the 20th International Conference on Artificial Intelligence (IJCAI 2007), Hyderabad, India, pp. 702–707 (2007)Google Scholar
  16. 16.
    Thabtah, F., Cowling, P., Peng, Y.: Mmac: A new multi-class, multi-label associative classification approach. In: Proceedings of the 4th IEEE International Conference on Data Mining, ICDM 2004, pp. 217–224 (2004)Google Scholar
  17. 17.
    Tsymbal, A.: The problem of concept drift: definitions and related work. Technical Report TCD-CS-2004-15, Department of Computer Science, Trinity College Dublin, Ireland (2004)Google Scholar
  18. 18.
    Street, W., Kim, Y.: A streaming ensemble algorithm (SEA) for large scale classification. In: KDD 2001, 7th International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, August 2001, pp. 377–382 (2001)Google Scholar
  19. 19.
    Zhu, X., Wu, X., Yang, Y.: Dynamic classifier selection for effective mining from noisy data streams. In: Proceedings of the 4th international conference on Data Mining (ICDM 2004), pp. 305–312 (2004)Google Scholar
  20. 20.
    Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: a new ensemble method for tracking concept drift. In: ICDM 2003, 3rd International Conference on Data Mining, pp. 123–130 (2003)Google Scholar
  21. 21.
    Zhang, Y., Jin, X.: An automatic construction and organization strategy for ensemble learning on data streams. ACM SIGMOD Record 35(3), 28–33 (2006)CrossRefGoogle Scholar
  22. 22.
    Tsymbal, A., Pechenizkiy, M., Cunningham, P.: Puuronen. S.: Handling local concept drift with dynamic integration of classifiers: domain of antibiotic resistance in nosocomial infections. In: Proceedings of 19th International Symposium on Computer-Based Medical Systems (CBMS 2006), pp. 679–684 (2006)Google Scholar
  23. 23.
    Gao, S., Wu, W., Lee, C.-H., Chua, T.-S.: A maximal figure-of-merit approach to text categorization. In: SIGIR 2003, pp. 174–181 (2003)Google Scholar
  24. 24.
    Katakis, I., Tsoumakas, G., Vlahavas, I.: Dynamic Feature Space and Incremental Feature Selection for the Classification of Textual Data Streams. In: ECML/PKDD 2006 International Workshop on Knowledge Discovery from Data Streams, Berlin, Germany, pp. 107–116 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Wei Qu
    • 1
  • Yang Zhang
    • 1
  • Junping Zhu
    • 1
  • Qiang Qiu
    • 1
  1. 1.College of Information EngineeringNorthwest A&F UniversityYanglingP.R. China

Personalised recommendations