A Multi-partition Multi-chunk Ensemble Technique to Classify Concept-Drifting Data Streams

  • Mohammad M. Masud
  • Jing Gao
  • Latifur Khan
  • Jiawei Han
  • Bhavani Thuraisingham
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5476)


We propose a multi-partition, multi-chunk ensemble classifier based data mining technique to classify concept-drifting data streams. Existing ensemble techniques in classifying concept-drifting data streams follow a single-partition, single-chunk approach, in which a single data chunk is used to train one classifier. In our approach, we train a collection of v classifiers from r consecutive data chunks using v-fold partitioning of the data, and build an ensemble of such classifiers. By introducing this multi-partition, multi-chunk ensemble technique, we significantly reduce classification error compared to the single-partition, single-chunk ensemble approaches. We have theoretically justified the usefulness of our algorithm, and empirically proved its effectiveness over other state-of-the-art stream classification techniques on synthetic data and real botnet traffic.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Fan, W.: Systematic data selection to mine concept-drifting data streams. In: Proc. ACM SIGKDD, Seattle, WA, USA, pp. 128–137 (2004)Google Scholar
  2. 2.
    Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proc. ACM SIGKDD, Boston, MA, USA, pp. 71–80. ACM Press, New York (2000)Google Scholar
  3. 3.
    Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proc. SIGKDD, Washington, DC, USA, pp. 226–235 (2003)Google Scholar
  4. 4.
    Scholz, M., Klinkenberg., R.: An ensemble classifier for drifting concepts. In: Proc. Second International Workshop on Knowledge Discovery in Data Streams (IWKDDS), Porto, Portugal, pp. 53–64 (2005)Google Scholar
  5. 5.
    Gehrke, J., Ganti, V., Ramakrishnan, R., Loh, W.: Boat–optimistic decision tree construction. In: Proc. ACM SIGMOD, Philadelphia, PA, USA, pp. 169–180 (1999)Google Scholar
  6. 6.
    Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proc. ACM SIGKDD, San Francisco, CA, USA, pp. 97–106 (2001)Google Scholar
  7. 7.
    Utgoff, P.E.: Incremental induction of decision trees. Machine Learning 4, 161–186 (1989)CrossRefGoogle Scholar
  8. 8.
    Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proc. International Conference on Machine Learning (ICML), Bari, Italy, pp. 148–156 (1996)Google Scholar
  9. 9.
    Kolter, J.Z., Maloof, M.A.: Using additive expert ensembles to cope with concept drift. In: Proc. International conference on Machine learning (ICML), Bonn, Germany, pp. 449–456 (2005)Google Scholar
  10. 10.
    Gao, J., Fan, W., Han, J.: On appropriate assumptions to mine data streams. In: Proc. IEEE International Conference on Data Mining (ICDM), Omaha, NE, USA, pp. 143–152 (2007)Google Scholar
  11. 11.
    Tumer, K., Ghosh, J.: Error correlation and error reduction in ensemble classifiers. Connection Science 8(304), 385–403 (1996)CrossRefGoogle Scholar
  12. 12.
    Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B.: Mining concept-drifting data stream to detect peer to peer botnet traffic. Univ. of Texas at Dallas Tech. Report# UTDCS-05-08 (2008),
  13. 13.
    Barford, P., Yegneswaran, V.: An Inside Look at Botnets. In: Advances in Information Security. Springer, Heidelberg (2006)Google Scholar
  14. 14.
    Ferguson, T.: Botnets threaten the internet as we know it. ZDNet Australia (April 2008)Google Scholar
  15. 15.
    Lemos, R.: Bot software looks to improve peerage (2006),
  16. 16.
    Group, L.T.I.: Sinit p2p trojan analysis. lurhq (2004),
  17. 17.
    Grizzard, J.B., Sharma, V., Nunnery, C., Kang, B.B., Dagon, D.: Peer-to-peer botnets: Overview and case study. In: Proc. 1st Workshop on Hot Topics in Understanding Botnets, p. 1 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Mohammad M. Masud
    • 1
  • Jing Gao
    • 2
  • Latifur Khan
    • 1
  • Jiawei Han
    • 2
  • Bhavani Thuraisingham
    • 1
  1. 1.Department of Computer ScienceUniversity of Texas at DallasUSA
  2. 2.Department of Computer ScienceUniversity of Illinois at Urbana-ChampaignUSA

Personalised recommendations