Advertisement

A Multi-partition Multi-chunk Ensemble Technique to Classify Concept-Drifting Data Streams

  • Mohammad M. Masud
  • Jing Gao
  • Latifur Khan
  • Jiawei Han
  • Bhavani Thuraisingham
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5476)

Abstract

We propose a multi-partition, multi-chunk ensemble classifier based data mining technique to classify concept-drifting data streams. Existing ensemble techniques in classifying concept-drifting data streams follow a single-partition, single-chunk approach, in which a single data chunk is used to train one classifier. In our approach, we train a collection of v classifiers from r consecutive data chunks using v-fold partitioning of the data, and build an ensemble of such classifiers. By introducing this multi-partition, multi-chunk ensemble technique, we significantly reduce classification error compared to the single-partition, single-chunk ensemble approaches. We have theoretically justified the usefulness of our algorithm, and empirically proved its effectiveness over other state-of-the-art stream classification techniques on synthetic data and real botnet traffic.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Fan, W.: Systematic data selection to mine concept-drifting data streams. In: Proc. ACM SIGKDD, Seattle, WA, USA, pp. 128–137 (2004)Google Scholar
  2. 2.
    Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proc. ACM SIGKDD, Boston, MA, USA, pp. 71–80. ACM Press, New York (2000)Google Scholar
  3. 3.
    Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proc. SIGKDD, Washington, DC, USA, pp. 226–235 (2003)Google Scholar
  4. 4.
    Scholz, M., Klinkenberg., R.: An ensemble classifier for drifting concepts. In: Proc. Second International Workshop on Knowledge Discovery in Data Streams (IWKDDS), Porto, Portugal, pp. 53–64 (2005)Google Scholar
  5. 5.
    Gehrke, J., Ganti, V., Ramakrishnan, R., Loh, W.: Boat–optimistic decision tree construction. In: Proc. ACM SIGMOD, Philadelphia, PA, USA, pp. 169–180 (1999)Google Scholar
  6. 6.
    Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proc. ACM SIGKDD, San Francisco, CA, USA, pp. 97–106 (2001)Google Scholar
  7. 7.
    Utgoff, P.E.: Incremental induction of decision trees. Machine Learning 4, 161–186 (1989)CrossRefGoogle Scholar
  8. 8.
    Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proc. International Conference on Machine Learning (ICML), Bari, Italy, pp. 148–156 (1996)Google Scholar
  9. 9.
    Kolter, J.Z., Maloof, M.A.: Using additive expert ensembles to cope with concept drift. In: Proc. International conference on Machine learning (ICML), Bonn, Germany, pp. 449–456 (2005)Google Scholar
  10. 10.
    Gao, J., Fan, W., Han, J.: On appropriate assumptions to mine data streams. In: Proc. IEEE International Conference on Data Mining (ICDM), Omaha, NE, USA, pp. 143–152 (2007)Google Scholar
  11. 11.
    Tumer, K., Ghosh, J.: Error correlation and error reduction in ensemble classifiers. Connection Science 8(304), 385–403 (1996)CrossRefGoogle Scholar
  12. 12.
    Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B.: Mining concept-drifting data stream to detect peer to peer botnet traffic. Univ. of Texas at Dallas Tech. Report# UTDCS-05-08 (2008), http://www.utdallas.edu/~mmm058000/reports/UTDCS-05-08.pdf
  13. 13.
    Barford, P., Yegneswaran, V.: An Inside Look at Botnets. In: Advances in Information Security. Springer, Heidelberg (2006)Google Scholar
  14. 14.
    Ferguson, T.: Botnets threaten the internet as we know it. ZDNet Australia (April 2008)Google Scholar
  15. 15.
    Lemos, R.: Bot software looks to improve peerage (2006), http://www.securityfocus.com/news/11390
  16. 16.
    Group, L.T.I.: Sinit p2p trojan analysis. lurhq (2004), http://www.lurhq.com/sinit.html
  17. 17.
    Grizzard, J.B., Sharma, V., Nunnery, C., Kang, B.B., Dagon, D.: Peer-to-peer botnets: Overview and case study. In: Proc. 1st Workshop on Hot Topics in Understanding Botnets, p. 1 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Mohammad M. Masud
    • 1
  • Jing Gao
    • 2
  • Latifur Khan
    • 1
  • Jiawei Han
    • 2
  • Bhavani Thuraisingham
    • 1
  1. 1.Department of Computer ScienceUniversity of Texas at DallasUSA
  2. 2.Department of Computer ScienceUniversity of Illinois at Urbana-ChampaignUSA

Personalised recommendations