Skip to main content
Log in

Online network traffic classification with incremental learning

  • Original Paper
  • Published:
Evolving Systems Aims and scope Submit manuscript

Abstract

Conventional network traffic detection methods based on data mining could not efficiently handle high throughput traffic with concept drift. Data stream mining techniques are able to classify evolving data streams although most techniques require completely labeled data. This paper proposes an improved data stream mining algorithm for online network traffic classification that is able to incrementally learn from both labeled and unlabeled flows. The algorithm uses the concept of incremental k-means and self-training semi-supervised method to continuously update the classification model after receiving new flow instances. The experimental results show that the proposed algorithm is able to classify 325 thousands flow instances per second and achieves up to 91–94 % average accuracy, even when using 10 % of labeled input flows. It is also able to maintain high accuracy even in the presence of concept drifts. Although there are drifts detected in the datasets evaluated using the Drift Detection Method, our proposed method with incremental learning is able to achieve up to 91–94 % accuracy compared to 60–69 % without incremental learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Abdulsalam H (2008) Streaming Random Forest. PhD thesis, School of Computing, Queen’s University, Kingston, Ontario, Canada

  • Aggarwal CC, Jiawei H, Jianyong W, Philip SY (2004) On demand classification of data streams. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD’04, New York, NY, USA. ACM, pp 503–508

  • Aggarwal Charu C, Han Jiawei, Wang Jianyong, Yu Philip S (2006) A framework for on-demand classification of evolving data streams. IEEE Trans Knowl Data Eng 18(5):577–588

    Article  Google Scholar 

  • Angelov Plamen P, Zhou Xiaowei (2008) Evolving fuzzy-rule-based classifiers from data streams. IEEE Trans Fuzzy Syst 16(6):1462–1475

    Article  Google Scholar 

  • Baena-García M, José del Campo-Ávila J, Raúl F, Albert B, Gavaldà R, Morales-Bueno R (2006) Early drift detection method. 6:77–86

  • Bertini Jr, João R, de Andrade Alneu, Lopes AA, Liang Z (2012) Partially labeled data stream classification with the semi-supervised K-associated graph. J Braz Comp Soc 18(4):299–310

  • Bifet A, Holmes G, Kirkby R, Pfahringer B (2011) Data stream mining: a practical approach. Technical report, University of Waikato

  • Bifet A, Holmes G, Pfahringer B, Gavalda R (2009) Improving adaptive bagging methods for evolving data streams. In: Advances in Machine Learning. Springer, pp 23–37

  • Bifet A, Pfahringer B, Read J, Holmes G (2013) Efficient data stream classification via probabilistic adaptive windows. In: Proceedings of the 28th Annual ACM Symposium on Applied Computing, pp 801–806

  • Dainotti Alberto, Pescape Antonio, Claffy Kimberly C (2012) Issues and future directions in traffic classification. IEEE Netw 26(1):35–40

    Article  Google Scholar 

  • de Souza EN, Matwin S, Fernandes S (2014) Traffic classification with on-line ensemble method. In: Global Information Infrastructure and Networking Symposium (GIIS), IEEE, pp 1–4

  • Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD’00, New York, NY, USA, ACM, pp 71–80

  • Erman Jeffrey, Mahanti Anirban, Arlitt Martin, Cohen Ira, Williamson Carey (2007) Offline/realtime traffic classification using semi-supervised learning. Perform Eval 64(9):1194–1213

    Article  Google Scholar 

  • Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Advances in Artificial Intelligence-SBIA 2004, Springer, pp 286–295

  • Gringoli F, Salgarelli L, Cascarano N, Risso F, Claffy KC (2009) GT: picking up the truth from the ground for internet traffic. ACM SIGCOMM Comp Commun Rev 39:13–18

    Article  Google Scholar 

  • Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, KDD’01, New York, NY, USA, ACM, pp 97–106

  • Li S (2012) Towards ultra high-speed online network traffic classifcation enhanced with machine learning algorithms and openflow accelerators. PhD thesis, University of Massachusetts Lowell

  • Li W, Moore AW (2007) (2007) A machine learning approach for efficient traffic classification. In: 15th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, IEEE, MASCOTS’07., pp 310–317

  • Liu Jing, Guo-sheng Xu, Zheng Shi-hui, Xiao Da, Li-ze Gu (2014) Data streams classification with ensemble model based on decision-feedback. J China Univ Posts Telecommun 21(1):79–85

    Article  Google Scholar 

  • Loo HR, Andromeda Trias, Marsono MN (2014) Online data stream learning and classification with limited labels. Proc Elect Eng Comp Sci Inform 1(1):161–164

    Google Scholar 

  • Loo HR, Marsono MN (2015) Online data stream classification with incremental semi-supervised learning. In: 2nd IKDD Conference on Data Science, CODS’15, ACM, pp 132–133

  • Lughofer E, Angelov P (2009) Detecting and reacting on drifts and shifts in on-line data streams with evolving fuzzy systems. In: Proceedings of the Joint 2009 International Fuzzy Systems Association World Congress and 2009 European Society of Fuzzy Logic and Technology Conference, Lisbon, Portugal, July 20–24, IFSA, Lisbon, pp 931–937

  • Lughofer Edwin, Angelov Plamen (2011) Handling drifts and shifts in on-line data streams with evolving fuzzy systems. Appl Soft Comp 11(2):2057–2068

    Article  Google Scholar 

  • Masud Mohammad M, Woolam Clay, Gao Jing, Khan Latifur, Han Jiawei, Hamlen Kevin W, Oza Nikunj C (2012) Facing the reality of data stream classification: coping with scarcity of labeled data. Knowl Inform Syst 33(1):213–244

    Article  Google Scholar 

  • Mingliang G, Xiaohong H, Xu T, Ma Y, Zhenhua W (2009) Data stream mIning based real-time high speed traffic classification. In: Proceedings of 2nd IEEE International Conference on Broadband Network and Multimedia Technology, 2009. IC-BNMT’09., pp 700–705

  • Minku L, Yao Xin (2012) DDD: a new ensemble approach for dealing with concept drift. IEEE Trans Knowl Data Eng 24(4):619–633

    Article  Google Scholar 

  • Monemi Alireza, Zarei Roozbeh, Marsono Muhammad Nadzir (2013) Online NetFPGA decision tree statistical traffic classifier. Comp Commun 36(12):1329–1340

    Article  Google Scholar 

  • Moore A, Zuev D, Crogan M (2005) Discriminators for use in flow-based classification. Technical report, Department of Computer Science, Queen Mary, University of London

  • Qian Feng, Guang-min Hu, Yao Xing-miao (2008) Semi-supervised internet network traffic classification using a Gaussian mixture model. AEU Int J Elect Commun 62(7):557–564

    Article  Google Scholar 

  • Raahemi B, Mumtaz A (2008) A two-stage window-based architecture for classification of peer-to-peer traffic using fast decision tree. In: Proceedings of the 4th International conference on data mining DMIN2008. Las Vegas, Nevada, USA, pp 144–149

  • Raahemi B, Zhong W, Liu J (2008) Peer-to-peer traffic identification by mining IP layer data streams using concept-adapting very fast decision tree. In: 20th IEEE Internationtal Conference on Tools with Artificial Intelligence, vol 1, pp 525–532

  • Shrivastav A, Tiwari J (2010) Network traffic classification using semi-supervised approach. In: IEEE 2010 Second International Conference on Machine Learning and Computing (ICMLC), pp 345–349

  • Street WN, Kim YS (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, KDD’01, New York, NY, USA. ACM, pp 377–382

  • Tian X, Sun Q, Huang X, Ma Y (2008) Dynamic online traffic classification using data stream mining. In: International Conference on MultiMedia and Information Technology, MMIT’08, IEEE, pp 104–107

  • Waikato (2015) MOA massive online analysis. http://moa.cs.waikato.ac.nz/

  • Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’03, New York, NY, USA. ACM, pp 226–235

  • Zhang T, Raghu R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: ACM SIGMOD Record, ACM, vol 25, pp 103–114

  • Zhen Liu, Qiong Liu (2012) A new feature selection method for internet traffic classification using ml. Phys Procedia 33:1338–1345

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. N. Marsono.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Loo, H.R., Marsono, M.N. Online network traffic classification with incremental learning. Evolving Systems 7, 129–143 (2016). https://doi.org/10.1007/s12530-016-9152-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12530-016-9152-x

Keywords

Navigation