Statistical network protocol identification with unknown pattern extraction

Abstract

Network traffic classification is an enabling technique for network security and management for both traditional networks and emerging networks such as Internet of Things. Due to the decreasing effectiveness of traditional port-based and payload-based methods, lots of research attentions are devoted to an alternative approach based on flow and packet-level traffic characteristics. A variety of statistical classification schemes are proposed in this context, but most of them embody an implicit assumption that all protocols are known in advance and well presented in the training data. This assumption is unrealistic because real-world networks constantly witness emerging traffic patterns and protocols that are previously unknown. In this paper, we revisit the problem by proposing a learning scheme with unknown pattern extraction for statistical protocol identification. The scheme is designed with a more realistic setting, in which we assume that the training data only consists of labeled samples from a limited number of protocols, and the goal is to identify these known patterns out of arbitrary traffic mixture of both known and unknown protocols. Our experiments based on real-world traffic show that the proposed scheme outperforms previous approaches by accurately identifying both known and unknown protocols.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. 1.

    Nguyen TT, Armitage G (2008) A survey of techniques for internet traffic classification using machine learning. Commun Surveys Tuts 10(4):56

    Article  Google Scholar 

  2. 2.

    Liu Q, Wang G, Liu X, Peng T, Wu J (2017) Achieving reliable and secure services in cloud computing environments. Comput Electr Eng 59:153

    Article  Google Scholar 

  3. 3.

    Meng W, Tischhauser EW, Wang Q, Wang Y, Han J (2018) When intrusion detection meets blockchain technology: a review. IEEE Access 6:10179

    Article  Google Scholar 

  4. 4.

    Karagiannis T, Broido A, Brownlee N, Claffy KC, Faloutsos M (2004) In: Global telecommunications conference GLOBECOM ’04. IEEE, vol 3. pp 1532–1538

  5. 5.

    Sen S, Spatscheck O, Wang D (2004) In: Proceedings of the 13th international conference on World Wide Web, WWW ’04, ACM, New York, pp 512–521

  6. 6.

    Mawi working group traffic archive. http://mawi.wide.ad.jp/mawi/. Accessed: 2018-03-01

  7. 7.

    Meng W, Wang Y, Wong DS, Wen S, Xiang Y (2018) Touchwb: Touch behavioral user authentication based on web browsing on smartphones. J Netw Comput Appl 117:1

    Article  Google Scholar 

  8. 8.

    Li J, Sun L, Yan Q, Li Z, Srisa-an W, Ye H (2018) Significant permission identification for machine learning based android malware detection. IEEE Transactions on Industrial Informatics. https://doi.org/10.1109/TII.2017.2789219

  9. 9.

    Liu Y, Ling J, Liu Z, Shen J, Gao C (2018) Finger vein secure biometric template generation based on deep learning. Soft Comput 22(7):2257

    Article  Google Scholar 

  10. 10.

    Meng W, Jiang L, Wang Y, Li J, Zhang J, Xiang Y (2017) Jfcguard: Detecting juice filming charging attack via processor usage analysis on smartphones. Computers & Security. https://doi.org/10.1016/j.cose.2017.11.012

  11. 11.

    Yuan C, Li X, Wu Q, Li J, Sun X (2017) Fingerprint liveness detection from different fingerprint materials using convolutional neural network and principal component analysis. CMC-Computers Materials & Continua 53(4):357

    Google Scholar 

  12. 12.

    Roughan M, Sen S, Spatscheck O, Duffield N (2004) In: Proceedings of the 4th ACM SIGCOMM conference on internet measurement, IMC ’04, ACM, New York, pp 135–148

  13. 13.

    Moore AW, Zuev D (2005) In: Proceedings of the ACM SIGMETRICS international conference on measurement and modeling of computer systems, SIGMETRICS ’05, ACM, New York, pp 50–60

  14. 14.

    Auld T, Moore AW, Gull SF (2007) Bayesian neural networks for internet traffic classification. IEEE Trans Neural Netw 18(1):223

    Article  Google Scholar 

  15. 15.

    Chen Z, Peng L, Gao C, Yang B, Chen Y, Li J (2017) Flexible neural trees based early stage identification for ip traffic. Soft Comput 21(8):2035

    Article  Google Scholar 

  16. 16.

    Williams N, Zander S, Armitage G (2006) A preliminary performance comparison of five machine learning algorithms for practical ip traffic flow classification. SIGCOMM Comput Commun Rev 36(5):5

    Article  Google Scholar 

  17. 17.

    Kim H, Claffy K, Fomenkov M, Barman D, Faloutsos M, Lee K (2008) In: Proceedings of the ACM coNEXT conference, CoNEXT ’08, ACM, New York, pp 11:1–11:12

  18. 18.

    Karagiannis T, Papagiannaki K, Faloutsos M (2005) In: Proceedings of the conference on applications, technologies, architectures, and protocols for computer communications, SIGCOMM ’05, ACM, New York, pp 229–240

  19. 19.

    Jiang W, Wang G, Bhuiyan MZA, Wu J (2016) Understanding graph-based trust evaluation in online social networks: Methodologies and challenges. ACM Comput Surv 49(1):10:1

    Article  Google Scholar 

  20. 20.

    Yang W, Wang G, Bhuiyan MZA, Choo KKR (2017) Hypergraph partitioning for social networks based on information entropy modularity. J Netw Comput Appl 86:59. Special Issue on Pervasive Social Networking

    Article  Google Scholar 

  21. 21.

    Peng S, Wang G, Xie D (2017) Social influence analysis in social networking big data: opportunities and challenges. IEEE Netw 31(1):11

    Article  Google Scholar 

  22. 22.

    Peng S, Yang A, Cao L, Yu S, Xie D (2017) Social influence modeling using information theory in mobile social networks. Inf Sci 379:146

    Article  Google Scholar 

  23. 23.

    Cai J, Wang Y, Liu Y, Luo JZ, Wei W, Xu X (2017) Enhancing network capacity by weakening community structure in scale-free network. Future Generation Computer Systems. https://doi.org/10.1016/j.future.2017.08.014

  24. 24.

    Chen S, Wang G, Yan G, Xie D (2017) Multi-dimensional fuzzy trust evaluation for mobile social networks based on dynamic community structures. Concurrency and Computation: Practice and Experience 29(7):e3901

    Article  Google Scholar 

  25. 25.

    Este A, Gringoli F, Salgarelli L (2009) On the stability of the information carried by traffic flow features at the packet level. SIGCOMM Comput Commun Rev 39(3):13

    Article  MATH  Google Scholar 

  26. 26.

    Pietrzyk M, Costeux JL, Urvoy-Keller G, En-Najjary T (2009) In: Proceedings of the 9th ACM SIGCOMM conference on internet measurement, IMC ’09, ACM, New York, pp 122–135

  27. 27.

    Lim YS, Kim HC, Jeong J, Kim CK, Kwon TT, Choi Y (2010)

  28. 28.

    Zander S, Armitage G (2011) In: 2011 IEEE 36th conference on local computer networks, pp 399–406

  29. 29.

    Amaral P, Dinis J, Pinto P, Bernardo L, Tavares J, Mamede HS (2016) In: 2016 IEEE 24th international conference on network protocols (ICNP), pp 1–5

  30. 30.

    Crotti M, Dusi M, Gringoli F, Salgarelli L (2007) Traffic classification through simple statistical fingerprinting. SIGCOMM Comput Commun Rev 37(1):5

    Article  Google Scholar 

  31. 31.

    Este A, Gringoli F, Salgarelli L (2009) Support vector machines for tcp traffic classification. Comput Netw 53(14):2476

    Article  MATH  Google Scholar 

  32. 32.

    Nguyen TTT, Armitage G, Branch P, Zander S (2012) Timely and continuous machine-learning-based classification for interactive ip traffic. IEEE/ACM Trans Netw 20(6):1880

    Article  Google Scholar 

  33. 33.

    Wang Y, Chen C, Xiang Y (2015) In: 2015 IEEE 40th conference on local computer networks (LCN), pp 506–509

  34. 34.

    Campos HF, Nobel AB, Smith FD, Jeffay K (2003) In: 35th symposium on the interface of computing science and statistics

  35. 35.

    McGregor A, Hall M, Lorier P, Brunskill J (2004) . In: Barakat C, Pratt I (eds) Passive and active network measurement. Springer, Berlin, pp 205–214

  36. 36.

    Zander S, Nguyen T, Armitage G (2005) In: The IEEE conference on local computer networks 30th anniversary (LCN’05)l, pp 250–257

  37. 37.

    Erman J, Mahanti A, Arlitt M (2006) In: IEEE Globecom 2006, pp 1–6

  38. 38.

    Bernaille L, Teixeira R, Akodkenou I, Soule A, Salamatian K (2006) Traffic classification on the fly. SIGCOMM Comput Commun Rev 36(2):23

    Article  Google Scholar 

  39. 39.

    Erman J, Arlitt M, Mahanti A (2006) In: Proceedings of the SIGCOMM workshop on mining network data, MineNet ’06, ACM, New York, pp 281–286

  40. 40.

    Wang Y, Xiang Y, Zhang J, Zhou W, Wei G, Yang LT (2014) Internet traffic classification using constrained clustering. IEEE Trans Parallel Distrib Syst 25(11):2932

    Article  Google Scholar 

  41. 41.

    Erman J, Mahanti A, Arlitt M, Cohen I, Williamson C (2007) In: Proceedings of the ACM SIGMETRICS international conference on measurement and modeling of computer systems, SIGMETRICS ’07, ACM, New York, pp 369–370

  42. 42.

    Li P, Li J, Huang Z, Gao CZ, Chen WB, Chen K (2017) Privacy-preserving outsourced classification in cloud computing. Cluster Computing. https://doi.org/10.1007/s10586-017-0849-9

  43. 43.

    Gao CZ, Cheng Q, Li X, Xia SB (2018) Cloud-assisted privacy-preserving profile-matching scheme under multiple keys in mobile social network. Cluster Computing. https://doi.org/10.1007/s10586-017-1649-y

  44. 44.

    Luo E, Liu Q, Abawajy JH, Wang G (2017) Privacy-preserving multi-hop profile-matching protocol for proximity mobile social networks. Futur Gener Comput Syst 68:222

    Article  Google Scholar 

  45. 45.

    Li P, Li J, Huang Z, Li T, Gao CZ, Yiu SM, Chen K (2017) Multi-key privacy-preserving deep learning in cloud computing. Futur Gener Comput Syst 74:76

    Article  Google Scholar 

  46. 46.

    Li J, Zhang Y, Chen X, Xiang Y (2018) Secure attribute-based data sharing for resource-limited users in cloud computing. Comput Secur 72:1

    Article  Google Scholar 

  47. 47.

    zhi Gao C, Cheng Q, He P, Susilo W, Li J (2018) Privacy-preserving naive bayes classifiers secure against the substitution-then-comparison attack. Inf Sci 444:72

    MathSciNet  Article  Google Scholar 

  48. 48.

    A day in the life of the internet (ditl). https://www.caida.org/projects/ditl/. Accessed: 2018-03-01

  49. 49.

    Tcp statistic and analysis tool. http://tstat.polito.it/. Accessed: 2018-03-01

  50. 50.

    Wireshark. https://www.wireshark.org/. Accessed: 2018-03-01

  51. 51.

    Libsvm – a library for support vector machines. https://www.csie.ntu.edu.tw/∼cjlin/libsvm/. Accessed: 2018-03-01

Download references

Acknowledgements

The work is supported by NSFC Project 61802080 and 61872102.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Yu Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Xue, H., Liu, Y. et al. Statistical network protocol identification with unknown pattern extraction. Ann. Telecommun. 74, 473–482 (2019). https://doi.org/10.1007/s12243-019-00704-y

Download citation

Keywords

  • Network security
  • Traffic classification
  • Machine learning
  • Constrained clustering