Advertisement

An Entropy Based Encrypted Traffic Classifier

  • Mohammad Saiful Islam MamunEmail author
  • Ali A. Ghorbani
  • Natalia Stakhanova
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9543)

Abstract

This paper proposes an approach of encrypted network traffic classification based on entropy calculation and machine learning technique. Apart from using ordinary Shannon’s entropy, we examine entropy after encoding and a weighted average of Shannon binary entropy called BiEntropy. The objective of this paper is to identify any application flows as part of encrypted traffic. To achieve this we (i) calculate entropy-based features from the packet payload: encoded payload or binary payload, n-length word of the payload, (ii) employ a Genetic-search feature selection algorithm on the extracted features where fitness function is calculated from True Positive Rate, False Positive Rate and number of selected features, and (iii) propose a data driven supervised machine learning model from Support Vector Machine (SVM) for automatic identification of encrypted traffic. To the best of our knowledge, this is the first attempt to tackle the problem of classifying encrypted traffic using extensive entropy-based features and machine learning techniques.

Keywords

Traffic classification Entropy Encoding 

Notes

Acknowledgements

This work was funded by Atlantic Canada Opportunity Agency (ACOA) through the Atlantic Innovation Fund (AIF) in cooperation with IBM Security division.

References

  1. 1.
    Callado, A., et al.: A Survey on Internet Traffic Identification. IEEE Commun. Surveys Tutorials, 11(3), 37–52 (2009)Google Scholar
  2. 2.
    Alshammari, R., Nur Zincir-Heywood, A.: Can encrypted traffic be identified without port numbers. Computer networks 55(6), 1326–1350 (2011)CrossRefGoogle Scholar
  3. 3.
    Alshammari, R., Nur Zincir-Heywood, A.: Investigating two different approaches for encrypted traffic classification. Privacy, Security and Trust (2008)Google Scholar
  4. 4.
    Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Marsaglia, G., Zaman, A.: Monkey tests for random number generators. Comput. Math. Appl. 26(9), 1–10 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    jNetPcap, Open-source java library. http://jnetpcap.com
  7. 7.
    Datasets: Information Security Center of eXcellence (ISCX). www.unb.ca/research/iscx/dataset/
  8. 8.
    Tstat, Skype Testbed Traces. http://tstat.tlc.polito.it/traces-skype.shtml
  9. 9.
    Wireshark sample captures. http://wiki.wireshark.org/SampleCaptures
  10. 10.
    Schneider, P.: TCP/IP traffic Classification Based on port numbers. Division Of Applied Sciences, Cambridge, 2138 (1996)Google Scholar
  11. 11.
    Karagiannis, T., Papagiannaki, K., Faloutsos, M.: BLINK: multilevel traffic classification in the dark. In: SIGCOMM , Philadelphia, 21–26 August 2005Google Scholar
  12. 12.
    Moore, A.W., Zuev, D.: Internet traffic classification using bayesian analysis techniques. In: SIGMETRIC, Banff, 6–10 June 2005Google Scholar
  13. 13.
    Sen, S., Spatscheck, O., Wang, D.: Accurate, scalable in-network identification of P2P traffic using application signatures. In: WWW2005, USA (2004)Google Scholar
  14. 14.
    Zander, S., Nguyen, T., Armitage, G.: Automated traffic classification and application identification using machine learning. In: LCN, Australia (2005)Google Scholar
  15. 15.
    Gomes, J.V., et al.: Analysis of peer-to-peer traffic using a behavioural method based on entropy. In: Performance, Computing and Communications Conference (2008)Google Scholar
  16. 16.
    Bonfiglio, D., et al.: Revealing skype traffic: when randomness plays with you. In: Proceedings of the ACM SIGCOMM, pp. 37–48. ACM Press, USA (2007)Google Scholar
  17. 17.
    Smith, R., et al.: Deflating the big bang: fast and scalable deep packet inspection. In: ACM SIGCOMM , pp. 207–218. ACM Press, USA (2008)Google Scholar
  18. 18.
    Zhang, H., Papadopoulos, C., Massey, D.: Detecting encrypted botnet traffic. In: Computer Communications Workshops (INFOCOM Workshop). IEEE (2013)Google Scholar
  19. 19.
    Dorfinger, P., et al.: Entropy-based traffic filtering to support real-time Skype detection. In: Proceedings of the 6th International Wireless Communications and Mobile Computing Conference. ACM (2010)Google Scholar
  20. 20.
    Korczynski, M., Duda, A.: Markov chain fingerprinting to classify encrypted traffic. In: INFOCOM, Proceedings IEEE. IEEE (2014)Google Scholar
  21. 21.
    Sun, Q., et al.: Statistical identification of encrypted web browsing traffic. In: IEEE Symposium on Security and Privacy, Proceedings. IEEE (2002)Google Scholar
  22. 22.
    Weber, M., et al.: A toolkit for detecting and analyzing malicious software. In: 18th Annual Proceedings of Computer Security Applications Conference. IEEE (2002)Google Scholar
  23. 23.
    Lyda, R., Hamrock, J.: Using entropy analysis to find encrypted and packed malware. IEEE Secur. Privacy 2, 40–45 (2007)CrossRefGoogle Scholar
  24. 24.
    Olivain, J., Goubault-Larrecq, J.: Detecting subverted cryptographic protocols by entropy checking. Laboratoire Specification et Verification, ENS Cachan, France, Research Report LSV-06-13 (2006)Google Scholar
  25. 25.
    Wagner, A., Plattner, B.: Entropy based worm and anomaly detection in fast IP networks. In: WETICE 2005 Proceedings of the 14th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprise, pp. 172–177. IEEE Computer Society, Washington, DC (2005)Google Scholar
  26. 26.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Francisco (1993). ISBN=1-55860-238-0Google Scholar
  27. 27.
    Hall, M., et al.: The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)CrossRefGoogle Scholar
  28. 28.
    RStudio, an integrated development environment (IDE)for R. http://www.rstudio.com
  29. 29.
    Sicker, D.C., Ohm, P., Grunwald, D.: Legal issues surrounding monitoring during network research, In: Proceeding 7th ACM SIGCOMM conference on Internet measurement, ser. IMC 2007, pp. 141–148. ACM, New York (2007)Google Scholar
  30. 30.
    Chung, J.Y., Park, B., Won, Y.J., Strassner, J., Hong, J.W.: Traffic classification based on flow similarity. In: Nunzi, G., Scoglio, C., Li, X. (eds.) IPOM 2009. LNCS, vol. 5843, pp. 65–77. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  31. 31.
    Keralapura, R., Nucci, A., Chuah, C.-N.: Self-learning peer-to-peer traffic classifier. In: Proceedings of 18th Internatonal Conference on Computer Communications and Networks, ICCCN. IEEE (2009)Google Scholar
  32. 32.
    TCPDUMP packet analizer. http://www.tcpdump.org
  33. 33.
    Croll, G.J.: BiEntropy-The Approximate Entropy of a Finite Binary String (2013). arXiv preprint arXiv:1305.0954
  34. 34.
    Zhao, M., et al.: Feature selection and parameter optimization for support vector machines: a new approach based on genetic algorithm with feature chromosomes. Expert Syst. Appl. 38(5), 5197–5204 (2011)CrossRefGoogle Scholar
  35. 35.
  36. 36.
    Erman, J., Arlitt, M., Mahanti, A.: Traffic classification using clustering algorithms. In: Proceedings of the SIGCOMM workshop on Mining network data. ACM (2006)Google Scholar
  37. 37.
    Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)zbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Mohammad Saiful Islam Mamun
    • 1
    Email author
  • Ali A. Ghorbani
    • 1
  • Natalia Stakhanova
    • 1
  1. 1.Information Security Centre of Excellence (ISCX)University of New BrunswickFrederictonCanada

Personalised recommendations