Skip to main content
Log in

How Robust Can a Machine Learning Approach Be for Classifying Encrypted VoIP?

  • Published:
Journal of Network and Systems Management Aims and scope Submit manuscript

Abstract

The classification of encrypted network traffic represents an important issue for network management and security tasks including quality of service, firewall enforcement, and security. Traffic classification becomes more challenging since the traditional techniques, such as port numbers or Deep Packet Inspection, are ineffective against Peer-to-Peer Voice over Internet Protocol (VoIP) applications, which used non-standard ports and encryption. Moreover, traffic classification also represents a particularly challenging application domain for machine learning (ML). Solutions should ideally be both simple—therefore efficient to deploy—and accurate. Recent advances in ML provide the opportunity to decompose the original problem into a subset of classifiers with non-overlapping behaviors, in effect providing further insight into the problem domain and increasing the throughput of solutions. In this work, we investigate the robustness of an ML approach to classify encrypted traffic on not only different network traffic but also against evasion attacks. Our ML based approach only employs statistical network traffic flow features without using the Internet Protocol addresses, source/destination ports, and payload information to unveil encrypted VoIP applications in network traffic. What we mean by robust signatures is that the signatures learned by training on one network are still valid when they are applied to traffic coming from totally different locations, networks, time periods, and also against evasion attacks. The results on different network traces, as well as on the evasion of a Skype classifier, demonstrate that the performance of the signatures are very promising, which implies that the statistical information based on the network layer with the use of ML can achieve high classification accuracy and produce robust signatures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. Skype detection task and flow features.

References

  1. IANA, Internet assigned numbers authority, http://www.iana.org/assignments/port-number (last Accessed Oct 2009)

  2. Moore, A.W., Papagiannaki, K.: Toward the accurate identification of network applications. In: Passive and Active Network Measurement: Proceedings of the Passive & Active Measurement Workshop, pp. 41–54 (2005)

  3. Madhukar, A., Williamson, C.: A longitudinal study of p2p traffic classification. In: Modeling, Analysis, and Simulation of Computer and Telecommunication Systems. MASCOTS 2006. 14th IEEE International Symposium on, pp. 179–188 (2006)

  4. Sen, S., Spatscheck, O., Wang, D.: Accurate, scalable in-network identification of p2p traffic using application signatures. In: WWW ’04: Proceedings of the 13th International Conference on World Wide Web, pp. 512–521. ACM, New York, NY, USA (2004)

  5. Erman, J., Arlitt, M., Mahanti, A.: Traffic classification using clustering algorithms. In: MineNet ’06: Proceedings of the 2006 SIGCOMM Workshop on Mining Network Data, pp. 281–286. ACM Press, New York, NY, USA (2006)

  6. Karagiannis, T., Papagiannaki, K., Faloutsos, M.: BLINC: multilevel traffic classification in the dark. In: SIGCOMM ’05: Proceedings of the 2005 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 229–240. ACM Press, New York, NY, USA (2005)

  7. Alshammari, R., Zincir-Heywood, A.N.: Can encrypted traffic be identified without port numbers, IP addresses and payload inspection? Comput. Netw. 55(6), 1326–1350 (2011)

    Article  MATH  Google Scholar 

  8. Bernaille, L., Teixeira, R., Akodkenou, I., Soule, A., Salamatian, K.: Traffic classification on the fly. SIGCOMM Comput. Commun. Rev. 36(2), 23–26 (2006)

    Article  Google Scholar 

  9. Moore, A.W., Zuev, D.: Internet traffic classification using bayesian analysis techniques. In: SIGMETRICS ’05: Proceedings of the 2005 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp. 50–60. ACM Press, New York, NY, USA (2005)

  10. Alshammari, R., Zincir-Heywood, A.N.: A flow based approach for ssh traffic detection. In: Proceedings of the IEEE International Conference on System, Man and Cybernetics—SMC’2007 (2007)

  11. Alshammari, R., Zincir-Heywood, A.N.: Investigating two different approaches for encrypted traffic classification. In: PST ’08: Proceedings of the 2008 Sixth Annual Conference on Privacy, Security and Trust, pp. 156–166. IEEE Computer Society, Washington, DC, USA (2008)

  12. Alshammari, R., Zincir-Heywood, N.: Generalization of signatures for ssh encrypted traffic identification. In: Computational Intelligence in Cyber Security. CICS ’09. IEEE Symposium on, pp. 167–174 (2009)

  13. Early, J., Brodley, C., Rosenberg, C.: Behavioral authentication of server flows. In: Proceedings of the 19th Annual Computer Security Applications Conference, pp. 46–55 (2003)

  14. Haffner, P., Sen, S., Spatscheck, O., Wang, D.: ACAS: automated construction of application signatures. In: MineNet ’05: Proceeding of the 2005 ACM SIGCOMM Workshop on Mining Network Data, pp. 197–202. ACM Press, New York, NY, USA (2005)

  15. Montigny-Leboeuf, A.D.: Flow Attributes for Use in Traffic Characterization, CRC Technical Note No. CRC-TN-2005-003, Feb 2005.

  16. Wright, C., Monrose, F., Masson, G.M.: HMM profiles for network traffic classification. In: VizSEC/DMSEC ’04: Proceedings of the 2004 ACM Workshop on Visualization and Data Mining for Computer Security, pp. 9–15. ACM Press, New York, NY, USA (2004)

  17. Williams, N., Zander, S., Armitage, G.: A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification. SIGCOMM Comput. Commun. Rev. 36(5), 5–16 (2006)

    Article  Google Scholar 

  18. Pise, N., Kulkarni, P.: A survey of semi-supervised learning methods. In: Computational Intelligence and Security. CIS ’08. International Conference on, vol. 2, pp. 30–34 (2008)

  19. Alshammari, R.: Automatically classifying encrypted network traffic: a case study of ssh. Mater thesis, Dalhousie University, NS, Canada, 133 pp. (2008)

  20. Quinlan, J.: See5-comparison, http://www.rulequest.com/see5-comparison.html (last Accessed Feb 2011)

  21. Callado, A., Kelner, J., Sadok, D.: Alberto Kamienski C, Fernandes S.: Better network traffic identification through the independent combination of techniques. J. Netw. Comput. Appl. 33(4), 433–446 (2010)

    Article  MATH  Google Scholar 

  22. Nguyen, T., Armitage, G.: A survey of techniques for internet traffic classification using machine learning. Commun. Surv. Tutor. IEEE 10(4), 56–76 (2008)

    Article  Google Scholar 

  23. Callado, A., Kamienski, C., Szabo, G., Gero, B., Kelner, J., Fernandes, S., Sadok, D.: A survey on internet traffic identification. Commun. Surv. Tutor. IEEE 11(3), 37–52 (2009)

    Article  MATH  Google Scholar 

  24. Bonfiglio, D., Mellia, M., Meo, M., Rossi, D., Tofanelli, P.: Revealing skype traffic: when randomness plays with you. SIGCOMM Comput. Commun. Rev. 37(4), 37–48 (2007)

    Article  Google Scholar 

  25. Freire, E., Ziviani, A., Salles, R.: Detecting skype flows in web traffic. In: Network Operations and Management Symposium. NOMS 2008, pp. 89–96. IEEE (2008)

  26. Este, A., Gringoli, F., Salgarelli, L.: Support vector machines for TCP traffic classification. Comput. Netw. 53(14), 2476–2490 (2009)

    Article  MATH  Google Scholar 

  27. Erman, J., Mahanti, A., Arlitt, M., Cohen, I., Williamson, C.: Offline/realtime traffic classification using semi-supervised learning. Perform. Eval. 64, 1194–1213 (2007)

    Article  Google Scholar 

  28. Bacquet, C., Gumus, K., Tizer, D., Zincir-Heywood, A., Heywood, M.I.: A comparison of unsupervised learning techniques for encrypted traffic identification. J. Inf. Assur. Secur. 5, 464–472 (2010)

    MATH  Google Scholar 

  29. Iliofotou, M., Kim, H.C., Faloutsos, M., Mitzenmacher, M., Pappu, P., Varghese, G.: Graption: a graph-based p2p traffic classification framework for the internet backbone. Comput. Netw. 55(8), 1909–1920 (2011)

    Article  Google Scholar 

  30. Park, J., Tyan, H.-R., Kuo, C.-C.: Ga-based internet traffic classification technique for QoS provisioning. In: Intelligent Information Hiding and Multimedia Signal Processing. IIH-MSP ’06. International Conference on, pp. 251–254 (2006)

  31. Hu, Y., Chiu, D.-M., Lui, J.C.S.: Profiling and identification of p2p traffic. Comput. Netw. 53(6), 849–863 (2009)

    Article  MATH  Google Scholar 

  32. Wright, C.V., Coull, S.E., Monrose, F.: Traffic morphing: an efficient defense against statistical traffic analysis. In: Proceedings of the Network and Distributed Security Symposium—NDSS ’09 (2009)

  33. Wright, C.V., Ballard, L., Monrose, F., Masson, G.M.: Language identification of encrypted VoIP traffic: Alejandra y roberto or alice and bob? In: Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium, pp. 4:1–4:12. USENIX Association, Berkeley, CA, USA (2007)

  34. Liberatore, M., Levine, B.N.: Inferring the source of encrypted http connections. In: Proceedings of the 13th ACM Conference on Computer and Communications Security, CCS ’06, pp. 255–263. ACM, New York, NY, USA (2006)

  35. Skype, http://www.skype.com/useskype/

  36. Baset, S.A., Schulzrinne, H.G.: An analysis of the skype peer-to-peer internet telephony protocol. In: INFOCOM 2006. 25th IEEE International Conference on Computer Communications. Proceedings, pp. 1–11 (2006)

  37. Bonfiglio, D., Mellia, M., Meo, M., Ritacca, N., Rossi, D.: Tracking down skype traffic. In: INFOCOM 2008. The 27th Conference on Computer Communications, pp. 261–265. IEEE (2008)

  38. De Cicco, L., Mascolo, S., Palmisano, V.: Skype video responsiveness to bandwidth variations. In: Proceedings of the 18th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV ’08), pp. 81–86. ACM, New York, NY, USA (2008)

  39. Barbosa, R., Callado, A., Kamienski, C., Fernandes, S., Mariz, D., Sadok, D.: Performance evaluation of P2P VoIP application. In: Proceedings of the 17th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV ’07), IL, USA (2007)

  40. IETF, http://www3.ietf.org/proceedings/97apr/97apr-final/xrtftr70.htm

  41. NetMate, http://www.ip-measurement.org/tools/netmate/

  42. Arndt, D.: How to calculating flow statistics using netmate, http://dan.arndt.ca/nims/calculating-flow-statistics-using-netmate/ (last Accessed Sept 2011)

  43. Quinlan, J.: see5-info, http://www.rulequest.com/see5-info.html (last Accessed July 2010)

  44. Alpaydin, E.: Introduction to Machine Learning. MIT Press, Cambridge, MA (2004)

    MATH  Google Scholar 

  45. Lichodzijewski, P., Heywood, M.I.: Managing team-based problem solving with symbiotic bid-based genetic programming. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 363–370 (2008)

  46. de Jong, E.: A monotonic archive for pareto-coevolution. Evol. Comput. 15(1), 61–93 (2007)

    Article  MATH  Google Scholar 

  47. T. U. of Waikato, WEKA software, http://www.cs.waikato.ac.nz/ml/weka/

  48. SBB-GP, Symbiotic bid-based (sbb) paradigm, http://www.cs.dal.ca/mheywood/Code/SBB/SCM.9.r20081212.tar.gz (last Accessed March 2008)

  49. PacketShaper, http://www.packeteer.com/products/p-acketshaper/ (last Accessed March 2008). CalladoBetter2010

  50. Traces, S.: Telecommunication networks group—politecnico ditorino, http://tstat.tlc.polito.it/traces-skype.shtml (last Accessed August 2009)

  51. Alshamamri, R.: Downloading the NIMS data sets, http://web.cs.dal.ca/riyad/Site/Download.html (last Accessed Sept 2011)

  52. Wireshark, http://www.wireshark.org/ (last Accessed Sept 2008)

  53. Peeker, N.: Netpeeker, http://www.net-peekerCalladoBetter2010.com (last Accessed Oct 2009)

  54. Signalogic, Speech codec wav samples, http://www.signalogic.com/index.pl?page=codec_samples (last Accessed Oct 2009)

  55. Zimmermann, P.: The Zfone project, http://zfoneproject.com/ (last Accessed Oct 2009)

  56. Zimmermann, E.P., Johnston, A., Callas, J.: Zrtp: media path key agreement for secure rtp, http://tools.ietf.org/html/draft-zimmermann-avt-zrtp-17 (2010)

  57. P. T. C. Inc, Primus softphone client, http://www.primus.ca/en/residential/talkbroadband/talkBroadband-softphone.htm (last Accessed Oct 2009)

  58. ETSI, Digital cellular telecommunications system (phase 2+), general packet radio service (gprs), overall description of the gprs radio interface, stage 2 (gsm 03.64, version 7.0.0, release 1999)

  59. BirdsSoft, Vpn-x, http://birdssoft.com/ (last Accessed March 2011)

  60. Kent, S., Atkinson, R.: Security architecture for the internet protocol, http://www.ietf.org/rfc/rfc2401.txt (1998)

  61. MAWI, Mawi working group traffic archive, http://tracer.csl.sony.co.jp/mawi/

  62. Fink, R., Hinden, R.: 6bone (IPv6 testing address allocation), http://tools.ietf.org/html/rfc3701 (2004)

  63. Ehlert, S., Petgang, S., Magedanz, T., Sisalem, D.: Analysis and signature of Skype VoIP session traffic. In: CIIT 2006: 4th IASTED International Conference on Communications, Internet, and Information Technology, pp. 83–89 (2006)

  64. Skype, Skype garage, http://blogs.skype.com/garage/windows/ (last Accessed Sept 2011)

  65. Valin, J.-M., Montgomery, C.: Improved noise weighting in CELP coding of speech—applying the Vorbis psychoacoustic model to speex, http://www.speex.org (2006)

  66. N. software, Switch audio converter for mac, http://www.nch.com.au/switch/index.html (last Accessed May 2011)

  67. OpenDPI, the open source version of ipoque’s dpi engine, http://www.opendpi.org/ (last Accessed April 2011)

Download references

Acknowledgments

This research is supported by a Natural Science and Engineering Research Council of Canada (NSERC) grant. This work is conducted as part of the Dalhousie NIMS Lab at http://projects.cs.dal.ca/projectx/.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Riyad Alshammari.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alshammari, R., Zincir-Heywood, A.N. How Robust Can a Machine Learning Approach Be for Classifying Encrypted VoIP?. J Netw Syst Manage 23, 830–869 (2015). https://doi.org/10.1007/s10922-014-9324-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10922-014-9324-6

Keywords

Navigation