How Robust Can a Machine Learning Approach Be for Classifying Encrypted VoIP?

Alshammari, Riyad; Zincir-Heywood, A. Nur

doi:10.1007/s10922-014-9324-6

How Robust Can a Machine Learning Approach Be for Classifying Encrypted VoIP?

Published: 17 July 2014

Volume 23, pages 830–869, (2015)
Cite this article

Journal of Network and Systems Management Aims and scope Submit manuscript

Riyad Alshammari¹ &
A. Nur Zincir-Heywood²

680 Accesses
15 Citations
Explore all metrics

Abstract

The classification of encrypted network traffic represents an important issue for network management and security tasks including quality of service, firewall enforcement, and security. Traffic classification becomes more challenging since the traditional techniques, such as port numbers or Deep Packet Inspection, are ineffective against Peer-to-Peer Voice over Internet Protocol (VoIP) applications, which used non-standard ports and encryption. Moreover, traffic classification also represents a particularly challenging application domain for machine learning (ML). Solutions should ideally be both simple—therefore efficient to deploy—and accurate. Recent advances in ML provide the opportunity to decompose the original problem into a subset of classifiers with non-overlapping behaviors, in effect providing further insight into the problem domain and increasing the throughput of solutions. In this work, we investigate the robustness of an ML approach to classify encrypted traffic on not only different network traffic but also against evasion attacks. Our ML based approach only employs statistical network traffic flow features without using the Internet Protocol addresses, source/destination ports, and payload information to unveil encrypted VoIP applications in network traffic. What we mean by robust signatures is that the signatures learned by training on one network are still valid when they are applied to traffic coming from totally different locations, networks, time periods, and also against evasion attacks. The results on different network traces, as well as on the evasion of a Skype classifier, demonstrate that the performance of the signatures are very promising, which implies that the statistical information based on the network layer with the use of ML can achieve high classification accuracy and produce robust signatures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Traffic Identification in Big Internet Data

Machine Learning Based Classification Accuracy of Encrypted Service Channels: Analysis of Various Factors

Article 31 October 2020

A review on machine learning–based approaches for Internet traffic classification

Article 22 June 2020

Notes

Skype detection task and flow features.

References

IANA, Internet assigned numbers authority, http://www.iana.org/assignments/port-number (last Accessed Oct 2009)
Moore, A.W., Papagiannaki, K.: Toward the accurate identification of network applications. In: Passive and Active Network Measurement: Proceedings of the Passive & Active Measurement Workshop, pp. 41–54 (2005)
Madhukar, A., Williamson, C.: A longitudinal study of p2p traffic classification. In: Modeling, Analysis, and Simulation of Computer and Telecommunication Systems. MASCOTS 2006. 14th IEEE International Symposium on, pp. 179–188 (2006)
Sen, S., Spatscheck, O., Wang, D.: Accurate, scalable in-network identification of p2p traffic using application signatures. In: WWW ’04: Proceedings of the 13th International Conference on World Wide Web, pp. 512–521. ACM, New York, NY, USA (2004)
Erman, J., Arlitt, M., Mahanti, A.: Traffic classification using clustering algorithms. In: MineNet ’06: Proceedings of the 2006 SIGCOMM Workshop on Mining Network Data, pp. 281–286. ACM Press, New York, NY, USA (2006)
Karagiannis, T., Papagiannaki, K., Faloutsos, M.: BLINC: multilevel traffic classification in the dark. In: SIGCOMM ’05: Proceedings of the 2005 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 229–240. ACM Press, New York, NY, USA (2005)
Alshammari, R., Zincir-Heywood, A.N.: Can encrypted traffic be identified without port numbers, IP addresses and payload inspection? Comput. Netw. 55(6), 1326–1350 (2011)
Article MATH Google Scholar
Bernaille, L., Teixeira, R., Akodkenou, I., Soule, A., Salamatian, K.: Traffic classification on the fly. SIGCOMM Comput. Commun. Rev. 36(2), 23–26 (2006)
Article Google Scholar
Moore, A.W., Zuev, D.: Internet traffic classification using bayesian analysis techniques. In: SIGMETRICS ’05: Proceedings of the 2005 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp. 50–60. ACM Press, New York, NY, USA (2005)
Alshammari, R., Zincir-Heywood, A.N.: A flow based approach for ssh traffic detection. In: Proceedings of the IEEE International Conference on System, Man and Cybernetics—SMC’2007 (2007)
Alshammari, R., Zincir-Heywood, A.N.: Investigating two different approaches for encrypted traffic classification. In: PST ’08: Proceedings of the 2008 Sixth Annual Conference on Privacy, Security and Trust, pp. 156–166. IEEE Computer Society, Washington, DC, USA (2008)
Alshammari, R., Zincir-Heywood, N.: Generalization of signatures for ssh encrypted traffic identification. In: Computational Intelligence in Cyber Security. CICS ’09. IEEE Symposium on, pp. 167–174 (2009)
Early, J., Brodley, C., Rosenberg, C.: Behavioral authentication of server flows. In: Proceedings of the 19th Annual Computer Security Applications Conference, pp. 46–55 (2003)
Haffner, P., Sen, S., Spatscheck, O., Wang, D.: ACAS: automated construction of application signatures. In: MineNet ’05: Proceeding of the 2005 ACM SIGCOMM Workshop on Mining Network Data, pp. 197–202. ACM Press, New York, NY, USA (2005)
Montigny-Leboeuf, A.D.: Flow Attributes for Use in Traffic Characterization, CRC Technical Note No. CRC-TN-2005-003, Feb 2005.
Wright, C., Monrose, F., Masson, G.M.: HMM profiles for network traffic classification. In: VizSEC/DMSEC ’04: Proceedings of the 2004 ACM Workshop on Visualization and Data Mining for Computer Security, pp. 9–15. ACM Press, New York, NY, USA (2004)
Williams, N., Zander, S., Armitage, G.: A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification. SIGCOMM Comput. Commun. Rev. 36(5), 5–16 (2006)
Article Google Scholar
Pise, N., Kulkarni, P.: A survey of semi-supervised learning methods. In: Computational Intelligence and Security. CIS ’08. International Conference on, vol. 2, pp. 30–34 (2008)
Alshammari, R.: Automatically classifying encrypted network traffic: a case study of ssh. Mater thesis, Dalhousie University, NS, Canada, 133 pp. (2008)
Quinlan, J.: See5-comparison, http://www.rulequest.com/see5-comparison.html (last Accessed Feb 2011)
Callado, A., Kelner, J., Sadok, D.: Alberto Kamienski C, Fernandes S.: Better network traffic identification through the independent combination of techniques. J. Netw. Comput. Appl. 33(4), 433–446 (2010)
Article MATH Google Scholar
Nguyen, T., Armitage, G.: A survey of techniques for internet traffic classification using machine learning. Commun. Surv. Tutor. IEEE 10(4), 56–76 (2008)
Article Google Scholar
Callado, A., Kamienski, C., Szabo, G., Gero, B., Kelner, J., Fernandes, S., Sadok, D.: A survey on internet traffic identification. Commun. Surv. Tutor. IEEE 11(3), 37–52 (2009)
Article MATH Google Scholar
Bonfiglio, D., Mellia, M., Meo, M., Rossi, D., Tofanelli, P.: Revealing skype traffic: when randomness plays with you. SIGCOMM Comput. Commun. Rev. 37(4), 37–48 (2007)
Article Google Scholar
Freire, E., Ziviani, A., Salles, R.: Detecting skype flows in web traffic. In: Network Operations and Management Symposium. NOMS 2008, pp. 89–96. IEEE (2008)
Este, A., Gringoli, F., Salgarelli, L.: Support vector machines for TCP traffic classification. Comput. Netw. 53(14), 2476–2490 (2009)
Article MATH Google Scholar
Erman, J., Mahanti, A., Arlitt, M., Cohen, I., Williamson, C.: Offline/realtime traffic classification using semi-supervised learning. Perform. Eval. 64, 1194–1213 (2007)
Article Google Scholar
Bacquet, C., Gumus, K., Tizer, D., Zincir-Heywood, A., Heywood, M.I.: A comparison of unsupervised learning techniques for encrypted traffic identification. J. Inf. Assur. Secur. 5, 464–472 (2010)
MATH Google Scholar
Iliofotou, M., Kim, H.C., Faloutsos, M., Mitzenmacher, M., Pappu, P., Varghese, G.: Graption: a graph-based p2p traffic classification framework for the internet backbone. Comput. Netw. 55(8), 1909–1920 (2011)
Article Google Scholar
Park, J., Tyan, H.-R., Kuo, C.-C.: Ga-based internet traffic classification technique for QoS provisioning. In: Intelligent Information Hiding and Multimedia Signal Processing. IIH-MSP ’06. International Conference on, pp. 251–254 (2006)
Hu, Y., Chiu, D.-M., Lui, J.C.S.: Profiling and identification of p2p traffic. Comput. Netw. 53(6), 849–863 (2009)
Article MATH Google Scholar
Wright, C.V., Coull, S.E., Monrose, F.: Traffic morphing: an efficient defense against statistical traffic analysis. In: Proceedings of the Network and Distributed Security Symposium—NDSS ’09 (2009)
Wright, C.V., Ballard, L., Monrose, F., Masson, G.M.: Language identification of encrypted VoIP traffic: Alejandra y roberto or alice and bob? In: Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium, pp. 4:1–4:12. USENIX Association, Berkeley, CA, USA (2007)
Liberatore, M., Levine, B.N.: Inferring the source of encrypted http connections. In: Proceedings of the 13th ACM Conference on Computer and Communications Security, CCS ’06, pp. 255–263. ACM, New York, NY, USA (2006)
Skype, http://www.skype.com/useskype/
Baset, S.A., Schulzrinne, H.G.: An analysis of the skype peer-to-peer internet telephony protocol. In: INFOCOM 2006. 25th IEEE International Conference on Computer Communications. Proceedings, pp. 1–11 (2006)
Bonfiglio, D., Mellia, M., Meo, M., Ritacca, N., Rossi, D.: Tracking down skype traffic. In: INFOCOM 2008. The 27th Conference on Computer Communications, pp. 261–265. IEEE (2008)
De Cicco, L., Mascolo, S., Palmisano, V.: Skype video responsiveness to bandwidth variations. In: Proceedings of the 18th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV ’08), pp. 81–86. ACM, New York, NY, USA (2008)
Barbosa, R., Callado, A., Kamienski, C., Fernandes, S., Mariz, D., Sadok, D.: Performance evaluation of P2P VoIP application. In: Proceedings of the 17th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV ’07), IL, USA (2007)
IETF, http://www3.ietf.org/proceedings/97apr/97apr-final/xrtftr70.htm
NetMate, http://www.ip-measurement.org/tools/netmate/
Arndt, D.: How to calculating flow statistics using netmate, http://dan.arndt.ca/nims/calculating-flow-statistics-using-netmate/ (last Accessed Sept 2011)
Quinlan, J.: see5-info, http://www.rulequest.com/see5-info.html (last Accessed July 2010)
Alpaydin, E.: Introduction to Machine Learning. MIT Press, Cambridge, MA (2004)
MATH Google Scholar
Lichodzijewski, P., Heywood, M.I.: Managing team-based problem solving with symbiotic bid-based genetic programming. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 363–370 (2008)
de Jong, E.: A monotonic archive for pareto-coevolution. Evol. Comput. 15(1), 61–93 (2007)
Article MATH Google Scholar
T. U. of Waikato, WEKA software, http://www.cs.waikato.ac.nz/ml/weka/
SBB-GP, Symbiotic bid-based (sbb) paradigm, http://www.cs.dal.ca/mheywood/Code/SBB/SCM.9.r20081212.tar.gz (last Accessed March 2008)
PacketShaper, http://www.packeteer.com/products/p-acketshaper/ (last Accessed March 2008). CalladoBetter2010
Traces, S.: Telecommunication networks group—politecnico ditorino, http://tstat.tlc.polito.it/traces-skype.shtml (last Accessed August 2009)
Alshamamri, R.: Downloading the NIMS data sets, http://web.cs.dal.ca/riyad/Site/Download.html (last Accessed Sept 2011)
Wireshark, http://www.wireshark.org/ (last Accessed Sept 2008)
Peeker, N.: Netpeeker, http://www.net-peekerCalladoBetter2010.com (last Accessed Oct 2009)
Signalogic, Speech codec wav samples, http://www.signalogic.com/index.pl?page=codec_samples (last Accessed Oct 2009)
Zimmermann, P.: The Zfone project, http://zfoneproject.com/ (last Accessed Oct 2009)
Zimmermann, E.P., Johnston, A., Callas, J.: Zrtp: media path key agreement for secure rtp, http://tools.ietf.org/html/draft-zimmermann-avt-zrtp-17 (2010)
P. T. C. Inc, Primus softphone client, http://www.primus.ca/en/residential/talkbroadband/talkBroadband-softphone.htm (last Accessed Oct 2009)
ETSI, Digital cellular telecommunications system (phase 2+), general packet radio service (gprs), overall description of the gprs radio interface, stage 2 (gsm 03.64, version 7.0.0, release 1999)
BirdsSoft, Vpn-x, http://birdssoft.com/ (last Accessed March 2011)
Kent, S., Atkinson, R.: Security architecture for the internet protocol, http://www.ietf.org/rfc/rfc2401.txt (1998)
MAWI, Mawi working group traffic archive, http://tracer.csl.sony.co.jp/mawi/
Fink, R., Hinden, R.: 6bone (IPv6 testing address allocation), http://tools.ietf.org/html/rfc3701 (2004)
Ehlert, S., Petgang, S., Magedanz, T., Sisalem, D.: Analysis and signature of Skype VoIP session traffic. In: CIIT 2006: 4th IASTED International Conference on Communications, Internet, and Information Technology, pp. 83–89 (2006)
Skype, Skype garage, http://blogs.skype.com/garage/windows/ (last Accessed Sept 2011)
Valin, J.-M., Montgomery, C.: Improved noise weighting in CELP coding of speech—applying the Vorbis psychoacoustic model to speex, http://www.speex.org (2006)
N. software, Switch audio converter for mac, http://www.nch.com.au/switch/index.html (last Accessed May 2011)
OpenDPI, the open source version of ipoque’s dpi engine, http://www.opendpi.org/ (last Accessed April 2011)

Download references

Acknowledgments

This research is supported by a Natural Science and Engineering Research Council of Canada (NSERC) grant. This work is conducted as part of the Dalhousie NIMS Lab at http://projects.cs.dal.ca/projectx/.

Author information

Authors and Affiliations

College of Public Health and Health Informatics, King Saud Bin Abdulaziz University for Health Sciences, P.O. Box 22490, Riyadh, 11426, Kingdom of Saudi Arabia
Riyad Alshammari
Faculty of Computer Science, Dalhousie University, Halifax, NS, B3H 1W5, Canada
A. Nur Zincir-Heywood

Authors

Riyad Alshammari
View author publications
You can also search for this author in PubMed Google Scholar
A. Nur Zincir-Heywood
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Riyad Alshammari.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alshammari, R., Zincir-Heywood, A.N. How Robust Can a Machine Learning Approach Be for Classifying Encrypted VoIP?. J Netw Syst Manage 23, 830–869 (2015). https://doi.org/10.1007/s10922-014-9324-6

Download citation

Received: 06 May 2013
Revised: 02 June 2014
Accepted: 04 July 2014
Published: 17 July 2014
Issue Date: October 2015
DOI: https://doi.org/10.1007/s10922-014-9324-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How Robust Can a Machine Learning Approach Be for Classifying Encrypted VoIP?

Abstract

Access this article

Similar content being viewed by others

Traffic Identification in Big Internet Data

Machine Learning Based Classification Accuracy of Encrypted Service Channels: Analysis of Various Factors

A review on machine learning–based approaches for Internet traffic classification

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

How Robust Can a Machine Learning Approach Be for Classifying Encrypted VoIP?

Abstract

Access this article

Similar content being viewed by others

Traffic Identification in Big Internet Data

Machine Learning Based Classification Accuracy of Encrypted Service Channels: Analysis of Various Factors

A review on machine learning–based approaches for Internet traffic classification

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation