Skip to main content

k-NN Classification of Malware in HTTPS Traffic Using the Metric Space Approach

Part of the Lecture Notes in Computer Science book series (LNSC,volume 9650)

Abstract

In this paper, we present detection of malware in HTTPS traffic using k-NN classification. We focus on the metric space approach for approximate k-NN searches over dataset of sparse high-dimensional descriptors of network traffic. We show the classification based on approximate k-NN search using metric index exhibits false positive rate reduced by an order of magnitude when compared to the state of the art method, while keeping the classification fast enough.

Keywords

  • Similarity search
  • k-NN classification
  • Intrusion detection

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-31863-9_10
  • Chapter length: 15 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   54.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-31863-9
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   69.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.

Notes

  1. 1.

    In our case, the descriptors are high-dimensional sparse vectors representing network traffic and the distance function is the Euclidean distance.

  2. 2.

    The exact cannot be published due to non-disclosure agreements.

  3. 3.

    Specifically, the hash was considered to be malicious if the corresponding process was detected by at least 20 anti-viruses used by virustotal.com service.

  4. 4.

    virustotal.com.

  5. 5.

    \(r_{\mathrm {up}}\) is the number of bytes sent from the client to the server, \(r_{\mathrm {down}}\) is the number of bytes received by the client from the server, \(r_{\mathrm {td}}\) is the duration of the connection (in milliseconds), and \(r_{\mathrm {ti}}\) is the time in seconds elapsed between start of the current and previous request of the same client.

  6. 6.

    The experiments have run on 64-bit Windows Server 2008 R2 Standard with Intel Xeon CPU X5660, 2.8 GHz, 12 cores supporting hyper-threading. The training of the ECM classifier has run on a virtual machine (VMWare) using 8 cores CPU 2.2 GHz and 132 GB RAM. Matlab library MinFunc has been used.

  7. 7.

    For a given query, the approximation error is computed as a normed overlap distance between the query result returned by approximate k-NN search and the correct result returned by exact k-NN search.

References

  1. Brin, S.: Near neighbor search in large metric spaces. In: Proceedings of 21th International Conference on Very Large Data Bases, VLDB 1995, 11–15 September 1995, Zurich, Switzerland, pp. 574–584 (1995). http://www.vldb.org/conf/1995/P574.PDF

  2. Chaudhuri, K., Dasgupta, S.: Rates of convergence for nearest neighbor classification. In: Advances in Neural Information Processing Systems (2014)

    Google Scholar 

  3. Chávez, E., Navarro, G.: A compact space decomposition for effective metric indexing. Pattern Recogn. Lett. 26(9), 1363–1376 (2005). http://dx.doi.org/10.1016/j.patrec.2004.11.014

    CrossRef  Google Scholar 

  4. Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)

    CrossRef  Google Scholar 

  5. Ciaccia, P., Patella, M., Zezula, P.: M-tree: an efficient access method for similarity search in metric spaces. In: VLDB 1997, pp. 426–435 (1997)

    Google Scholar 

  6. Cisco: Cisco IOS NetFlow. http://www.cisco.com/c/en/us/products/ios-nx-os-software/ios-netflow/index.html

  7. Cisco: Cloud Web Security (CWS). http://www.cisco.com/c/en/us/products/security/cloud-web-security/index.html

  8. Claise, B., Trammell, B., Aitken, P.: Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of Flow Information (2013). https://tools.ietf.org/html/rfc7011

  9. Crotti, M., Dusi, M., Gringoli, F., Salgarelli, L.: Traffic classification through simple statistical fingerprinting. SIGCOMM Comput. Commun. Rev. 37, 5–16 (2007)

    CrossRef  Google Scholar 

  10. Dusi, M., Crotti, M., Gringoli, F., Salgarelli, L.: Tunnel hunter: detecting application-layer tunnels with statistical fingerprinting. Comput. Netw. 53, 81–97 (2009)

    CrossRef  Google Scholar 

  11. Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., Berners-Lee, T.: Hypertext Transfer Protocol – HTTP/1.1. https://tools.ietf.org/html/rfc2616

  12. Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of the 25th International Conference on Very Large Data Bases, VLDB 1999, pp. 518–529. Morgan Kaufmann Publishers Inc., San Francisco (1999). http://dl.acm.org/citation.cfm?id=645925.671516

  13. Kohout, J., Pevny, T.: Automatic discovery of web servers hosting similar applications. In: 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM) (2015)

    Google Scholar 

  14. Kohout, J., Pevny, T.: Unsupervised detection of malware in persistent web traffic. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2015)

    Google Scholar 

  15. van der Maaten, L., Hinton, G.E.: Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)

    MATH  Google Scholar 

  16. Nelms, T., Perdisci, R., Ahamad, M.: Execscent: mining for new c&c domains in live networks with adaptive control protocol templates. In: Proceedings of the 22nd USENIX Conference on Security (2013)

    Google Scholar 

  17. Novak, D., Batko, M., Zezula, P.: Metric index: an efficient and scalable solution for precise and approximate similarity search. Inf. Syst. 36(4), 721–733 (2011)

    CrossRef  Google Scholar 

  18. Novak, D., Kyselak, M., Zezula, P.: On locality-sensitive indexing in generic metric spaces. In: Proceedings of the Third International Conference on SImilarity Search and APplications, SISAP 2010, pp. 59–66. ACM, New York (2010). http://doi.acm.org/10.1145/1862344.1862354

  19. Perdisci, R., Ariu, D., Giacinto, G.: Scalable fine-grained behavioral clustering of HTTP-based malware. Comput. Netw. 57, 487–500 (2013)

    CrossRef  Google Scholar 

  20. Perdisci, R., Lee, W., Feamster, N.: Behavioral clustering of HTTP-based malware and signature generation using malicious network traces. In: Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation (2010)

    Google Scholar 

  21. Pevny, T., Ker, A.D.: Towards dependable steganalysis. In: IS&T/SPIE Electronic Imaging (2015)

    Google Scholar 

  22. Wright, C., Monrose, F., Masson, G.M.: On inferring application protocol behaviors in encrypted network traffic. J. Mach. Learn. Res. 7, 2745–2769 (2006)

    MathSciNet  MATH  Google Scholar 

  23. Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach. Springer, New York (2005)

    MATH  Google Scholar 

Download references

Acknowledgments

This research has been supported by Czech Science Foundation project (GAČR) 15-08916S.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Přemysl Čech .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Lokoč, J., Kohout, J., Čech, P., Skopal, T., Pevný, T. (2016). k-NN Classification of Malware in HTTPS Traffic Using the Metric Space Approach. In: Chau, M., Wang, G., Chen, H. (eds) Intelligence and Security Informatics. PAISI 2016. Lecture Notes in Computer Science(), vol 9650. Springer, Cham. https://doi.org/10.1007/978-3-319-31863-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31863-9_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31862-2

  • Online ISBN: 978-3-319-31863-9

  • eBook Packages: Computer ScienceComputer Science (R0)