Reading Network Packets as a Natural Language for Intrusion Detection

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10779)

Abstract

Detecting unknown malicious traffic is a challenging task. There are many behavior-based detection methods which use the characteristic of drive-by-download attacks or C&C traffic. However, many previous methods specialize the attack techniques. Thus, the adaptability is restricted. Moreover, they need to decide the feature vectors every attack method. This paper proposes a generic detection method which does not depend on attack methods and does not need devising feature vectors. This method reads network packets as a natural language with Paragraph Vector an unsupervised algorithm, and learns the feature automatically to detect malicious traffic. This paper conducts timeline analysis and cross-dataset validation with the multiple datasets which contain captured traffic from Exploit Kit (EK). The best F-measure achieves 0.98 in the timeline analysis and 0.97 on the other dataset. Finally, the result shows that using Paragraph Vector is effective on unseen traffic in a linguistic approach.

Keywords

Drive by download C&C Neural network Bag of Words Word2vec Paragraph Vector Doc2vec Support Vector Machine 

Notes

Acknowledgment

This work was supported by JSPS KAKENHI Grant Number 17K06455.

References

  1. 1.
    Malware-Traffic-Analysis.net. http://www.malware-traffic-analysis.net/
  2. 2.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
  3. 3.
    Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of 31st International Conference on Machine Learning, pp. 1188–1196 (2014)Google Scholar
  4. 4.
    Wang, K., Stolfo, S.J.: Anomalous payload-based network intrusion detection. In: Jonsson, E., Valdes, A., Almgren, M. (eds.) RAID 2004. LNCS, vol. 3224, pp. 203–222. Springer, Heidelberg (2004).  https://doi.org/10.1007/978-3-540-30143-1_11 CrossRefGoogle Scholar
  5. 5.
    Moore, D., Shannon, C., Brown, D.J., Voelker, G.M., Savage, S.: Inferring internet denial-of-service activity. ACM Trans. Comput. Syst. 24(2), 115–139 (2006)CrossRefGoogle Scholar
  6. 6.
    Bailey, M., Oberheide, J., Andersen, J., Mao, Z.M., Jahanian, F., Nazario, J.: Automated classification and analysis of internet malware. In: Kruegel, C., Lippmann, R., Clark, A. (eds.) RAID 2007. LNCS, vol. 4637, pp. 178–197. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-74320-0_10 CrossRefGoogle Scholar
  7. 7.
    Song, H., Turner, J.: Toward advocacy-free evaluation of packet classification algorithms. IEEE Trans. Comput. 605, 723–733 (2011)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Karagiannis, T., Papagiannaki, K., Faloutsos, M.: BLINC: multilevel traffic classification in the dark. In: Proceedings of the 2005 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 229–240 (2005)Google Scholar
  9. 9.
    Gu, G., Perdisci, R., Zhang, J., Lee, W.: BotMiner: clustering analysis of network traffic for protocol and structure independent botnet detection. In: Proceedings of USENIX Security Symposium, vol. 5, pp. 139–154 (2008)Google Scholar
  10. 10.
    Bilge, L., Balzarotti, D., Robertson, W., Kirda, E., Kruegel, C.: Disclosure: detecting botnet command and control servers through large-scale netflow analysis. In: Proceedings of the 28th Annual Computer Security Applications Conference, pp. 129–138 (2012)Google Scholar
  11. 11.
    Antonakakis, M., Perdisci, R., Dagon, D., Lee, W., Feamster, N.: Building a dynamic reputation system for DNS. In: Proceedings of the 19th USENIX Security Symposium (2010)Google Scholar
  12. 12.
    Antonakakis, M., Perdisci, R., Lee, W., Vasiloglou II, N., Dagon, D.: Detecting malware domains at the upper DNS hierarchy. In: Proceedings of 20th USENIX Security Symposium (2011)Google Scholar
  13. 13.
    Antonakakis, M., Perdisci, R., Nadji, Y., Vasiloglou, N., Abu-Nimeh, S., Lee, W., Dagon, D.: From throw-away traffic to bots: detecting the rise of DGA-based malware. In: Proceedings of 21th USENIX Security Symposium (2012)Google Scholar
  14. 14.
    Rahbarinia, B., Perdisci, R., Antonakakis, M.: Segugio: efficient behavior-based tracking of new malware-control domains in large ISP networks. In: Proceedings of the 2015 IEEE/IFIP International Conference on Dependable Systems and Networks (2015)Google Scholar
  15. 15.
    Kruegel, C., Vigna, G.: Anomaly detection of web-based attacks. In: Proceedings of the 10th ACM Conference on Computer and Communications Security, pp. 251–261 (2003)Google Scholar
  16. 16.
    Choi, H., Zhu, B.B., Lee, H.: Detecting malicious web links and identifying their attack types. In: Proceedings of the 2nd USENIX Conference on Web Application Development, pp. 1–11 (2011)Google Scholar
  17. 17.
    Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Learning to detect malicious URLs. In: ACM Transactions on Intelligent Systems and Technology, vol. 23, Article no. 30 (2011)Google Scholar
  18. 18.
    Zhao, P., Hoi, S.C.: Cost-sensitive online active learning with application to malicious URL detection. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 919–927 (2013)Google Scholar
  19. 19.
    Invernizzi, L., Miskovic, S., Torres, R., Saha, S., Lee, S., Mellia, M., Kruegel, C., Vigna, G.: Nazca: detecting malware distribution in large-scale networks. In: Proceedings of the Network and Distributed System Security Symposium (2014)Google Scholar
  20. 20.
    Nelms, T., Perdisci, R., Antonakakis, M., Ahamad, M.: Webwitness: investigating, categorizing, and mitigating malware download paths. In: Proceedings of the 24th USENIX Security Symposium, pp. 1025–1040 (2015)Google Scholar
  21. 21.
    Bartos, K., Sofka, M.: Optimized invariant representation of network traffic for detecting unseen malware variants. In: Proceedings of the 25th USENIX Security Symposium, pp. 806–822 (2016)Google Scholar
  22. 22.
    Shibahara, T., Yamanishi, K., Takata, Y., Chiba, D., Akiyama, M., Yagi, T., Ohsita, Y., Murata, M.: Malicious URL sequence detection using event de-noising convolutional neural network. In: Proceedings of the IEEE ICC 2017 Communication and Information Systems Security Symposium (2017)Google Scholar
  23. 23.
    Takata, Y., Akiyama, M., Yagi, Y., Hariu, T., Goto, G.: MineSpider: extracting URLs from environment-dependent drive-by download attack. In: Proceedings of the 2015 IEEE 39th Annual Computer Software and Applications Conference, vol. 2, pp. 444–449 (2015)Google Scholar
  24. 24.
    Jodavi, M., Abadi, M., Parhizkar, E.: DbDHunter: an ensemble-based anomaly detection approach to detect drive-by download attacks. In: Proceedings of the 2015 5th International Conference on Computer and Knowledge Engineering (ICCKE), pp. 273–278 (2015)Google Scholar
  25. 25.
    tshark - Dump and analyze network traffic. https://www.wireshark.org/docs/man-pages/tshark.html
  26. 26.
  27. 27.
  28. 28.
    Hatada, M., Akiyama, M., Matsuki, T., Kasama, T.: Empowering anti-malware research in Japan by sharing the MWS datasets. J. Inf. Process. 23(5), 579–588 (2015)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.National Defense AcademyYokosukaJapan

Personalised recommendations