Advertisement

Machine Learning

, Volume 108, Issue 8–9, pp 1353–1368 | Cite as

Joint detection of malicious domains and infected clients

  • Paul PrasseEmail author
  • René Knaebel
  • Lukáš Machlica
  • Tomáš Pevný
  • Tobias Scheffer
Article
Part of the following topical collections:
  1. Special Issue of the ECML PKDD 2019 Journal Track
  2. Special Issue of the ECML PKDD 2019 Journal Track
  3. Special Issue of the ECML PKDD 2019 Journal Track

Abstract

Detection of malware-infected computers and detection of malicious web domains based on their encrypted HTTPS traffic are challenging problems, because only addresses, timestamps, and data volumes are observable. The detection problems are coupled, because infected clients tend to interact with malicious domains. Traffic data can be collected at a large scale, and antivirus tools can be used to identify infected clients in retrospect. Domains, by contrast, have to be labeled individually after forensic analysis. We explore transfer learning based on sluice networks; this allows the detection models to bootstrap each other. In a large-scale experimental study, we find that the model outperforms known reference models and detects previously unknown malware, previously unknown malware families, and previously unknown malicious domains.

Keywords

Machine learning Neural networks Computer security Traffic data Https traffic 

Notes

Acknowledgements

The work of Tomáš Pevný has been partially funded by Czech Ministry of education under the GACR project 18-21409S. We would like to thank Virustotal.com for their kind support.

Funding

Funding was provided by Cisco R&D.

References

  1. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., & Zheng, X. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. Software available from https://www.tensorflow.org/. Accessed 6 Sept 2018.
  2. Anderson, B., Quist, D., Neil, J., Storlie, C., & Lane, T. (2011). Graph-based malware detection using dynamic analysis. Journal of Computer Virology, 7(4), 247–258.CrossRefGoogle Scholar
  3. Argyriou, A., Evgeniou, T., & Pontil, M. (2007). Multi-task feature learning. In B. Schölkopf, J. C. Platt, & T. Hoffman (Eds.), Advances in neural information processing systems 19 (pp. 41–48). MIT Press.Google Scholar
  4. Arora, A., Garg, S., & Peddoju, S. K. (2014). Malware detection using network traffic analysis in android based mobile devices. In International conference on next generation mobile apps, services and technologies (pp. 66–71).Google Scholar
  5. Bartos, K., & Sofka, M. (2015). Robust representation for domain adaptation in network security. In European conference on machine learning and principles and practice of knowledge discovery in databases (pp. 116–132). Springer.Google Scholar
  6. Bartos, K., Sofka, M., & Franc, V. (2016). Optimized invariant representation of network traffic for detecting unseen malware variants. In USENIX security symposium (pp. 807–822).Google Scholar
  7. Baxter, J. (1997). A bayesian/information theoretic model of learning to learn via multiple task sampling. Machine Learning, 28(1), 7–39.CrossRefzbMATHGoogle Scholar
  8. Bickel, S., Bogojeska, J., Lengauer, T., & Scheffer, T. (2008). Multi-task learning for hiv therapy screening. In Proceedings of the international conference on machine learning (pp. 56–63). ACM.Google Scholar
  9. Blum, S. B., & Lueker, J. (2001). Transparent proxy server, January 30. US Patent 6,182,141.Google Scholar
  10. Caruana, R. (1993) Multitask learning: A knowledge-based source of inductive bias. In Proceedings of the international conference on machine learning.Google Scholar
  11. Chollet, F., et al. (2015). Keras. https://keras.io. Accessed 6 Sept 2018.
  12. Crotti, M., Dusi, M., Gringoli, F., & Salgarelli, L. (2007). Traffic classification through simple statistical fingerprinting. ACM SIGCOMM Computer Communication Review, 37(1), 5–1.CrossRefGoogle Scholar
  13. Demontis, A., Melis, M., Biggio, B., Maiorca, D., Arp, D., Rieck, K., et al. (2018). Yes, machine learning can be more secure! a case study on android malware detection. IEEE Transactions on Dependable and Secure Computing. https://doi.org/10.1109/TDSC.2017.2700270.
  14. Duong, L., Cohn, T., Bird, S., & Cook, P. (2015). A neural network model for low-resource universal dependency parsing. In Proceedings of the conference on empirical methods in natural language processing (pp. 339–348).Google Scholar
  15. Dusi, M., Crotti, M., Gringoli, F., & Salgarelli, L. (2009). Tunnel hunter: Detecting application-layer tunnels with statistical fingerprinting. Computer Networks, 53(1), 81–97.CrossRefGoogle Scholar
  16. Evgeniou, T., Micchelli, C. A., & Pontil, M. (2005). Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6(Apr), 615–637.MathSciNetzbMATHGoogle Scholar
  17. Finkel, J. R., & Manning, C. D. (2009). Hierarchical bayesian domain adaptation. In Proceedings of ACL human language technologies (pp. 602–610).Google Scholar
  18. Finley, K. (2017). Half the web is now encrypted. That makes everyone safer. Wired. https://www.wired.com/2017/01/half-web-now-encrypted-makes-everyone-safer/.
  19. Franc, V., Sofka, M., & Bartos, K. (2015). Learning detector of malicious network traffic from weak labels. In A. Bifet, M. May, B. Zadrozny, R. Gavalda, D. Pedreschi, F. Bonchi, J. Cardoso, & M. Spiliopoulou (Eds.), Machine learning and knowledge discovery in databases (pp. 85–99). Springer.Google Scholar
  20. Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., et al. (2016). Domain-adversarial training of neural networks. Journal of Machine Learning Research, 17(59), 1–35.MathSciNetzbMATHGoogle Scholar
  21. Gehring, J., Auli, M., Grangier, D., Yarats, D., & Dauphin, Y. N. (2017). Convolutional sequence to sequence learning. arXiv:1705.03122.
  22. Karim, M. E., Walenstein, A., Lakhotia, A., & Laxmi, P. (2005). Malware phylogeny generation using permutations of code. Journal in Computer Virology, 1(1–2), 13–23.CrossRefGoogle Scholar
  23. Kogan, R. (2015). Bedep trojan malware spread by the angler exploit kit gets political. Spider Labs Blog. https://www.trustwave.com/Resources/SpiderLabs-Blog/Bedep-trojan-malware-spread-by-the-Angler-exploit-kit-gets-political/. Accessed 6 Sept 2018.
  24. Kohout, J., & Pevny, T. (2015a) Automatic discovery of web servers hosting similar applications. In Proceedings of the IFIP/IEEE international symposium on integrated network management.Google Scholar
  25. Kohout, J., & Pevny, T. (2015b). Unsupervised detection of malware in persistent web traffic. In Proceedings of the IEEE international conference on acoustics, speech and signal processing.Google Scholar
  26. Lashkari, A., Kadir, A., Gonzalez, H., Mbah, K., & Ghorbani, A. (2015). Towards a network-based framework for android malware detection and characterization. In Proceedings international conference on privacy, security, and trust.Google Scholar
  27. Li, L., Jamieson, K. G., DeSalvo, G., Rostamizadeh, A., & Talwalkar, A. (2016). Efficient hyperparameter optimization and infinitely many armed bandits. CoRR. arXiv:1603.06560.
  28. Lokoč, J., Kohout, J., Čech, P., Skopal, T., & Pevnỳ, T. (2016). k-NN classification of malware in HTTPS traffic using the metric space approach. In M. Chau, G. A. Wang, & H. Chen (Eds.), Intelligence and security informatics (pp. 131–145). Springer.Google Scholar
  29. Long, M., & Wang, J. (2015). Learning multiple tasks with deep relationship networks. In arXiv:1506.02117.
  30. Malik, J., & Kaushal, R. (2016). CREDROID: Android malware detection by network traffic analysis. In Proceedings of the first ACM workshop on privacy-aware mobile computing (pp. 28–36). ACM.Google Scholar
  31. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in neural information processing systems 26 (pp. 3111–3119). Curran Associates, Inc.Google Scholar
  32. Misra, I., Shrivastava, A., Gupta, A., & Hebert, M. (2016). Cross-stitch networks for multi-task learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3994–4003).Google Scholar
  33. Nelms, T., Perdisci, R., & Ahamad, M. (2013). Execscent: Mining for new C&C domains in live networks with adaptive control protocol templates. In Proceedings of the USENIX security symposium.Google Scholar
  34. Nguyen, T., & Armitage, G. (2008). A survey of techniques for internet traffic classification using machine learning. IEEE Communications Surveys, Tutorials, 10(4), 56–76.CrossRefGoogle Scholar
  35. Pascanu, R., Stokes, J. W., Sanossian, H., Marinescu, M., & Thomas, A. (2015). Malware classification with recurrent networks. In Proceedings of the IEEE international conference on acoustics, speech and signal processing (pp. 1916–1920). IEEE.Google Scholar
  36. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.MathSciNetzbMATHGoogle Scholar
  37. Pevny, T., & Somol, P. (2016). Discriminative models for multi-instance problems with tree structure. In Proceedings of the international workshop on artificial intelligence for computer security.Google Scholar
  38. Prasse, P., Machlica, L., Pevný, T., Havelka, J., & Scheffer, T. (2017). Malware detection by analysing network traffic with neural networks. In Proceedings of the European conference on machine learning.Google Scholar
  39. Ruder, S., Bingel, J., Augenstein, I., & Søgaard, A. (2017). Sluice networks: Learning what to share between loosely related tasks. arXiv:1705.08142v1 [stat.ML]
  40. Swinnen, A., & Mesbahi, A. (2014). One packer to rule them all: Empirical identification, comparison and circumvention of current antivirus detection techniques. BlackHat USA. https://www.blackhat.com/docs/us-14/materials/us-14-Mesbahi-One-Packer-To-Rule-Them-All-WP.pdf.
  41. Wright, C. V., Monrose, F., & Masson, G. M. (2006). On inferring application protocol behaviors in encrypted network traffic. Journal of Machine Learning Research, 7, 2745–2769.MathSciNetzbMATHGoogle Scholar
  42. Yang, Y., & Hospedales, T. M. (2016). Trace norm regularised deep multi-task learning. arXiv:1606.04038.

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of PotsdamPotsdamGermany
  2. 2.Cisco R&DPragueCzech Republic
  3. 3.Department of Computer ScienceCzech Technical University in PraguePragueCzech Republic

Personalised recommendations