Learning Representations for Log Data in Cybersecurity
- 5 Citations
- 1.5k Downloads
Abstract
We introduce a framework for exploring and learning representations of log data generated by enterprise-grade security devices with the goal of detecting advanced persistent threats (APTs) spanning over several weeks. The presented framework uses a divide-and-conquer strategy combining behavioral analytics, time series modeling and representation learning algorithms to model large volumes of data. In addition, given that we have access to human-engineered features, we analyze the capability of a series of representation learning algorithms to complement human-engineered features in a variety of classification approaches. We demonstrate the approach with a novel dataset extracted from 3 billion log lines generated at an enterprise network boundaries with reported command and control communications. The presented results validate our approach, achieving an area under the ROC curve of 0.943 and 95 true positives out of the Top 100 ranked instances on the test data set.
Keywords
Representation learning Deep learning Feature discovery Cybersecurity Command and control detection Malware detectionReferences
- 1.Adversarial tactics, techniques and common knowledge. https://attack.mitre.org
- 2.
- 3.Malware capture facility project. http://mcfp.weebly.com/
- 4.VirusTotal. https://www.virustotal.com
- 5.Beigi, E.B., Jazi, H.H., Stakhanova, N., Ghorbani, A.A.: Towards effective feature selection in machine learning-based botnet detection approaches. In: 2014 IEEE Conference on Communications and Network Security, pp. 247–255 (2014)Google Scholar
- 6.Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefzbMATHGoogle Scholar
- 7.Chen, Y., Keogh, E., Hu, B., Begum, N., Bagnall, A., Mueen, A., Batista, G.: The UCR time series classification archive (2015)Google Scholar
- 8.Draper-Gil, G., Lashkari, A.H., Mamun, M.S.I., Ghorbani, A.A.: Characterization of encrypted and VPN traffic using time-related features. In: Proceedings of the 2nd International Conference on Information Systems Security and Privacy, ICISSP, vol. 1, pp. 407–414 (2016)Google Scholar
- 9.García, S., Uhlíř, V., Rehak, M.: Identifying and modeling botnet C&C behaviors. In: Proceedings of the 1st International Workshop on Agents and CyberSecurity, ACySE 2014, NY, USA, pp. 1:1–1:8. ACM, New York (2014)Google Scholar
- 10.Garcia, S., Zunino, A., Campo, M.: Survey on network-based botnet detection methods. Secur. Commun. Netw. 7(5), 878–903 (2014)CrossRefGoogle Scholar
- 11.Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
- 12.Jiang, H., Nagra, J., Ahammad, P.: Sok: applying machine learning in security-a survey. arXiv preprint arXiv:1611.03186 (2016)
- 13.Kim, S., Smyth, P., Luther, S.: Modeling waveform shapes with random effects segmental hidden Markov models. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, UAI 2004, pp. 309–316. AUAI Press, Arlington (2004)Google Scholar
- 14.Nanopoulos, A., Alcock, R., Manolopoulos, Y.: Information processing and technology. In: Feature-based Classification of Time-series Data, pp. 49–61. Nova Science Publishers Inc, Commack (2001)Google Scholar
- 15.Plohmann, D., Yakdan, K., Klatt, M., Bader, J., Gerhards-Padilla, E.: A comprehensive measurement study of domain generating malware. In: 25th USENIX Security Symposium (USENIX Security 2016), pp. 263–278. USENIX Association, Austin (2016)Google Scholar
- 16.Rodríguez, J.J., Alonso, C.J.: Interval and dynamic time warping-based decision trees. In: Proceedings of the 2004 ACM Symposium on Applied Computing, SAC 2004, NY, USA, pp. 548–552. ACM, New York (2004)Google Scholar
- 17.Sak, H., Senior, A.W., Beaufays, F.: Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition. CoRR abs/1402.1128 (2014)Google Scholar
- 18.Shiravi, A., Shiravi, H., Tavallaee, M., Ghorbani, A.A.: Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput. Secur. 31(3), 357–374 (2012)CrossRefGoogle Scholar
- 19.Sood, A., Enbody, R.: Targeted Cyber Attacks: Multi-staged Attacks Driven by Exploits and Malware, 1st edn. Syngress Publishing, Burlington (2014)Google Scholar
- 20.Staudemeyer, R.C., Omlin, C.W.: Evaluating performance of long short-term memory recurrent neural networks on intrusion detection data. In: Proceedings of the South African Institute for Computer Scientists and Information Technologists Conference, SAICSIT 2013, NY, USA, pp. 218–224. ACM, New York (2013)Google Scholar
- 21.Stevanovic, M., Pedersen, J.M.: On the use of machine learning for identifying botnet network traffic. J. Cyber. Secur. Mobility 4(3), 1–32 (2016)CrossRefGoogle Scholar
- 22.Tuor, A., Kaplan, S., Hutchinson, B., Nichols, N., Robinson, S.: Deep learning for unsupervised insider threat detection in structured cybersecurity data streams (2017)Google Scholar
- 23.Veeramachaneni, K., Arnaldo, I., Korrapati, V., Bassias, C., Li, K.: AI\(^2\): training a big data machine to defend. In: 2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS), pp. 49–54 (2016)Google Scholar
- 24.Wang, Z., Oates, T.: Imaging time-series to improve classification and imputation. In: Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI 2015, pp. 3939–3945. AAAI Press (2015)Google Scholar
- 25.Woodbridge, J., Anderson, H.S., Ahuja, A., Grant, D.: Predicting domain generation algorithms with long short-term memory networks. arXiv preprint arXiv:1611.00791 (2016)
- 26.Xi, X., Keogh, E., Shelton, C., Wei, L., Ratanamahatana, C.A.: Fast time series classification using numerosity reduction. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, NY, USA, pp. 1033–1040. ACM, New York (2006)Google Scholar
- 27.Zhao, D., Traore, I., Sayed, B., Lu, W., Saad, S., Ghorbani, A., Garant, D.: Botnet detection based on traffic behavior analysis and flow intervals. Comput. Secur. 39, 2–16 (2013)CrossRefGoogle Scholar