BlackEye: automatic IP blacklisting using machine learning from security logs
- 19 Downloads
Blacklisting of malicious IP address is a primary technique commonly used for safeguarding mission-critical IT systems. The decision to blacklist an IP address requires careful examination of various aspects of packet traffic data as well as the behavioral history. Most of the current security monitoring for IP blacklisting heavily relies on the domain expertise from experienced specialists. Although there are efforts to apply machine-learning (ML) techniques to this problem, we are yet to see the mature solution. To mitigate these challenges and to gain better understanding of the problem, we have designed the BlackEye framework in which we can apply various ML techniques and produce models for accurate blacklisting. From our analysis results, we learn that multi-staged method that combines the data cleansing and the classification via logistic regression or random forest produces the best results. Our evaluation on the real-world data shows that it can reduce the the incorrect blacklisting by nearly 90% when compared to the performance of experts. More over, our proposed model performed well in terms of the time-to-blacklist by curtailing the period of malicious IP address in activity by 27 days on average.
KeywordsBlacklisting Security logs Machine learning Linear regression
This research was supported by Kyungpook National University Research Fund, 2017.
- 2.Anand, A., Gorde, K., Antony Moniz, J. R., Park, N., Chakraborty, T., & Chu. B. (2018) . Phishing URL detection with oversampling based on text generative adversarial networks. In: 2018 IEEE international conference on Big Data (Big Data) (pp. 1168–1177). https://doi.org/10.1109/BigData.2018.8622547.
- 3.Arnaldo, I., Arun, A., Kyathanahalli, S., & Veeramachaneni, K. (2018) . Acquire, adapt, and anticipate: Continuous learning to block malicious domains. In: 2018 IEEE international conference on Big Data (Big Data) (pp. 1891–1898). https://doi.org/10.1109/BigData.2018.8622197.
- 7.DShield. http://dshiled.org. Accessed 1 Dec 2019.
- 8.Du, M., Li, F., Zheng, G., Srikumar, V. (2017). DeepLog: Anomaly detection and diagnosis from system logs through deep learning. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security. CCS ’17. Dallas, Texas, USA: ACM, 2017 (pp. 1285–1298). ISBN: 978-1-4503-4946-8.Google Scholar
- 11.Lee, W. (1990). A data mining framework for constructing features and models for intrusion detection systems (computer security, network security). AAI9949009. Ph.D. thesis. New York, NY, USA, 1999. ISBN: 0-599-51249-0.Google Scholar
- 16.Ma, J., Saul, L. K., Savage, S., & Voelker, G. M. (2009). Beyond blacklists: Learning to detect malicious web sites from suspicious URLs. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’09. Paris: ACM, 2009 (pp. 1245–1254). ISBN: 978-1-60558-495-9.Google Scholar
- 18.Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Technical report 1999-66. Previous number = SIDL-WP-1999-0120. Stanford InfoLab. http://ilpubs.stanford.edu:8090/422/. Accessed 1 Dec 2019.
- 19.Sahoo, D., Liu, C., & Hoi, S. C. H. (2017) . Malicious URL detection using machine learning: A survey. CoRR arXiv:1701.07179.
- 20.Soldo, F., Le, A., & Markopoulou, A. (2010). Predictive blacklisting as an implicit recommendation system. In Proceedings of the 29th conference on information communications. INFOCOM’10. San Diego, California, USA: IEEE Press, 2010 (pp. 1640–1648). ISBN: 978-1-4244-5836-3. http://dl.acm.org/citation.cfm?id=1833515.1833744
- 21.Tuor, A., Kaplan, S., Hutchinson, B., Nichols, N., & Robinson, S. (2017). Deep learning for unsupervised insider threat detection in structured cybersecurity data streams. In: CoRR. arXiv:1710.00811.
- 25.Yen, T.-F., Oprea, A., Onarlioglu, K., Leetham, T., Robertson, W., Juels, A., & Kirda, E. (2013). Beehive: Large-scale log analysis for detecting suspicious activity in enterprise networks. In Proceedings of the 29th annual computer security applications conference. ACSAC ’13. ACM (pp. 199–208). ISBN: 978-1-4503-2015-3.Google Scholar
- 26.Zhang, J., Porras, P., & Ullrich, J. (2008). Highly predictive blacklisting. In Proceedings of the 17th conference on security symposium. SS’08. San Jose, CA: USENIX Association (pp. 107–122). http://dl.acm.org/citation.cfm?id=1496711.1496719.