Data Modelling for Predicting Exploits
- 701 Downloads
Modern society is becoming increasingly reliant on secure computer systems. Predicting which vulnerabilities are more likely to be exploited by malicious actors is therefore an important task to help prevent cyber attacks. Researchers have tried making such predictions using machine learning. However, recent research has shown that the evaluation of such models require special sampling of training and test sets, and that previous models would have had limited utility in real world settings. This study further develops the results of recent research through the use of their sampling technique for evaluation in combination with a novel data model. Moreover, contrary to recent research, we find that using open web data can help in making better predictions about exploits, and that zero-day exploits are detrimental to the predictive powers of the model. Finally, we discovered that the initial days of vulnerability information is sufficient to make the best possible model. Given our findings, we suggest that more research should be devoted to develop refined techniques for building predictive models for exploits. Gaining more knowledge in this domain would not only help preventing cyber attacks but could yield fruitful insights in the nature of exploit development.
KeywordsExploits Machine learning Concept drift Vulnerability management
The research leading to these results has been partially supported by the Swedish Civil Contingencies Agency (MSB) through the project “RICS” and by the European Community’s Horizon 2020 Framework Programme through the UNITED-GRID project under grant agreement 773717.
We would also like to thank Staffan Truvé and Michel Edkrantz at Recorded Future for inspiration, access to data and the environment to perform the current study.
- 2.Bozorgi, M., Saul, L.K., Savage, S., Voelker, G.M.: Beyond heuristics: learning to classify vulnerabilities and predict exploits. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2010, pp. 105–114. ACM, New York (2010). http://doi.acm.org/10.1145/1835804.1835821
- 3.Bullough, B.L., Yanchenko, A.K., Smith, C.L., Zipkin, J.R.: Predicting exploitation of disclosed software vulnerabilities using open-source data. In: Proceedings of the 3rd ACM on International Workshop on Security and Privacy Analytics, IWSPA 2017, pp. 45–53. ACM, New York (2017). http://doi.acm.org/10.1145/3041008.3041009
- 4.Chen, T., He, T., Benesty, M., et al.: Xgboost: extreme gradient boosting. R package version 0.4-2, pp. 1–4 (2015)Google Scholar
- 5.Edkrantz, M., Said, A.: Predicting cyber vulnerability exploits with machine learning. In: SCAI (2015)Google Scholar
- 6.Exploit-DB Offensive Securitys Exploit Database Archive. https://www.exploit-db.com/. Accessed 24 Aug 2017
- 9.National Vulnerability Database Computer Security Resource Center. https://nvd.nist.gov/. Accessed 24 Aug 2017
- 10.Recorded Future’s threat intelligence platformGoogle Scholar
- 11.Roytman, M.: Quick Look: Predicting Exploitability, Forecasts for Vulnerability Management (2018). https://www.rsaconference.com/videos/quick-look-predicting-exploitabilityforecasts-for-vulnerability-management
- 12.Sabottke, C., Suciu, O., Dumitras, T.: Vulnerability disclosure in the age of social media: exploiting twitter for predicting real-world exploits. In: 24th USENIX Security Symposium. USENIX Association, Washington, D.C. (2015)Google Scholar
- 13.Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23(1), 69–101 (1996)Google Scholar