Up-High to Down-Low: Applying Machine Learning to an Exploit Database

  • Yisroel Mirsky
  • Noam Gross
  • Asaf Shabtai
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9522)


Today machine learning is primarily applied to low level features such as machine code and measurable behaviors. However, a great asset for exploit type classifications is public exploit databases. Unfortunately, these databases contain only meta-data (high level or abstract data) of these exploits. Considering that classification depends on the raw measurements found in the field, these databases have been overlooked. In this study, we offer two usages for these high level datasets and evaluate their performance. The first usage is classification by using meta-data as a bridge (supervised), and the second usage is the study of exploits’ relations using clustering and Self Organizing Maps (unsupervised). Both offer insights into exploit detection and can be used as a means to better define exploit classes.


Exploit database Machine learning Supervised Unsupervised Pattern abstraction Data mining 



This research was supported by the Ministry of Science and Technology, Israel.


  1. 1.
    Bayer, U., Comparetti, P.M., Hlauschek, C., Kruegel, C., Kirda, E.: Scalable, behavior-based malware clustering. In: NDSS, vol. 9, pp 8–11. Citeseer (2009)Google Scholar
  2. 2.
    Bellman, R.E.: Adaptive Control Processes: A Guided Tour, vol. 4. Princeton University Press, Princeton (1961)CrossRefzbMATHGoogle Scholar
  3. 3.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. CRC Press, Boca Raton (1984)zbMATHGoogle Scholar
  5. 5.
    Christodorescu, M., Jha, S.: Static analysis of executables to detect malicious patterns. Technical report, DTIC Document (2006)Google Scholar
  6. 6.
    Coppersmith, D., Hong, S.J., Hosking, J.R.: Partitioning nominal attributes in decision trees. Data Min. Knowl. Disc. 3(2), 197–217 (1999)CrossRefGoogle Scholar
  7. 7.
    Kohonen, T.: The self-organizing map. Proc. IEEE 78(9), 1464–1480 (1990)CrossRefGoogle Scholar
  8. 8.
    Patcha, A., Park, J.-M.: An overview of anomaly detection techniques: existing solutions and latest technological trends. Comput. Netw. 51(12), 3448–3470 (2007)CrossRefGoogle Scholar
  9. 9.
    Ritter, H., Kohonen, T.: Self-organizing semantic maps. Biol. Cybern. 61(4), 241–254 (1989)CrossRefGoogle Scholar
  10. 10.
    Shabtai, A., Moskovitch, R., Feher, C., Dolev, S., Elovici, Y.: Detecting unknown malicious code by applying classification techniques on opcode patterns. Secur. Inform. 1(1), 1–22 (2012)CrossRefGoogle Scholar
  11. 11.
    Sung, A.H., Xu, J., Chavez, P., Mukkamala, S.: Static analyzer of vicious executables (save). In: 20th Annual Computer Security Applications Conference, pp. 326–334. IEEE (2004)Google Scholar
  12. 12.
    Wagner, D., Dean, D.: Intrusion detection via static analysis. In: Proceedings of the 2001 IEEE Symposium on Security and Privacy, S&P 2001, pp. 156–168. IEEE (2001)Google Scholar
  13. 13.
    Walenstein, A., Venable, M., Hayes, M., Thompson, C., Lakhotia, A.: Exploiting similarity between variants to defeat malware. In: Proceedings of the BlackHat DC Conference (2007)Google Scholar
  14. 14.
    Wespi, A., Debar, H.: Building an intrusion-detection system to detect suspicious process behavior. In: Recent Advances in Intrusion Detection (1999)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Department of Information Systems EngineeringBen-Gurion UniversityBeershebaIsrael

Personalised recommendations