Advanced Detection Tool for PDF Threats

  • Quentin JeromeEmail author
  • Samuel Marchal
  • Radu State
  • Thomas Engel
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8247)


In this paper we introduce an efficient application for malicious PDF detection: ADEPT. With targeted attacks rising over the recent past, exploring a new detection and mitigation paradigm becomes mandatory. The use of malicious PDF files that exploit vulnerabilities in well-known PDF readers has become a popular vector for targeted attacks, for which few efficient approaches exist. Although simple in theory, parsing followed by analysis of such files is resource-intensive and may even be impossible due to several obfuscation and reader-specific artifacts. Our paper describes a new approach for detecting such malicious payloads that leverages machine learning techniques and an efficient feature selection mechanism for rapidly detecting anomalies. We assess our approach on a large selection of malicious files and report the experimental performance results for the developed prototype.


PDF files Malware detection Machine learning 



The authors would like to thank Prof. Dr. Pavel Laskov for the support and dataset provided for our experiments. Special thanks also go to the VirusTotal team for giving us access to several datasets.


  1. 1.
    Adobe: PDF reference sixth edition, adobe portable document format, version 1.7 (2006)Google Scholar
  2. 2.
    Filiol, E., Blonce, A., Frayssignes, L.: Portable document format (PDF) security analysis and malware threats. J. Comput. Virol. 3(2), 75–86 (2007)CrossRefGoogle Scholar
  3. 3.
    Daniel, M., Honoroff, J., Miller, C.: Engineering heap overflow exploits with JavaScript. In: Proceedings of the 2nd Conference on USENIX Workshop on Offensive Technologies, WOOT’08, pp. 1:1–1:6. USENIX Association, Berkeley (2008)Google Scholar
  4. 4.
    Rahman, M.A.: Getting owned by malicious PDF - analysis. Global Information Assurance Certification Paper (2010)Google Scholar
  5. 5.
    Laskov, P., Šrndić, N.: Static detection of malicious JavaScript-bearing PDF documents. In: Proceedings of the 27th Annual Computer Security Applications Conference. ACSAC ’11, pp. 373–382. ACM, New York (2011)Google Scholar
  6. 6.
    Šrndic, N., Laskov, P.: Detection of malicious pdf files based on hierarchical document structure. In: Proceedings of the 20th Annual Network and Distributed System Security Symposium (2013)Google Scholar
  7. 7.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  8. 8.
    Witten, I., Frank, E., Hall, M.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Amsterdam (2011)Google Scholar
  9. 9.
    Akbani, R., Kwek, S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  10. 10.
    Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)zbMATHGoogle Scholar
  11. 11.
    Lowagie, B.: iText in Action: Creating and Manipulating PDF. Dreamtech Press, New Delhi (2006)Google Scholar
  12. 12.
    Willems, C., Holz, T., Freiling, F.: Toward automated dynamic malware analysis using CWSandbox. IEEE Secur. Priv. 5, 32–39 (2007)CrossRefGoogle Scholar
  13. 13.
    Trinius, P., Willems, C., Holz, T., Rieck, K.: A malware instruction set for behavior-based analysis. In: Proceedings of the Conference Sicherheit Schutz und Zuverlssigkeit SICHERHEIT (TR-2009-07), pp. 1–11 (2011)Google Scholar
  14. 14.
    Tzermias, Z., Sykiotakis, G., Polychronakis, M., Markatos, E.P.: Combining static and dynamic analysis for the detection of malicious documents. In: Proceedings of the Fourth European Workshop on System Security. EUROSEC ’11, pp. 4:1–4:6. ACM, New York (2011)Google Scholar
  15. 15.
    Schmitt, F., Gassen, J., Gerhards-Padilla, E.: Pdf scrutinizer: detecting javascript-based attacks in pdf documents. In: 2012 Tenth Annual International Conference on Privacy, Security and Trust (PST), pp. 104–111. IEEE(2012)Google Scholar
  16. 16.
    Rieck, K., Krueger, T., Dewald, A.: Cujo: Efficient detection and prevention of drive-by-download attacks. In: Proceedings of the 26th Annual Computer Security Applications Conference, pp. 31–39. ACM (2010)Google Scholar
  17. 17.
    Smutz, C., Stavrou, A.: Malicious PDF detection using metadata and structural features. In: Proceedings of the 28th Annual Computer Security Applications Conference, pp. 239–248. ACM (2012)Google Scholar
  18. 18.
    François, J., Wang, S., State, R., Engel, T.: BotTrack: tracking botnets using NetFlow and PageRank. In: Domingo-Pascual, J., Manzoni, P., Palazzo, S., Pont, A., Scoglio, C. (eds.) NETWORKING 2011, Part I. LNCS, vol. 6640, pp. 1–14. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  19. 19.
    Wagner, C., Wagener, G., State, R., Engel, T.: Malware analysis with graph kernels and support vector machines. In: 2009 4th International Conference on Malicious and Unwanted Software (MALWARE), pp. 63–68. IEEE (2009)Google Scholar
  20. 20.
    Abdelnur, H.J., State, R., Festor, O.: Advanced network fingerprinting. In: Lippmann, R., Kirda, E., Trachtenberg, A. (eds.) RAID 2008. LNCS, vol. 5230, pp. 372–389. Springer, Heidelberg (2008)Google Scholar
  21. 21.
    Kolter, J., Maloof, M.: Learning to detect and classify malicious executables in the wild. J. Mach. Learn. Res. 7, 2721–2744 (2006)zbMATHMathSciNetGoogle Scholar
  22. 22.
    Li, W., Wang, K., Stolfo, S., Herzog, B.: Fileprints: identifying file types by n-gram analysis. In: Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop. IAW’05, pp. 64–71. IEEE (2005)Google Scholar
  23. 23.
    Stolfo, S.J., Wang, K., Li, W.J.: Fileprint analysis for malware detection. ACM CCS WORM (2005)Google Scholar
  24. 24.
    Li, W., Stolfo, S., Stavrou, A., Androulaki, E., Keromytis, A.: A study of malcode-bearing documents. Detection of Intrusions and Malware, and Vulnerability, Assessment, pp. 231–250 (2007)Google Scholar
  25. 25.
    Bayer, U., Moser, A., Kruegel, C., Kirda, E.: Dynamic analysis of malicious code. J. Comput. Virol. 1, 67–77 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Quentin Jerome
    • 1
    Email author
  • Samuel Marchal
    • 1
  • Radu State
    • 1
  • Thomas Engel
    • 1
  1. 1.SnT - University of LuxembourgLuxembourgLuxembourg

Personalised recommendations