Skip to main content

An Evasion Resilient Approach to the Detection of Malicious PDF Files

  • Conference paper
  • First Online:
Information Systems Security and Privacy (ICISSP 2015)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 576))

Included in the following conference series:

Abstract

Malicious PDF files still constitute a serious threat to the systems security. New reader vulnerabilities have been discovered, and research has shown that current state of the art approaches can be easily bypassed by exploiting weaknesses caused by erroneous parsing or incomplete information extraction. In this work, we present a novel machine learning system to the detection of malicious PDF files. We have developed a static approach that leverages on information extracted by both the structure and the content of PDF files, which allows to improve the system robustness against evasion attacks. Experimental results show that our system is able to outperform all publicly available state of the art tools. We also report a significant improvement of the performances at detecting reverse mimicry attacks, which are able to completely evade systems that only extract information from the PDF file structure. Finally, we claim that, to avoid targeted attacks, a more careful design of machine learning based detectors is needed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://eternal-todo.com/tools/peepdf-pdf-analysis-tool.

  2. 2.

    http://esec-lab.sogeti.com/pages/origami.

  3. 3.

    http://wepawet.iseclab.org/index.php.

  4. 4.

    http://htmlunit.sourceforge.net.

  5. 5.

    http://www.mozilla.org/rhino.

  6. 6.

    http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit.

  7. 7.

    http://pdfrate.com/.

  8. 8.

    Streams containing other objects.

  9. 9.

    A new typology of cross-reference table introduced by recent PDF specification.

  10. 10.

    The seed value has been set to the default value indicated here: http://weka.sourceforge.net/doc.dev/weka/clusterers/SimpleKMeans.html.

  11. 11.

    A scan that is stopped if it finds anomalies in the files. This definition is valid for PeePDF; in Origami, such scan is defined as standard mode.

  12. 12.

    If W is in its percentage form, it must be divided by 100 first.

  13. 13.

    http://contagiodump.blogspot.it.

  14. 14.

    Being Wepawet and PDFRate online services, we could not train such systems with our own samples.

  15. 15.

    For EXE Embedding we exploited the CVE-2010-1240 vulnerability and for PDF Embedding and Javascript Injection we exploited the CVE-2009-0927.

References

  1. Symantec: Internet Security Threat Reports. 2013 Trends. Symantec (2014)

    Google Scholar 

  2. Buchanan, E., Roemer, R., Sevage, S., Shacham, H.: Return-oriented programming: exploitation without code injection. In: Black Hat 2008 (2008)

    Google Scholar 

  3. Ratanaworabhan, P., Livshits, B., Zorn, B.: Nozzle: a defense against heap-spraying code injection attacks. In: Proceedings of the 18th Conference on USENIX Security Symposium (2009)

    Google Scholar 

  4. Bania, P.: Jit spraying and mitigations. CoRR abs/1009.1038 (2010)

    Google Scholar 

  5. Adobe: Adobe Supplement to ISO 32000. Adobe (2008)

    Google Scholar 

  6. Esparza, J.M.: Obfuscation and (non-)detection of malicious pdf files. In: S21Sec e-crime (2011)

    Google Scholar 

  7. Cova, M., Kruegel, C., Vigna, G.: Detection and analysis of drive-by-download attacks and malicious javascript code. In: Proceedings of the 19th International Conference on World Wide Web (2010)

    Google Scholar 

  8. Laskov, P., Šrndić, N.: Static detection of malicious javascript-bearing pdf documents. In: Proceedings of the 27th Annual Computer Security Applications Conference (2011)

    Google Scholar 

  9. Tzermias, Z., Sykiotakis, G., Polychronakis, M., Markatos, E.P.: Combining static and dynamic analysis for the detection of malicious documents. In: Proceedings of the 4th European Workshop on System Security (2011)

    Google Scholar 

  10. Maiorca, D., Giacinto, G., Corona, I.: A pattern recognition system for malicious pdf files detection. In: Proceedings of the 8th International Conference on Machine Learning and Data Mining in Pattern Recognition (2012)

    Google Scholar 

  11. Smutz, C., Stavrou, A.: Malicious pdf detection using metadata and structural features. In: Proceedings of the 28th Annual Computer Security Applications Conference (2012)

    Google Scholar 

  12. Šrndić, N., Laskov, P.: Detection of malicious pdf files based on hierarchical document structure. In: Proceedings of the 20th Annual Network and Distributed System Security Symposium (2013)

    Google Scholar 

  13. Maiorca, D., Corona, I., Giacinto, G.: Looking at the bag is not enough to find the bomb: an evasion of structural methods for malicious pdf files detection. In: Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security (2013)

    Google Scholar 

  14. Šrndic, N., Laskov, P.: Practical evasion of a learning-based classifier: a case study. In: Proceedings of the 2014 IEEE Symposium on Security and Privacy, SP 2014, pp. 197–211. IEEE Computer Society, Washington, D.C. (2014)

    Google Scholar 

  15. Corona, I., Maiorca, D., Ariu, D., Giacinto, G.: Lux0r: detection of malicious pdf-embedded javascript code through discriminant analysis of API references. In: Proceedings of the 7th ACM Workshop on Artificial Intelligence and Security (AiSEC). Scottdale, Arizona, USA (2014)

    Google Scholar 

  16. Liu, D., Wang, H., Stavrou, A.: Detecting malicious javascript in pdf through document instrumentation. In: Proceedings of the 44th Annual International Conference on Dependable Systems and Networks (2014)

    Google Scholar 

  17. Maass, M., Scherlis, W.L., Aldrich, J.: In-nimbo sandboxing. In: Proceedings of the 2014 Symposium and Bootcamp on the Science of Security, HotSoS 2014. ACM, New York, pp. 1:1–1:12 (2014)

    Google Scholar 

  18. Maiorca, D., Ariu, D., Corona, I., Giacinto, G.: A structural and content-based approach for a precise and robust detection of malicious pdf files. In: Proceedings of the 1st International Conference on Information Systems Security and Privacy (ICISSP 2015), pp. 27–36. INSTICC (2015)

    Google Scholar 

  19. Adobe: PDF Reference. Adobe Portable Document Format Version 1.7. Adobe (2006)

    Google Scholar 

  20. Li, W.J., Stolfo, S., Stavrou, A., Androulaki, E., Keromytis, A.D.: A study of malcode-bearing documents. In: Proceedings of the 4th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (2007)

    Google Scholar 

  21. Shafiq, M.Z., Khayam, S.A., Farooq, M.: Embedded malware detection using markov n-grams. In: Proceedings of the 5th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (2008)

    Google Scholar 

  22. Tabish, S.M., Shafiq, M.Z., Farooq, M.: Malware detection using statistical analysis of byte-level file content. In: Proceedings of the ACM SIGKDD Workshop on CyberSecurity and Intelligence Informatics (2009)

    Google Scholar 

  23. Rieck, K., Krueger, T., Dewald, A.: Cujo: efficient detection and prevention of drive-by-download attacks. In: Proceedings of the 26th Annual Computer Security Applications Conference (2010)

    Google Scholar 

  24. Curtsinger, C., Livshits, B., Zorn, B., Seifert, C.: Zozzle: fast and precise in-browser javascript malware detection. In: Proceedings of the 20th USENIX Conference on Security (2011)

    Google Scholar 

  25. Canali, D., Cova, M., Vigna, G., Kruegel, C.: Prophiler: a fast filter for the large-scale detection of malicious web pages. In: Proceedings of the 20th International Conference on World Wide Web (2011)

    Google Scholar 

  26. Engleberth, M., Willems, C., Holz, T.: Detecting malicious documents with combined static and dynamic analysis. In: Virus Bulletin (2009)

    Google Scholar 

  27. Willems, C., Holz, T., Freiling, F.: Toward automated dynamic malware analysis using cwsandbox. IEEE Secur. Priv. 5, 32–39 (2007)

    Article  Google Scholar 

  28. Rieck, K., Holz, T., Willems, C., Düssel, P., Laskov, P.: Learning and classification of malware behavior. In: Proceedings of the 5th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (2008)

    Google Scholar 

  29. Snow, K.Z., Krishnan, S., Monrose, F., Provos, N.: Shellos: enabling fast detection and forensic analysis of code injection attacks. In: Proceedings of the 20th USENIX Conference on Security (2011)

    Google Scholar 

  30. Nissim, N., Cohen, A., Glezer, C., Elovici, Y.: Detection of malicious PDF files and directions for enhancements: a state-of-the art survey. Comput. Secur. 48, 246–266 (2015)

    Article  Google Scholar 

  31. MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Cam, L.M.L., Neyman, J., (eds.) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press (1967)

    Google Scholar 

  32. Quinlan, J.R.: Learning decision tree classifiers. ACM Comput. Surv. 28, 71–72 (1996)

    Article  Google Scholar 

  33. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997). doi:10.1006/jcss.1997.1504

    Article  MATH  MathSciNet  Google Scholar 

  34. Biggio, B., Corona, I., Maiorca, D., Nelson, B., Šrndić, N., Laskov, P., Giacinto, G., Roli, F.: Evasion attacks against machine learning at test time. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013, Part III. LNCS, vol. 8190, pp. 387–402. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  35. Baldi, P., Brunak, S., Chauvin, Y., Andersen, C.A.F., Nielsen, H.: Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16, 412–424 (2000)

    Article  Google Scholar 

  36. Biggio, B., Fumera, G., Roli, F.: Security evaluation of pattern classifiers under attack. IEEE Trans. Knowl. Data Eng. 26, 984–996 (2014)

    Article  Google Scholar 

  37. Biggio, B., Corona, I., Nelson, B., Rubinstein, B., Maiorca, D., Fumera, G., Giacinto, G., Roli, F.: Security evaluation of support vector machines in adversarial environments. In: Ma, Y., Guo, G. (eds.) Support Vector Machines Applications, pp. 105–153. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  38. Biggio, B., Nelson, B., Laskov, P.: Poisoning attacks against support vector machines. In: Langford, J., Pineau, J. (eds.) 29th International Conference on Machine Learning (ICML). Omnipress (2012)

    Google Scholar 

  39. Biggio, B., Fumera, G., Roli, F.: Multiple classifier systems for robust classifier design in adversarial environments. Int. J. Mach. Learn. Cybernet. 1, 27–41 (2010)

    Article  Google Scholar 

  40. Biggio, B., Rieck, K., Ariu, D., Wressnegger, C., Corona, I., Giacinto, G., Roli, F.: Poisoning behavioral malware clustering. In: Proceedings of 2014 Workshop on Artificial Intelligent and Security Workshop, AISec 2014. ACM, New York, pp. 27–36 (2014)

    Google Scholar 

  41. Biggio, B., Pillai, I., Bulò, S.R., Ariu, D., Pelillo, M., Roli, F.: Is data clustering in adversarial settings secure? In: Proceedings of the 2013 ACM Workshop on Artificial Intelligence and Security, AISec 2013, ACM, New York, pp. 87–98 (2013)

    Google Scholar 

Download references

Acknowledgement

This work is supported by the Regional Administration of Sardinia, Italy, within the project “Advanced and secure sharing of multimedia data over social networks in the future Internet” (CUP F71J11000690002). Davide Maiorca gratefully acknowledges Sardinia Regional Government for the financial support of his PhD scholarship (P.O.R. Sardegna F.S.E. Operational Programme of the Autonomous Region of Sardinia, European Social Fund 2007–2013 - Axis IV Human Resources, Objective l.3, Line of Activity l.3.1.).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Davide Maiorca .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Maiorca, D., Ariu, D., Corona, I., Giacinto, G. (2015). An Evasion Resilient Approach to the Detection of Malicious PDF Files. In: Camp, O., Weippl, E., Bidan, C., Aïmeur, E. (eds) Information Systems Security and Privacy. ICISSP 2015. Communications in Computer and Information Science, vol 576. Springer, Cham. https://doi.org/10.1007/978-3-319-27668-7_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27668-7_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27667-0

  • Online ISBN: 978-3-319-27668-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics