Skip to main content

PDF-Malware Detection: A Survey and Taxonomy of Current Techniques

  • Chapter
  • First Online:
Cyber Threat Intelligence

Part of the book series: Advances in Information Security ((ADIS,volume 70))

Abstract

Portable Document Format, more commonly known as PDF, has become, in the last 20 years, a standard for document exchange and dissemination due its portable nature and widespread adoption. The flexibility and power of this format are not only leveraged by benign users, but from hackers as well who have been working to exploit various types of vulnerabilities, overcome security restrictions, and then transform the PDF format in one among the leading malicious code spread vectors. Analyzing the content of malicious PDF files to extract the main features that characterize the malware identity and behavior, is a fundamental task for modern threat intelligence platforms that need to learn how to automatically identify new attacks. This paper surveys existing state of the art about systems for the detection of malicious PDF files and organizes them in a taxonomy that separately considers the used approaches and the data analyzed to detect the presence of malicious code.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This feature is actually reader-dependent. As an example, the Google Chrome PDF reader executes embedded Javascript code within a Google Native Client sandbox.

  2. 2.

    https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey.

  3. 3.

    https://github.com/smthmlk/phoneypdf.

  4. 4.

    https://github.com/jesparza/peepdf.

  5. 5.

    http://esec-lab.sogeti.com/pages/origami.html.

References

  1. C. Carmony, M. Zhang, X. Hu, A. V. Bhaskar, and H. Yin. Extract me if you can: Abusing pdf parsers in malware detectors. 2016.

    Google Scholar 

  2. I. Corona, D. Maiorca, D. Ariu, and G. Giacinto. Lux0r: Detection of malicious pdf-embedded javascript code through discriminant analysis of api references. In Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop, pages 47–57. ACM, 2014.

    Google Scholar 

  3. Document management – portable document format – part 1: Pdf 1.7. Standard, International Organization for Standardization, Geneva, CH, Mar. 2008.

    Google Scholar 

  4. S. Karademir, T. Dean, and S. Leblanc. Using clone detection to find malware in acrobat files. In Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research, pages 70–80. IBM Corp., 2013.

    Google Scholar 

  5. J. Kittilsen. Detecting malicious pdf documents. Master’s thesis, 2011.

    Google Scholar 

  6. P. Laskov et al. Practical evasion of a learning-based classifier: A case study. In Security and Privacy (SP), 2014 IEEE Symposium on, pages 197–211. IEEE, 2014.

    Google Scholar 

  7. P. Laskov and N. Šrndić. Static detection of malicious javascript-bearing pdf documents. In Proceedings of the 27th Annual Computer Security Applications Conference, pages 373–382. ACM, 2011.

    Google Scholar 

  8. K. Liu. Dig into the attack surface of pdf and gain 100+ cves in 1 year. White paper at Black Hat Asia 2016, 2017.

    Google Scholar 

  9. X. Lu, J. Zhuge, R. Wang, Y. Cao, and Y. Chen. De-obfuscation and detection of malicious pdf files with high accuracy. In System sciences (HICSS), 2013 46th Hawaii international conference on, pages 4890–4899. IEEE, 2013.

    Google Scholar 

  10. D. Maiorca, D. Ariu, I. Corona, and G. Giacinto. A Structural and Content-based Approach for a Precise and Robust Detection of Malicious PDF Files. In Proceedings of the 1st International Conference on Information Systems Security and Privacy (ICISSP 2015), pages 27–36, 2015.

    Google Scholar 

  11. D. Maiorca, G. Giacinto, and I. Corona. A Pattern Recognition System for Malicious PDF Files Detection. In P. Perner, editor, MLDM, volume 7376 of Lecture Notes in Computer Science, pages 510–524. Springer, 2012.

    Chapter  Google Scholar 

  12. N. Nissim, A. Cohen, C. Glezer, and Y. Elovici. Detection of malicious pdf files and directions for enhancements: a state-of-the art survey. Computers & Security, 48:246–266, 2015.

    Article  Google Scholar 

  13. N. Nissim, A. Cohen, R. Moskovitch, A. Shabtai, M. Edri, O. BarAd, and Y. Elovici. Keeping pace with the creation of new malicious pdf files using an active-learning based detection framework. Security Informatics, 5(1):1, 2016.

    Google Scholar 

  14. H. Pareek, P. Eswari, and N. S. C. Babu. Malicious PDF Document Detection Based on Feature Extraction and Entropy. International Journal of Security, Privacy and Trust Management, 2(5), 2013.

    Article  Google Scholar 

  15. H. Pareek, P. Eswari, N. S. C. Babu, and C. Bangalore. Entropy and n-gram analysis of malicious pdf documents. International Journal of Engineering, 2(2), 2013.

    Google Scholar 

  16. M. Polychronakis, K. G. Anagnostakis, and E. P. Markatos. Comprehensive shellcode detection using runtime heuristics. In Proceedings of the 26th Annual Computer Security Applications Conference, ACSAC ’10, pages 287–296, New York, NY, USA, 2010. ACM.

    Google Scholar 

  17. P. Ratanaworabhan, V. B. Livshits, and B. G. Zorn. Nozzle: A defense against heap-spraying code injection attacks. In USENIX Security Symposium, pages 169–186, 2009.

    Google Scholar 

  18. C. K. Roy and J. R. Cordy. Nicad: Accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization. In Proceedings of the 2008 The 16th IEEE International Conference on Program Comprehension, ICPC ’08, pages 172–181, Washington, DC, USA, 2008. IEEE Computer Society.

    Google Scholar 

  19. F. Schmitt, J. Gassen, and E. Gerhards-Padilla. Pdf scrutinizer detecting javascript-based attacks in pdf documents. In Privacy, Security and Trust (PST), 2012 Tenth Annual International Conference on, pages 104–111. IEEE, 2012.

    Google Scholar 

  20. K. Selvaraj and N. F. Gutierrez. The rise of pdf malware. Symantec Security Response, 2010.

    Google Scholar 

  21. C. Smutz and A. Stavrou. Malicious PDF detection using metadata and structural features. In Proceedings of the 28th Annual Computer Security Applications Conference, pages 239–248. ACM, 2012.

    Google Scholar 

  22. K. Z. Snow, S. Krishnan, F. Monrose, and N. Provos. Shellos: Enabling fast detection and forensic analysis of code injection attacks. In USENIX Security Symposium, pages 183–200, 2011.

    Google Scholar 

  23. N. Å rndic and P. Laskov. Detection of malicious pdf files based on hierarchical document structure. In Proceedings of the 20th Annual Network & Distributed System Security Symposium, 2013.

    Google Scholar 

  24. N. Šrndić and P. Laskov. Hidost: a static machine-learning-based detector of malicious files. EURASIP Journal on Information Security, 2016(1):22, 2016.

    Google Scholar 

  25. Z. Tzermias, G. Sykiotakis, M. Polychronakis, and E. P. Markatos. Combining static and dynamic analysis for the detection of malicious documents. In Proceedings of the Fourth European Workshop on System Security, page 4. ACM, 2011.

    Google Scholar 

  26. C. Vatamanu, D. Gavriluţ, and R. Benchea. A practical approach on clustering malicious pdf documents. Journal in Computer Virology, 8(4):151–163, 2012.

    Article  Google Scholar 

Download references

Acknowledgements

This present work has been partially supported by a grant of the Italian Presidency of Ministry Council, and by CINI Cybersecurity National Laboratory within the project FilieraSicura: Securing the Supply Chain of Domestic Critical Infrastructures from Cyber Attacks (www.filierasicura.it) funded by CISCO Systems Inc. and Leonardo SpA.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leonardo Querzoni .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Elingiusti, M., Aniello, L., Querzoni, L., Baldoni, R. (2018). PDF-Malware Detection: A Survey and Taxonomy of Current Techniques. In: Dehghantanha, A., Conti, M., Dargahi, T. (eds) Cyber Threat Intelligence. Advances in Information Security, vol 70. Springer, Cham. https://doi.org/10.1007/978-3-319-73951-9_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73951-9_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73950-2

  • Online ISBN: 978-3-319-73951-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics