PDF-Malware Detection: A Survey and Taxonomy of Current Techniques

Elingiusti, Michele; Aniello, Leonardo; Querzoni, Leonardo; Baldoni, Roberto

doi:10.1007/978-3-319-73951-9_9

Michele Elingiusti⁵,
Leonardo Aniello⁵,
Leonardo Querzoni⁵ &
…
Roberto Baldoni⁵

Part of the book series: Advances in Information Security ((ADIS,volume 70))

4425 Accesses
11 Citations

Abstract

Portable Document Format, more commonly known as PDF, has become, in the last 20 years, a standard for document exchange and dissemination due its portable nature and widespread adoption. The flexibility and power of this format are not only leveraged by benign users, but from hackers as well who have been working to exploit various types of vulnerabilities, overcome security restrictions, and then transform the PDF format in one among the leading malicious code spread vectors. Analyzing the content of malicious PDF files to extract the main features that characterize the malware identity and behavior, is a fundamental task for modern threat intelligence platforms that need to learn how to automatically identify new attacks. This paper surveys existing state of the art about systems for the detection of malicious PDF files and organizes them in a taxonomy that separately considers the used approaches and the data analyzed to detect the presence of malicious code.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This feature is actually reader-dependent. As an example, the Google Chrome PDF reader executes embedded Javascript code within a Google Native Client sandbox.
2.
https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey.
3.
https://github.com/smthmlk/phoneypdf.
4.
https://github.com/jesparza/peepdf.
5.
http://esec-lab.sogeti.com/pages/origami.html.

References

C. Carmony, M. Zhang, X. Hu, A. V. Bhaskar, and H. Yin. Extract me if you can: Abusing pdf parsers in malware detectors. 2016.
Google Scholar
I. Corona, D. Maiorca, D. Ariu, and G. Giacinto. Lux0r: Detection of malicious pdf-embedded javascript code through discriminant analysis of api references. In Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop, pages 47–57. ACM, 2014.
Google Scholar
Document management – portable document format – part 1: Pdf 1.7. Standard, International Organization for Standardization, Geneva, CH, Mar. 2008.
Google Scholar
S. Karademir, T. Dean, and S. Leblanc. Using clone detection to find malware in acrobat files. In Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research, pages 70–80. IBM Corp., 2013.
Google Scholar
J. Kittilsen. Detecting malicious pdf documents. Master’s thesis, 2011.
Google Scholar
P. Laskov et al. Practical evasion of a learning-based classifier: A case study. In Security and Privacy (SP), 2014 IEEE Symposium on, pages 197–211. IEEE, 2014.
Google Scholar
P. Laskov and N. Šrndić. Static detection of malicious javascript-bearing pdf documents. In Proceedings of the 27th Annual Computer Security Applications Conference, pages 373–382. ACM, 2011.
Google Scholar
K. Liu. Dig into the attack surface of pdf and gain 100+ cves in 1 year. White paper at Black Hat Asia 2016, 2017.
Google Scholar
X. Lu, J. Zhuge, R. Wang, Y. Cao, and Y. Chen. De-obfuscation and detection of malicious pdf files with high accuracy. In System sciences (HICSS), 2013 46th Hawaii international conference on, pages 4890–4899. IEEE, 2013.
Google Scholar
D. Maiorca, D. Ariu, I. Corona, and G. Giacinto. A Structural and Content-based Approach for a Precise and Robust Detection of Malicious PDF Files. In Proceedings of the 1st International Conference on Information Systems Security and Privacy (ICISSP 2015), pages 27–36, 2015.
Google Scholar
D. Maiorca, G. Giacinto, and I. Corona. A Pattern Recognition System for Malicious PDF Files Detection. In P. Perner, editor, MLDM, volume 7376 of Lecture Notes in Computer Science, pages 510–524. Springer, 2012.
Chapter Google Scholar
N. Nissim, A. Cohen, C. Glezer, and Y. Elovici. Detection of malicious pdf files and directions for enhancements: a state-of-the art survey. Computers & Security, 48:246–266, 2015.
Article Google Scholar
N. Nissim, A. Cohen, R. Moskovitch, A. Shabtai, M. Edri, O. BarAd, and Y. Elovici. Keeping pace with the creation of new malicious pdf files using an active-learning based detection framework. Security Informatics, 5(1):1, 2016.
Google Scholar
H. Pareek, P. Eswari, and N. S. C. Babu. Malicious PDF Document Detection Based on Feature Extraction and Entropy. International Journal of Security, Privacy and Trust Management, 2(5), 2013.
Article Google Scholar
H. Pareek, P. Eswari, N. S. C. Babu, and C. Bangalore. Entropy and n-gram analysis of malicious pdf documents. International Journal of Engineering, 2(2), 2013.
Google Scholar
M. Polychronakis, K. G. Anagnostakis, and E. P. Markatos. Comprehensive shellcode detection using runtime heuristics. In Proceedings of the 26th Annual Computer Security Applications Conference, ACSAC ’10, pages 287–296, New York, NY, USA, 2010. ACM.
Google Scholar
P. Ratanaworabhan, V. B. Livshits, and B. G. Zorn. Nozzle: A defense against heap-spraying code injection attacks. In USENIX Security Symposium, pages 169–186, 2009.
Google Scholar
C. K. Roy and J. R. Cordy. Nicad: Accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization. In Proceedings of the 2008 The 16th IEEE International Conference on Program Comprehension, ICPC ’08, pages 172–181, Washington, DC, USA, 2008. IEEE Computer Society.
Google Scholar
F. Schmitt, J. Gassen, and E. Gerhards-Padilla. Pdf scrutinizer detecting javascript-based attacks in pdf documents. In Privacy, Security and Trust (PST), 2012 Tenth Annual International Conference on, pages 104–111. IEEE, 2012.
Google Scholar
K. Selvaraj and N. F. Gutierrez. The rise of pdf malware. Symantec Security Response, 2010.
Google Scholar
C. Smutz and A. Stavrou. Malicious PDF detection using metadata and structural features. In Proceedings of the 28th Annual Computer Security Applications Conference, pages 239–248. ACM, 2012.
Google Scholar
K. Z. Snow, S. Krishnan, F. Monrose, and N. Provos. Shellos: Enabling fast detection and forensic analysis of code injection attacks. In USENIX Security Symposium, pages 183–200, 2011.
Google Scholar
N. Šrndic and P. Laskov. Detection of malicious pdf files based on hierarchical document structure. In Proceedings of the 20th Annual Network & Distributed System Security Symposium, 2013.
Google Scholar
N. Šrndić and P. Laskov. Hidost: a static machine-learning-based detector of malicious files. EURASIP Journal on Information Security, 2016(1):22, 2016.
Google Scholar
Z. Tzermias, G. Sykiotakis, M. Polychronakis, and E. P. Markatos. Combining static and dynamic analysis for the detection of malicious documents. In Proceedings of the Fourth European Workshop on System Security, page 4. ACM, 2011.
Google Scholar
C. Vatamanu, D. Gavriluţ, and R. Benchea. A practical approach on clustering malicious pdf documents. Journal in Computer Virology, 8(4):151–163, 2012.
Article Google Scholar

Download references

Acknowledgements

This present work has been partially supported by a grant of the Italian Presidency of Ministry Council, and by CINI Cybersecurity National Laboratory within the project FilieraSicura: Securing the Supply Chain of Domestic Critical Infrastructures from Cyber Attacks (www.filierasicura.it) funded by CISCO Systems Inc. and Leonardo SpA.

Author information

Authors and Affiliations

CIS - Sapienza University of Rome, Rome, Italy
Michele Elingiusti, Leonardo Aniello, Leonardo Querzoni & Roberto Baldoni

Authors

Michele Elingiusti
View author publications
You can also search for this author in PubMed Google Scholar
Leonardo Aniello
View author publications
You can also search for this author in PubMed Google Scholar
Leonardo Querzoni
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Baldoni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leonardo Querzoni .

Editor information

Editors and Affiliations

Department of Computer Science, University of Sheffield, Sheffield, United Kingdom
Ali Dehghantanha
Department of Mathematics, University of Padua, Padua, Italy
Mauro Conti
Department of Computer Science, University of Salford, Manchester, United Kingdom
Tooska Dargahi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Elingiusti, M., Aniello, L., Querzoni, L., Baldoni, R. (2018). PDF-Malware Detection: A Survey and Taxonomy of Current Techniques. In: Dehghantanha, A., Conti, M., Dargahi, T. (eds) Cyber Threat Intelligence. Advances in Information Security, vol 70. Springer, Cham. https://doi.org/10.1007/978-3-319-73951-9_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-73951-9_9
Published: 24 April 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73950-2
Online ISBN: 978-3-319-73951-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics