Abstract
Portable Document Format, more commonly known as PDF, has become, in the last 20 years, a standard for document exchange and dissemination due its portable nature and widespread adoption. The flexibility and power of this format are not only leveraged by benign users, but from hackers as well who have been working to exploit various types of vulnerabilities, overcome security restrictions, and then transform the PDF format in one among the leading malicious code spread vectors. Analyzing the content of malicious PDF files to extract the main features that characterize the malware identity and behavior, is a fundamental task for modern threat intelligence platforms that need to learn how to automatically identify new attacks. This paper surveys existing state of the art about systems for the detection of malicious PDF files and organizes them in a taxonomy that separately considers the used approaches and the data analyzed to detect the presence of malicious code.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
This feature is actually reader-dependent. As an example, the Google Chrome PDF reader executes embedded Javascript code within a Google Native Client sandbox.
- 2.
- 3.
- 4.
- 5.
References
C. Carmony, M. Zhang, X. Hu, A. V. Bhaskar, and H. Yin. Extract me if you can: Abusing pdf parsers in malware detectors. 2016.
I. Corona, D. Maiorca, D. Ariu, and G. Giacinto. Lux0r: Detection of malicious pdf-embedded javascript code through discriminant analysis of api references. In Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop, pages 47–57. ACM, 2014.
Document management – portable document format – part 1: Pdf 1.7. Standard, International Organization for Standardization, Geneva, CH, Mar. 2008.
S. Karademir, T. Dean, and S. Leblanc. Using clone detection to find malware in acrobat files. In Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research, pages 70–80. IBM Corp., 2013.
J. Kittilsen. Detecting malicious pdf documents. Master’s thesis, 2011.
P. Laskov et al. Practical evasion of a learning-based classifier: A case study. In Security and Privacy (SP), 2014 IEEE Symposium on, pages 197–211. IEEE, 2014.
P. Laskov and N. Šrndić. Static detection of malicious javascript-bearing pdf documents. In Proceedings of the 27th Annual Computer Security Applications Conference, pages 373–382. ACM, 2011.
K. Liu. Dig into the attack surface of pdf and gain 100+ cves in 1 year. White paper at Black Hat Asia 2016, 2017.
X. Lu, J. Zhuge, R. Wang, Y. Cao, and Y. Chen. De-obfuscation and detection of malicious pdf files with high accuracy. In System sciences (HICSS), 2013 46th Hawaii international conference on, pages 4890–4899. IEEE, 2013.
D. Maiorca, D. Ariu, I. Corona, and G. Giacinto. A Structural and Content-based Approach for a Precise and Robust Detection of Malicious PDF Files. In Proceedings of the 1st International Conference on Information Systems Security and Privacy (ICISSP 2015), pages 27–36, 2015.
D. Maiorca, G. Giacinto, and I. Corona. A Pattern Recognition System for Malicious PDF Files Detection. In P. Perner, editor, MLDM, volume 7376 of Lecture Notes in Computer Science, pages 510–524. Springer, 2012.
N. Nissim, A. Cohen, C. Glezer, and Y. Elovici. Detection of malicious pdf files and directions for enhancements: a state-of-the art survey. Computers & Security, 48:246–266, 2015.
N. Nissim, A. Cohen, R. Moskovitch, A. Shabtai, M. Edri, O. BarAd, and Y. Elovici. Keeping pace with the creation of new malicious pdf files using an active-learning based detection framework. Security Informatics, 5(1):1, 2016.
H. Pareek, P. Eswari, and N. S. C. Babu. Malicious PDF Document Detection Based on Feature Extraction and Entropy. International Journal of Security, Privacy and Trust Management, 2(5), 2013.
H. Pareek, P. Eswari, N. S. C. Babu, and C. Bangalore. Entropy and n-gram analysis of malicious pdf documents. International Journal of Engineering, 2(2), 2013.
M. Polychronakis, K. G. Anagnostakis, and E. P. Markatos. Comprehensive shellcode detection using runtime heuristics. In Proceedings of the 26th Annual Computer Security Applications Conference, ACSAC ’10, pages 287–296, New York, NY, USA, 2010. ACM.
P. Ratanaworabhan, V. B. Livshits, and B. G. Zorn. Nozzle: A defense against heap-spraying code injection attacks. In USENIX Security Symposium, pages 169–186, 2009.
C. K. Roy and J. R. Cordy. Nicad: Accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization. In Proceedings of the 2008 The 16th IEEE International Conference on Program Comprehension, ICPC ’08, pages 172–181, Washington, DC, USA, 2008. IEEE Computer Society.
F. Schmitt, J. Gassen, and E. Gerhards-Padilla. Pdf scrutinizer detecting javascript-based attacks in pdf documents. In Privacy, Security and Trust (PST), 2012 Tenth Annual International Conference on, pages 104–111. IEEE, 2012.
K. Selvaraj and N. F. Gutierrez. The rise of pdf malware. Symantec Security Response, 2010.
C. Smutz and A. Stavrou. Malicious PDF detection using metadata and structural features. In Proceedings of the 28th Annual Computer Security Applications Conference, pages 239–248. ACM, 2012.
K. Z. Snow, S. Krishnan, F. Monrose, and N. Provos. Shellos: Enabling fast detection and forensic analysis of code injection attacks. In USENIX Security Symposium, pages 183–200, 2011.
N. Å rndic and P. Laskov. Detection of malicious pdf files based on hierarchical document structure. In Proceedings of the 20th Annual Network & Distributed System Security Symposium, 2013.
N. Šrndić and P. Laskov. Hidost: a static machine-learning-based detector of malicious files. EURASIP Journal on Information Security, 2016(1):22, 2016.
Z. Tzermias, G. Sykiotakis, M. Polychronakis, and E. P. Markatos. Combining static and dynamic analysis for the detection of malicious documents. In Proceedings of the Fourth European Workshop on System Security, page 4. ACM, 2011.
C. Vatamanu, D. Gavriluţ, and R. Benchea. A practical approach on clustering malicious pdf documents. Journal in Computer Virology, 8(4):151–163, 2012.
Acknowledgements
This present work has been partially supported by a grant of the Italian Presidency of Ministry Council, and by CINI Cybersecurity National Laboratory within the project FilieraSicura: Securing the Supply Chain of Domestic Critical Infrastructures from Cyber Attacks (www.filierasicura.it) funded by CISCO Systems Inc. and Leonardo SpA.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Elingiusti, M., Aniello, L., Querzoni, L., Baldoni, R. (2018). PDF-Malware Detection: A Survey and Taxonomy of Current Techniques. In: Dehghantanha, A., Conti, M., Dargahi, T. (eds) Cyber Threat Intelligence. Advances in Information Security, vol 70. Springer, Cham. https://doi.org/10.1007/978-3-319-73951-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-73951-9_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73950-2
Online ISBN: 978-3-319-73951-9
eBook Packages: Computer ScienceComputer Science (R0)