An Evasion Resilient Approach to the Detection of Malicious PDF Files

Maiorca, Davide; Ariu, Davide; Corona, Igino; Giacinto, Giorgio

doi:10.1007/978-3-319-27668-7_5

Davide Maiorca¹⁴,
Davide Ariu¹⁴,
Igino Corona¹⁴ &
…
Giorgio Giacinto¹⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 576))

Included in the following conference series:

International Conference on Information Systems Security and Privacy

601 Accesses
6 Citations

Abstract

Malicious PDF files still constitute a serious threat to the systems security. New reader vulnerabilities have been discovered, and research has shown that current state of the art approaches can be easily bypassed by exploiting weaknesses caused by erroneous parsing or incomplete information extraction. In this work, we present a novel machine learning system to the detection of malicious PDF files. We have developed a static approach that leverages on information extracted by both the structure and the content of PDF files, which allows to improve the system robustness against evasion attacks. Experimental results show that our system is able to outperform all publicly available state of the art tools. We also report a significant improvement of the performances at detecting reverse mimicry attacks, which are able to completely evade systems that only extract information from the PDF file structure. Finally, we claim that, to avoid targeted attacks, a more careful design of machine learning based detectors is needed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://eternal-todo.com/tools/peepdf-pdf-analysis-tool.
2.
http://esec-lab.sogeti.com/pages/origami.
3.
http://wepawet.iseclab.org/index.php.
4.
http://htmlunit.sourceforge.net.
5.
http://www.mozilla.org/rhino.
6.
http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit.
7.
http://pdfrate.com/.
8.
Streams containing other objects.
9.
A new typology of cross-reference table introduced by recent PDF specification.
10.
The seed value has been set to the default value indicated here: http://weka.sourceforge.net/doc.dev/weka/clusterers/SimpleKMeans.html.
11.
A scan that is stopped if it finds anomalies in the files. This definition is valid for PeePDF; in Origami, such scan is defined as standard mode.
12.
If W is in its percentage form, it must be divided by 100 first.
13.
http://contagiodump.blogspot.it.
14.
Being Wepawet and PDFRate online services, we could not train such systems with our own samples.
15.
For EXE Embedding we exploited the CVE-2010-1240 vulnerability and for PDF Embedding and Javascript Injection we exploited the CVE-2009-0927.

References

Symantec: Internet Security Threat Reports. 2013 Trends. Symantec (2014)
Google Scholar
Buchanan, E., Roemer, R., Sevage, S., Shacham, H.: Return-oriented programming: exploitation without code injection. In: Black Hat 2008 (2008)
Google Scholar
Ratanaworabhan, P., Livshits, B., Zorn, B.: Nozzle: a defense against heap-spraying code injection attacks. In: Proceedings of the 18th Conference on USENIX Security Symposium (2009)
Google Scholar
Bania, P.: Jit spraying and mitigations. CoRR abs/1009.1038 (2010)
Google Scholar
Adobe: Adobe Supplement to ISO 32000. Adobe (2008)
Google Scholar
Esparza, J.M.: Obfuscation and (non-)detection of malicious pdf files. In: S21Sec e-crime (2011)
Google Scholar
Cova, M., Kruegel, C., Vigna, G.: Detection and analysis of drive-by-download attacks and malicious javascript code. In: Proceedings of the 19th International Conference on World Wide Web (2010)
Google Scholar
Laskov, P., Šrndić, N.: Static detection of malicious javascript-bearing pdf documents. In: Proceedings of the 27th Annual Computer Security Applications Conference (2011)
Google Scholar
Tzermias, Z., Sykiotakis, G., Polychronakis, M., Markatos, E.P.: Combining static and dynamic analysis for the detection of malicious documents. In: Proceedings of the 4th European Workshop on System Security (2011)
Google Scholar
Maiorca, D., Giacinto, G., Corona, I.: A pattern recognition system for malicious pdf files detection. In: Proceedings of the 8th International Conference on Machine Learning and Data Mining in Pattern Recognition (2012)
Google Scholar
Smutz, C., Stavrou, A.: Malicious pdf detection using metadata and structural features. In: Proceedings of the 28th Annual Computer Security Applications Conference (2012)
Google Scholar
Šrndić, N., Laskov, P.: Detection of malicious pdf files based on hierarchical document structure. In: Proceedings of the 20th Annual Network and Distributed System Security Symposium (2013)
Google Scholar
Maiorca, D., Corona, I., Giacinto, G.: Looking at the bag is not enough to find the bomb: an evasion of structural methods for malicious pdf files detection. In: Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security (2013)
Google Scholar
Šrndic, N., Laskov, P.: Practical evasion of a learning-based classifier: a case study. In: Proceedings of the 2014 IEEE Symposium on Security and Privacy, SP 2014, pp. 197–211. IEEE Computer Society, Washington, D.C. (2014)
Google Scholar
Corona, I., Maiorca, D., Ariu, D., Giacinto, G.: Lux0r: detection of malicious pdf-embedded javascript code through discriminant analysis of API references. In: Proceedings of the 7th ACM Workshop on Artificial Intelligence and Security (AiSEC). Scottdale, Arizona, USA (2014)
Google Scholar
Liu, D., Wang, H., Stavrou, A.: Detecting malicious javascript in pdf through document instrumentation. In: Proceedings of the 44th Annual International Conference on Dependable Systems and Networks (2014)
Google Scholar
Maass, M., Scherlis, W.L., Aldrich, J.: In-nimbo sandboxing. In: Proceedings of the 2014 Symposium and Bootcamp on the Science of Security, HotSoS 2014. ACM, New York, pp. 1:1–1:12 (2014)
Google Scholar
Maiorca, D., Ariu, D., Corona, I., Giacinto, G.: A structural and content-based approach for a precise and robust detection of malicious pdf files. In: Proceedings of the 1st International Conference on Information Systems Security and Privacy (ICISSP 2015), pp. 27–36. INSTICC (2015)
Google Scholar
Adobe: PDF Reference. Adobe Portable Document Format Version 1.7. Adobe (2006)
Google Scholar
Li, W.J., Stolfo, S., Stavrou, A., Androulaki, E., Keromytis, A.D.: A study of malcode-bearing documents. In: Proceedings of the 4th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (2007)
Google Scholar
Shafiq, M.Z., Khayam, S.A., Farooq, M.: Embedded malware detection using markov n-grams. In: Proceedings of the 5th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (2008)
Google Scholar
Tabish, S.M., Shafiq, M.Z., Farooq, M.: Malware detection using statistical analysis of byte-level file content. In: Proceedings of the ACM SIGKDD Workshop on CyberSecurity and Intelligence Informatics (2009)
Google Scholar
Rieck, K., Krueger, T., Dewald, A.: Cujo: efficient detection and prevention of drive-by-download attacks. In: Proceedings of the 26th Annual Computer Security Applications Conference (2010)
Google Scholar
Curtsinger, C., Livshits, B., Zorn, B., Seifert, C.: Zozzle: fast and precise in-browser javascript malware detection. In: Proceedings of the 20th USENIX Conference on Security (2011)
Google Scholar
Canali, D., Cova, M., Vigna, G., Kruegel, C.: Prophiler: a fast filter for the large-scale detection of malicious web pages. In: Proceedings of the 20th International Conference on World Wide Web (2011)
Google Scholar
Engleberth, M., Willems, C., Holz, T.: Detecting malicious documents with combined static and dynamic analysis. In: Virus Bulletin (2009)
Google Scholar
Willems, C., Holz, T., Freiling, F.: Toward automated dynamic malware analysis using cwsandbox. IEEE Secur. Priv. 5, 32–39 (2007)
Article Google Scholar
Rieck, K., Holz, T., Willems, C., Düssel, P., Laskov, P.: Learning and classification of malware behavior. In: Proceedings of the 5th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (2008)
Google Scholar
Snow, K.Z., Krishnan, S., Monrose, F., Provos, N.: Shellos: enabling fast detection and forensic analysis of code injection attacks. In: Proceedings of the 20th USENIX Conference on Security (2011)
Google Scholar
Nissim, N., Cohen, A., Glezer, C., Elovici, Y.: Detection of malicious PDF files and directions for enhancements: a state-of-the art survey. Comput. Secur. 48, 246–266 (2015)
Article Google Scholar
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Cam, L.M.L., Neyman, J., (eds.) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press (1967)
Google Scholar
Quinlan, J.R.: Learning decision tree classifiers. ACM Comput. Surv. 28, 71–72 (1996)
Article Google Scholar
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997). doi:10.1006/jcss.1997.1504
Article MATH MathSciNet Google Scholar
Biggio, B., Corona, I., Maiorca, D., Nelson, B., Šrndić, N., Laskov, P., Giacinto, G., Roli, F.: Evasion attacks against machine learning at test time. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013, Part III. LNCS, vol. 8190, pp. 387–402. Springer, Heidelberg (2013)
Chapter Google Scholar
Baldi, P., Brunak, S., Chauvin, Y., Andersen, C.A.F., Nielsen, H.: Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16, 412–424 (2000)
Article Google Scholar
Biggio, B., Fumera, G., Roli, F.: Security evaluation of pattern classifiers under attack. IEEE Trans. Knowl. Data Eng. 26, 984–996 (2014)
Article Google Scholar
Biggio, B., Corona, I., Nelson, B., Rubinstein, B., Maiorca, D., Fumera, G., Giacinto, G., Roli, F.: Security evaluation of support vector machines in adversarial environments. In: Ma, Y., Guo, G. (eds.) Support Vector Machines Applications, pp. 105–153. Springer, Heidelberg (2014)
Chapter Google Scholar
Biggio, B., Nelson, B., Laskov, P.: Poisoning attacks against support vector machines. In: Langford, J., Pineau, J. (eds.) 29th International Conference on Machine Learning (ICML). Omnipress (2012)
Google Scholar
Biggio, B., Fumera, G., Roli, F.: Multiple classifier systems for robust classifier design in adversarial environments. Int. J. Mach. Learn. Cybernet. 1, 27–41 (2010)
Article Google Scholar
Biggio, B., Rieck, K., Ariu, D., Wressnegger, C., Corona, I., Giacinto, G., Roli, F.: Poisoning behavioral malware clustering. In: Proceedings of 2014 Workshop on Artificial Intelligent and Security Workshop, AISec 2014. ACM, New York, pp. 27–36 (2014)
Google Scholar
Biggio, B., Pillai, I., Bulò, S.R., Ariu, D., Pelillo, M., Roli, F.: Is data clustering in adversarial settings secure? In: Proceedings of the 2013 ACM Workshop on Artificial Intelligence and Security, AISec 2013, ACM, New York, pp. 87–98 (2013)
Google Scholar

Download references

Acknowledgement

This work is supported by the Regional Administration of Sardinia, Italy, within the project “Advanced and secure sharing of multimedia data over social networks in the future Internet” (CUP F71J11000690002). Davide Maiorca gratefully acknowledges Sardinia Regional Government for the financial support of his PhD scholarship (P.O.R. Sardegna F.S.E. Operational Programme of the Autonomous Region of Sardinia, European Social Fund 2007–2013 - Axis IV Human Resources, Objective l.3, Line of Activity l.3.1.).

Author information

Authors and Affiliations

University of Cagliari, Piazza d’Armi, 09123, Cagliari, Italy
Davide Maiorca, Davide Ariu, Igino Corona & Giorgio Giacinto

Authors

Davide Maiorca
View author publications
You can also search for this author in PubMed Google Scholar
Davide Ariu
View author publications
You can also search for this author in PubMed Google Scholar
Igino Corona
View author publications
You can also search for this author in PubMed Google Scholar
Giorgio Giacinto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Davide Maiorca .

Editor information

Editors and Affiliations

MODESTE/ESEO, Angers, France
Olivier Camp
SBA Research, Wien, Austria
Edgar Weippl
Supélec, Cesson, France
Christophe Bidan
Université de Montréal, Montreal, Québec, Canada
Esma Aïmeur

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maiorca, D., Ariu, D., Corona, I., Giacinto, G. (2015). An Evasion Resilient Approach to the Detection of Malicious PDF Files. In: Camp, O., Weippl, E., Bidan, C., Aïmeur, E. (eds) Information Systems Security and Privacy. ICISSP 2015. Communications in Computer and Information Science, vol 576. Springer, Cham. https://doi.org/10.1007/978-3-319-27668-7_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-27668-7_5
Published: 01 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27667-0
Online ISBN: 978-3-319-27668-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics