Abstract
Malicious PDF files still constitute a serious threat to the systems security. New reader vulnerabilities have been discovered, and research has shown that current state of the art approaches can be easily bypassed by exploiting weaknesses caused by erroneous parsing or incomplete information extraction. In this work, we present a novel machine learning system to the detection of malicious PDF files. We have developed a static approach that leverages on information extracted by both the structure and the content of PDF files, which allows to improve the system robustness against evasion attacks. Experimental results show that our system is able to outperform all publicly available state of the art tools. We also report a significant improvement of the performances at detecting reverse mimicry attacks, which are able to completely evade systems that only extract information from the PDF file structure. Finally, we claim that, to avoid targeted attacks, a more careful design of machine learning based detectors is needed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
Streams containing other objects.
- 9.
A new typology of cross-reference table introduced by recent PDF specification.
- 10.
The seed value has been set to the default value indicated here: http://weka.sourceforge.net/doc.dev/weka/clusterers/SimpleKMeans.html.
- 11.
A scan that is stopped if it finds anomalies in the files. This definition is valid for PeePDF; in Origami, such scan is defined as standard mode.
- 12.
If W is in its percentage form, it must be divided by 100 first.
- 13.
- 14.
Being Wepawet and PDFRate online services, we could not train such systems with our own samples.
- 15.
For EXE Embedding we exploited the CVE-2010-1240 vulnerability and for PDF Embedding and Javascript Injection we exploited the CVE-2009-0927.
References
Symantec: Internet Security Threat Reports. 2013 Trends. Symantec (2014)
Buchanan, E., Roemer, R., Sevage, S., Shacham, H.: Return-oriented programming: exploitation without code injection. In: Black Hat 2008 (2008)
Ratanaworabhan, P., Livshits, B., Zorn, B.: Nozzle: a defense against heap-spraying code injection attacks. In: Proceedings of the 18th Conference on USENIX Security Symposium (2009)
Bania, P.: Jit spraying and mitigations. CoRR abs/1009.1038 (2010)
Adobe: Adobe Supplement to ISO 32000. Adobe (2008)
Esparza, J.M.: Obfuscation and (non-)detection of malicious pdf files. In: S21Sec e-crime (2011)
Cova, M., Kruegel, C., Vigna, G.: Detection and analysis of drive-by-download attacks and malicious javascript code. In: Proceedings of the 19th International Conference on World Wide Web (2010)
Laskov, P., Šrndić, N.: Static detection of malicious javascript-bearing pdf documents. In: Proceedings of the 27th Annual Computer Security Applications Conference (2011)
Tzermias, Z., Sykiotakis, G., Polychronakis, M., Markatos, E.P.: Combining static and dynamic analysis for the detection of malicious documents. In: Proceedings of the 4th European Workshop on System Security (2011)
Maiorca, D., Giacinto, G., Corona, I.: A pattern recognition system for malicious pdf files detection. In: Proceedings of the 8th International Conference on Machine Learning and Data Mining in Pattern Recognition (2012)
Smutz, C., Stavrou, A.: Malicious pdf detection using metadata and structural features. In: Proceedings of the 28th Annual Computer Security Applications Conference (2012)
Šrndić, N., Laskov, P.: Detection of malicious pdf files based on hierarchical document structure. In: Proceedings of the 20th Annual Network and Distributed System Security Symposium (2013)
Maiorca, D., Corona, I., Giacinto, G.: Looking at the bag is not enough to find the bomb: an evasion of structural methods for malicious pdf files detection. In: Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security (2013)
Šrndic, N., Laskov, P.: Practical evasion of a learning-based classifier: a case study. In: Proceedings of the 2014 IEEE Symposium on Security and Privacy, SP 2014, pp. 197–211. IEEE Computer Society, Washington, D.C. (2014)
Corona, I., Maiorca, D., Ariu, D., Giacinto, G.: Lux0r: detection of malicious pdf-embedded javascript code through discriminant analysis of API references. In: Proceedings of the 7th ACM Workshop on Artificial Intelligence and Security (AiSEC). Scottdale, Arizona, USA (2014)
Liu, D., Wang, H., Stavrou, A.: Detecting malicious javascript in pdf through document instrumentation. In: Proceedings of the 44th Annual International Conference on Dependable Systems and Networks (2014)
Maass, M., Scherlis, W.L., Aldrich, J.: In-nimbo sandboxing. In: Proceedings of the 2014 Symposium and Bootcamp on the Science of Security, HotSoS 2014. ACM, New York, pp. 1:1–1:12 (2014)
Maiorca, D., Ariu, D., Corona, I., Giacinto, G.: A structural and content-based approach for a precise and robust detection of malicious pdf files. In: Proceedings of the 1st International Conference on Information Systems Security and Privacy (ICISSP 2015), pp. 27–36. INSTICC (2015)
Adobe: PDF Reference. Adobe Portable Document Format Version 1.7. Adobe (2006)
Li, W.J., Stolfo, S., Stavrou, A., Androulaki, E., Keromytis, A.D.: A study of malcode-bearing documents. In: Proceedings of the 4th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (2007)
Shafiq, M.Z., Khayam, S.A., Farooq, M.: Embedded malware detection using markov n-grams. In: Proceedings of the 5th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (2008)
Tabish, S.M., Shafiq, M.Z., Farooq, M.: Malware detection using statistical analysis of byte-level file content. In: Proceedings of the ACM SIGKDD Workshop on CyberSecurity and Intelligence Informatics (2009)
Rieck, K., Krueger, T., Dewald, A.: Cujo: efficient detection and prevention of drive-by-download attacks. In: Proceedings of the 26th Annual Computer Security Applications Conference (2010)
Curtsinger, C., Livshits, B., Zorn, B., Seifert, C.: Zozzle: fast and precise in-browser javascript malware detection. In: Proceedings of the 20th USENIX Conference on Security (2011)
Canali, D., Cova, M., Vigna, G., Kruegel, C.: Prophiler: a fast filter for the large-scale detection of malicious web pages. In: Proceedings of the 20th International Conference on World Wide Web (2011)
Engleberth, M., Willems, C., Holz, T.: Detecting malicious documents with combined static and dynamic analysis. In: Virus Bulletin (2009)
Willems, C., Holz, T., Freiling, F.: Toward automated dynamic malware analysis using cwsandbox. IEEE Secur. Priv. 5, 32–39 (2007)
Rieck, K., Holz, T., Willems, C., Düssel, P., Laskov, P.: Learning and classification of malware behavior. In: Proceedings of the 5th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (2008)
Snow, K.Z., Krishnan, S., Monrose, F., Provos, N.: Shellos: enabling fast detection and forensic analysis of code injection attacks. In: Proceedings of the 20th USENIX Conference on Security (2011)
Nissim, N., Cohen, A., Glezer, C., Elovici, Y.: Detection of malicious PDF files and directions for enhancements: a state-of-the art survey. Comput. Secur. 48, 246–266 (2015)
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Cam, L.M.L., Neyman, J., (eds.) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press (1967)
Quinlan, J.R.: Learning decision tree classifiers. ACM Comput. Surv. 28, 71–72 (1996)
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997). doi:10.1006/jcss.1997.1504
Biggio, B., Corona, I., Maiorca, D., Nelson, B., Šrndić, N., Laskov, P., Giacinto, G., Roli, F.: Evasion attacks against machine learning at test time. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013, Part III. LNCS, vol. 8190, pp. 387–402. Springer, Heidelberg (2013)
Baldi, P., Brunak, S., Chauvin, Y., Andersen, C.A.F., Nielsen, H.: Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16, 412–424 (2000)
Biggio, B., Fumera, G., Roli, F.: Security evaluation of pattern classifiers under attack. IEEE Trans. Knowl. Data Eng. 26, 984–996 (2014)
Biggio, B., Corona, I., Nelson, B., Rubinstein, B., Maiorca, D., Fumera, G., Giacinto, G., Roli, F.: Security evaluation of support vector machines in adversarial environments. In: Ma, Y., Guo, G. (eds.) Support Vector Machines Applications, pp. 105–153. Springer, Heidelberg (2014)
Biggio, B., Nelson, B., Laskov, P.: Poisoning attacks against support vector machines. In: Langford, J., Pineau, J. (eds.) 29th International Conference on Machine Learning (ICML). Omnipress (2012)
Biggio, B., Fumera, G., Roli, F.: Multiple classifier systems for robust classifier design in adversarial environments. Int. J. Mach. Learn. Cybernet. 1, 27–41 (2010)
Biggio, B., Rieck, K., Ariu, D., Wressnegger, C., Corona, I., Giacinto, G., Roli, F.: Poisoning behavioral malware clustering. In: Proceedings of 2014 Workshop on Artificial Intelligent and Security Workshop, AISec 2014. ACM, New York, pp. 27–36 (2014)
Biggio, B., Pillai, I., Bulò, S.R., Ariu, D., Pelillo, M., Roli, F.: Is data clustering in adversarial settings secure? In: Proceedings of the 2013 ACM Workshop on Artificial Intelligence and Security, AISec 2013, ACM, New York, pp. 87–98 (2013)
Acknowledgement
This work is supported by the Regional Administration of Sardinia, Italy, within the project “Advanced and secure sharing of multimedia data over social networks in the future Internet” (CUP F71J11000690002). Davide Maiorca gratefully acknowledges Sardinia Regional Government for the financial support of his PhD scholarship (P.O.R. Sardegna F.S.E. Operational Programme of the Autonomous Region of Sardinia, European Social Fund 2007–2013 - Axis IV Human Resources, Objective l.3, Line of Activity l.3.1.).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Maiorca, D., Ariu, D., Corona, I., Giacinto, G. (2015). An Evasion Resilient Approach to the Detection of Malicious PDF Files. In: Camp, O., Weippl, E., Bidan, C., Aïmeur, E. (eds) Information Systems Security and Privacy. ICISSP 2015. Communications in Computer and Information Science, vol 576. Springer, Cham. https://doi.org/10.1007/978-3-319-27668-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-27668-7_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27667-0
Online ISBN: 978-3-319-27668-7
eBook Packages: Computer ScienceComputer Science (R0)