File Guard: automatic format-based media file sanitization

Zhang, Tao; Lee, Wang Hao; Gao, Mingyuan; Zhou, Jianying

doi:10.1007/s10207-019-00440-3

File Guard: automatic format-based media file sanitization

A black-box approach against vulnerability exploitation

regular contribution
Published: 06 June 2019

Volume 18, pages 701–713, (2019)
Cite this article

International Journal of Information Security Aims and scope Submit manuscript

Tao Zhang ORCID: orcid.org/0000-0002-4156-8116¹,
Wang Hao Lee²,
Mingyuan Gao² &
…
Jianying Zhou³

309 Accesses
1 Citation
Explore all metrics

Abstract

This paper proposes a format-based file sanitization mechanism, File Guard, aiming at preventing software vulnerabilities from being triggered by input files. Based on our experiments and the statistics on Common Vulnerabilities and Exposures, we observed that most of the software vulnerabilities are exploited by malformed input files which violate their corresponding format standards. Hence, repairing the input file according to the format standard is effective in preventing the vulnerabilities from being exploited. In this work, we focus on media file types, such as image, audio, video. File Guard automatically sanitizes input files of known formats, disabling the files from triggering potential vulnerabilities in a target system (or a group of systems) with minimum data loss. Existing intrusion prevention systems and anti-virus tools typically compare the input file or executable with the ones which are known to be malicious, and eliminate such data or software before they get into the protected system. Instead of blocking the malicious or damaged input file, our mechanism repairs the data and provides a more user-friendly protection. File Guard can ensure that the input file meets its standard format and thus is not exploitable even by zero-day vulnerabilities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Fig. 3

Sniping at web applications to discover input-handling vulnerabilities

Article Open access 12 April 2024

Cloud Security Threats and Solutions: A Survey

Article 19 September 2022

Digital Forensics AI: Evaluating, Standardizing and Optimizing Digital Evidence Mining Techniques

Article Open access 12 May 2022

Notes

Most of vulnerabilities of this kind are discovered by STAMP.

References

International Organization for Standardization (ISO). https://www.iso.org/
Common Vulnerabilities and Exposures (CVE). http://cve.mitre.org/
Lee, W.H., Ramanujam, M.S., Krishnan, S.P.T.: On designing an efficient distributed black-box fuzzing system for mobile devices. In: ASIACCS (2015)
Costa, M., Castro, M., Zhou, L., Zhang, L., Peinado, M.: Bouncer: securing software by blocking bad input. In: SOSP (2007)
Wang, X., Li, Z., Xu, J., Reiter, M.K., Kil, C., Choi, J.Y.: Packet vaccine: black-box exploit detection and signature generation. In: CCS (2006)
Newsome, J., Brumley, D., Song, D.X.: Vulnerability-specific execution filtering for exploit prevention on commodity software. In: NDSS (2006)
Costa, M., Crowcroft, J., Castro, M., Rowstron, A.I.T., Zhou, L., Zhang, L., Barham, P.: Vigilante: end-to-end containment of internet worms. In: SOSP (2005)
Tahan, G., Shabtai, A., Elovici, Y.: Automatic extraction of signatures for malware. Jan. 8, uS Patent 8,353,040 (2013)
Zhichun, L., Chen, Y., Lanjia, W., Fu, Z.: Method and apparatus to facilitate generating worm-detection signatures using data packet field lengths. Dec. 18, uS Patent App. 11/958,760 (2007)
Thompson, R.J., Mosher, G.A.: Software vulnerability exploitation shield. Nov. 25, uS Patent 8,898,787 (2014)
Chen, Y., Zhichun, L., Xia, G., Liu, B.: Matching with a large vulnerability signature ruleset for high performance network defense. Aug. 27, uS Patent 8,522,348 (2013)
Craioveanu, C., Lin, Y., Ferrie, P., Dang, B.: Proactive exploit detection. Mar. 19, uS Patent 8,402,541 (2013)
Newsome, J., Song, D.X.: Dynamic taint analysis for automatic detection, analysis, and signaturegeneration of exploits on commodity software. In: NDSS (2005)
Brumley, D., Wang, H., Jha, S., Song, D.X.: Creating vulnerability signatures using weakest preconditions. In: CSF (2007)
Long, F., Sidiroglou-Douskos, S., Kim, D., Rinard, M.C.: Sound input filter generation for integer overflow errors. In: POPL (2014)
Sun, H., Zhang, X., Su, C., Zeng, Q.: Efficient dynamic tracking technique for detecting integer-overflow-to-buffer-overflow vulnerability. In: ASIACCS (2015)
Zhang, Y., Sun, X., Deng, Y., Cheng, L., Zeng, S., Fu, Y., Feng, D.: Improving accuracy of static integer overflow detection in binary. In: RAID (2015)
Krügel, C., Vigna, G.: Anomaly detection of web-based attacks. In: CCS (2003)
Robertson, W.K., Vigna, G., Krügel, C., Kemmerer, R.A.: Using generalization and characterization techniques in the anomaly-based detection of web attacks. In: NDSS (2006)
Perdisci, R., Lee, W., Feamster, N.: Behavioral clustering of http-based malware and signature generation using malicious network traces. In: NSDI (2010)
Valeur, F., Mutz, D., Vigna, G.: A learning-based approach to the detection of SQL attacks. In: DIMVA (2005)
Rinard, M.C.: Living in the comfort zone. In: OOPSLA (2007)
Long, F., Ganesh, V., Carbin, M., Sidiroglou, S., Rinard, M.C.: Automatic input rectification. In: ICSE (2012)
Cowan, C.: Stackguard: automatic adaptive detection and prevention of buffer-overflow attacks. In USENIX (1998)
Condit, J., Harren, M., McPeak, S., Necula, G.C., Weimer, W.: Ccured in the real world. In: PLDI (2003)
Austin, T.M., Breach, S.E., Sohi, G.S.: Efficient detection of all pointer and array access errors. In: PLDI (1994)
Jim, T., Morrisett, J.G., Grossman, D., Hicks, M.W., Cheney, J., Wang, Y.: Cyclone: a safe dialect of C. In: USENIX (2002)
Jones, R.W.M., Kelly, P.H.J.: Backwards-compatible bounds checking for arrays and pointers in C programs. In: AADEBUG (1997)
Ruwase, O., Lam, M.S.: A practical dynamic buffer overflow detector. In: NDSS (2004)
Tucek, J., Newsome, J., Lu, S., Huang, C., Xanthos, S., Brumley, D., Zhou, Y., Song, D.X.: Sweeper: a lightweight end-to-end system for defending against fast worms. In: EuroSys (2007)
Perkins, J.H., Kim, S., Larsen, S., Amarasinghe, S.P., Bachrach, J., Carbin, M., Pacheco, C., Sherwood, F., Sidiroglou, S., Sullivan, G., Wong, W., Zibin, Y., Ernst, M.D., Rinard, M.C.: Automatically patching errors in deployed software. In: SOSP (2009)
Long, F., Sidiroglou-Douskos, S., Rinard, M.C.: Automatic runtime error repair and containment via recovery shepherding. In: PLDI (2014)
Carzaniga, A., Gorla, A., Mattavelli, A., Perino, N., Pezzè, M.: Automatic recovery from runtime failures. In: ICSE (2013)
Gailly, J.-l., Adler, M.: Zlib compression library’ (2004)

Download references

Author information

Authors and Affiliations

Department of Information Engineering, The Chinese University of Hong Kong, Hong Kong, China
Tao Zhang
Institute for Infocomm Research (I2R), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
Wang Hao Lee & Mingyuan Gao
Information Systems Technology and Design Pillar, Singapore University of Technology and Design (SUTD), Singapore, Singapore
Jianying Zhou

Authors

Tao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wang Hao Lee
View author publications
You can also search for this author in PubMed Google Scholar
Mingyuan Gao
View author publications
You can also search for this author in PubMed Google Scholar
Jianying Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This chapter does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A vulnerability analysis

We present the extracted constraints and detailed vulnerability analysis for our experiment in Sect. 5 here.

1.1 A.0.1 GIF sanitization

The vulnerabilities triggered by the test GIF files can be categorized into two types.

The GIF files having inconsistent image sizes or image positions may trigger the first type of vulnerability. This is a violation of data consistency constraint that makes the target software write to unallocated memory. This type of the malicious GIF file contains animated images, which means there are multiple images in one GIF file. GIF format has one global control extension (GCE) which has a field indicating the size of the canvas within which all the images should be displayed. GIF also assigns one image information for each image including the size and position of that image. If the image size and position make the image fall out of the canvas, the software with this vulnerability will write to unallocated memory space and crash. File Guard removes this vulnerability by enlarging the canvas size specified in GCE field.
All the other GIF files in the test data may trigger the second type of vulnerability, whose cause is the violation of data consistency or critical atom constraints. This type of vulnerability makes the software read from unallocated memory space. In the test GIF files, there are two fields which may be set to an undefined value or may not have a value assigned: the image data for each pixel and the color index.

Undefined image data Shifted atom delimiter in GCE makes the software to consider the image data in a GIF file as meta-data. Hence, such malicious GIF files lack image data. So the malicious GIF files with shifted atom delimiter in GCE and the files with no image data after color map actually have the same effect on the software which tries to render the GIF image. Such GIF files lack the image data, but the size of the image is not zero. When the targeted software attempts to display the GIF image, it may read image data from an unallocated memory space.

Undefined color index When a vulnerable software attempts to display the GIF image contained in a GIF file with no color map or with undefined background color index, it may read color information from an unallocated memory space. Such operations may cause the software to crash. File Guard checks whether there is undefined reference (such as the color index) or missing data in the GIF file. If undefined reference exists, the sanitizer replaces the reference with one defined in the file. If missing data are detected, the sanitizer fills in with the corresponding amount of dummy data.

1.1.1 A.0.2 PNG sanitization

The vulnerabilities triggered by malicious PNG files are similar to the ones triggered by GIF files. In our test data, we only find the vulnerabilities which make the software to read from unallocated memory space. All the PNG vulnerabilities listed in Table 1 are caused by undefined references in the PNG file. In the test PNG files, there are three fields which may be set to an undefined value or may not have a value assigned: the image data for each pixel, the color index, and the code index for encoded data.

Undefined image data. IDAT chunks hold the image data for PNG files. PNG files with zero-sized IDAT chunk, no IDAT chunk, or image size larger than image data amount passes undefined image data to the software. Similar to GIF files with undefined image data, such PNG files make the software read from unallocated memory space.
Undefined color index. PLTE chunks hold the color information for PNG files in an index-value structure (the index is implicitly recorded as the position of the corresponding color information). PNG files with no PLTE chunk pass undefined color index and undefined color information to the software. Similar to GIF files with undefined color index, such PNG files make the software read from unallocated memory space.
Undefined code index in encoded data. In PNG files, the image data are encoded using Huffman code for lossless compression. If an undefined Huffman code appears in the data part, the software rendering the PNG image may attempt to read from unallocated memory space.

File Guard works in a similar way to the sanitization of GIF files.

1.1.2 A.0.3 MPEG4 sanitization

The vulnerability in the MPEG4 files with inconsistent frame sizes and video data amount is similar to GIF files or PNG files with undefined image data. Such MPEG4 files make the software read image data from unallocated memory space and cause software crash. The sanitization of this kind of MPEG4 files is also similar to the handling of the GIF or PNG files with this similar vulnerability.

The vulnerability in the MPEG4 files with inappropriate meta-data causes more complicated consequences to the software. The inappropriate meta-data are a mode indicator, and each value of this meta-data is associated with a different set of parameters. Wrong parameter may make the software either read from or write to unallocated memory space, and crash. File Guard checks the parameters and changes the value of these meta-data to an appropriate one.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, T., Lee, W.H., Gao, M. et al. File Guard: automatic format-based media file sanitization. Int. J. Inf. Secur. 18, 701–713 (2019). https://doi.org/10.1007/s10207-019-00440-3

Download citation

Published: 06 June 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s10207-019-00440-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

File Guard: automatic format-based media file sanitization

Abstract

Access this article

Similar content being viewed by others

Sniping at web applications to discover input-handling vulnerabilities

Cloud Security Threats and Solutions: A Survey

Digital Forensics AI: Evaluating, Standardizing and Optimizing Digital Evidence Mining Techniques

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

A vulnerability analysis

1.1 A.0.1 GIF sanitization

1.1.1 A.0.2 PNG sanitization

1.1.2 A.0.3 MPEG4 sanitization

Rights and permissions

About this article

Cite this article

Keywords

Navigation

File Guard: automatic format-based media file sanitization

Abstract

Access this article

Similar content being viewed by others

Sniping at web applications to discover input-handling vulnerabilities

Cloud Security Threats and Solutions: A Survey

Digital Forensics AI: Evaluating, Standardizing and Optimizing Digital Evidence Mining Techniques

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

A vulnerability analysis

A vulnerability analysis

1.1 A.0.1 GIF sanitization

1.1.1 A.0.2 PNG sanitization

1.1.2 A.0.3 MPEG4 sanitization

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation