Development of an Automatic Document Malware Analysis System
Malware attacks that use document files like PDF and HWP have been rapidly increasing lately. Particularly, social engineering cases of infection by document based malware that has been transferred through Web/SNS posting or spam mail that pretends to represent political/cultural issues or a work colleague has greatly increased. The threat of document malware is expected to increase as most PC users routinely access document files and the rate of this type of malware being detected by commercial vaccine programs is not that high. Therefore, this paper proposes an automatic document malware analysis system that automatically performs the static/dynamic analysis of document files like PDF and HWP and provides the result. The static analysis of document based malware identifies the existence of the script and the shell code that is generating the malicious behavior and extracts it. It also detects obfuscated codes or the use of reportedly vulnerable functions. The dynamic analysis monitors the behavior of the kernel level and generates the log. The log is then compared with the malicious behavior rule to detect the suspicious malware. In the performance test that used the actual document malware sample, the system demonstrated an outstanding detection performance.
KeywordsDocument Malware Automatic analysis system
Malware attacks like Advanced Persistent Threat (APT) and spam mail using a document file have been rapidly increasing lately. These attacks are mostly used in the social engineering method, which uses a Web/SNS posting containing political and cultural issues, to induce the users download the malware, or that pretends to be a work colleague and that sends spam mail with document malware attached to it to infect the users with malware [1, 2]. Since most PC users routinely use document files, they are more vulnerable to document based malware than the existing types of PE (Portable Executable) malware. Moreover, the rate of this type of malware being detected by commercial vaccine programs is not that high. Since the commercial vaccine programs use the signature based detection method, which has a low rate of detecting document malware, the threat of document malware is expected to continue to increase [3, 4].
Therefore, this paper proposes an automatic document malware analysis system that will automatically perform the static/dynamic analyses of document files like PDF and HWP and that will provide the result. The static analysis of document malware identifies the existence of the script and the shell code that is generating the malicious behavior and extracts it. It also detects obfuscated codes or the use of reportedly vulnerable functions. The dynamic analysis monitors the behavior of the kernel level and generates the log. The log is then compared with the malicious behavior rule to detect the suspicious malware. In the performance test that used the actual document malware sample, the system demonstrated an outstanding detection performance.
2 Related Studies
Wepawet extracts the script/shell code that is contained in the PDF file and provides the behavior data of the extracted codes. For the generated file, it shows the result of applying the commercial vaccine program. However, Wepawet is limited in that it only provides the analysis of PDF format document based malicious codes.
3 System Design
As shown in Fig. 5, the analysis management module performs the task of saving the analysis request data in the management DB so that the system will perform the static/dynamic analyses upon an analysis request by a Web user/external system.
As the static/dynamic analysis modules are configured as being the virtual environment, and they consist of many GeustOS systems. Each GeustOS performs the analysis of the input document file. Having many GeustOSs enables simultaneous analysis of multiple files.
4 System Implementation
Figure 9 shows the static/dynamic detailed analyses results of the document file. Users can check the scripts/shell codes that were extracted by the static analysis and whether code obfuscation and reportedly vulnerable functions were used. In the dynamic analysis, the result of the behavior analysis can be checked. The extracted behaviors are compared with the malicious behavior rule to determine the level of maliciousness. The malicious behavior rule is divided into the file, registry, process, network, and memory. The rule can be added or edited.
5 Performance Test
Number of samples
No. of samples
Number of detected malware
No. of detected malware
This paper proposed an automatic document malware analysis system that can automatically analyze document files. The static analysis extracted the scripts/shell codes from the document file and detected any obfuscation or use of reportedly vulnerable functions. The dynamic analysis monitored behaviors and determined the maliciousness based on the malicious behavior rule to detect the document files that were suspected of being malicious. The testing of the system on the actual document malicious code samples showed outstanding performance.
Although obtaining new samples is very important to increase the detection rate of document based malware, there is no efficient sample collection channel in Korea. In the future, a function to provide a Web based document file analysis service, like Wepawet, to general users is needed to secure the new samples.
This research was supported by the KCC(Korea Communications Commission), Korea, under the R&D program supervised by the KCA(Korea Communications Agency)”(KCA-2012-(10912-06001)).
- 1.Park CS (2010) An email vaccine cloud system for detecting Malcode-Bearing documents. J KMS 13(5):754–762Google Scholar
- 2.Han KS, Shin YH, Im EG (2010) A study of spam spread malware analysis and countermeasure framework. J SE 7(4):363–383Google Scholar
- 3.BoanNews (2012) http://www.boannews.com/media/view.asp?idx=31322&kind=1, 2012
- 4.Ratantonio Y, Kruegel C, Vigna G, Shellzer (2011) a tool for the dynamic analysis of malicious shellcode. In: Proceedings of the international symposium on RAID, pp 61–80Google Scholar
- 5.Ulrich B, Imam H, Davide B, Engin K, Christopher K (2009) Insights into current malware behavior In: 2nd USENIX workshop on LEET, 2009Google Scholar
- 6.CWSandbox: Behavior-based Malware Analysis. http://mwanalysis.org/