We use this section to explain the general concept of customized malware that presents a completely new angle in the battle of sandbox evasion. In our context, customized malware can identify a target-specific system in both heterogeneous and homogeneous environments. We will first explain the attack scenario that follows two phases, starting with a phase to place identifiers into the target system and then using them in the next phase. We then describe how to leave unique and characteristic features in the target system. To prototype the attack, we implement customized malware that checks the execution and search for certain marks/traces. Finally, we test these samples against commercial appliances and see if stealth evasion is possible in practice (Fig. 1).
3.1 Attack Scenario
We first explain the threat model that we follow in our work and the assumptions that we make about the capabilities of sandboxes. We start with an overview of how sandboxes are used in practice, and then detail the setting of our new concept of customized malware.
To defend against the ever-increasing number of malware attacks, malware security appliances have become an integral part of many organizations’ security strategies. Seeing their prevalence, malware authors put significant effort into detecting (and evading) such sandboxes. For example, malware can deploy detection routines for VM-based or emulation-based sandboxes and change its behavior if a sandbox environment is found. Similarly, malware that evades sandbox analysis based on the lack of user interaction has been seen. In addition, recent academic works propose further more systematic sandbox detection strategies [42, 60]. To cope with the problem, stealthier or even bare-metal sandboxes [7, 34] have arisen, and security vendors have entered an ongoing arms race to become resilient against sandbox evasion techniques.
Whereas it seems that this evasion arms race will continue for a while, in this paper, we call attention to the next-stage problem of target-specific malware. That is, we envision customized malware that is tailored towards infecting one particular system. To this end, we assume that the customized malware first places an implant into the target system using an out-of-band channel (e.g., triggered via email). Later on, when executed, the customized malware first gathers information on the executed environment, matches these features against the previous implants of the target system, and triggers its malicious activities only if the features match. Regardless of the efforts put into hiding the presence of a sandbox, as long as the sandbox is not an exact copy of the target system, the customized malware will not reveal its normal behavior. That is, our work is fundamentally different from previous approaches in that our key idea is to identify the target system instead of separating sandboxes from normal systems.
Individual malware campaigns have already been observed to follow a similar idea, e.g., by computing MD5 hashes over directories, assuming a certain SID (a unique per-system identifier in Windows), and by comparing specific and current paths to decrypt themselves [6, 29]. Chengyu et al. show some system unique properties that can be used to obfuscate malware samples [23]. However, these features are difficult to exfiltrate remotely (before infection) and thus pose costs for the attacker or even render such evasions infeasible. We hence revisit this attack pattern and augment it with new techniques. Most importantly, we propose to place unique marks on the target system prior to infection instead of merely using existing ones.
The novel concept of customized malware follows a two-phase approach, starting with the reconnaissance phase and then using its results in the intrusion phase. This attack scenario is common for targeted attacks such as Advanced Persistent Threats (APTs), in which the attackers aim to infiltrate specific systems after spying on their victims first. In our context, we require that the reconnaissance target and the attack target are the same (specific user). As we will show, this assumption can be easily satisfied, e.g., when both reconnaissance phase and infection phase share the same communication channel (e.g., email or HTTP). We will describe the two phases in the following:
Reconnaissance Phase: In this phase, adversaries first implant certain marks on the target system. The Internet has become a necessary technology for daily business, including email communication and web browsing. In fact, obtaining the email address of an attack target is already a vital step for targeted attacks. To perform the reconnaissance, we assume that a targeted email successfully tricks a user into clicking the URL of an attacker-controller reconnaissance web site. This web site will then use a web-based reconnaissance procedure that implants features to the host, e.g., via email or via web sites. That is, if target users click on an attacker-provided URL in an email, an unique identifier is implanted on the target system. Such an implant can be stealthy, e.g., by leaving certain marks/traces in the browser or system. Later on, customized malware can recognize the target-specific system based on these implants.
During the reconnaissance phase, we assume that sandboxes either do not click on URLs provided in emails (e.g., as this would otherwise cause bad side effects, such as automated unsubscriptions from mailing lists, etc.). Or, if sandboxes indeed follow links provided in emails, we assume that the intrusion and reconnaissance phase would not be run within the same execution unit of the sandbox. This follows the reasoning that sandboxes have to differentiate the behavior of different inputs (emails, malware files, etc.) and restore their snapshot after analyzing a particular input. In fact, this relaxation allows us to place implants without even requiring the user to click on URLs. We observed that it is possible to carefully craft emails that automatically load external content from attacker-controlled URLs (e.g., loading embedded external images, as enabled by default in Apple Mail), which then can be used to place implants. In an attempt to collect implants, sandboxes could also automatically open emails and access the URLs, but again, if this happens on a freshly-restored snapshot, both phases would still not be linked. We will motivate this further in Sect. 4.
Intrusion Phase: After the reconnaissance phase, adversaries have sufficient characteristic information to reidentify a specific target. The purpose of the intrusion phase is thus to create stealth customized malware that only executes on the target system, while not triggering suspicious activity on other systems and thereby also evading anti-malware appliances. As we will show, we can even hide the actual malicious payload by encrypting/decrypting it with the characteristic information. That is, also a manual analyst lacking the true characteristics of the actual target system could not reverse engineer the malicious payload.
To assess how stealthy the customized malware is, we assume that the crafted customized malware is tested by any (combination of) appliance(s) and sandbox(es). We further assume that adversaries have no insights on which (if any) sandboxes are deployed, which also makes it difficult to implant features. Although some expert knowledge might be helpful to guess which activity in the intrusion phase might raise an alert in the sandbox, an attacker might use blackbox tests to identify viable strategies that survive sandbox checks.
Hetegerogeneous vs. Homogeneous Target Environments: Targeted attacks are usually easier to perform in heterogeneous environments compared to homogeneous environments, as systems in the heterogeneous setting can be easily distinguished from each other. Yet we propose a methodology for customized malware that also performs well if the target system has identically-configured clones. For example, organizations that deploy preconfigured configurations to their end users to minimize maintenance and license costs and to ease security management create such a homogeneous environment. High-end security appliances adapt to such settings in that their sandbox operates on exact copies of the actual production systems of the organization they aim to protect. In such a setting, traditional fingerprinting methods may fail to distinguish target systems from any other system of the organization—including the sandbox.
3.2 Feature Implantation
We will now describe and evaluate our implant-based methodology to develop customized malware that implicitly evades sandboxes. The key idea of the proposed implants is to add unique and characteristic marks into the target system, such that the customized malware can recognize them during the infection stage. An important detail is that the implants should look benign to the target system (and to the sandbox when looking them up), while the customized malware has to be able to query implants. We now consider four non-invasive methods to realize such implants.
Browser History: Web browsers typically record histories of web accesses to ease lookups of visited sites in the future. By sending a unique (not necessarily attacker-controlled) URL to the target and leading him to access it, the URL will be recorded in the log of the target system. Since adversaries only need to implant the URL in the access history, the web site does not have to be malicious, so that no network appliances will be able to detect such implants. As the implant just has to be unique, it could even be a legitimate web site with a unique identifier added within the URL (e.g., google.com/12345). Another way to abuse browser history is to use the access date as an implant, if browsers record the last access date per URL.
Browser Cache: Most web browsers cache web content to speed up subsequent visits. By luring the target system into clicking a unique URL, one can place identifiers that are stored in the cache of the target system. The implant could be benign but unique URLs or resources (images, CSS files, etc.). As caches might be refreshed or deleted after a certain time and may thus not last long, attackers would likely aim to shorten the time between the reconnaissance phase and the intrusion phase.
HTTP Cookie: Cookies are a well-known technique for tracking browsers. They are stored on the user’s computer as a file and store stateful information that is specific to the pair of a client and a specific web site. Cookies are usually saved when a new web site is loaded and can be queried by the web server or accessed by the client computer. By now, most legitimate web sites use cookies, so their usage is not suspicious. Although they can be destroyed when the current web browser is closed, the lifetime of cookies is configurable (unless overridden by manual browser configurations). Attackers can also use cookies to implant an identifier in a stealth manner using attacker-controlled web sites. This way, malware authors could search for an implant in cookie databases of browsers.
Supercookies (Evercookies) are similar to cookies in that they are stored on the user’s computer. They are usually harder to delete from the device, as they are realized using multiple storage mechanisms [31]. For example, a supercookie can implant a user-specific identifier using a Flash Local Shared Object. Alternatively, web servers can abuse a security improvement function like HSTS or HPKP for supercookies [9]. The data size can be larger than normal cookies, showing that supercookies are suited to track devices and place implants.
DNS Stub Resolver Cache: The Domain Name System (DNS) is widely used to resolve domains to IP addresses, e.g., when accessing web sites. On Windows, by default the Windows DNS stub resolver is used to query domains and cache results according to their lifetime, as specified in the Time-to-Live (TTL) value in DNS responses. Thus by sending a specific URL to the target and leading him to access it, the domain will be cached by the DNS stub resolver. In fact, the domain does not have to lead to a malicious web site, just feature a sufficiently long cache duration (i.e., TTL) to bridge the time between reconnaissance and infection. Malware authors could then check if the resolver cached a particular domain, e.g., using common DNS cache snooping techniques.
Beside implanting features, when it becomes important for malware authors to learn if the target has accessed a specific web site, the situation is trickier. The obvious solution is using an attacker-controlled URL. Alternatively, one could use unique benign web sites that state the date of the most recent visit, deploy a publicly-visible visitor counter, or even abuse a timing-based side channel to infer whether a certain page has been visited. By preparing an attacker-controlled URL, malware authors can also find out which web browser the target is using and make it easier to search for such implants. Therefore, malware authors may carefully send an attacker-controlled URL and effectively implant identifiers into the target system.
3.3 Customized Malware
We envision that customized malware first gathers information of the executed environment and then matches these features against the implants placed previously during the intrusion phase. Most basic, malware authors could simply check if the implant exists, and if so, follow a binary decision to either unpack the malicious payload or not. However, a manual analyst could then still reverse engineer the customized malware, reverse the decision to not unpack, and then obtain the malicious payload. In fact, malware sandboxes and appliances already have similar functionalities to scan malware binaries for malicious payloads, or to execute several branches (multi-path execution [41]).
To strengthen this naïve approach, malware authors can not only check for the existence of an implant. In fact, they can retrieve a value from the implant, which can then be used as decryption key for malicious payloads. This way, it would be impossible to decrypt the malicious payload even for a human analyst or multi-path execution. Embedding such implant values is trivial for cookies, caches and browser history, where adversaries just need to configure the implant URLs in the reconnaissance phase accordingly. For DNS caches, storing implant values is bit more evolved, but also possible. For example, the value of a cached AAAA record (which stores an IPv6 address) could be used as a 128-bit AES key.
3.4 Malware Security Appliance Evasion
Seeing that one can implant identifiers through web-based techniques and implement customized malware, we now test if such implants can be used to evade commercial appliances.
Implementation: We first prepared a legitimate web site, which prototypes an attacker-controlled URL and gives an attacker the highest flexibility. We accessed our web site from a user machine that is used on a daily basis at a real organization (henceforth simply “target”). To compare the results, we accessed the web site from three different web browsers (Chrome, Firefox, and IE) and implanted a URL in the target system’s browser history, cache, cookie, and DNS stub resolver cache. We then implemented Windows 32-bit PE programs written in C/C# that use the Windows API, commands, and custom functions to search for URLs that perfectly match our web site. We implemented samples that open the browser history by searching for preferences for Chrome, sessionstore.js for Firefox, and %HISTORY% for IE. Our web site has an image file embedded in the top page, and therefore the samples search for such cached items in entries for Firefox and %TEMPORARY INTERNET FILES% for IE. We furthermore implemented samples that read a cookie from Cookies for Chrome and cookies.sqlite for Firefox. To inspect the stub resolver’s DNS cache, our prototype uses the Windows ipconfig utility and the undocumented Windows API DnsGetCacheDataTable.
Evaluation Setup: We first executed the samples on the target host and verified that all implants could be found. We then submitted the samples to three popular appliances from well-known vendorsFootnote 1 (henceforth simply appliance A, B, and C). We gained access to various sandbox configurations (Windows 10, Windows 7, Windows XP, 32/64 bit, different service packs, etc.) of the appliances, totaling nine distinct sandboxes. Since the appliances did not allow for network communication (which supports our threat model that even collecting implants might be impractical for most sandboxes), we investigated the analysis report which was produced by the appliance after execution. Although the appliances were not cloned from actual user machines, this methodology would similarly work in a fully-homogeneous environment.
Table 1. Security alerts reported from the three appliances including nine sandboxes (Windows 10, Windows 7, and Windows XP) which executed samples that searched for implants of browser history, cache, cookie, and DNS cache.
Table 2. Security alerts reported from the three appliances including five sandboxes (Windows 10 and Windows 7) which executed samples that searched for implants of browser history, cache, and DNS cache and decrypts malicious payload when previous implants are found.
Evaluation Results: When manually inspecting the analysis reports, not surprisingly, the implants were not found in any of the sandboxes. After verifying the evasion capabilities, we then checked if our implant checks triggered any security alerts from the appliances. We summarize the results in Table 1. The first column contains the implant technique, and the last three columns show the result per vendor. We first implemented test samples that are executable on systems that are backwards-compatible to earlier Windows versions, down to Windows XP. We had a look at the reports produced by the appliances and found that a security alert about Hardware access was reported for every single sample. The alert was reported from Windows XP, by one of the nine sandboxes which belonged to appliance A. For samples that check HTTP cookies, we obtained another alert about Browser access from three sandboxes, which belonged to appliance A. The sample that checks the DNS stub resolver cache raised an alert about Hardware access from two sandboxes (including Windows XP) which belonged to appliance A, and an alert about a Windows command (ipconfig) from three sandboxes which belonged to appliance B.
For further analysis, we implemented test samples that not only searched for implants, but also encapsulated an encrypted version of a malicious payload that all sandboxes would detect if it was not hidden. That is, we chose to use malware that was seen in a real attack campaign targeting organizations in Japan and Taiwan [4]. We submitted the malware sample to the appliances and verified that it indeed was detected as such by all sandboxes. We then wrapped the malware in the customized malware, protected by a decryption that would only trigger if implants were found. Technically, the sample searches for the implant, and when the URL is found, decrypts and executes the actual malware. In fact, adversaries could include the decryption key in implants, such that even multi-path execution or manual analysts would fail to obtain the malicious packed payload. In our evaluation, the implants are the same as for the previous experiment and searched for in the browser history (for Chrome, Firefox, and IE), browser cache (for Firefox and IE), and DNS stub resolver cache. Seeing that one of the three appliances reported alerts when accessing cookies, we excluded the HTTP cookie from further analysis. Assuming that the target host operates Windows 7 or newer, we implemented these samples using libraries that do not work on earlier Windows versions, leaving us to five sandboxes of three appliances in our test setting. Using this new API, only a single sandbox raised an alert when our sample tried to inspect the DNS stub resolver using ipconfig. To counter this alert, we used DnsGetCacheDataTable, which did not raise any alert. In summary, as Table 2 shows, our updated implant search did not trigger any alerts. We manually inspected the analysis reports of all sandboxes and verified that none of the sandboxes decrypted the malware sample, meaning that our implant checking mechanism worked as expected.
Evaluation Summary: To summarize, an attacker can implant several identifiers into the target-specific system using web-based techniques. The implanted features can be used to implement customized malware that can stealthily evade malware security appliances. Adversaries with insider knowledge on anti-malware appliances, or having an oracle that tells whether their customized malware is detected as such, can tweak their implant mechanism such that the evasion is stealthy and remains undetected.