Keywords

1 Introduction

In recent years, AI has been increasingly adopted as part of cyber attack methods. The application of AI on the defender’s side has been successfully used in intrusion detection systems and is widely deployed in network filtering, phishing protection, and botnet control. However, the enhancement of the capabilities of malware with the help of AI methods is a relatively recent development.

This article presents the result of a literature survey mapping the state of AI-powered malware. The salient aims of this survey is to map AI-enhanced attacks carried out by malware, to identify malware types that conceal themselves from detection using AI techniques, to get a better understanding of the maturity of those attacks, and to identify the algorithms and methods involved in those attacks (Fig. 1 and Table 1).

Fig. 1.
figure 1

Uses of AI in malware.

Table 1. Table of acronyms

2 Literature Review on AI-Powered Malware

2.1 Literature Search

For assessing the state of the art in AI-supported malware, we performed a literature search using the Google Scholar database of scientific publications. We defined the search criteria as follows. Search keywords were malware, artificial intelligence, machine learning combined with offensive, adversarial, attack, network security, information security. The resulting articles were checked against inclusion criteria. The resulting article set was then snowballed backward and forward [36]. We limited the backward snowballing range by cutting off snowballing for articles older than 2010. Eligible forms of publications were scientific articles, conference presentation, pre-prints and technical reports. For inclusion, articles needed to contain descriptions of malware functionality based on machine learning or AI functionality. Both survey articles as well as articles describing demonstrators or specific malware were included. Our final set of articles were 37 articles.

After collecting the articles, we classified the articles into categories reflecting the specific malware functionality enhanced with AI techniques. Our findings are summarized below.

2.2 Findings

Among the deployed technologies are authentication factor extraction, generation of phishing and malware domain names, adaptive generation of phishing-e-mail, direct attacks against malware detection (code obfuscation, model poisoning) and intrusion detection (generative traffic imitation as well as AI model poisoning attacks). In addition, we found publications describing the successful parsing and controlling of graphical application user interfaces (GUIs). Finally, self-learning malware aimed at sabotage of or through cyber-physical systems was found. In particular, the evasion of detection of malware and the exfiltration of information through covert channels have been recently used in AI-powered malware.

The establishment of covert channels is an established practice for malware distribution, command and control of malware agents, and information exfiltration. Such covert channels intend to bypass intrusion detection, malware detection, and anomaly detection systems.

Table 2. Surveys and taxonomies

2.3 Surveys

Our search found 13 survey articles that were either fully or partially present knowledge about AI-enhanced malware (see Table 2). We found ten surveys, two taxonomic articles, and one anecdotal collection of AI attack use cases.

The surveys focus on different perspectives of the offensive use of AI against information security in malware:

  • Surveys that summarize the use of AI-enhanced malware for different purposes: Probing, scanning, spoofing, misdirection, execution, or bypass;

  • Summary of methods and algorithms used for direct attacks against a defender’s AI and ML systems, e.g. evasion attacks, model poisoning, adverse samples.

  • Surveys of malware improvements concerning exfiltration, code permutation, automation, and reverse engineering with AI;

  • Surveys on generative networks used for attack and defense;

  • Survey on stegomalware, where AI is used to hide malware in images;

  • Several surveys taxonomiz offensive AI in malware into categories: intelligence, evasion, target selection, attack automation, generating malware, hiding malware, combining attack techniques, adjusting features, automating attacks at high speed.

2.4 AI-Enabled Attacks on Authentication Factors

Four articles described attacks against authentication factors on mobile devices’. The devices’ sensors (microphone, accelerometer) were used in combination with AI models with the intention of extracting PINs, passwords, and patterns. The articles are listed in Table 3. We found two categories of AI weaponization against authentication factors:

  • Prediction of PINs and passwords using accelerometer sensors in phones and wearables;

  • Analysis of phone microphone records to generate PIN and credit card numbers from touch tones;

Table 3. Password extraction or prediction

2.5 Techniques for Hiding Malware Code from Detection

AI is frequently used for hiding malware code from detection. The eleven articles listed in Table 4 show these approaches:

  1. 1.

    Hiding malware code as payload inside AI models fulfilling other functions, e.g., neural networks for face recognition;

  2. 2.

    Code perturbation for detection evasion automated with learning algorithms and prediction;

  3. 3.

    Code generation with Generative Adversarial Networks that blackbox-test filters for successful evasion;

  4. 4.

    Attacking AI systems for malware detection through attacks against the learning function (presentation of malicious samples, model poisoning, gradient attacks);

  5. 5.

    Sandbox detection in order to evade detection in sandboxed environments.

Table 4. Code detection evasion

2.6 Evading Network Traffic Detection

Hiding malware’s communication traffic is published in four articles (see Table 5). AI and specifically unsupervised learning, is deployed against intrusion detection systems. Demonstrators described in the articles hide probing and infiltration traffic as well as command and control traffic. One noteworthy article deploys swarm intelligence in order to coordinate Botnet agents without a centralized command server.

Table 5. Evasion of network intrusion detection

2.7 Other AI Deployment

Table 6 lists the miscellaneous applications of AI in the malware context. We found six articles describing enhanced capabilities in the areas of phishing, Application control and sabotage. AI is used for creating phishing domain names that evade detection in anti-phishing-systems. One spear phishing demonstrator extracts social media sentiments using AI in order to turn them into phishing e-mail-text, learning which topics are susceptible of currently provoking most reaction from the targets.

An interesting application of image recognition is malware that can understand graphical user interface elements with AI with the goal of finding out which GUI elements it can control to execute functionality.

Finally, undetectable sabotage in cyber-physical systems has been demonstrated in two cases: i) A surgical robot which - injected with malware - can learn how to modify its actions similar to normal actions in order to hurt patients. ii) The second demonstration case showed how to AI can learn to manipulate smart house technology in ways that will be hard to notice. Such AI-empowered sabotage is envisioned to be used against variable targets, dramatically leveraging the preparation effort of cyber sabotage.

Table 6. Miscellaneous AI applications in malware

3 Discussion of Findings

The presented survey investigated the use of artificial intelligence (AI) techniques and of machine learning (ML) for the improvement of malware capabilities. We found surveys and literature that describe a variety of deployments of AI in the malware context:

  • Direct sabotage of defending AI or ML algorithms;

  • Detection evasion through intelligent code perturbation techniques;

  • Detection evasion through learning of traffic patterns in case of scanning systems, communication or connection to command and control infrastructures;

  • Black-box-techniques bypassing intrusion detection using generative networks and unsupervised learning;

  • Direct attacks predicting passwords, PIN codes;

  • Automatic interpretation of user interfaces for application control;

  • Self-learning system behavior for undetected automated cyber-physical sabotage;

  • Botnet coordination with swarm intelligence, removing need for command and control servers;

  • Sandbox detection and evasion with neural networks;

  • Hiding malware within images or neural networks.

We conclude that AI deployed to either improve or hide malware poses a considerable threat to malware detection. Code obfuscation, code behavior adaption, as well as learned communication detection evasion potentially bypass existing malware detection techniques.

Offensive deployment of AI within malware improves malware performance, including methods such as selection of targets, extracting authentication factors, enabling the automated and fast generation of highly efficient Phishing messages, and swarm-coordinated action planning.

We consider AI-enhanced malware to be a serious risk for information security, which should be thoroughly investigated.