A Comparison of Adversarial Learning Techniques for Malware Detection

Machine learning has proven to be a useful tool for automated malware detection, but machine learning models have also been shown to be vulnerable to adversarial attacks. This article addresses the problem of generating adversarial malware samples, specifically malicious Windows Portable Executable files. We summarize and compare work that has focused on adversarial machine learning for malware detection. We use gradient-based, evolutionary algorithm-based, and reinforcement-based methods to generate adversarial samples, and then test the generated samples against selected antivirus products. We compare the selected methods in terms of accuracy and practical applicability. The results show that applying optimized modifications to previously detected malware can lead to incorrect classification of the file as benign. It is also known that generated malware samples can be successfully used against detection models other than those used to generate them and that using combinations of generators can create new samples that evade detection. Experiments show that the Gym-malware generator, which uses a reinforcement learning approach, has the greatest practical potential. This generator achieved an average sample generation time of 5.73 seconds and the highest average evasion rate of 44.11%. Using the Gym-malware generator in combination with itself improved the evasion rate to 58.35%.


Introduction
With the rapid development of information technology, computer systems have become increasingly important in the daily lives of people.Unfortunately, the rapid development of these technologies is accompanied by a similarly rapid increase in cyberattacks.
Malicious software (malware) is one of the most significant security threats today, comprising several different categories of malicious code, such as viruses, trojans, worms, spyware, and ransomware.To protect computers and the Internet from malware, early detection is necessary.However, this is problematic, as a large amount of new malicious code is generated every day [1].Since it is not possible to analyze each sample individually, automatic mechanisms are required to detect malware.
Antivirus companies often rely mainly on signature-based detection techniques [2] for malware detection.Signatures are specific patterns that allow for the recognition of malicious files.For example, they can be a byte sequence, a file hash, or a string.When inspecting a file, the antivirus system compares its content with the signatures of already-known malware stored in the database.If a match is found, the file is reported as malware.Signature-based detection methods are fast and effective in detecting known malware.However, malware authors can modify their code to change the signature of the program, thereby avoiding detection.Some malware can hide in the system using various obfuscation techniques [3], such as encryption, oligomorphic, polymorphic, metamorphic, stealth, and packing methods, to make the detection process more difficult.
Machine learning (ML) models are commonly used today in various fields.Their application can be found, for example, in technologies such as selfdriving cars, weather forecasting, face recognition, or language translation systems.Machine learning has also proved to be a useful tool for automatic malware detection [4].Unlike the signature-based method, it is capable of detecting previously unknown or obfuscated malware.However, it can be difficult to explain why the model classifies a certain file as malicious or benign [5], which can cause hidden vulnerabilities that attackers can exploit.
Machine learning models are vulnerable to adversarial attacks [6].Attackers purposely design adversarial examples, which are deliberately designed inputs to a machine learning model, to cause the model to make a mistake in its predictions.Adversarial machine learning is a field that deals with attacks on machine learning algorithms and defenses against such attacks.
Malware detection is thus a battle between defenders and malware authors, in which each side attempts to devise new and effective ways to outwit the other.Each detection method has its own advantages and disadvantages.In various scenarios, one method may be more successful than another.Thus, the creation of an effective malware detection method is a very challenging task, and new research and methods are necessary.
The main contributions of this paper are to compare works that focus on adversarial machine learning in the area of malware detection.Specifically, • we applied some existing methods in the field of adversarial learning to selected malware detection systems.• we combined these methods to create more sophisticated adversarial generators capable of bypassing top-tier AV products.• we evaluated the single and combined generators in terms of accuracy and usability in practice.
The rest of the paper is organized as follows: In Section 2, we describe state-of-the-art techniques used to generate adversarial examples present.Section 3 provides an overview of the publications focused on creating adversarial portable executable malware samples.Section 4 describes the experiments performed and the metrics used for evaluation.Section 5 presents and discusses the experimental results, and Section 6 concludes this work.

Background
In this section, we describe the different methods used to create adversarial examples.We also introduce and describe selected attacks for experimentation.

Methods for Creating Adversarial Examples
In this section, we describe various methods to create adversarial examples.

Gradient-based Approaches
Gradient-based methods are a popular approach to generate adversarial examples.These methods work by computing the gradient of a loss function with respect to the input data.This gradient is then used to iteratively modify the input to minimize the loss.The Fast Gradient Sign Method [7] and the Jacobian-based Saliency Map Approach [8] are two popular gradient-based methods used for malware generation.Given a trained model f and an input example x, gradient-based methods generate an adversarial example x ′ by adding a small perturbation δ to the input that maximizes the loss function L(f (x ′ ), y), where y is the true input label.
The perturbation δ is calculated as follows: where ϵ is a small constant that controls the size of the perturbation, and the sign function is used to ensure that the perturbation has the same sign as the gradient, allowing efficient computation and ensuring that the perturbation always increases the loss.The gradient of the loss function with respect to the input (∇ x L(f (x), y)) is computed by backpropagation through the model f .This gradient gives the direction in which the loss function increases the most for a small change in the input and is used to determine the direction of the perturbation.
Gradient-based attacks are performed using the addition or insertion method for perturbations generated using the gradient of the cost function.When using the append method, the data (payload) is appended at the end of the file.When the insertion method is used, the payload is inserted into the slack region where the physical size is greater than the virtual size.

Generative Adversarial Network-based Approaches
Generative adversarial networks (GANs) were developed and presented by Goodfellow et al. [9] in 2014.GAN is a system consisting of two neural networks, a generator and a discriminator, which compete against each other.The goal of the generator is to create examples that are indistinguishable from the real examples in the training set, thus fooling the discriminator.In contrast, the objective of the discriminator is to distinguish the false examples produced by the generator from the real examples that come from the training data set, thus preventing it from being fooled by the generator.The generator learns from the feedback it receives from the discriminator's classification [10].
These two neural networks are trained simultaneously.The generator is constantly improving its ability to generate realistic samples, so the discriminator must continually improve its ability to distinguish between real and generated samples.This mutual competition forces both networks to continuously improve through the training process.Once this training process is completed, the generator can be used to generate new samples that are indistinguishable from the real samples [11].
Denote the generator as G and the discriminator as D. As described in [9], networks G and D play the following two-player minimax game with value function V (G, D): that G tries to minimize, while D tries to maximize.D(x) is the discriminator's estimate of the probability that the original data x is real, G(z) is the generator's output when it receives noise z as input, D(G(z)) is the discriminator's estimate of the probability that a synthetic sample G(z) of data is real, E x is the expected value over all real data instances, and E z is the expected value over all generated fake instances.

Reinforcement Learning-based Approaches
Reinforcement learning (RL) is a type of machine learning technique along with supervised and unsupervised machine learning.In supervised machine learning, the model is trained using a training set of labeled examples.Based on the given inputs and expected outputs, the model creates a mapping equation that the model can use to predict the labels of the inputs in the future.
In unsupervised machine learning, the model is trained only on inputs without labels.The model divides the input data into classes that have similar properties, and during prediction, the inputs are labeled based on the similarity of their properties to one of the classes.Unlike supervised and unsupervised machine learning, reinforcement learning algorithms learn by interacting with an environment and getting feedback in the form of rewards or penalties rather than relying on pre-labeled instances.
A reinforcement learning model consists of two main parts: an agent and an environment.The agent learns to perform a task through repeated interactions with the environment through trial and error.In addition to the agent and the environment, the reinforcement learning model has four main subelements: a policy, a reward signal, a value function and, optionally, an environment model.The policy defines the behavior of the agent at a particular time.In other words, it is a strategy that the agent uses to determine the next action based on the current state to achieve the highest reward.The reward signal is the feedback from the environment to the agent, indicating the success or failure of the agent's action in a given state.At each time step, the agent is in some state and sends the selected action as its output to the environment, which then returns a new state to the agent along with a reward signal.While the reward signal shows what is beneficial in the present, the value function describes what is beneficial in the long term.The value function provides an estimate of the expected cumulative reward from the current state of the environment in the future.The agent's objective is to maximize the total reward.The environment model mimics the behavior of the environment, making it possible to predict future states and rewards.This is an optional part of the system [12].
Reinforcement learning is defined as repeated interactions between an agent and an environment.Individual interactions (signals exchange) are performed in time steps.At time step t, the environment is in state s t ∈ S, where S is the set of all possible states in the environment.The agent receives state s t and then chooses action a t ∈ A based on policy π, where A is the set of all possible actions defined by the environment.After the environment receives information about the chosen action a t from the agent, it calculates the reward r t = R(s t , a t , s t+1 ) and sends it to the agent in the form of feedback.At the same time, the environment transitions to a new state s t+1 .When this cycle is complete, we say that one time step has passed.This cycle can repeat forever or end when it reaches a terminal state or a maximum time step t = T .We call the triplet of signals (s t , a t , r t ) an experience.The time elapsed between t = 0 and the end of the environment is called an episode.A trajectory is a sequence of experiences during an episode, τ = (s 0 , a 0 , r 0 ), (s 1 , a 1 , r 1 ), . . .[12].
In the field of creating adversarial malware samples, interactions between the agent and the environment occur in discrete time steps.The agent has a set of operations available for modifying PE files while maintaining the functionality of the malware.The goal of the agent is to perform a sequence of operations on the malware to prevent its detection.

Evolutionary Algorithm-based Approaches
Evolutionary algorithms (EA) are a useful tool for solving optimization problems.They are based on the Darwinian principle of evolution and attempt to mimic these processes.The search for the best or at least satisfactory solution to a problem takes the form of competition between gradually developing solutions within the population.Variants of EA include, for example, evolutionary strategies, evolutionary programming, genetic algorithms, and genetic programming.All of these variants share the same principle of operation but differ in their implementation.
When solving a problem, it is necessary first to define the representation of candidate solutions.These candidate solutions are called individuals or phenotypes.Since the phenotype can have a complex structure, an encoding is used to represent the individuals in an appropriate way, which is called a chromosome or genotype.Next, an initial population of individuals is created, with each individual representing a coded solution.Then, each member of the population is evaluated using a fitness function that numerically expresses the quality of the solution.The individuals with the best score are then selected and used to create a new generation.This is done using the crossover operator, which usually takes pairs of chromosomes and exchanges information between them to create new offspring.This is followed by the mutation operator, which changes a small portion of the offspring so that it is no longer just a mixture of parental genes.This introduces new genetic material into the new generation.This entire cycle (fitness evaluation, selection, crossover, mutation) is repeated until the termination condition is reached.

Selected Attacks
To generate samples of malicious software, we utilized three distinct techniques: gradient-based techniques, evolutionary algorithm-based techniques, and reinforcement learning-based techniques.Specifically, we selected Partial DOS [13] and Full DOS [14] attacks from gradient-based techniques.From the evolutionary algorithmsbased techniques, we chose GAMMA padding [15] and GAMMA section-injection [15] attacks.Finally, for reinforcement learning techniques, we selected the Gym-malware [16] attack.In this section, we describe each of these attacks in detail.
Partial DOS attack and Full DOS attack focus on modifying bytes in the DOS header of a portable executable (PE) file.The DOS header contains only two important pieces of information.The initial 2 bytes represent the magic number, while the final 4 bytes at offset 0x3C show the location of the PE signature in the NT headers.Partial DOS attacks modify only bytes within the range of 0x02 to 0x3B, inclusive.Full DOS attacks expand this range to include all bytes up to the PE signature.The position of this signature may vary in individual files, but it can be found at offset 0x3C.
GAMMA padding attack and GAMMA section-injection attack are based on inserting parts extracted from benign files into malware files.These are black-box attacks.Gamma attacks are formalized as a constrained optimization problem.The goal is to minimize the probability of detection but also to minimize the size of the injected content.This optimization problem is solved using a genetic optimizer.First, a random matrix is created that represents the initial population of manipulation vectors.The algorithm then iterates in three steps: selection, crossover, and mutation.During selection, the objective function is evaluated and the N best candidate manipulation vectors from the current population and the population created in the previous iteration are selected.This is followed by the crossover function, where the candidates from the previous step are modified by mixing the values of pairs of randomly selected candidate vectors and a new set of N candidates is returned.The last operation is a mutation, which randomly changes the elements of each vector with a low probability.At each iteration, N queries are made to the target model to evaluate the objective function on new candidates and keep the best candidate population.When the maximum number of queries is reached or no further improvement in the objective function value is observed, the best manipulation vector from the current population is returned.The resulting adversarial malware sample is obtained by applying the optimal manipulation vector to the input malware sample through the manipulation operator.
The Gym-malware attack is based on reinforcement learning.The environment consists of a sample of malware and the attack target, which is an anti-malware engine.At each step, the agent receives feedback that is composed of a reward value and a vector of features that summarize the state of the environment.Based on the feedback, the agent selects mutations from a set of actions, such as adding a function to the import address table that will never be used; manipulating existing section names; creating new sections that will not be used; adding bytes to extra space at the end of sections; creating a new entry point that immediately jumps to the original entry point; manipulating debug information; packing or unpacking the file; modifying the header checksum; adding bytes to the end of the PE file.This process is repeated in several rounds, and rounds can be prematurely terminated if the agent bypasses the anti-malware engine before 10 mutations are completed.

Related Work
In this section, the publications on modern methods to create adversarial examples are summarized.The section is divided into several parts, depending on the area to which the method belongs, with all publications compared based on the attacker's knowledge and strategy in Table 1.

Evolutionary Algorithm-based Attacks
In [17], an AIMED system was designed and implemented to generate adversarial examples using genetic programming by Castro et al.The system enables the automatic finding of optimized modifications that are applied to previously detected malware and lead to its incorrect evaluation by the malware classifier.It is ensured that all generated adversarial examples are valid PE files.The system implements genetic operations such as selection, crossover, and mutation.[17] 2019 AIMED black-box evolutionary algorithm [18] 2020 MDEA black-box evolutionary algorithm [15] 2021 GAMMA black-box evolutionary algorithm [16] 2018 Gym-malware black-box reinforcement learning [19] 2019 DQEAF black-box reinforcement learning [20] 2020 MAB-malware black-box reinforcement learning [21] 2021 AIMED-RL black-box reinforcement learning [22] 2018 AMB white-box gradient [23] 2018 -white-box gradient [13] 2019 -white-box gradient [24] 2018 -white-box gradient [14] 2020 RAMEN white-box gradient [25] 2017 MalGAN black-box GAN [26] 2019 Improved-MalGAN black-box GAN [27] 2020 Wang and Miikkulainen proposed an adversarial malware detection model named MDEA [18].This model combines the convolutional neural network to classify raw data from malicious binary and evolutionary optimization to modify detected malware.The action space consists of 10 different methods to modify binary programs.The genetic algorithm evolves different action sequences by selecting actions from the action space until the generated adversarial malware can bypass the target malware detectors.After the successful discovery of action sequences, these sequences are applied to the corresponding malware samples and create a new training set for the detection model.Unlike AIMED, malware samples generated by MDEA are not tested for functionality.
Demetrio et al. introduced a black-box attack framework called GAMMA [15].The black-box attack is the most challenging case since the attacker knows nothing about the target classifier besides the final prediction label.GAMMA attacks are based on the principle of injecting harmless content extracted from goodware into malicious files.Harmless content is inserted into some newly created section or at the end of the file, while the functionality of the file is preserved.The attack is formalized as a constrained optimization problem that minimizes the probability of escaping detection and also penalizes the size of the injected content through a specific penalty.

Reinforcement Learning-based Attacks
Anderson et al. focused on automating the manipulation of malicious PE files in [16].The goal is to modify the original malicious PE file so that it is no longer detected as malicious and, at the same time, its format and functionality are not violated.They proposed an attack known as Gymmalware.This is a black-box attack based on reinforcement learning.The authors defined the RL agent's action space as a set of binary manipulation actions.Over time, the agent learns which combinations of actions make malware more likely to bypass antivirus systems.Song et al. proposed a MAB-Malware framework [20] based on reinforcement learning to generate adversarial PE malware examples.The action selection problem is modeled as a multiarmed bandit problem.The results showed that the MAB-Malware framework achieves an evasion rate of 74% to 97% against machine learning detectors (EMBER [28] and MalConv [29]) and an evasion rate of 32% to 48% against commercial antivirus (AV).Furthermore, they also showed that the transferability of adversarial attacks between ML-based classifiers (i.e., adversarial examples generated against one classifier can be used successfully against another) is greater than 80%, and the transferability of attacks between pure ML and commercial AV is only up to 7%.
Fang et al. proposed a framework named DQEAF [19] that uses reinforcement learning to evade antimalware engines.DQEAF is similar in methodology to Gym-malware but has many benefits and a higher evasion success rate of adversarial examples compared to it.DQEAF uses a subset of modifications used in Gym-malware and ensures that all modifications will not cause corruption in the modified malware files.DQEAF is able to reduce instability caused by higher dimensions by representing executable files with only 513 features, which is much lower than that in Gymmalware.DQEAF takes priority into account when replaying past transitions.This helps to replay higher-value transitions more frequently, and thus optimize RL networks.
Another reinforcement learning approach is presented in [21], in which Labaca-Castro et al. presented the AIMED-RL adversarial attack framework.This attack can generate adversarial examples that lead machine learning models to misclassify malicious files without compromising their functionality.The authors demonstrated the importance of a penalty technique and introduced a new penalization for the reward function with the aim of increasing the diversity of the generated sequences of modifications while minimizing the number of modifications.The results showed that the agents with penalty outperform the agents without penalty in terms of both the best and the average evasion rates.

Gradient-based Attacks
Kolosnjaji et al. in [22] introduced a gradientbased white-box attack to generate adversarial malware binaries against MalConv.A white-box scenario occurs when the attacker gets access to the system and may examine its internal configuration or training datasets.The basic idea of the attack is to manipulate some bytes of each malicious software to maximize the likelihood that the input samples are classified as benign.To ensure that malicious binary functionality is maintained, only padding bytes at the end of the file are considered.The results show that the evasion rate is linearly correlated with the length of the injected sequence and, despite the fact that less than 1% of the bytes are modified, the modified binary evades the target network with high probability and that the precision of MalConv can be reduced by more than 50%.
In [23] Kreuk et al. presented an improved gradient-based white-box attack method against MalConv.The authors proposed two methods for inserting a sequence of bytes into a file; the payload is inserted either at the end of the file or into slack regions.Unlike [22], the evasion rate of [23] is invariant to the length of the injected sequence.
Demetrio et al. in [13] presented a gradientbased variant white-box attack that is similar to [22].The main difference is that [22] injects adversarial bytes to the end of the PE file, while this attack is limited to changing bytes within a specific disk operating system (DOS) header in the PE header.The results show that a change of a few bytes is sufficient to evade MalConv with high probability.
Suciu et al. in [24] describe the FGM append and the FGM slack attacks and compare their effectiveness against MalConv.Experimental results show that the FGM slack attack achieves better results than the FGM append attack with fewer modified bytes.
Demetrio et al. in [14] propose RAMEN, a general adversarial attack framework against PE malware detectors.This framework generalizes and includes previous attacks against machine learning models, as well as three new attacks based on manipulations of the PE file format that preserve its functionality.The first attack is a full DOS attack, which edits all the available bytes in the DOS header.The second attack is called Extend, which enlarges the DOS header, thus enabling manipulation of the extra DOS bytes.The third is the Shift attack, which shifts the content of the first section, creating additional space for the adversarial payload.

Generative Adversarial
Network-based Attacks  [26].The authors discuss the problems of MalGAN and try to improve them.For example, the original MalGAN uses multiple malware samples to train MalGAN, in contrast, the improved MalGAN uses only one malware sample.Furthermore, the original MalGAN trains the generator and the malware detector with the same application programming interface (API) call list, while the Improved MalGAN trains with different API call lists.
Yuan et al. in [27] introduced GAPGAN, a GAN-based black-box adversarial attack framework.GAPGAN allows end-to-end black-box attacks at the byte level against deep learningbased malware binaries detection.In this approach, a generator and a discriminator are trained concurrently.The generator is used to generate adversarial payloads that are appended to the end of the original data to craft a malware adversarial sample while ensuring the preservation of their functionality.The discriminator tries to imitate the black-box malware detector to recognize both the original benign samples and the adversarial samples generated.When the training process is completed, the trained generator is able to generate an adversarial sample in less than 20 milliseconds.Experiments show that GAPGAN is capable of achieving a 100% attack success rate against the MalConv malware detector by only inserting adversarial payloads with the size of 2.5% of the total length of the input malware samples.

Experiments
This section describes the setup and procedure for each experiment.First, the hardware configuration is introduced, followed by a description of the datasets used.Next, the setup of the different algorithms used to generate adversarial samples is described.Finally, the experiments performed are described.

Experimental Setup
The experiments are carried out on the NVIDIA DGX Station A100 server.This server is equipped with an AMD EPYC 7742 processor with a base frequency of 2.25 GHz, 64 cores and 512 GiB of RAM.We also use a virtual machine with Windows 11 operating system and another virtual machine with Kali Linux operating system for testing and analysis.
We use two datasets for our experiments.The first dataset contains a total of 3,625 harmless executable files, which were collected from a newly installed Windows 11 system.The second dataset contains 3,625 malicious executable files obtained from the VirusShare repository [30].All the files in both datasets are PE files.

Attack settings
In total, we compare five adversarial attack strategies.Partial DOS and Full DOS attacks are performed in a white-box setting against the MalConv detector and the maximum number of iterations was set to 50.GAMMA padding and GAMMA section-injection attacks are performed in a black-box setting against the MalConv detector with the maximum number of queries set to 500, regularization parameter set to 10 −5 , and a total of 100 .datasections extracted from benign programs used as injection content.The last attack we use is the Gym-malware attack with its default settings performed in a black-box setting against the GDBT detector.The Gym-malware model was trained on a dataset of 3,000 malicious samples and a validation set of 1,000 files.

Evaluation Metrics
We use several metrics to evaluate the experiments.A key metric in the area of adversarial machine learning is the evasion rate (ER), the proportion of malware files misclassified by the target malware classifier, and can be calculated as follows: where misclassif ied is the number of malware samples misclassified as benign and total is the total number of files submitted to the target classifier after discarding files that were already incorrectly predicted before modification.The evasion rate mentioned above is a universal metric that can be used to evaluate both single attacks and combinations of attacks.In both cases, we are interested in the percentage of malware that escaped detection by the antivirus program.Additionally, we use the following metrics to evaluate the combination of attacks.
The first two metrics that we chose to evaluate the success of the combination are the absolute improvement and the relative improvement in the evasion rate when using the second attack in the combination compared to the first attack.Absolute improvement (AI) can be described by the following formula: where ER C is the total evasion rate when using a combination of methods, and ER 1 is the evasion rate after using the first attack in the combination alone.The result is the percentage increase in evasion rate between the first and second attack in the combination.For example, if the evasion rate after the first attack in the combination is 0.01 and after the second attack is 0.1, then the absolute improvement is 0.09, which means that the second attack improved the overall evasion rate by 9%.Similarly, the relative improvement (RI) can be expressed using the formula: where the meaning of the variables is the same as in the previous formula (2).However, in the previous case, the result expressed a percentage increase over all samples tested.In the case of relative improvement, we limit ourselves to the set of samples that escaped the antivirus program after the first attack.For example, if the evasion rate after the first attack of the combination is 0.01 and after the second attack it is 0.1, then the relative improvement is 9, i.e. the second attack improved the evasion rate of the first attack by 900%.Next, we need to compare the combination of attacks with performing the attacks separately to see if the combination of attacks adds any value.To do this, we use a simple comparison of the evasion rate of the combination of attacks with the evasion rate of the attacks performed separately.We call it evasion rate comparison (ERC), and it has the following calculation: where ER C is the evasion rate of the combination attack, while ER 1 and ER 2 are the evasion rates of the first and second attacks, respectively.If the result is positive, it means that the combination of attacks performed better than the combination of the two attacks in the combination that would have been performed alone.If the result is negative or zero, it means that the execution of the attack combination was pointless because one of the attacks that were part of the combination performed better or was equal to the combination in terms of evasion rate.

Experiments Description
We present four experiments that explore different characteristics of the adversarial attack methods mentioned above.

Sample Generation Time
In the first experiment, we measure the time it took to generate individual samples using all the aforementioned selected algorithms.These results, along with other data collected during the experiments, can help compare the effectiveness of various generators and determine the most effective method to generate adversarial malware samples.

Sample Size
In the second experiment, we investigate how the size of the original malware samples changes by applying various adversarial malware generators.Generally, the attacker's goal is to minimize the increase in the size of the generated adversarial files to make it harder to distinguish them from the original malware samples.

Bypassing commercial AV products
In the third experiment, we analyze the effectiveness of created adversarial malware samples against real-world AV detectors.Based on a comparative study [31], 10 AV programs were selected for experimentation, and their names were intentionally anonymized in the following results.Note that in the subsequent results, only nine AVs are listed as two of the selected AVs reported the same results.
The modified malware files from different adversarial algorithms are submitted to the Virus-Total server [32] to obtain the evasion rate for each adversarial malware generator.To avoid bias in the results, we only analyzed malware samples that were correctly classified by all selected AV products before modification.We also discarded samples from which we were unable to obtain file analysis from VirusTotal, e.g., due to broken behavior of modified examples.In total, we use a set of 530 genuine malware samples along with a modified version for each malware generator.

Combination of Multiple Techniques
In the last experiment, we test the effectiveness of using a combination of methods to generate malware samples [33].The goal of this experiment is to test whether using multiple adversarial example generators per malware sample would significantly increase the malware evasion rate.An overview of the experiment is shown in Figure 1.First, the original malware samples are processed by the first generator.These modified samples are then tested against a real AV detector that is not part of the generator.The result is a set of samples divided into two sets.The first set consists of evasive examples that successfully evaded the given malware detector, and this set is no longer processed.In

Results
This section presents the recorded results of the individual experiments described in Section 4.

Sample generation time
Firstly, we look at the results of the sample generation time.Average times in seconds and standard deviation for each attack are listed in Table 2 and in the box plot in Figure 2.
From Figure 2, we can see that Gym-malware took the least amount of time, on average less than 6 seconds, to generate adversarial examples  with some outliers below 100 seconds.It should be noted that Gym-malware requires a preceding training phase that is not taken into account in this experiment.On the contrary, the Full DOS attack recorded the longest time to create adversarial examples, with an average duration of more than 160 seconds.The remaining three methods achieved similar sample generation times of around 100 seconds in spite of the fact that the results of the GAMMA section-injection attack contain several outliers with extensive long durations of more than 300 seconds.However, the measured times may be affected by the settings of individual algorithms.

Sample Size
Secondly, we present the results of the sample size experiment.Partial DOS and Full DOS attacks are based on changing bytes in the DOS header.Thus, modifying the malware samples by these methods does not alter the size of the resulting adversarial examples.On the other hand, the remaining tested methods can change the initial file size.The general results of this experiment are shown in Table 3.
The GAMMA padding and section-injection attacks are based on inserting parts extracted from benign files into the malware file.In the case of the GAMMA padding attack, the file size increased on average by 223,605 bytes and in the case of the GAMMA section-injection attack, by 1,940,352 bytes.As we can see from Table 3, the adversarial examples generated by the GAMMA section-injection attack exhibit significant differences in the final file sizes.
The Gym-malware attack uses various types of file manipulations.The file size can be reduced, increased, or unchanged based on the chosen modification.On average, the file size was reduced by 149,273 bytes.The observed file size reduction was probably due to the authors' implementation of file manipulations, as they used the LIEF library [34], which significantly alters the structure of the initial malware file.

Bypassing commercial AV products
Thirdly, we list the effectiveness of generated adversarial samples on commercially available AV programs.The results are shown in Table 4.
Each column represents one of the selected AV programs and each row represents one of the algorithms used to generate adversarial samples.The values in the table represent the achieved evasion rates, expressed as a percentage, for the corresponding algorithm and AV products.The Gym-malware attack achieved the highest evasion rates among all selected AV programs, successfully bypassing the top AVs 19.02% to 67.23% of the time.The second-best results were recorded by the GAMMA section-injection attack, which recorded evasion rates between 1.23% and 43.62%.In contrast, the GAMMA padding attack achieved the worst results, failing to mislead any detector tested in more than 1.5% of cases.Full and Partial DOS attacks scored slightly better than the GAMMA padding attack, with the Full DOS marginally outperforming the Partial DOS attack.

Combination of Multiple Techniques
Based on the results from the previous experiment, we chose the three most successful adversarial example generators and tested all nine possible combinations.Namely, we used Gym-malware, GAMMA section-injection, and Full DOS adversarial malware generators.This section contains two types of tables.The first type of table lists the measured minimum, average, and maximum values for particular AVs across the nine combinations of the three selected generators.The second type of table lists the measured minimum, average, and maximum values for a particular combination of generators across the nine AVs.The First Generator column contains the first generator used, and the Second Generator column contains the second generator used as described in Section 4.4.4.First, we present the results of the evasion rate metric.For all AVs, we examine the minimum, average, and maximum of the results of all combinations of generators tested in the experiment.These results can be found in Table 5.For the minimum values of the evasion rate, we can see that none of the antivirus programs reached a detection rate of 100%.On the other hand, all these values are relatively low compared to the average and maximum values, which tells us that some of the nine generator combinations were not very successful.The average evasion rates range from about 18% to 49%, and the maximum values range from 30% to 78%.The best result was achieved by combinations of generators against AV7, where the average evasion rate is around 49%, while the least successful was against AV2, where the average evasion rate is around 18%. Table 6 shows the results of the evasion rate achieved by each combination of generators.Here, we can see that the most successful combination was the one in which the Gym-malware generator was used twice in a row.This achieved a minimum evasion rate of around 30% for all AVs.The average value for this combination is around 58%, and for at least one AV we achieved an evasion rate of around 78% with this combination.On the other hand, the worst combination in terms of evasion rate is the one in which the Full DOS generator was used twice.In this case, we have an average evasion rate of about 2%, while for all other combinations, this value exceeds 10%, and for some even more significantly.The maximum value of the evasion rate for this combination is about 5%, for the others we have at least about 44%.

Evasion Rate
To determine the effectiveness of using generator combinations instead of individual generators alone, we can compare these results with the values from the previous experiment.However, we have additional metrics for this, which are described in Section 4.3.We analyze the results of these metrics in the following parts of this section.

Absolute Improvement
Next, we evaluate the metric that we identified as an absolute improvement in Section 4.3.In short, it is the percentage difference in the evasion rate between the evasion rate achieved by combining both generators and the evasion rate achieved by using only the first generator.
First, we focus on Table 7, which shows the absolute improvement values for each AV in the nine combinations of generators.Here we can see that for some AVs we were unable to improve the evasion rate by applying the second generator.Specifically, this refers to the minimum value for AV4, AV5, and AV9.Table 8 helps us to identify the relevant generators.We can see that the only non-improving combinations are the ones in which we use the same generator twice, namely, the Full DOS generator or the GAMMA section-injection generator.This may indicate that the use of these combinations is not entirely effective.For the average values in Table 7, we can see that they do not differ significantly, in all cases between about 8% and 15%.For the maximum values, we have a slightly higher range, approximately 22% to 61%.This means that if we use a second generator, we will improve the evasion rate by about 10% on average after using the first generator.However, significantly more intriguing is Table 8, which shows the absolute improvement for each generator combination against all AVs.Here, we can see that the use of Full DOS as the second generator does not result in a significant absolute improvement in the evasion rate for the minimum, average, and maximum values.This means that using Full DOS as the second generator in a combination is the least effective.On the other hand, we can see that we get the best absolute improvement by using the Gym-malware method as the second generator in the combination.
In Table 8, we can observe another interesting fact.As we have already mentioned, the use of two identical generators in combination is not very effective.However, this statement does not apply to the Gym-malware generator.In contrast, if we use the Gym-malware generator as the first generator in the combination, then the best choice to select the second generator seems to be the Gym-malware generator again.Full DOS and GAMMA section-injection generators have minimal absolute improvements in the role of the second generator.These results show that the Gym-malware generator is successful in both cases, used alone and in combination with itself.
The best absolute improvement values are achieved when we choose Full DOS as the first generator and Gym-malware as the second.This results in an absolute improvement in the evasion rate in all AVs of at least around 22%, on average 37% and up to a maximum of 61%.On the other hand, the worst results in terms of absolute evasion rate improvement are obtained when we choose Full DOS as both generators.In this case, we obtain a minimum absolute improvement of 0%, an average of 0.3% and a maximum of 1.1% across all antivirus programs.We can conclude that the Full DOS generator is likely to be the least successful generator, both when used alone and in combination.

Relative Improvement
We follow the absolute improvement results with a relative improvement in the evasion rate.Analogously to the improvement in the absolute evasion rate, we can see the results of relative improvement in Table 9.We can see that AV7 performed the best on average in terms of relative improvement of the evasion rate.Conversely, AV4 performed the worst, where in some cases the second generator managed to increase the number of successfully modified samples (which evaded detection) by more than 67 times.Table 10 confirms that the Full DOS generator, which was used as the second generator in combination, does not increase the evasion rate significantly.On the other hand, if this generator is chosen as the first generator in the combination, the GAMMA section-injection or Gym-malware generator can increase the number of samples that evade detection by up to 67 times.This shows that the Full DOS generator is not a very strong generator on its own.Regarding the Gym-malware generator, we can see that when it is used as the first generator in a combination, the second generator does not significantly increase the result, even when Gym-malware is used again as the second generator.However, the Gym-malware generator is very successful on its own as it achieves a significantly higher evasion rate than other generators.Therefore, a relative improvement in the evasion rate from 39% to 56% can cause a drastic increase in the total evasion rate of the combination (up to 20%).We can also see that when the Gym-malware generator is used as a second generator in other combinations, there are huge relative improvements over the first generator.Again, we can see that Gym-malware is very effective when used in any combination of generators.

Evasion Rate Comparison
The last metric examined, evasion rate comparison, helps us find the answer to the question of whether it is better to use individual generators separately or to use them in combination.A negative value of this metric indicates that we would achieve a better evasion rate by using the better of the two generators individually (better in terms of evasion rate).On the contrary, a positive value indicates that we recorded a better result using the combination of both generators.
Looking at the minimum values in table 11, we can see that for almost all AVs there was a combination of generators that was not effective due to the negative values in this column.The exception is AV2, where we achieved a better evasion rate for all combinations of generators used.However, the average and maximum values show that in most cases the combination is more effective than the better of the two generators used separately.Only AV4 has a negative average value, which means that a separate use of a single generator is a better option against this AV.On the other hand, when attacking AV2 it is better to use a combination of generators as it achieved the highest average evasion rate comparison value.Table 12 shows that combining generators using Full DOS as the first generator and either GAMMA section-injection or Gym-malware as the second generator yields negative results, as evidenced by both the minimum and average values being negative.Based on the results from the previous parts of this section, we can say that it is more advantageous to use GAMMA sectioninjection and Gym-malware generators separately in such cases.We can also see other negative values in the minimum value column for the combination of GAMMA section-injection and Gym-malware generators used in this order.Nonetheless, the average value of the combination is positive, indicating that it is beneficial to use this combination on average.For the remaining combinations, we can conclude that using them results in a better evasion rate than using the better of the two generators separately.

Conclusion
In this paper, we explored the use of adversarial learning techniques in malware detection.
Our goal was to apply existing methods for generating adversarial malware samples, test their effectiveness against selected malware detectors, and compare the evasion rates achieved and the practical applicability of these methods.
For our experiments, we chose five adversarial malware sample generators: Partial DOS, Full DOS, GAMMA padding, GAMMA sectioninjection, and Gym-malware.This selection represents a spectrum of adversarial techniques based on gradient, evolutionary algorithms, and reinforcement learning.These adversarial malware generators were evaluated on nine commercially available antivirus products.
To validate and compare the different characteristics and properties of the methods used, we performed four experiments.These included tracking the time taken to generate samples, changes in sample size after using applying adversarial modifications, testing effectiveness against antivirus programs, and evaluating combinations of generators.
The results indicate that making optimized modifications to previously detected malware can cause the classifier to misclassify the file and label it as benign.Furthermore, the study confirmed that generated malware samples could be used successfully against detection models other than those used to generate them.Using combination attacks, a significant percentage of new samples were created that could evade detection by antivirus programs.
Experiments showed that the Gym-malware generator, which uses a reinforcement learning approach, has the greatest practical potential.This generator produced malware samples in the shortest time, with an average sample generation time of 5.73 seconds.The Gym-malware generator also achieved the highest evasion rate among all selected antivirus products, with the highest average evasion rate of 44.11% against nine AVs.Furthermore, the Gym-malware generator was effective when combined with another generator, especially with itself, where it achieved the highest average evasion rate of 58.35%.Additionally, this generator could significantly improve the performance of other generators with absolute and relative improvements ranging between 36.04%-37.42% and 741.93%-4027.99%,respectively.
For future experiments, we propose to study in more detail how the sample generation time is affected by the input genuine malware size and investigate the correlation between the time and the evasion rate of the resulting adversarial examples.Further experiments could be done in the area of combining generators where more than two generators could be combined to achieve even higher evasion rates.Our work highlights the importance of developing new techniques to detect malware and identify adversarial attacks.More research is needed in this area to successfully combat these novel threats and attacks.
Center for Informatics" and by the Grant Agency of the CTU in Prague, grant No. SGS23/211/OHK3/3T/18 funded by the MEYS of the Czech Republic.

Fig. 2 :
Fig. 2: Time required to generate a sample for each sample generator.

Table 1 :
Summary of State-of-the-Art Adversarial Attacks against PE Malware Detection.Paper Year Attack framework Knowledge Attack Strategy

Table 2 :
Average sample generation time for each sample generator.

Table 3 :
Changes to the sample size of generated adversarial samples for each generator.

Table 4 :
Evasion rate (in %) of generated adversarial samples achieved against AV products on the VirusTotal server.

Table 5 :
Evasion rate (in %) for each AV using all combinations of generators.

Table 6 :
Evasion rate (in %) for each generator combination against all AVs.

Table 7 :
Absolute improvement (in %) for each AV using all combinations of generators.

Table 8 :
Absolute improvement (in %) for each generator combination against all AVs.

Table 9 :
Relative improvement (in %) for each AV using all combinations of generators.

Table 10 :
Relative improvement (in %) for each generator combination against all AVs.

Table 11 :
Evasion rate comparison (in %) for each AV using all combinations of generators.

Table 12 :
Evasion rate comparison (in %) for each generator combination against all AVs.