1 Introduction

The rise of deep learning has led to rapid advances in multimedia forensics. Algorithms based on deep neural networks are able to automatically learn forensic traces, detect complex forgeries, and localize falsified content with increasingly greater accuracy. At the same time, deep learning has expanded the capabilities of anti-forensic attackers. New anti-forensic attacks have emerged, including those discussed in Chap. 14 based on adversarial examples, and those based on generative adversarial networks (GANs).

In this chapter, we discuss the emerging threat posed by GAN-based anti-forensic attacks. GANs are a powerful machine learning framework that can be used to create realistic, but completely synthetic data. Researchers have recently shown that anti-forensic attacks can be built by using GANs to create synthetic forensic traces. While only a small number of GAN-based anti-forensic attacks currently exist, results show these early attacks are both effective at fooling forensic algorithms and introduce very little distortion into attacked images. Furthermore, by using the GAN framework to create new anti-forensic attacks, attackers can dramatically reduce the time required to create a new attack.

For simplicity, we will assume in this chapter that GAN-based anti-forensic attacks will be launched against images. This also aligns with the current state of research, since all existing attacks are targeted at images. However, the ideas and methodologies presented in this chapter can be generalized to a variety of media modalities including video and audio. We expect new GAN-based anti-forensic attacks targeted on these modalities will likely arise in the near future.

This chapter begins with Sect. 17.2, which provides a brief background on GANs. Section 17.3 provides background on anti-forensic attacks, including how they are traditionally designed as well as their shortcomings. In Sect. 17.4, we discuss how GANs are used to create anti-forensic attacks. This gives a high-level overview of the components of an anti-forensic GAN, differences between attacks based on GANs and adversarial examples, as well as an overview of existing GAN-based anti-forensic attacks. Section 17.5 provides more details about how these attacks are trained, including different techniques that can be employed based on the attacker’s knowledge of and access to the forensic algorithm under attack. We discuss known shortcomings of anti-forensic GANs and future research directions in Sect. 17.6.

2 Background on GANs

Generative adversarial networks (GANs) are a machine learning framework that is used to train generative models (Goodfellow et al. 2014). In their standard form, GANs consist of two components: a generator G and a discriminator D. The generator and the discriminator can be viewed as competitors in a two-player game. The goal of the generator is to learn to produce deceptively synthetic data that can mimic the real data, while the goal of the discriminator aims to learn to distinguish the synthetic data from the real data. The two parties are trained alternatively and each party attempts to improve its performance to defeat the other until the synthetic data is indistinguishable from the real data.

The generator is a generative model that creates synthetic data \(\mathbf {x}'\) by mapping some input \(\mathbf {z}\) in a latent space to an output as a set of real data \(\mathbf {x}\). Ideally, the generator produces synthetic data \(\mathbf {x}'\) with the same distribution as real data \(\mathbf {x}\) such that the two cannot be differentiated. The discriminator is a discriminative model that will assign a scalar to each input data. The discriminator can output any scalar between 0 and 1, with 1 representing the real data and 0 representing the synthetic data. During training phase, the discriminator aims to maximize the probability of assigning correct label to both real data \(\mathbf {x}\) and synthetic data \(\mathbf {x}'\), while the generator aims to reduce the difference between the discriminator’s decision on the synthetic data and 1 (i.e., it must be real data). Training a GAN is equivalent to finding the solution of the min-max function

$$\begin{aligned} \min _{G}\max _{D} \mathbb {E}_{\mathbf {x}\sim f_X(\mathbf {x})}[\log D(\mathbf {x})]+\mathbb {E}_{\mathbf {z}\sim f_Z(\mathbf {z})}[\log (1- D(G(\mathbf {z})))] \end{aligned}$$
(17.1)

where \(f_X(x)\) represents the distribution of real data, \(f_Z(z)\) represents the distribution of input noise, \(\mathbb {E}\) represents the operation of calculating expected value.

2.1 GANs for Image Synthesis

While GANs can be used to synthesize many types of data, one of their most common uses is to synthesize images or modify images. The development of several specific GANs has provided important insights and guidelines into the design of GANs for images. For example, while the GAN framework does not specify the specific form of the generator or discriminator, the development of DCGAN showed that deep convolutional generators and discriminators yield substantial benefits when performing image synthesis (Radford et al. 2015). Additionally, this work suggested constraints on the architectures of the generator and discriminator that can result in stable training. The creation of Conditional GANs (CGANs) showed the benefit of using information such as labels as auxiliary inputs to both the generator and the discriminator (Mehdi Mirza and Simon Osindero 2014). This can help can improve the visual quality and control the appearance of synthetic images generated by a GAN. InfoGAN showed that the latent space can be structured to both select and control generation of images (Chen et al. 2016). Pix2Pix (Isola et al. 2018) showed that GANs can learn to translate from one image space to another (Isola et al. 2018), while StackGAN demonstrated that text can be used to guide the generation of images via a series of CGANs (Zhang et al. 2017).

Since their first appearance in literature, the development of GANS for image synthesis has proceeded at an increasingly rapid pace. Research has been performed to improve the architectures of the generator and discriminator (Zhang et al. 2019; Brock et al. 2018; Karras et al. 2017; Zhu et al. 2017; Karras et al. 2020; Choi et al. 2020), improve the training procedure (Arjovsky et al. 2017; Mao et al. 2017), and build publicly available datasets to aid the research community (Yu et al. 2016; Liu et al. 2015; Karras et al. 2019; Caesar et al. 2018). Because of these efforts, GANs have been used to achieve state-of-the-art performance on various computer vision tasks, such as super-resolution (Ledig et al. 2017), photo inpainting (Pathak et al. 2016), photo blending (Wu et al. 2019), image-to-image translation (Isola et al. 2018; Zhu et al. 2017; Jin et al. 2017), text-to image-translation (Zhang et al. 2017), semantic-image-to-photo translations (Park et al. 2019), face aging (Antipov et al. 2017), generating human faces (Karras et al. 2017; Choi et al. 2018, 2020; Karras et al. 2019, 2020), generating 2-D (Jin et al. 2017; Karras et al. 2017; Brock et al. 2018) and 3-D objects (Wu et al. 2016), video prediction (Vondrick et al. 2016), and many more applications.

3 Brief Overview of Relevant Anti-Forensic Attacks

In this section, we provide a brief discussion of what anti-forensic attacks are and how they are designed.

3.1 What Are Anti-Forensic Attacks

Anti-forensic attacks are countermeasures that a media forger can use to fool forensic algorithms. They operate by falsifying the traces that forensic algorithms rely upon to make classification or detection decisions. A forensic classifier that is targeted by an anti-forensic attack is referred to as a victim classifier.

Just as there are many different forensic algorithms, there are similarly a wide variety of anti-forensic attacks (Stamm et al. 2013; Böhme and Kirchner 2013). Many of them, however, can be broadly grouped into two categories:

  • Attacks designed to disguise evidence of manipulation and editing. Attacks have been developed to fool forensic algorithms designed to detect a variety of manipulations such as resampling (Kirchner and Bohme 2008; Gloe et al. 2007), JPEG compression (Stamm and Liu 2011; Stamm et al. 2010; Fan et al. 2013, 2014; Comesana-Alfaro and Pérez-González 2013; Pasquini and Boato 2013; Chu et al. 2015), median filtering (Fontani and Barni 2012; Wu et al. 2013; Dang-Nguyen et al. 2013; Fan et al. 2015), contrast enhancement (Cao et al. 2010), and unsharp masking (Laijie et al. 2013), as well as algorithms detect forgeries by identifying inconsistencies in lateral chromatic aberration (Mayer and Stamm 2015), identify copy-move forgeries (Amerini et al. 2013; Costanzo et al. 2014), and detect video frame deletion (Stamm et al. 2012a, b; Kang et al. 2016).

  • Attacks designed to falsify the source of a multimedia file. Attacks have been developed to falsify demosaicing traces associated with an image’s source camera model (Kirchner and Böhme 2009; Chen et al. 2017)??, as well as PRNU traces that can be linked to a specific device (Lukas et al. 2005; Gloe et al. 2007; Goljan et al. 2010; Barni and Tondi 2013; Dirik et al. 2014; Karaküçük and Dirik 2015)

To understand how anti-forensic attacks operate, it is helpful to first consider a simple model of how forensic algorithms operate. Forensic algorithms extract forensic traces from an image, then associate those traces with a particular forensic class. That class can be associated with editing operation or manipulation that an image has undergone, the image’s source (i.e., it’s camera model, its source device, it’s distribution channel, etc.,) or some other property that a forensic investigator wishes to ascertain. It is important to note that when performing manipulation detection, even unaltered images contain forensic traces. While they do not contain traces associated with editing, splicing, or falsification, they do contain traces associated with an image being generated by a digital camera and not undergoing subsequent processing.

Anti-forensic attacks create synthetic forensic traces within an image that are designed to fool a forensic classifier. This holds true even for attacks designed to remove evidence of editing. These attacks synthesize traces in manipulated images that are associated with unaltered images. Most anti-forensic attacks are targeted, i.e., they are designed to trick the forensic classifier into associating an image with a particular target forensic class. For example, attacks against manipulation detectors typically specify the class of ‘unaltered’ images as the target class. Attacks designed to fool source identification algorithms synthesize traces associated with a target source that the image did not truly originate from. Additionally, a small number of anti-forensic attacks are untargeted. This means that the attack does not care which class the forensic algorithm associates the image with, so long as it is not the true class. Untargeted attacks are more commonly used to fool source identification algorithms, since an untargeted attack still tricks the forensic algorithm into believing that an image can form an untrue source. They are typically not used against manipulation detectors, since a forensic investigator will still decide that image is inauthentic even if their algorithm mistakes manipulation A for manipulation B.

3.2 Anti-Forensic Attack Objectives and Requirements

When creating an anti-forensic attack, an attacker must consider several design objectives. For an attack to be successful, it must satisfy the following design requirements;

  • Requirement 1: Fool the victim forensic algorithm An anti-forensic attack should cause the victim algorithm to classify an image as belonging to the attacker’s target class. If the attack is designed to disguise manipulation or content falsification, then the target class is the class of unaltered images. If the attack is designed to falsify an image’s source, then the target class is the desired fake source chosen by the attacker such as a particular source camera.

  • Requirement 2: Introduce no visually perceptible artifacts. An attack should not introduce visually distinct telltale signs that a human can easily identify. If this occurs, a human will quickly disregard an attacked image as fake even if it fools a forensic algorithm. This is particularly important if the image is to be used as part of a misinformation campaign, since images that are not plausibly real will be quickly flagged.

  • Requirement 3: Maintain high visual quality of the attacked image. Similar to Requirement 2, if an attack fools a detector but significantly distorts an image, it is not useful to an information attacker. Some distortion is allowable, since ideally no one other than the attacker will be able to compare the attacked image to its unattacked counterpart. Additionally, this requirement is put in place to ensure that an attack does not undo desired manipulations that an attacker has previously performed to an image. For example, localized color corrections and contrast adjustments may be required to make a falsified image appear visually plausible. If an anti-forensic attack reverses these intentional manipulations, then the attack is no longer successful.

In addition to these requirements, there are several other highly desirable properties for an attack to possess. These include

  • Desired Goal 1: Be rapidly deployable in practical scenarios. Ideally, an attack should be able to be launched quickly and efficiently, thus enabling rapid attacks or attacks at scale. It is flexible enough to attack images of any arbitrary size, not only images of a fixed predetermined size. Additionally, it should not require prior knowledge of which region of an image will be analyzed, such as a specific image block or block grid (it is typically unrealistic to assume that an investigator will only examine certain fixed image locations).

  • Desired Goal 2: Achieve attack transferability. An attack achieves transferability if it is able to fool other victim classifiers that it has not been explicitly designed or trained to fool. This is important because an investigator can typically utilize multiple different forensic algorithms to perform a particular task. For example, several different algorithms exist for identifying an image’s source camera model. An attack designed to fool camera model identification algorithm A is maximally effective if it also transfers to fool camera model identification algorithms B and C.

While it is not necessary that an attack satisfy these additional goals, attacks that do are significantly more effective in realistic scenarios.

3.3 Traditional Anti-Forensic Attack Design Procedure and Shortcomings

While anti-forensic attacks target different forensic algorithms, at a high level the majority of them operate in a similar manner. First, an attack builds or estimates a model of the forensic trace that it wishes to falsify. This could be associated with the class of unaltered images in the case that the attack is designed to disguise evidence of manipulation or falsification, or it could be associated with a particular image source. Next, this model is used to guide a technique that synthesizes forensic traces associated with this target class. For example, anti-forensic attacks to falsify an image’s source camera model synthesize demosaicing traces or other forensic traces associated with a target camera model. Alternatively, techniques designed to remove evidence of editing such as multiple JPEG compression or median filtering synthesize traces that manipulation detectors associate with unaltered images.

Traditionally, creating a new anti-forensic attack has required a human expert to first design an explicit model of the target trace that the attack wishes to falsify. Next, the expert must create an algorithm capable of synthesizing this target trace such that it matches their model. While this approach has led to the creation of several successful anti-forensic attacks, it is likely not scalable enough to keep pace with the rapid development of new forensic algorithms. It is very difficult and time consuming for humans to construct explicit models of forensic traces that are accurate enough to for an attack to be successful. Furthermore, the forensic community has widely adopted the use machine learning and deep learning approaches to learn sophisticated implicit models directly from data. In this case, it may be intractable for human experts to explicitly model a target trace. Even if a model can be successfully constructed, a new algorithm must be designed to synthesize each trace under attack. Again, this is also a challenging and time-consuming process.

In order to respond to the rapid advancement of forensic technologies, adversaries would like to develop some automated means of creating new anti-forensic attacks. Tools from deep learning such as convolutional neural networks have allowed forensics researchers to successfully automate how forensic traces are learned. Similarly, intelligent adversaries have begun to look to deep learning for approaches to automate the creation of new attacks. As a result, new attacks based upon GANs have begun to emerge.

3.4 Anti-Forensic Attacks on Parametric Forensic Models

WIth the advances of software and apps, it is very easy for people to manipulate images the way they want. To make an image deceptively convincing to human eyes, post-processing operations such as resampling, blending, denoising are often applied to the falsified images. Recent years, deep learning-based algorithms are also used to create innovation contents. In the hand of an attacker, these techniques can be used for malicious purposes. Previous research shows that manipulations and attacks leave traces that can be modeled or characterized by forensic algorithms. While these traces may be invisible to human eyes, through forensic algorithms forged images can be detected or distinguished from the real images.

Anti-forensic attacks are countermeasures attempting to fool target forensic algorithms. Some anti-forensic attacks aim to remove forensic traces left by manipulation operations or attacks that the forensic algorithms analysis. Some anti-forensic attacks synthesize fake traces to make the forensic algorithms make a wrong decision. For example, anti-forensic attacks can remove traces left by resampling and make the forensic algorithms designed for characterizing resampling traces to believe the anti-forensically attacked images as “unaltered”. Anti-forensic attacks can also synthesize fake forensic traces associated with a particular target camera model B into an images, and make a camera model identification algorithm to believe an image was captured by the camera model B, while the image was actually captured by camera model A. Anti-forensic attacks can also hide forensic traces to obfuscate their true origins. Anti-forensic attacks are expected to leave invisible distortion, since an visible distortion often flag “fake” for investigators. However, anti-forensic attacks do not consider countermeasures of itself, and only examine fooling target forensic algorithms.

Additionally, anti-forensic attacks are often designed to falsify particular forensic traces. A common methodology is typically used to create anti-forensic attacks similar to those described in the previous section. First, a human expert designs a parametric model of the target forensic trace that they wish to falsify during the attack. Next, an algorithm is created to introduce a synthetic trace into the attacked image so that it matches this model. Often this is accomplished by creating a generative noise model which is used to introduce specially designed noise either directly into the image or into some transform of the image. Many anti-forensic attacks have been developed following this methodology, such as removing traces left by median filtering, JPEG compression, or resampling, or synthesizing demosaicing information of camera models.

This approach to creating attacks has several important drawbacks. First, it is both challenging and time consuming for human experts to create accurate parametric models of forensic traces. In some cases it is extremely difficult or even infeasible to create models accurate enough to fool state-of-the-art detectors. Furthermore, models of one forensic cannot typically be reused to attack a different forensic trace. This creates an important scalability challenge for the attacker.

3.5 Anti-Forensic Attacks on Deep Neural Networks

An even greater challenge arises from the widespread adoption of deep-learning-based forensic algorithms such as convolutional neural networks (CNNs). The approach described above cannot be used to attack forensic CNNs since they don’t utilize an explicit feature set or model of a forensic trace. Anti-forensic attacks targeting on deep-learning-based forensic algorithms now draw an increasing amount of attentions both from academia and industries.

While deep learning has achieved many state-of-the-art performances on many machine learning tasks, including computer vision and multimedia forensics, researchers found that non-ideal properties of neural networks can be exploited and cause misclassifications of an input image. Several explanations have been posited for the non-ideal properties of neural networks that adversarial examples exploit, such as imperfections caused by the locally-linear nature of neural networks (Goodfellow et al. 2014) or that there is misalignment between the features used by humans and those learned by neural networks when performing the same classification task (Ilyas et al. 2019). These findings facilitate the development of adversarial example generation algorithms.

Adversarial example generation algorithms operate by adding adversarial perturbations to an input image. The adversarial perturbations are some noises produced by optimizing a certain distance metric, usually \(L_p\) norm (Goodfellow et al. 2014; Madry et al. 2017; Kurakin et al. 2016; Carlini and Wagner 2017). The adversarial perturbations aim to push the representation in latent feature space just across the decision boundary of the desired class (i.e., targeted attack) or push the data away from its true class (i.e., untargeted attack). Forensic researchers found that adversarial example generation algorithms can be used as anti-forensic attacks on forensic algorithms. In 2017, Güera et al. showed that camera model identification CNNs (Güera et al. 2017) can be fooled by Fast Gradient Sign method (FGSM) (Goodfellow et al. 2014) and Jacobian-based Saliency Map (JSMA) (Papernot et al. 2016).

There are several drawbacks for using adversarial example generation algorithms as anti-forensic attacks. To ensure the CNN can be successfully fooled, the adversarial perturbations added to the image may be strong and result in fuzzy or distorted output images. While using some tricks, researchers could control the visual quality of anti-forensic attacked images to a certain level, maintaining high visual quality is still a challenge for further study. Another drawback is that adversarial perturbations have to be optimized for particular given input image. As a result, it could be time-consuming for producing large volume of anti-forensically attacked images. Additionally, the adversarial perturbations are not invariant to detection block alignment.

4 Using GANs to Make Anti-Forensic Attacks

Recently, researchers have begun adapting the GAN framework to create a new class of anti-forensic attacks. Though research into these new GAN-based attacks is still in its early stages, recent results have shown that these attacks can overcome many of the problems associated with both traditional anti-forensic attacks and attacks based on adversarial examples. In this section, we will describe at a high level how GAN-based anti-forensic attacks are constructed. Additionally, we will discuss the advantages of utilizing these types of attacks.

4.1 How GANs Are Used to Construct Anti-Forensic Attacks

GAN-based anti-forensic attacks are designed to fool forensic algorithms built upon neural networks. At a high level, they operate by using an adversarially trained generator to synthesize a target set of forensic traces in an image. Training occurs first, before the attack is launched. Once the anti-forensic generator has been trained, an image can be attacked simply by passing it through the generator. The generator does not need to be re-trained for each image, however, it must be re-trained if a new target class is selected for the attack.

Researchers have shown that GANs can be adapted to both automate and generalize the anti-forensic attack design methodology described in Sect. 17.5 of this chapter. The first step in the traditional approach to creating an anti-forensic attack involves the creation of a model of the target forensic trace. While it is difficult or potentially impossible to build an explicit model of the traces learned by forensic neural networks, GANs can exploit the model that has already been learned by the forensic neural network. This is done by adversarially training the generator against a pre-trained version of the victim classifier C. The victim classifier can either take the place of the discriminator in the traditional GAN formulation, or it can be used in conjunction with a traditional discriminator. Currently, there is not a clear consensus on whether a discriminator is necessary or beneficial when creating GAN-based anti-forensic attacks. Additionally, in some scenarios the attacker may not have direct access to the victim classifier. If this occurs, the victim classifier cannot be directly integrated into the adversarial training process. Instead, alternate means of training the generator must be used. These are discussed in Sect. 17.5.

When the generator is adversarially trained against the victim classifier, it learns the model of the target forensic trace implicitly used by the victim classifier. Because of this, the attacker does not need to explicitly design a model of the target trace, as would typically be done in the traditional approach to creating an anti-forensic attack. This dramatically reduces the time required to construct an attack. Furthermore, it increases the accuracy of the trace model used by the attack, since it is directly matched to the victim classifier.

Additionally, GAN-based attacks significantly reduce the effort required for the second step of the traditional approach to creating an anti-forensic attack, i.e., creating a technique to synthesize the target trace. This is because the generator automatically learns to synthesize the target trace through adversarial training. While existing GAN-based anti-forensic attacks utilize different generator architectures, the generator still remains somewhat generic in that it is a deep convolutional neural network. CNNs can be quickly and easily implemented using deep learning frameworks such as TensorFlow or Pytorch, making it easy for an attacker to experimentally optimize the design of their generator.

Currently, there are no well-established guidelines for designing anti-forensic generators. However, research by Zhang et al. has shown that when an upsampling component is utilized in a GAN’s generator, it leaves behind “checkerboard” artifacts in the synthesized image. Zhang et al. (2019). As a result, it is likely important to avoid the use of upsampling in the generator so that visual or statistically detectable artifacts that may act as giveaways are not introduced to an attacked image. This is reinforced by the fact that many existing attacks utilize fully convolutional generators that do not include any pooling or upsampling layers.

Convolutional generators also possess the useful property that once they are trained, they can be used to attack an image of any size. Fully convolutional CNNs are able to accept images of any size as an input and produce output images of the same size. As a result, these generators can be deployed in realistic scenarios in which the size of the image under attack may not be known a prior, or may vary in the case of attacking multiple images. They also provide an advantage in that synthesized forensic traces are distributed throughout the attacked image and do not depend on a blocking grid. As a result, the attacker does not require advanced knowledge of which image region or block that an investigator will analyze. This is a distinct advantage over anti-forensic attacks based upon adversarial examples.

Research also suggests that a well-constructed generator can be re-trained to synthesize different forensic traces than the ones which it was initially designed for. For example, the MISLGAN attack was initially designed to falsify an image’s origin by synthesizing source camera model traces linked with a different camera model. Recent work has also used this generator architecture to construct a GAN-based attack that removes evidence of multiple editing operations.

4.2 Overview of Existing GAN-Based Attacks

Recently, several GAN-based anti-forensic attacks have been developed to falsify information about an image’s source and authenticity. Here we provide a brief overview of existing GAN-based attacks.

In 2018, Kim et al. proposed a GAN-based attack to remove median filtering traces from an image (Kim et al. 2017). This GAN was built using a generator and a discriminator. The generator was trained to remove the median filtering traces, and the discriminator was trained to distinguish between the restored images and the unaltered images. The author introduced loss computed from a pre-trained VGG network to improve the visual quality of attacked images. This attack was able to remove media filtering traces from gray-scale images, and fool many traditional forensic algorithms using hand-designed features, and CNN detectors.

Chen et al. proposed a GAN-based attack in 2018 named MISLGAN to falsify an image’s source camera model (Chen et al. 2018). In this work, the authors assumed that the attacker has access to the forensic CNN that the investigator will use to identify an image’s source camera model. Their attack attempts to modify the traces in images associated with a target camera model such that the forensic CNN misclassifies an attacked image as having been captured by the target camera model. MISLGAN is consists of three major components: a generator, a discriminator, and the pre-trained forensic camera model CNN. The generator was adversarially trained against both the discriminator and the camera model CNN for each target camera model. The trained generator was shown to be able to falsify fake camera models for any color images with high success attack rate (i.e., the percentage of an image being classified as the target camera model after being attacked). This attack was demonstrated to be effective even when images under attack were captured by camera models not used during training. Additionally, the attacked images also maintain high visual qualities even comparing with the original unaltered image side by side.

In 2019, the authors of MISLGAN extended this work to create a black box attack against forensic source camera model identification CNNs (Chen et al. 2019). To accomplish this, the authors integrated a substitute network to approximate the camera model identification CNNs under attack. By training MISLGAN against a substitute network, this attack can successfully fool state-of-the-art camera model CNNs even in black-box scenarios, and outperformed adversarial example generation algorithms such as FGSM (Goodfellow et al. 2014). This GAN-based attack has been adapted to remove traces left by editing operations and to create transferable anti-forensic attacks (Zhao et al. 2021). Additionally, by improving the generator’s architecture, Zhao et al. were able to falsify traces left in synthetic images by the GAN generation process (Xinwei Zhao and Matthew C. Stamm 2021). This attack was shown to fool existing synthetic image detectors targeted at detecting both GAN-generated faces and objects.

In 2021, Cozzolino et al. proposed a GAN-based attack, SpoC (Cozzolino et al. 2021), to falsify camera model information of a GAN-synthesized image. SpoC is consisted of a generator, a discriminator, and a pre-trained embedder. The embedder is used to learn the reference vectors of real images or the images of the target camera models in the latent spaces. This attack can successfully fool GAN-synthesized image detectors or the camera model identification CNNs, and outperformed adversarial example generation algorithm such as FGSM (Goodfellow et al. 2014) and PGD (Madry et al. 2017). The authors also showed that the attack is robust to JPEG compression of a certain level.

In 2021, Wu et al. proposed a GAN-based attack, JRA-GAN, to restore JPEG compressed images (Wu et al. 2021). The authors used a GAN to restore the high frequency information in the DCT domain, and remove the blocking effect caused by JPEG compression to fool JPEG compression detection algorithms. The authors showed that the attack can successfully restore gray-scale JPEG compressed images and fool traditional JPEG detection algorithms using hand-designed features, and CNN detectors.

4.3 Differences Between GAN-Based Anti-Forensic Attacks and Adversarial Examples

While attacks built from GANs and from adversarial examples both use deep learning to create anti-forensic countermeasures, it is important to note that the GAN-based attacks described in this chapter are fundamentally different from adversarial examples

As discussed in Chap. 16, adversarial examples operate by exploiting non-ideal properties of a victim classifier. There are several explanations that have been considered by the research community as to why adversarial examples exist and are successful. These include “blind spots” caused by non-ideal training or overfitting, a mismatch between features used by humans and those learned by neural networks when performing classification (Ilyas et al. 2019), and inadvertent effects caused by the locally-linear nature of neural networks (Goodfellow et al. 2014). In all of these explanations, adversarial examples cause a classifier to misclassify due to unintended behavior.

By contrast, GAN-based anti-forensic attacks do not try to trick the victim classifier into misclassifying an image by exploiting its non-ideal properties. Instead, they replace the forensic traces present in an image with a synthetic set of traces that accurately match the victim classifier’s model of the target trace. In this sense, they attempt to synthesize traces that the victim classifier correctly uses to perform tasks such as manipulation detection and source identification. This is possible because forensic traces, such as those linked to editing operations or an image’s source camera, are typically not visible to the human eye. As a result, GANs are able to create new forensic traces in an image without altering its content or visual appearance.

Additionally, adversarial example attacks are customized to each image under attack. Often, this is done through an iterative training or search process that must be repeated each time a new image is attacked. It is worth noting that GAN-based techniques for generating adversarial examples that have recently been proposed in the computer vision and machine learning literature (Poursaeed et al. 2018; Xiao et al. 2018). These attacks are distinct from the GAN-based attacks discussed in this chapter, specifically because they generate adversarial examples and not synthetic versions of features that should accurately be associated with the target class. For these attacks, the GAN framework is instead adapted to search for an adversarial example that can be made from a particular image in the same manner that iterative search algorithms are used.

4.4 Advantages of GAN-Based Anti-Forensic Attacks

There are several advantages to using GAN-based anti-forensic attacks over both traditional human-designed approaches and those based on adversarial examples. We briefly summarize these below. Open problems and weaknesses of GAN-based adversarial attacks are further discussed in Sect. 17.6.

  • Rapid and automatic attack creation: As discussed earlier, GANs both automate and generalized the traditional approach to designing anti-forensic attacks. As a result, it is much easier and quicker to create new GAN-based attacks than through traditional human-designed approaches.

  • Low visual distortion: In practice, existing GAN-based attacks tend to introduce little-to-no visually detectable distortion. While this condition is not guaranteed, visual distortion can be effectively controlled during training by carefully balancing the weight placed on the generator’s loss term controlling visual quality. This is discussed in more detail in Sect. 17.5. By contrast, several adversarial example attacks can leave behind visually detectable noise-like artifacts (Güera et al. 2017).

  • Attack only requires training once: After training, GAN-based attack can be launched on any image as long as the target class remains constant. This differs from attacks based on adversarial examples, which need to create a custom set of modifications for each image, often involving an iterative algorithm. Additionally, no image-specific parameters need to be learned, unlike for several traditional anti-forensic attacks.

  • Scalable deployment in realistic scenarios: Attack requires only to pass the image through a pre-trained convolutional generator. It launches very rapidly and can be deployed at scale. Additionally, since the attack is launched

  • Ability to attack images of arbitrary size: Because the attack is launched via a convolutional generator, the attack can be applied to images of any size. By contrast, attacks based on adversarial examples only produce attacked images of the same size as the input to the victim CNN, which is typically a small patch and not a full sized image.

  • Does not require advanced knowledge of the analysis region: One way to adapt adversarial attacks to larger images is to break the image into blocks, then attack each block separately. This however requires advanced knowledge of the particular blocking grid that will be used during forensic analysis. If the blocking grid used by the attacker and the detector do not align, or if the investigator analyzes an image multiple times with different blocking grids, adversarial example attacks are unlikely to be successful.

5 Training Anti-Forensic GANs

In this section, we give an overview of how to train an anti-forensic generator. We begin by discussing the different terms of the loss function used during training. We shown how each term of the loss function is formulated in the perfect knowledge scenario (i.e., when a white box attack is launched). After this, we show how to modify the loss function to train the attack to create black box attacks in limited knowledge scenarios, as well transferable attacks in the zero knowledge scenario.

5.1 Overview of Adversarial Training

During adversarial training, the generator G used in GAN-based attack should be incentivized to produce visually convincing anti-forensic attacked images that can fool the victim forensic classifier C, as well as a discriminator D if it is used. These goals are achieved by properly formulating the generator’s loss function \(\mathcal {L}_G\). A typical loss function for adversarially training an anti-forensic generator consists of three major loss terms: the classification loss \(\mathcal {L}_C\), the adversarial loss \(\mathcal {L}_A\) (if a discriminator is used), and perceptual loss \(\mathcal {L}_P\),

$$\begin{aligned} \mathcal {L}_{G}=\alpha \mathcal {L}_P+ \beta \mathcal {L}_C+ \gamma \mathcal {L}_A, \end{aligned}$$
(17.2)

where \(\alpha \), \(\beta \), \(\gamma \) are weights to balance the performance of the attack and the visual quality of the attacked images.

Like all anti-forensic attacks, the generator’s primary goal is to produce output images that fool a particular forensic algorithm. This is done by introducing a term into the generator’s loss function that we describe as the forensic classification loss, or classification loss for short. Classification loss is the key element to guide the generator to learn the forensic traces of the target class. Depending on the attacker’s access to and knowledge level of the forensic classifier, particular strategies may need to be adopted during training We will elaborate on this later in this section. Typically, the classification loss is defined as 0 if the output of the generator is classified by the forensic classifier as belonging to the target class t and 1 if the output is classified as any other class.

If the generator is trained to fool a discriminator as well, then the adversarial loss provided by the feedback of the discriminator is consolidated into the total loss function. Typically, the adversarial loss is 0 if the generated anti-forensically attacked images fools the discriminator and 1 if not.

At the same time, the anti-forensic generator should introduce minimal distortion into the attacked image. Some amount of distortion is acceptable, since in an anti-forensic setting, the unattacked image will not be presented to the investigator. However these distortions should not undo or significantly alter any modifications that were introduced by the attacker before the image is passed through the anti-forensic generator. To control the visual quality of anti-forensically attack images, the perceptual loss measures the pixel-wise difference between the image under attack and the anti-forensically attacked image produced by the generator. Minimizing the perceptual loss during the adversarial training process helps to ensure that attacked images produced by the generator contain minimal, visually imperceptible distortions.

When constructing an anti-forensic attack, it is typically advantageous to exploit as much knowledge of the algorithm under attack as possible. This holds especially true with GAN-based attacks, which directly integrates a forensic classifier into training the attack. Since gradient information from a victim forensic classifier is needed to compute the classification loss, different strategies must be adopted to train the anti-forensic generator depending on the attacker’s knowledge about the forensic classifier under attack. As a result, it is necessary to provide more detail about the different knowledge levels that an attacker may have of the victim classifier before we are able to completely formulate the loss function. We note that the knowledge level typically only has an influence on the classification loss. The perceptual loss and discriminator loss remain unchanged for all knowledge levels.

5.2 Knowledge Levels of the Victim Classifier

As mentioned previously, we refer to the forensic classifier under attack as the victim classifier C. Depending on the attacker’s knowledge level of the victim classifier, it is common to categorize the knowledge scenarios into the perfect knowledge (i.e., white box) scenario, the limited knowledge (i.e., black box) scenario, and the zero knowledge scenario.

In the perfect knowledge scenario, the attacker has full access to the victim classifier or an exact identical copy of the victim classifier. This is an ideal situation for the attacker. The attacker can launch a white box attack in which they train directly against the victim classifier that they wish to fool.

In many other situations, however, the attacker has only partial knowledge or even zero knowledge of the victim classifier. Partial knowledge scenarios may include a variety of specific settings in real life. A common aspect is that the attacker does not have full access or control over the victim classifier. However, the attacker is allowed to probe the victim classifier as a black box. For example, they can query the victim classifier using an input, then observe the value of its output. Because of this, the partial knowledge scenario is also referred to as the black box scenario. This scenario is a more challenging, but potentially more realistic situation for the attacker. For example, a piece of forensic software may or online service may be encrypted to reduce the risk of attack. As a result, its users would only be able to provide input images and observe the results that the system output.

An even more challenging situation is when the attacker has zero knowledge of the victim classifier. Specifically, the attacker not only has no access to the victim classifier, but also the attacker is not allowed to observe the victim classifier by any means. This is also a realistic situation for the attacker. For example, an investigator may develop private forensic algorithms and keep this information classified for security purposes. In this case, we assume that the attacker knows what the victim classifier’s goal is (i.e., manipulation detection, source camera model identification, etc.). A broader concept of zero knowledge may include the attacker not knowing if a victim classifier exists, or what the specific goals of the victim classifier are (i.e., the attacker will not know anything about an image will be analyzed). Currently, however, this is beyond the scope of existing antiforensics research.

In both the black box and the zero knowledge scenarios, the attacker cannot use the victim classifier C to directly train the attack. The attacker needs to modify the classification loss in (17.2) such that a new classification loss is provided by other classifiers that can accessed during attack training. We note that the perceptual loss and the adversarial loss usually remain the same for all knowledge scenarios, or may need trivial adjustment based on specific cases. The main change, however, is the formulation of the classification loss.

5.3 White Box Attacks

We start by discussing the formulation of each loss term in the perfect knowledge scenario, i.e., when creating a white box attack. Each term in the generator’s loss is defined below. The perceptual loss and the adversarial loss will remain the same for all subsequent knowledge scenarios.

  • Perceptual Loss: The perceptual loss \(L_P\) can be formulated in many different ways. The mean squared error (\(L_2\) loss) or mean absolute difference (\(L_1\) loss) between an image before and after attack are the most commonly used formulations, i.e.,

    $$\begin{aligned} \mathcal {L}_P = \ \tfrac{1}{N} \Vert I-G(I) \Vert _p, \end{aligned}$$
    (17.3)

    where N is the number of pixels in the reference image I and anti-forensically attacked image G(I), \(\Vert \cdot \Vert p\) denotes the p norm and p equals to 1 or 2. Empirically, the \(L_1\) loss usually yields better visual quality for GAN-based anti-forensic attacks. This is potentially because the \(L_1\) loss penalizes small pixel differences comparatively more than the \(L_2\) loss, which puts greater emphasis on penalizing large pixel differences.

  • Adversarial Loss: The adversarial loss \(\mathcal {L}_A\) is used to ensure the anti-forensically attacked image can fool the discriminator D. Ideally, the discriminator’s output of the anti-forensically attacked images is 1 when the attacked images fool the discriminator, 0 otherwise. Therefore, the adversarial loss is typically formulated as the sigmoid cross-entropy between the discriminator’s output of the generated anti-forensically attacked image and 1.

    $$\begin{aligned} \mathcal {L}_A=\log (1- D(G(I))), \end{aligned}$$
    (17.4)

    As mentioned previously, a separate discriminator is not always used when creating a GAN-based anti-forensic attack. If this is the case, then the adversarial loss term can be discarded.

  • Classification Loss: In the perfect knowledge scenario, since the victim classifier C is fully accessible by the attacker, the victim classifier can be directly used to calculate the forensic classification loss \(\mathcal {L}_C\). Typically this is done by computing the softmax cross-entropy between the classifier’s output and the target class t, i.e.,

    $$\begin{aligned} \mathcal {L}_C = - \log ( C ( G (I) )_t ) \end{aligned}$$
    (17.5)

    where \(C(G(I))_t\) is the softmax output of victim classifier at location t.

5.4 Black Box Scenario

The perfect knowledge scenario is often used to evaluate the baseline performance of an anti-forensic attack. However, it is frequently unrealistic to assume that an attacker would have full access to the victim classifier. More commonly, the victim classifier may be presented as a black box to the attacker. Specifically, the attacker does not have full control over the victim classifier, nor do they have enough knowledge of its architecture to construct a perfect replica. As a result, the victim classifier can not be used to produce the classification loss as formulated in the perfect knowledge scenario. The victim classifier can, however, be probed by the attacker as a black box, then the output can be observed. This is done by providing it input images, then recording how the victim classifier classifies those images. The attacker can use this information and modify the classification loss to exploit it.

One approach is to build a substitute network \(C_s\) to mimic the behavior of the victim classifier C. The substitute network is a forensic classifier built and fully controlled by the attacker. Ideally, if the substitute network is trained properly, it perfectly mimics the decisions made by the victim classifier. The substitute network can then be used to adversarially train the generator instead of the victim classifier by reformulating the classification loss as

$$\begin{aligned} \mathcal {L}_{C} = - \log ( C_s( G (I) )_t ) \end{aligned}$$
(17.6)

where \(C_s(G(I))_t\) the softmax output of the substitute network \(C_s\) at location t.

When training a substitute network, it is important that it is trained to reproduce the forensic decisions produced by the victim classifier even if these decisions are incorrect. By doing this, the substitute network can approximate the latent space in which the victim classifier encodes forensic traces. For an attack to be successful, it is important to match this space as accurately as possible. Additionally, there are no strict rules guiding the design of the substitute network’s architecture. Research suggests that as long as the substitute network is deep and expressive enough to reproduce the victim classifier’s output, a successful black box attack can be launched. For example, Chen et al. demonstrated that successful black box attacks against source camera model identification algorithms can be trained using different substitute networks can built primarily from residual blocks and dense blocks (Chen et al. 2019).

5.5 Zero Knowledge

In the zero knowledge scenario, the attacker has no access to the victim classifier C. Furthermore, the attacker cannot observe or probe the victim classifier like in black box scenario. As a result, the substitute network approach described in the black box scenario does not translate to the zero knowledge scenario. In this circumstance, the synthetic forensic traces created by the attack must be transferable enough to fool the victim classifier.

Creating transferable attacks is challenging and remains an open research area. One recently proposed method of achieving attack transferability is to adversarially train the anti-forensic generator against an ensemble of forensic classifiers. Each forensic classifier \(C_m\) in the ensemble is built and pre-trained for a forensic classification by the attacker and acts as a surrogate for the victim classifier. For example, if the attacker’s goal is to fool a manipulation detection CNN, then each surrogate classifier in the ensemble should be trained to perform manipulation detection as well. These surrogate classifiers, however, should be as diverse as possible. Diversity can be achieved by varying the architecture of the surrogate classifiers, the classes that they distinguish between, or other factors.

Together, these surrogate classifiers can be used to replace the victim classifier and compute a single classification loss in the similar fashion as in the white box scenario. The final classification loss is formulated as a weighted sum of individual classification losses pertaining to each surrogate classifier in the ensemble, such that

$$\begin{aligned} \mathcal {L}_{C}= \sum _{m=1}^{M} \beta _m \mathcal {L}_{C_m} \end{aligned}$$
(17.7)

where M is the number of surrogate classifiers in the ensemble, \( \mathcal {L}_{C_m}\) corresponds to individual classification loss of the \(m\mathrm{th}\) surrogate classifier, \(\beta _m\) corresponds to the weight of \(m\mathrm{th}\) individual classification loss.

While there exists no strict mathematical justification for why this approach can achieve robustness, the intuition is that each surrogate classifier in the ensemble learns to partition the forensic feature space into separate regions for the target and other classes. By defining the classification loss in this fashion, the generator is trained to synthesize forensic features that lie in the intersection of these regions. If a diverse set of classifiers are used to form the ensemble, this intersection will likely lie inside the decision region that other classifiers (such as the victim classifier) associate with the target class.

6 Known Problems with GAN-Based Attacks & Future Directions

While GAN-based anti-forensic attacks pose a serious threat to forensic algorithms, this research is still in its very early stages. There are several known shortcomings of these new attacks, as well as open research questions.

One important issue facing GAN-based anti-forensic attacks is transferability. While these attacks are strongest in the white box scenario where the attacker has full access to the victim classifier, this scenario is the least realistic. More likely, the attacker knows only the general goal of the victim classifier (i.e., manipulation detection, source identification, etc.) and possibly the set of classes this classifier is trained to differentiate between. This corresponds to the zero knowledge scenario described in Sect. 17.5.5. In this case, the attacker must rely entirely on attack transferability to launch a successful attack.

Recent research has shown, however, that achieving attack transferability against forensic classifiers is a difficult task (Barni et al. 2019; Zhao and Stamm 2020). For example, small implementation differences between two forensic CNNs with the same architecture can prevent a GAN-based white box attack trained against one CNN from transferring to the other nearly identical CNN (Zhao and Stamm 2020). These differences can include the data used to train each CNN or how each CNN’s classes are defined, i.e., distinguish unmanipulated versus manipulated (binary) or unmanipulated versus each possible manipulation (multi-class). For GAN-based attacks to work in realistic scenarios, new techniques must be created for achieving transferrable attacks. While the ensemble-based strategy training discussed in Sect. 17.5.5 is a new way to create transferrable attacks, there is still much work to be done. Additionally, it is possible that there are theoretical limits on attack transferability. As of yet, these limits are unknown, but future research on this topic may help reveal them.

Another important topic that research has not yet addressed is the types of forensic algorithms that GAN-based attacks are able to attack. Currently, anti-forensic attacks based on both GANs and adversarial examples are targeted against forensic classifiers. However, many forensic problems are more sophisticated than simple classification. For example, state-of-the-art techniques to detect locally falsified or spliced content are use sophisticated approaches built upon Siamese networks, not simple classifiers (Huh et al. 2018; Mayer and Stamm 2020, 2020). Siamese networks are able to compare forensic traces in two local regions of an image and produce a measure of how similar or different these traces are, rather than simply providing a class decision. Similarly, state-of-the-art algorithms built to localize fake or manipulated content also utilize Siamese networks during training or localization (Huh et al. 2018; Mayer and Stamm 2020, 2020; Cozzolino and Verdoliva 2018). Attacking these forensic algorithms is much more challenging than attacking comparatively simple CNN-based classifiers. It remains to be seen if and how GAN-based anti-forensic attacks can be constructed to fool these algorithms.

At the other end of the spectrum, little-to-no work currently exists aimed specifically at detecting or defending against GAN-based anti-forensic attacks. Past research has shown that traditional anti-forensic attacks often leave behind their own forensically detectable traces (Stamm et al. 2013; Piva 2013; Milani et al. 2012; Barni et al. 2018). Currently, it is unknown if these new GAN-based attacks leave behind their own traces, or what these traces may be. Research into adversarial robustness may potentially provide some protection against GAN-based anti-forensic attacks (Wang et al. 2020; Hinton et al. 2015). However, because GAN-based attacks operate differently than adversarial examples, it is unclear if these techniques will be successful. New techniques may need to be constructed to allow forensic algorithms to correctly operate in the presence of a GAN-based anti-forensic attack. Clearly, much more research is needed to provide protection against these emerging threats.