1 Introduction

In recent years, with the development of computer technology and the graphics processor unit (GPU), many artificial intelligence (AI) technologies have emerged, especially in machine learning (ML). As a crucial branch of ML, deep learning (DL) has dominated research on large-scale data solutions. Moreover, as a novel research field of ML, DL can extract valuable information from massive data. Since it was proposed in 2006, DL has become one of the most salient areas in the academic field (Schmidhuber 2015). Considering that the training of a large-scale DL model requires high-performance computer systems, using a GPU instead of a central processing unit (CPU) for training is an effective means to accelerate the efficiency of DL and reduce the complexity of training. For example, in image processing, speech recognition, and human society and industry, the DL model has achieved impressive performance. DL has appealed to a wide range of researchers since AlexNet was proposed at the Conference and Workshop on Neural Information Processing Systems (NIPS) in 2012, it not only employs a convolutional neural network (CNN) but also fully exploits emerging GPU technologies (Krizhevsky et al. 2012). Similar to training a large-scale network like AlexNet, many large-scale problems such as image processing and audio generation need to be solved in numerous real-world applications, and ML and DL algorithms have increasingly shown the ability to solve such problems more effectively than other methods (Navidan et al. 2021).

The supervised learning and unsupervised learning are considered a major classification case for ML based on whether labeled data are used for classification. Supervised learning allows clear labeling of the data as input and the results are predicted as correct or incorrect. Unsupervised learning, however, does not require data labeling and its target attributes are not present (Hatcher and Yu 2018). The current mainstream supervised learning models are classification and regression, while clustering, downscaling and density estimation are generally considered as unsupervised learning. In the training process of the abovementioned models, the collection of labels is sometimes costly and problematic. Many underlying data in the real world are not easily accessible, and it is difficult to obtain relevant data for a specific problem, so the use of traditional ML models sometimes does not meet the challenges of data volume and parameters. Amidst this data label imbalance, data acquisition difficulties, and parameter challenges, the deep generation model can generate new data that fit the probability distribution of input data without changing the nature of the original data. This process can significantly improve the performance of the algorithm and generate enough false data.

Classical generation models are usually based on maximum likelihood estimation, the Markov chain, and approximate reasoning (Pan et al. 2019), such as such as the variational auto-encoder (VAE) (Kingma and Welling 2014) and the deep belief network (DBN) (Hinton et al. 2006) extended by the restricted Boltzmann machine. Most deep neural networks focus mainly on the generation ability and do not do as well in discriminant ability, mainly owing to the lack of simple discriminant rules or loss functions that can cover some criteria of the discriminant. Because any consideration of the behavior of either one is not sufficient, as one of the models of unsupervised learning, generative adversarial networks (GANs) based on generators and discriminators were proposed by Goodfellow et al. (2014). GANs provide a more concise framework using the idea of two-person games and aim to generate more convincing samples through adversarial learning. Another point to emphasize is that, compared with other generative models such as DBN, a GAN neither requires complex Markov chains nor has variable lower bounds, and if the discriminator is well trained, the generator can learn the distribution of the training samples perfectly. Based on the above advantages, GANs have been successfully applied to image generation (Brock et al. 2019; Wang et al. 2019), image translation (Isola et al. 2017; Kim et al. 2017), video prediction (Liu et al. 2019), and text-to-image generation (Zhang et al. 2017, 2019). However, large-scale optimization problems remain to be solved. To make the generated samples of higher quality, researchers have designed the network structure and parameters of GANs to be more complex. Gradient descent is sometimes not an effective strategy to reach the Nash equilibrium, and the large-scale, high-dimensional computation increases the instability of the model, which makes GANs more prone to gradient disappearance and mode collapse. Thus, in a simple adversarial framework, GANs still face challenges such as large-scale network parameters and complex network structures, which can lead to models that do not reach the maximum gain of optimization. In a practical sense, this may make the relevant images or texts generated by GANs unusable.

Evolutionary computation (EC) is considered effective in solving large-scale optimization problems and NP-hard problems. It is affected by the natural evolution process and aims to resolve optimization problems using human knowledge and biological evolution principles (Chen et al. 2016). In addition to its use in modeling, optimization, and design, EC has been increasingly used in recent years for various DL tasks to speed up the training of DL models and overcome parametric challenges, such as: the automatic exploration of neural network architectures and automatic exploration of parameters (Yao and Liu 1996). In a GAN, inappropriate parameter settings can degrade its performance and can even fail to produce any reasonable results, and manual exploration can make the training of a GAN more difficult. If EC is involved in the whole training of the GAN, the evolutionary process can automatically retain better-performing parameters and structures in the network for targeted training to achieve more complex adversarial training goals. Moreover, to achieve the Nash equilibrium in the training process faster and more effectively, EC can pressure the adversarial training process while avoiding local optima, resulting in a more asymptotic consistency of the GAN. When a GAN is applied to specific real-world scenarios, such as image style migration, super-resolution, image complementation, and denoising, EC enables the GAN to avoid many problems caused by the random distribution of noise vectors and difficulties in loss function design. In particular, to cope with the mode collapse, the survival of the fittest approach to fitness evaluation can set a benchmark in advance before or during training, and then, an iterative search can further enhance the generalization ability of the model. Evolutionary operators such as selection, crossover, and variation can make the gradient globally optimal in the process of descent, avoiding the gradient vanishing problem to some extent. In addition, the use of an evolutionary paradigm can overcome the inherent limitations of the traditional single adversarial training objective and preserve the best offspring based on different training objectives, which promotes the diversity of the generated samples and contributes to the advancement of GANs. Therefore, it is necessary to study how EC influences a GAN in terms of theoretical significance and practical significance.

This survey describes in detail GANs, EC, and the combination of the two. The more used methods in the studies mentioned are the genetic algorithm (GA), differential evolution algorithm (DE), co-evolution algorithm (CEA), evolutionary strategy (ES), and other algorithms such as particle swarm algorithm (PSO), evolutionary ensemble learning (EEL), and etc. Other scholars have combined the target number of the optimization problem to optimize a GAN based on EC, and the details of the above-mentioned studies are presented in Sect. 4. Based on the evolutionary GAN (E-GAN), where EC and GAN were first combined in 2018, this paper addresses the research implications of the EC-guided GAN. The main research questions in this survey are as follows:

  • RQ1: How is EC applied to the original GAN, and what is the specific performance of the relevant improvements based on the E-GAN and other GAN models driven by it? Also, for some specific ECs related to ML, how has GAN been combined with them and applied?

  • RQ2: Since most of the studies surveyed in this paper have adopted the basic ideas of GA and fewer studies have been conducted on DE and ES combined with GAN, they deserve to be discussed separately for their promising developments. Meanwhile, PSO and non-dominated sorting genetic algorithm-II (NSGAII) and so on are distributed in each section later in the discussion, and most of the models combined with them are shown in the application areas, so they are not discussed separately. How do the classic evolutionary processes of DE and ES specifically drive the evolution of GAN and enhance the stability of the GAN?

  • RQ3: How does neuroevolution, an important application of EC in neural networks, automatically guide a GAN in the exploration of parameters and network structure?

  • RQ4: Because EC is often applied to optimization problems and considering that there is sometimes more than one training objective for a GAN, the training of a GAN can be treated as an optimization problem. How the evolved GAN can be combined with the optimization problem to exhibit its unique characteristics is considered from two aspects: first, how to use EC to solve the optimization objective of a GAN; second, how to introduce the adversarial training process of a GAN into the evolution process of EC so that evolutionary algorithms (EAs) can better solve other multi-objective or many-objective optimization problems.

  • RQ5: How is the combination of a GAN and EC useful in tasks with realistic implications?

In order to make it easier for the reader to visualize the article structure, Table 1 shows whether RQs were answered in each section. Since each section covers papers associated with the RQ answered, here we only list the corresponding sections rather than papers. After searching for the keywords "Generative Adversarial Networks" and "Evolutionary Computation" in Google Scholar, we screened and sorted out relevant studies. The annual distribution of the literature is shown in Fig. 1.

Table 1 Correspondence of RQs with sections in this paper
Fig. 1
figure 1

Literature distribution

The structure of this paper is organized as follows: Sect. 2 introduces the definition of GANs, evaluation indicators, and the main problems. Then, the related presentation of EC models are provided in Sect. 3. Section 4 presents the evolution and variants of GANs based on EC, as well as its application and analysis in optimization problems and real-world issues. Based on the above summary and analysis, the future discussion and development direction of GANs combined with EC are presented in Sect. 5. Finally, Sect. 6 provides a summary of this survey.

2 GANs

Table 2 GANs notation

2.1 Definition

All the notations mentioned in this section are displayed in Table 2. GANs evolved from the thought of auto-encoder (AE), which includes two processes: encoding and decoding. Encoding refers to learning features from the original input data. Decoding refers to reconstructing the original input data with the learned features to obtain new data. The data can be considered as a new sample. When Gaussian noise is introduced in the encoding process, AE can be changed to VAE. However, no matter how similar the generated sample after VAE decoding is to the original one, there will definitely be an error between the generated sample and the original sample. Therefore, a generative model based on Fig. 2 is described.

Fig. 2
figure 2

Traditional generation model architecture

GAN was first proposed in Goodfellow et al. (2014). A GAN consists of two basic neural network models: a generator (defined as G) and a discriminator (defined as D). It works on the principle that the generator and the discriminator constantly play a zero-sum game between the two, with the ultimate goal that the samples generated by the generator are very similar to the original input data (Gui et al. 2020). Firstly, G uses a generation mechanism (usually passing in a Gaussian noise function) to generate a series of samples called “fake samples” from a pre-defined latent space (Patel et al. 2018). Then, G passes the fake samples and real samples into D, and D can determine how true the fake sample looks. After that, D can output 0 or 1 according to the authenticity of the sample, where “0” represents “fake” and “1” represents “true”. The meaning of adversarial training is that G makes D as far as possible to identify the samples generated by G as true, while D needs to identify the samples generated by G as false as much as possible. At the end of the training, D and G might achieve a nash equilibrium. Figure 3 presents the structure of the GAN.

Fig. 3
figure 3

GAN’s architecture

2.2 GAN’s training

In Eq. (1), z is the random noise vector, that is, the latent space input by the generator G, and x is the true data. The noise samples are sampled from \(z \sim P_z\) as the input of the generator, where \(P_z\) can be a uniform distribution or a normal distribution. After the sample passes through G, the output G(z), which obeys a \(P_g\) distribution. After adversarial training, this distribution can be closest to the original real data distribution \(P_{data}\) (Ghosh et al. 2020). At the same time, the discriminator network can be regarded as a binary classifier, which is used to distinguish between real data samples \(x \sim P_{data}(x)\) and \(G(z) \sim P_g (G(z))\). In the original GAN, this training process ultimately needs to achieve the following goals:

$$\begin{aligned} \mathop {min}\limits _{G}\mathop {max}\limits _{D} V(D, G) = {E_{x \sim P_{data}(x)}} [log(D(x))] + {E_{z \sim P_z}} [log(1-D(G(z)))] \end{aligned}$$
(1)

The log(D(x)) in the first term represents the judgment of D on true data, and the second term \(log(1-D(G(z)))\) represents the judgment on the result of data synthesis. The training goal of D is to divide the labels of training samples with the greatest probability, and the training goal of G is to minimize \(log(1-D(G(z)))\), that is, to maximize the loss of D. During the training process, the parameters of the network are updated by fixing G and D. Finally, G can estimate the distribution of sample data. Through such a minimax game, G and D are alternately optimized to train the required generative network and discriminant network, until the nash equilibrium is reached (Ruthotto and Haber 2021). Algorithm 1 displays the training process of the original GAN. In lines 2–6, D is trained for k steps according to a batch of m samples, and the optimal D after training is completed, such as Eq. (2) shown in.

$$\begin{aligned} D^*(x) = \frac{P_{data}(x)}{P_g(x) + P_{data}(x)} \end{aligned}$$
(2)

As shown in lines 7–8, train G by fixing D. In theory, when G is trained to the optimum, the distribution \(P_g(x)\) is close to \(P_{data}(x)\), and \(D^*(x)\) in Eq. (2) will become 1/2 at this time. When D is optimal, \(D^*(x)\) can be substituted into Eq. (1), and the loss function of G can be obtained as Eq. (3):

$$\begin{aligned} G^* = 2D_{JS}(P_{data}(x)||P_g(x)) - 2log2 \end{aligned}$$
(3)

At this point, the JS divergence between the generated data distrained the real data distribution has been minimized. An important reason for the success of GANs is that it overcomes the shortcomings of KL divergence asymmetry and the inability to distance (Sampath et al. 2021). The gradient descent strategy used in these two processes can be any stochastic gradient strategy, such as Adam optimizer or Momentum.

Algorithm 1
figure a

Training algorithm of original GANs

2.3 Common problems

Although training problems have been reduced to a certain level with the efforts of scholars, their existence still has a certain impact on the training results. The common problems are model collapse and gradient vanish, which are described below.

2.3.1 Mode collapse

Mode collapse is the most widely and most likely problem in the process of training GANs. A popular understanding of the problem of mode collapse is that although training can achieve convergence, the training results can only appear in a part of the given dataset, that is, the generator has reached a state where it can continuously generate samples of certain categories in the final stage of training. For example, after training the GANs model using the handwritten digit recognition dataset MNIST (LeCun et al. 1998), only one of ten Arabic numerals can be generated or only one style of face can be generated in the experiment of synthesizing human faces.

Since most current deep neural networks can only predict continuous distribution, a few current datasets are discontinuous distributions with discontinuous points. If the support of the target probability measure has multiple connected branches, and the GANs training obtains a continuous mapping, it is possible that the value range of the continuous mapping is concentrated on a certainly connected branch, which is the mode collapse (Thanh-Tung and Tran 2020). Figure 4 exhibits an example where we adopted the original GANs to train the MNIST dataset under the Pytorch framework. It can be found that when the training reaches the 200th generation, GANs have the emergence of mode collapse. After the 300th generation, the problem of model collapse completely emerges, then most numbers and even all numbers are 1.

Fig. 4
figure 4

Mode collapse in GANs

If the mode collapse problem in GANs is not improved, the model is bound to fail to achieve the desired objectives, so more relevant research requires to be exploited in the future.

2.3.2 Gradient vanish

The intrinsic probability distribution of the dataset and the implicit probability distribution defined by the generator are low-dimensional manifolds in high-dimensional data space, and there is almost no overlap (Arjovsky and Bottou 2017). Therefore, in deep neural networks, the closer to the output layer, the greater the parameter gradient. The parameters far from the output layer can only be learned at a very small rate at a gradient close to 0. This is equivalent to a vicious circle since the values of nodes close to the output layer are calculated forward by the previous layers with slow learning rates. Because the learning rate of the forward layer is slow, the parameters do not necessarily learn any characteristics, which indirectly leads to the strong input randomness of these back layers, which is equivalent to learning on random data. Even if the learning rate is faster, it is not necessarily possible to really learn useful features. So gradient vanish is more likely to occur in deep neural networks (Høye et al. 2021).

As a deep network model, when training the original GANs based on JS divergence, the gradient vanish is very easy to occur. The reason is that the difference between the data generated by the generator and the input real data is small, the Nash equilibrium between the generator and the discriminator is easy to be broken and leads to overfitting. When the loss of the discriminator is 0, the discriminator is trained to be very successful so that it can completely distinguish the real data and the generated data, and the generator often stops optimizing. At this time, the quality of the generated samples will no longer be improved, the training will reach a stagnation state, leading to the gradient disappearing. That is to say, the gradient vanish in GANs often occurs in the generator, which is one of the sources of instability in GANs training process (Fig. 5).

Fig. 5
figure 5

An example of gradient vanish in a training of original GANs

In order to avoid the occurrence of gradient vanish, it can be improved by modifying the loss function of GANs (Barnett 2018). Similarly, the selection of activation functions and regularization can also effectively resolve the gradient vanish (Wang et al. 2020). Researchers need to be vigilant when the loss remains constant during training of the GANs.

2.4 Evaluation

With the development of computer vision, most of the existing GANs are widely used in the field of image processing, the quality and diversity of generated images are two important aspects worth considering (Borji 2019). What we need is generally those indicators that can comprehensively evaluate the quality and diversity. Here we mainly introduce the four most commonly used indicators in previous literature: the Inception Score (IS), the Fréchet Inception Distance (FID), the Maximum Mean Discrepancy (MMD), and Sliced Wasserstein Distance (SWD).

2.4.1 IS

The Inception Score (IS) (Salimans et al. 2016) illustrated by Salimans et al., passes the generated samples into the trained Inception Network (Szegedy et al. 2016), and extracts features from them and uses classifiers for classification. If the quality of the generated sample is good, it should have a high degree of recognition, which is reflected in the classification results that it can be more accurately classified, and the probability of correct classification P(y|x) is expected to be greater. If expressed by entropy, the lower the entropy of P(y|x) is, the better the quality of the generated sample is. In order to ensure the diversity of generated samples, the entropy of P(y) reflects the diversity of generated images. The greater the entropy of P(y) is, the better the diversity is. The IS can be obtained by combining the above two functions with KL divergence representation, and it should be emphasized that the larger the IS value, the better the quality and diversity of the generated samples (Chong and Forsyth 2020), the formula is as follows:

$$\begin{aligned} IS(G) = exp(E_{x \sim P_g}D_{KL}(p(y|x)||p(y))) \end{aligned}$$
(4)

where P(y|x) is the conditional probability distribution of the incoming data x estimated by the pre-trained network, and P(y) is the edge distribution. However, there is a problem that P(y) and P(y|x) are not independent of each other. When the mode collapse occurs, x and the generated y are completely independent, which also leads to IS cannot effectively resolve the problem of mode collapse. In addition, IS only considers \(P_g\) rather than \(P_{data}\), so IS focuses more on the clarity and diversity of the generated samples than the real data distribution. IS cannot reflect the distance between real data and samples.

2.4.2 FID

The Fréchet Inception Distance (FID) (Heusel et al. 2017) is superior to IS (Xu et al. 2018). Similar to IS, FID also uses a well-trained Inception Network. Different from IS, the hidden layer of the Inception Network is used to find the feature space of input samples, which can be understood as a continuous multivariate Gaussian space. This process considers the samples generated by the generator and the real dataset. The Inception Network is used to extract the features of the input samples, and then the Gaussian model is used to model the feature space that represents the advanced abstraction, the mean and variance of the features of the generated data and the real data are solved (Nunn et al. 2021). FID between these two distributions can be calculated by the following formula. It should be noted that low FID means high quality and diversity of generated samples.

$$\begin{aligned} FID(x,g) = ||\mu _x - \mu _g||_2^2 + Tr\left(\Sigma _x + \Sigma _g - 2\sqrt{\Sigma _x\Sigma _g}\right) \end{aligned}$$
(5)

where \((\mu _x, \Sigma _x)\) and \((\mu _g, \Sigma _g)\) are the average and covarianc of real data and generated data, respectively. Compared with IS, FID is more robust to noise, but it has not yet resolved the problem of overfitting on large-scale datasets, the method based on feature extraction can only be evaluated according to the presence or absence of features, but cannot evaluate the relative spatial position of features. FID is often used as a supplement to IS in papers about GANs, especially in diversity and mode collapse problems. FID has better evaluation performance, but it also has the same defects as IS, such as not being suitable for datasets with large internal differences and cannot distinguish overfitting (Obukhov and Krasnyanskiy 2020).

2.4.3 MMD

The maximum mean discrepancy (MMD) (Gretton et al. 2012) is an image quality evaluation metric that measures the distance between two different but related distributions. For two samples with different distributions, a continuous function f is needed in the sample space to find the mean of the samples with different distributions on f. The difference between the two means corresponds to the mean discrepancy of the two distributions on f. The MMD is obtained by choosing one of all continuous functions f such that the mean discrepancy can be maximized. The mathematical expression is as follows:

$$\begin{aligned} MMD(X,Y)=\left\Vert \frac{1}{n}\sum _{i=1}^n\Phi (x_i)-\frac{1}{m}\sum _{i=1}^m\Phi (y_i) \right\Vert \end{aligned}$$
(6)

where the data set \(X=[x_1, x_2,\ldots , x_n]\) satisfies the P distribution, the data set \(Y=[y_1, y_2,\ldots , y_m]\) satisfies the q distribution, and \(\Phi\) denotes the mapping from the original space to Hilbert space.

2.4.4 SWD

Sliced Wasserstein distance (SWD) (Arjovsky et al. 2017), also called MD distance, is used to measure the high-resolution GAN and is an estimate of the Wasserstein-1 distance between the real image and the generated image, calculated as a statistical approximation between local image blocks, which are extracted from the Laplace pyramid.

$$\begin{aligned} SWD(P_{real},P_{gen}) = \mathop {inf}\limits _{\chi \in \Pi (P_{real},P_{gen})} E_{(x_1,x_2)\sim \chi }[\Vert x_1-x_2 \Vert ] \end{aligned}$$
(7)

where \(P_{real}\) denotes the true data distribution, \(P_{gen}\) denotes the generated data distribution, and \(\Pi (P_{real},P_{gen})\) denotes the edge distributions of each distribution are \(P_{real}\) and \(P_{gen}\). For the joint distribution \(\chi\), from which the true sample \(x_1\) and the generated sample \(x_2\) are taken, the expectation of the distance is calculated and a lower bound is taken on the expectation, from which the SWD is obtained.

2.4.5 Other methods

In addition to the above-mentioned common indicators for evaluating GANs, another important aspect is the loss function part. Understanding the loss function can better analyze and grasp subsequent optimization tools (such as gradient descent, etc.) (Kodama 2018). In response to a series of problems in the traditional GANs model, many researchers traced the source to adjust the loss function of GANs, WGAN (Arjovsky et al. 2017), WGAN-GP (Gulrajani et al. 2017), LSGAN (Mao et al. 2017) and BEGAN (Berthelot et al. 2017) have all been proposed to improve GANs.

LSGAN implements the least squares loss function to replace the loss function in original GANs, and a smoother and non-saturated gradient loss function is used in the discriminator, which alleviates the problems of unstable GANs training, poor image quality, and insufficient diversity. Adopting least squares can make the distribution of generated samples as close to the decision boundary as possible. The loss function is defined as follows (Mao et al. 2017):

$$\begin{aligned} \begin{aligned} \mathop {min}\limits _{D}V_{LSGAN}(D)&= \frac{1}{2}E_{x \sim P_{data}(x)}[(D(x)-b)^2] + \frac{1}{2}E_{z \sim P_z(z)}[(D(G(z))-a)^2]\\ \mathop {min}\limits _{G}V_{LSGAN}(G)&= \frac{1}{2}E_{z \sim P_z(z)}[(D(G(z))-c)^2] \end{aligned} \end{aligned}$$
(8)

while minimizing the objective function, it also minimizes the Pearson divergence (Anas et al. 2020), which makes the learning process more stable.

WGAN completely resolves the instability of GANs training, basically solves the problem of mode collapse, and ensures the quality and diversity of generated samples. Specifically, when there is almost no overlap between the distribution of real data and generated data, the gradient disappears. In order to avoid this problem, Wasserstein distance is proposed as a new loss function in WGAN, which is also called Earth-Mover (EM) distance (Chandna et al. 2019). The formula is as follows:

$$\begin{aligned} W(P_{data},P_g) = \mathop {inf}\limits _{\gamma \sim \Pi (P_{data},P_g)}E_{(x,y) \sim \gamma }[||x-y||] \end{aligned}$$
(9)

for the joint distribution \(\gamma\) combined by each \(P_{data}\) and \(P_g\), a real sample x and a generated sample y can be obtained by sampling \((x,y) \sim \gamma\), then the sample distance can be calculated. Thus, the expected value of the sample for the distance under the joint distribution can be computed, and the Wasserstein distance is defined after taking the lower bound. Its advantage is that even if the two distributions are not overlapped, the Wasserstein distance can still reflect their distance (Lei 2020). But it needs to satisfy a Lipschitz continuity condition, that is, Wasserstein distance limits the gradient weight of the discriminator, so that it is limited to a certain range, usually not greater than constant K. After meeting this continuity constraint, the new loss function in WGAN is as follows:

$$\begin{aligned} \mathop {min}\limits _{G}\mathop {max}\limits _{D}V(D,G) = E_{x \sim P_{data}(x) }[D(x)] - E_{z \sim P_z(z)}[D(G(z))] \end{aligned}$$
(10)

Compared with the original GANs, the new loss function in WGAN has no log function. Aiming at the problem that the Lipschitz continuity condition in WGAN may limit the fitting ability of neural networks, WGAN-GP improves the continuity constraint and adopts gradient penalty (GP) to meet this continuity. The improvement is to introduce a regularization term \(\lambda E_{x \sim \chi }[|| \nabla _x D(x) ||_p -1]^2\) into the loss function of WGAN, which is GP in WGAN-GP, namely gradient constraint. This constraint means that the L2 norm of the critic relative to the gradient of the original input is constrained near 1. Experiments show that gradient penalty can significantly improve the training speed and solve the problem of slow convergence of the original WGAN (Jin et al. 2020).

Previous GANs and their variants wanted the data generated by the generator to be as close as possible to the real data distribution, so from this point, researchers designed various loss functions to make the generated data distribution of G as close as possible to the real data distribution. BEGAN replaces this estimation probability distribution method, which does not directly estimate the gap between \(P_g\) and \(P_{data}\), but estimates the distance between the errors of the distribution. The authors believe that as long as the error distribution of the distribution is similar, it can also be considered that these distributions are similar (Luo et al. 2020). Its discriminator exists as an encoder, the input is the sample, and the output is the encoded sample. It proposes a new concept that if A distribution is similar to B distribution and B distribution is similar to C distribution, then A distribution is similar to C distribution. A is equivalent to the training data x, B is equivalent to the image D(x) after D encodes and decodes x, and C is equivalent to the result D(G(z)) when D generates G as input. The convergence of the advanced model is significantly improved.

3 Evolutionary computation (EC)

Many important and intricate computational problems in the real world can be interpreted as a class of optimization problems, and these problems often become difficult to optimize due to the constraints of existing conditions, which can be called NP-hard problems. The traditional polynomial-based optimization method cannot adapt to so complex NP-hard problems that are difficult to optimize. Therefore, EC has a wide range of influences in this regard.

EC (Fogel 1995) is an emerging algorithm influenced by the evolutionary strategy of survival of the fittest and the principles of genetics. EC algorithm usually includes initialization, mutation, crossover, and selection. Each iteration is an evolutionary process, the quality of the population is continuously improved in this process. The optimal solution is gradually approached through natural evolution and repeated iterations (Eiben and Smith 2015). In recent years, it has been widely used in the field of DL, then has shown satisfactory performance in parameter optimization and architecture search of neural networks. Recent papers that combine EC with DL summarize the integration of these two mechanisms, which have proven to have high potential in solving problems in the real world (Bharti et al. 2020; Bernard and Leprévost 2019).

In the statistical literature, the algorithms such as GA, DE, CEA, ES, PSO, NSGAII, EEL, etc. combined with GANs is more general, so we focus on these algorithms.

3.1 Genetic algorithms (GAs)

As one of the most widely used algorithms in EC, GAs are based on biological natural selection and genetic concepts (Mirjalili et al. 2019). The classical GAs first encodes the parameters to generate a certain number of individuals to form the initial population. Each individual can be a one-dimensional or multi-dimensional vector, which is frequently represented by a binary string, called chromosome, and each binary position in the chromosome is represented as a gene. Then the individuals in the population use selection, crossover and mutation operators to search the optimal solution. Then the fitness function is designed as the standard to judge the performance of each individual, the individuals with good performance are selected as parents to generate a new population with a certain probability. After that, the algorithm iterates through the loop until the termination condition is met. The flow chart of GAs is displayed in Fig. 6. Later, many scholars have done a lot of research on GAs and proposed various improved algorithms to enhance the convergence speed and accuracy.

Fig. 6
figure 6

GAs’ flow chart

3.2 Differential evolution (DE)

As a classical EA, DE was proposed in 1997 by Rainer Storn and Kenneth Price on the basis of evolutionary ideas such as GAs (Price et al. 2006), and is essentially a multi-objective (continuous variable) optimization algorithm that is used to solve the overall optimal solution in a multidimensional space. DE is similar to GAs in that they generate initial populations randomly and use the fitness value of each individual in the population as the selection criterion, and the main process includes three steps: variation, crossover and selection. In contrast, the variance vector of DE is generated from the parent differential vector, and crosses with the parent individual vector to generate a new individual vector, which is directly selected with its parent individual. Obviously, the approximation effect of DE is more significant compared with GA.

Therefore, as a specific EA different from GAs, it is meaningful to discuss DE combined optimization with GANs. So for one of the parts of RQ2, we make a relevant answer. The above-mentioned relevant features of DE can be used to optimize the input noise vector of GANs model to solve the mapping problem. This approach can indirectly act on the generator. Meanwhile, in the field of image processing, the processing of the input image of the generator can effectively alleviate the mode collapse. This independent evolutionary process can make the input image of the generator better and reduce the training pressure of the model, which is of great social importance at the practical application level. The further discussion of the combination of DE and GANs are shown in Sect. 4.2.

3.3 Co-evolution algorithms (CEAs)

CEAs is an important aspect of EC, which refers to the simultaneous evolution of two different species in a competitive or cooperative manner. The population in CEAs consists of multiple sub-populations, each with its own evolutionary process. In the evolutionary process, individuals gradually optimize themselves through competition or cooperation. In the competition process, the evaluation of fitness and selection among individuals lead to the gradual elimination of weaker individuals, so that better individuals are retained. In the cooperation process, information exchange and sharing among individuals can drive the whole population to approach the more optimal solution (Ma et al. 2018).

CEAs are often applied to large-scale optimization or combinatorial optimization problems, especially in the field of machine learning. For example, considering GANs with large-scale parameters, the optimal combination of parameters can be searched relatively quickly through the competition and cooperation of multiple sub-populations. Therefore, the discussion on CEAs in GANs is necessary.

3.4 Evolutionary strategy (ES)

ES is mainly used to resolve black-box optimization problems related to parameter problems. In the process of repeated iterations, it is necessary to constantly adjust a Gaussian change with zero mean and a certain variance to search (Salimans et al. 2017), to produce new individuals while retaining better individuals. It is a non-gradient stochastic optimization algorithm. In the process of evolution, two kinds of information can be inherited. One is to record the mean value of all positions, and the other is to record the varying intensity of this mean. Therefore, this kind of information is based on numerical values. The optimized strategy can play an important role in the subsequent GANs involving the automatic adjustment of neural network parameters. Unlike GAs, ES’s genes are real values rather than coded forms, and the operations in the mutation process are also slightly different.

ES can generally be divided into the following two categories according to the specific values (Altamirano et al. 2015):

  • \((1+1)-ES\): Under this strategy, there is only one parent in each iteration, and one parent can only produce one child. The fitness function is compared with the parent, and the good ones can be retained as the parent of the next iteration, otherwise they are directly discarded. The corresponding distribution parameters will be adjusted together.

  • \((\mu ,\lambda )-ES\): In this strategy, each iteration of the parent population will produce \(\lambda\) new solutions. By comparing with the parent population, the better one will become the parent population of the next iteration, and the other solutions will be discarded while adjusting the corresponding distribution parameters.

According to different selection methods, ES can be divided into \((\mu ,\lambda )-ES\) and \((\mu + \lambda )-ES\), in essence, the two are the same. Among the above two methods, the latter is mostly applied to multi-objective optimization problems.

3.5 Other algorithms

The algorithms proposed by using evolutionary ideas have derived many variants, as a classical algorithm, PSO plays a greate advantage in the optimization of GANs. The idea of PSO originates from the simulation of the foraging process of a flock of birds, and the solution domain of the optimization problem is analogous to the flight space of a bird, and through the collaboration and information sharing among individuals in the population, the movement of the whole flock gradually becomes orderly, so as to find the optimal solution. The advantage of PSO is that it is simple to implement and has fewer parameters (Jain et al. 2022). Currently, researchers usually use PSO’s excellent search capability for neural network architecture optimization, image generation tasks or applications in power systems. PSO plays an important role in the following sections.

EEL is the application of EC in ensemble learning and the combination of evolutionary learning and ensemble learning (Obo et al. 2016). In ML, the generalization ability or robustness of a single learner is poor, so many researches combine multiple learners with certain strategies to form an integrated model, and select a set of optimal sub-learners to improve the efficiency of the learning system under the premise of guaranteeing the performance of the learning system. However, the conflict between the diversity and robustness of different learners in integrated learning, etc. is easy to arise, which can generally be solved by adjusting the learner structure or parameters, which is obviously an optimization problem. Considering that EC has the ability of wide applicability, robustness, and global optimization, some scholars combine integrated learning with EC to form EEL.

In order to solve MOPs, researchers have proposed NSGAII based on Pareto dominance relations. Unlike GA, NSGAII is stratified based on the dominance relations of individuals before executing the selection, and the crossover and mutation operators are not much different from GA. As an improvement to NSGA, NSGAII has a significantly wider range of applications. In NSGAII, a fast non-dominated sorting algorithm is added, which reduces the complexity of computing the non-dominated order. The elite strategy is introduced to expand the sampling space and improve the accuracy of the optimization results. Meanwhile, the introduction of crowding degree and crowding degree comparison operator not only overcomes the defect of needing to artificially specify the sharing parameter in NSGA, but also takes the crowding degree as the comparison criterion between individuals in the population, so that the individuals of the population in the quasi-Pareto domain can be uniformly extended to the whole Pareto domain, thus ensuring the diversity of the population (Deb et al. 2002). Its excellent selection idea is widely used to solve the later GAN model optimization problems.

In the summary tables in each of the later sections, we list the EA used by each study. Here, we have counted the EA used by the studies listed in this paper. Table 3 shows the correspondence between the EAs and the number of literature. It can be noticed that the largest number of literature used the NSGAII to carry out the studies, followed by CEA and GA, and a discussion of DE and ES can be found in the later sections. Although we do not discuss all the algorithms involved in more depth, it is necessary to give the reader an idea of their main applications. In addition, many studies used both EAs.

Table 3 Correspondence of different EAs with the number of literature

3.6 Evolutionary thought in optimization problems

In optimization problems, decision variables, objective functions, and constraints are the three major factors. According to whether the optimal solution is unique, the optimization problem can be divided into single-objective optimization problems, multi-objective optimization problems, and many-objective optimization problems. Among them, single-objective optimization problem is the simplest and can be solved by ordinary optimization methods. The multi-objective optimization problems are optimized by two to three objective functions, and many-objective optimization problems are optimized by more than three objectives (Liang et al. 2020). Especially in the latter two, the EA makes the optimization problem maintain the diversity of individuals while ensuring the global search for the optimal solution.

One point to note is that due to the complexity of training GANs, their combination with single-objective optimization problems is very limited in practical applications. Moreover, the single-objective approach is hardly adopted by researchers in applications related to GANs, and we only need to know the specific classification of the optimization problems, and problem-specific analysis is needed for the related problems involving GANs. Therefore, only an overview of multi-objective and many-objective optimization problems is presented in this survey.

3.6.1 Multi-objective optimization problems (MOPs)

In MOPs, due to internal conflicts between objectives, it is impossible for each objective to achieve the optimal at the same time. The optimization of one objective is often at the expense of the degradation of other objectives, so the only optimal solution is difficult to appear. Instead, it is to obtain a set of optimal solutions, namely, pareto optimal solutions, so that the objective is optimal as a whole (Deb 2011). The images mapped to the target space by these solutions are called Pareto Front (PF). MOPs can be defined as (Deb and Blank 2021):

$$\begin{aligned} minF(x) = [f_1(x), f_2(x),\ldots , f_m(x)]^T, x \in \Omega \end{aligned}$$
(11)

where m is the number of objective functions and \(\Omega\) is decision space. x is an n-dimensional decision variable, that is, \(x=[x_1,x_2,\ldots ,x_n] \in \Omega\), it includes possible solutions to the problem. For the set of m-dimensional objective functions \(F: \Omega \rightarrow R^m\), it matches the n-dimensional decision space and the m-dimensional objective space. In optimization, it is hopeful to find a set of solutions that balance all objectives well and approach PF as close as possible, while increasing the diversity of solutions as possible. At present, algorithms used to solve MOPs can be divided into three categories, namely, algorithms based on pareto dominance, algorithms based on performance indicators, and algorithms based on decomposition.

3.6.2 Many-objective optimization problems (MaOPs)

Practical application problems in many fields can be modeled as four or more objective optimization problems, which are called MaOPs. In the face of the dimension curse problem, the solution of MaOPs may be more difficult than MOPs. The main reasons are as follows:

  • With the increase of the number of objectives, the number of Pareto optimal solutions increases exponentially. However, the existing dominant selection methods cannot efficiently select the representative solution set with real potential from the huge Pareto optimal solution set.

  • The increase in the number of targets leads to an exponential increase in the target space, and the curse of dimensionality is gradually emerging.

The purpose of solving MaOPs is the same as solving MOPs, that is to find a set of compromise solutions for each goal, and decision-makers can choose any one or more solutions as the final solution. Due to the powerful search and optimization ability of EAs, EC is very suitable for solving dimension curse problems to avoid falling into local optimum. The EA used to solve this kind of problem has one more class on the basis of the above MOPs, namely, the improved Pareto dominance relation algorithm. At present, multi-objective evolutionary optimization methods and many-objective evolutionary optimization methods have great advantages in solving the automatic design problem of deep neural networks. Here, we mainly describe them in Sect. 4.

4 The evolution of E-GANs and their application

RQ1 to RQ5 can be explained one by one in this section. Each part of the several components of the GANs can be combined with EC to optimize and apply it to practical problems, including the structure, parameters, and loss function of generators and discriminators. This section mainly reviews the development and application of evolutionary GAN (E-GAN), which involves how relevant researchers improve the components of GANs, then summarizes its significance and ability to solve practical problems in the real world.

4.1 E-GAN and its different variants

This subsection answers the RQ1 mentioned in the introduction, there have been a large number of examples that attempt to apply EC to the field of DL. For example, Zhou et al. (2021) put forward a method based on EC to shallow the deep neural network (DNNs) at the block level. In the use of multi-objective optimization technology to reduce the number of blocks, a new priori knowledge integration strategy was proposed to improve the exploration ability of evolutionary search, and knowledge distillation technology was adopted to improve performance. The experiment showed that the speed of DNNs reasoning after using the new method was significantly improved. Junior and Yen (2019) proposed a new method named PSOCNN based on PSO to automatically search the structure of DCNN, in which a new direct coding strategy and speed operator were added on the basis of the original PSO. Experiments showed that the presented method could quickly find a good neural network architecture. With the change and continuous development of technology, there are many other applications. In particular, the combination of the recently emerging GANs and EC to improve the network and resolve some large-scale optimization problems is the focus of many researchers.

4.1.1 E-GAN

Many models in DL can be used as a black box model. One of the most prominent problems in training deep networks is the uncertainty of globally optimal solutions. Most of the current papers are based on the evolutionary idea of the generation confrontation network algorithm and then train and test the proposed algorithm according to certain experimental steps. However, before E-GAN was proposed, some researchers used EAs to analyze the convergence in GANs. Mandal et al. (2017) did some research in 2017. With the passage of time, GANs will continue to be in a dynamic state because the outputs of the two networks to each other are constantly updated. At the same time, due to the multi-peak complexity of the loss functions of the two networks, in the case of gradient descent, GANs will have a high probability to be stuck on the local minimum, even when the weights are trained for infinite time. Compared with the gradient-based training method, the EA, especially the differential evolution algorithm, can explicitly convert the loss function to work in the probability density function space of the generator (or the discriminator). Finally, it is proved by the mathematical theory that the proposed scheme can theoretically make the generative adversarial network converge. Therefore, it is theoretically feasible to design a hybrid gradient-based and evolutionary-based algorithm to train GANs or other networks. Based on previous theoretical analysis and the advantages of GANs:

  • The generator can be generated in parallel.

  • There is almost no constraint in the design of the generator.

So on the basis of GANs’ generator can be arbitrarily changed, adopting the relevant characteristics of EA, E-GAN is presented by Wang et al. (2019). Since the existing GAN and its variants are prone to instability and model collapse, Wang et al. applied the idea of the population to the distribution of generators, regarded the mutation operation of individuals in different generators as the result of different objective functions, and then each individual was updated according to the mutation operation. At the same time, a discriminator was equivalent to the environment, evolving the generator population. In addition, an evaluation mechanism was proposed to sort the samples generated by the generator according to fitness so that the best generator could be retained according to the quality and diversity principle, and the best offspring generated by different training objectives were always retained. The basic framework of E-GAN is shown in Fig. 7.

Fig. 7
figure 7

E-GAN’s framework

It should be noted that the training of E-GAN is a dynamic process, and the evaluation of the fitness function can only be compared in each iteration, that is, the training of the generator and discriminator is alternating, the fitness of the generator can only be evaluated by the corresponding discriminator in each generation. E-GAN overcomes the limitations of using only one objective function as an evaluation method in the original GANs and integrates the advantages of different objectives so that the most competitive generator is retained. In the experimental stage, three metrics IS, FID, and MMD are used to evaluate the quality of the generated image. Experiments prove that E-GAN has shown good performance in terms of architecture stability and spatial continuity (Wang et al. 2019) in a variety of indicators and different datasets, which effectively solves the problem of mode collapse and provides guidance for the combination of EA and GANs.

The three mutation operators used in E-GAN correspond to three different objective functions. The first mutation operation corresponds to the minimum and maximum mutation objective function of the original GANs, which is a JS divergence between the real distribution and the generation distribution. The main problem is the gradient vanish of the generator.

$$\begin{aligned} V_G^{minmax} = \frac{1}{2}E_{z \sim P_z}[log(1-D(G(z))] \end{aligned}$$
(12)

The second is the heuristic mutation objective function in DCGAN (Radford et al. 2016), which does not reach saturation and avoids the gradient vanish problem. However, it will push the distribution of generators and discriminators away, which will easily lead to instability and quality fluctuation in the training process.

$$\begin{aligned} V_G^{heuristic} = -\frac{1}{2}E_{z \sim P_z}[log(D(G(z))] \end{aligned}$$
(13)

The third is the least squares variation objective function in LSGAN (Mao et al. 2017), which avoids the problems of gradient vanish and mode collapse to some extent. When the discriminator absolutely classifies the generated image into false images, this function is unsaturated.

$$\begin{aligned} V_G^{least-square} = E_{z \sim P_z}[(D(G(z))-1)^2] \end{aligned}$$
(14)

E-GAN is the first application of EC in GANs. This paper has emerged as a key to open the door of combining EC and GANs. So far, there have been many variants and different algorithms combined with GANs. The following will introduce various variants based on E-GAN.

4.1.2 Enhanced E-GAN

On the basis of E-GAN, Mu et al. (2020) proposed the enhanced evolutionary GANs. The output of the E-GAN’s discriminator is a single scalar form, although it may combine the sum of different evaluation criteria, one quantity is indeed not as good as multiple sets to reflect the authenticity of the generated image. That is to say, a simple quantity cannot convey enough information, which may potentially lead to some problems, such as gradient vanish and model collapse. The model proposed by Mu et al. regards the discriminator as an authenticity distribution rather than a single scalar and measures the authenticity of the generated samples from multiple perspectives, which provides more effective guidance for the generator’s ability. In other words, the evolution of the entire generator population is carried out in a dynamic environment. Compared with E-GAN, its fitness function is also changed, in which the least squares fitness function is used to measure the individual generation performance. The least squares fitness function does not allocate a high cost to the generated false samples, which avoids the mode collapse to some extent. Different from E-GAN, the logarithmic gradient function was used to encourage the generator to generate more diverse offspring in measuring the diversity of generated images. The image generation experiments on four datasets showed that the generated image quality is better than that of E-GAN, and better scores are also obtained in measuring FID and SWD compared with E-GAN, LSGAN, and WGAN.

4.1.3 Co-evolutionary GANs

Costa et al. (2019a, 2019b) proposed co-evolution of generative adversarial networks (COEGAN), which added a competitive CEA to the confrontation between generators and discriminators, increasing the diversity of GANs genome space, making discriminators and generators evolve according to their own sub-populations. In COEGAN, the genome is represented as a set of genes, which are directly mapped into a phenotype composed of layer sequences of DNN, and each gene represents a linear layer, a convolution layer, or a transposed convolution layer. The fitness function of the generator is FID, and the fitness of the discriminator is the objective function, so a more appropriate fitness function needs to be selected for the training of the discriminator. The common cycle problems in co-evolutionary models also require to be considered, and more experiments should be carried out under larger populations and larger genotype constraints. Finally, the effectiveness of the model was verified on MNIST and Fashion-MINIST (Xiao et al. 2017) datasets. The algorithm improved the stability of network training and automatically discover network topology. Schmiedlechner et al. (2018) proposed a co-evolutionary framework Lipizzaner for training GANs with distributed ideas. The framework can be scaled on spatial grid topology. On the basis of co-evolution, through fitness evaluation, mutation, and local interaction of selection, one or more individuals in generator population and discriminator population interact with each other or use random sampling to achieve interaction. The authors distributed the generator population and the discriminator population in a two-dimensional ring grid. Each grid contains a pair of generators and discriminators. The generator in each grid can be evaluated by all discriminators in its neighborhood grid. Similarly, the discriminator is also the same operation. Figure 8 displays a simple topological structure with neighborhood size of 5 in \(3*3\). This not only injects genetic diversity into confrontation optimization but also effectively solves the pathology of GANs training.

Fig. 8
figure 8

Topology structure of a neighborhood size of 5 in a \(3*3\)-gird. The mixture weight vector \(\omega ^k\) is optimized with an EA in each grid. The corresponding hyper-parameters such as the learning rate of each network are updated based on the neighborhood

Compared with the previous GANs, the author combines distributed computing with DL and improves GANs with the distribution and EC, which not only enhances the time efficiency, but also alleviates the complexity of GANs training. Therefore, distributed competitive co-evolution is a promising research direction, because it maps the population to space, achieves the purpose of co-evolution through the interaction between populations, and is conducive to maintaining population diversity.

Selection and communication as a way of regulating diversity has been paid attention to in recent years. Based on the improvement of the previous paper, Toutouh et al. (2020) conducted an ablation analysis of Lipizzaner’s training from the perspective of network training and analysis. They studied the impact of two algorithm components that affect diversity on performance in the process of co-evolution. These two components are performance-based selection or replacement within each sub-population and the transfer of solutions between overlapping neighborhoods. The experimental effect on four types of GANs indicated the importance of intercellular communication. Woldu Woldu (2020) first trained co-evolutionary W-GAN according to the idea of co-evolutionary and W-GAN, and obtained very ideal results. He focused on the importance of diversity, based on the following advantages of Lipizzaner:

  • Lipizzaner can withstand the test of different popular architecture variants.

  • It can be used with many loss functions.

  • Many parameters can be used for parallel training in a distributed manner.

A distributed hierarchical hybrid evolutionary computation framework Lipizzaner 2.0 was proposed, which constructed a more comprehensive, easy to use and extensible Lipizzaner with multiple configurations. At the same time, the Lipizzaner is modified to allow the gradual division of node layers through the layer partition function. A visual end-to-end user interface is constructed, which increases the modularization and flexibility of algorithm execution.

Considering the superior performance of Lipizzaner, Toutouh et al. (2019) combined Lipizzaner and E-GAN, proposed a superior method of training GANs called Mustangs, which was essentially a co-evolutionary method that could run on cloud computing architecture. Compared with distributed co-evolutionary GANs, the proposed algorithm breaks through the deficiency of Lipizzaner in using a single loss function on the two-dimensional grid. It adopts the variation of three objective functions in E-GAN, and selects a mutation with the same probability in each training to improve the time efficiency. They improved the diversity of the population from two aspects. One is the mutation operation of different objective functions, and the other is the distributed training of two-dimensional grid population by EA. Finally, experimental studies on MNIST and CelebA datasets showed that Mustangs manifest high performance. In addition, the proposed method is different from the previous GANs, which can be dynamically implemented on a distributed architecture with low computational cost.

As another important application in co-evolution, Kucharavy et al. (2020) pointed out that the problems in GANs were very similar to the co-evolutionary mechanism of pathogens and host cells based on real biological examples. Generally, potential pathogens will be trapped by the human immune system, and some pathogens are strong enough to cause the risk of infection. Considering this similarity, the author proposed a more robust algorithm to train GANs. Experiments showed that using less computing power could achieve more stable and clearer images. According to the speed of RNA virus evolution and the speed of antibody generation by immune system, a training process can be regarded as a co-evolutionary process of virus and immune system, the adaptive evolution of pathogens the process of immune system inhibiting pathogens. In the evolution process of the generator and discriminator, a Weibull distribution in previous studies is used as the fitness function. In the future, we can introduce the antagonism mechanism of GANs into the research of biology, or continue to apply other advanced ideas in biology to the field of GANs.

4.1.4 CDE-GAN

Although the dynamic environment is considered, the authors of Enhanced E-GAN did not bridge the generator population and the discriminator population. Chen et al. (2020) proposed a cooperative dual evolution-based generative adversarial network (CDE-GAN). They divided the adversarial optimization problem into two sub-problems: generation and identification. The generator population and the discriminator population evolve according to their own EAs, respectively, called E-generator and E-discriminator. This algorithm uses the complementary characteristics to introduce complementary mutation, namely, dual mutation diversity, into the training of the E-Generator and E-Discriminator to help the model distribute the probability quality fairly in data generation. In addition, in the process of cooperation, the authors proposed a soft mechanism to bridge E-Generator and E-Discriminator to ensure their balance, so that the confrontational training is more stable and efficient. The competition results on one synthetic dataset (i.e., eight Gaussian mixture two-dimensional datasets) and three real image datasets (i.e., CIFAR-10, LSUN, and CelebA) showed that the cooperative bi-evolutionary algorithm had great advantages and potential. The basic structure of CDE-GAN is shown in Fig. 9.

Fig. 9
figure 9

CDE-GAN’s structure

4.1.5 AGGAN

In view of the unsolved problems in E-GAN that are easy to fall into local optimum and the class imbalance that is easy to occur in GANs, it is particularly important to capture the distribution of minority classes. Considering the challenges faced by GANs in learning the precise distribution of fewer categories of data, and the usual method used to address this challenge is to increase a small number of samples, namely, oversampling. However, since this class of samples is especially emphasized, it often produces overfitting problems, and because the problem is defined in high dimensional data space, the introduction of new data will inevitably have an impact on sample quality. So Hao et al. (2020) proposed an annealing genetic GAN (AGGAN) based on GA and an annealing algorithm. It can reproduce the distribution of classes only using fewer data samples. Using the mechanism of SA and GA, the generator can use different training strategies to produce different offspring and retain the best. At the same time, the Metropolis criterion is introduced to determine whether to update the best offspring retained, which makes the algorithm far from the local optimum.

In the whole training process of AGGAN, the difference between AGGAN and E-GAN is using different mutation strategies to produce different generator individuals. Unlike E-GAN, the principle of SA is introduced when deciding which generator is retained. As shown in Fig. 10, if the fitness value of \(G_{best}\) is greater than that of the previous generation G, then the probability of 1 is updated to \(G_{new}\), if the fitness value of \(G_{best}\) is lower than that of the previous generation G, then the probability of q is updated to \(G_{new}\), where q is determined by the temperature in SA and the fitness difference between the two individuals. The annealing coefficient gradually decreases from the initial temperature T. Other training processes are similar to E-GAN.

Fig. 10
figure 10

AGGAN’s training

The authors specially selected a few limited data samples in the image dataset to verify the effectiveness of the proposed algorithm. The mathematical theory and experimental analysis proved that AGGAN could reproduce the data distribution from a small number of class samples, and solve the problem of unbalanced class distribution in GANs.

4.1.6 AEGAN

Whether GANs or later variants of GANs, even E-GAN, the weight of feature extraction in DCNN is the same, so an attention mechanism is introduced. Wu et al. (2021) proposed an attentive evolutionary generative adversarial network (AEGAN) based on the instability of existing GANs, the dependence of remote learning, and the low statistical efficiency. The authors embedded a normalized self-attention mechanism in the generator and discriminator (shown in Fig. 11), so that the network could automatically assign weights according to the importance of features to generate samples, which revealed the relationship between image-related regions. In particular, the gradient penalty mechanism is introduced in the discriminator. Its role is that when the input sample changes slightly, the score given by the discriminator does not change too much. Using the improved EA and the attention mechanism, the generator can automatically generate images and assign weights. Different from E-GAN, the three different objective functions used by the mutation operator are hinge mutation (Miyato et al. 2018), heuristic mutation (Radford et al. 2016), and least-square mutation (Mao et al. 2017). A large number of image synthesis experiments on CIFAR-10, CelebA, and LSUN datasets verified the performance of the AEGAN algorithm and improved the stability and statistical efficiency of training.

Fig. 11
figure 11

AEGAN’s framework

4.1.7 CEGAN

E-GAN only uses different objective functions for the mutation part. Li et al. (2021) improved the E-GAN algorithm. The improvement is that the crossover part is added to it, and the C-GAN framework including the crossover operator is proposed. The proposed crossover operator is combined with E-GAN to implement the crossover evolutionary generative adversarial network (CE-GAN). Its framework is shown in Fig. 12. At the same time, the knowledge distillation (KD) strategy is introduced in CE-GAN, called C-filtered Knowledge Distillation Crossover, the basic idea is to let the mutation individuals generated by different objective functions learn experience from the output to simulate the best output structure to produce better offspring. The experiments on the actual dataset showed that the stability of CE-GAN in time efficiency and image quality has been improved.

Fig. 12
figure 12

CEGAN’s framework

4.1.8 PEGANs

In order to generate higher-quality samples, Xue et al. (2022) improved the framework of E-GAN by employing a self-attentive module to improve the shortcomings of convolutional operations, and proposed a phased evolutionary generative adversarial networks (PEGANs). During the training process, the generators can produce a number of offspring that use different objective functions as update strategies, and the discriminator D can play against multiple generators simultaneously. After each round of specified number of iterations, by evaluating the fitness of the generated offspring individuals, the best performing generator G offspring will be retained for the next round of evolution. On this basis, the phased EA effectively makes the discriminator more stable and dynamically adjusts the adversarial strategy by employing different objective functions at different training stages. Experiments on two datasets show that PEGANs improve the stability of training and are competitive in generating high-quality samples.

The framework of PEGANs is shown in Fig. 13. First, a parent G is initialized, and then, the offspring with the same number of objective functions are generated according to different objective functions. Each offspring generation has a different variation operator as the objective function. During training, the D plays against multiple generated individuals. After every n steps of adversarial training, by evaluating the generator offspring, the advantages and disadvantages of different variant operators can be evaluated and a superior strategy is adopted to retain the offspring. The superior offspring will be used as the parent to enter the next round of iteration. In this algorithm, the G can dynamically adjust the training strategy according to the needs of different environments during the training process.

Fig. 13
figure 13

PEGANs’ framework

4.1.9 The comparison and analysis of above methods

It can be seen that in the above improvements based on E-GAN, in order to get rid of the limitations of E-GAN without improving the size of the discriminator, many efforts have been applied to the discriminator, in addition to the generator population, the model also includes an additional set of discriminators. What’s more, there are also methods to alleviate common problems in E-GAN by changing the type of EA, and different strategies are introduced. Table 4 summarizes the number of generators and discriminators in the above-mentioned E-GAN variants, the evaluation criteria of the network, the datasets, and the network architecture used, and shows the improvements of each method relative to E-GAN in the last column, which also points out the direction for future research.

It can be proved from Table 4 below that, in addition to E-GAN, almost all improvements use multiple discriminators in the model, so E-GAN is only applicable to small populations, which limits the evolution direction of some individuals. For Lipizzaner and Mustangs with spatial characteristics, they successfully make the model evolve under the background of more population, and utilize co-evolutionary methods to process individuals on the basis of COEGAN. The similarity between the two is that they both use a distributed approach to training the populations to enhance the efficiency of time. However, as an improvement to Lipizzaner, Mustangs help show the underlying flexibility of the Lipizzaner design and architecture, as evidenced by the fact that Mustangs can overcome the limitations of a single loss function under the influence of E-GAN, i.e., in training evolution of GANs, this system can employ a certain probability to flexibly choose the objective function in two dimensions. In E-GAN, three objective functions are employed as mutation operations without considering the crossover operator. The crossover operator in CEGAN makes the evolution of GAN more complete. What’s more, attention mechanism and SA are used as different strategies combined with E-GAN, especially in AEGAN and PEGANs, which can effectively enhance the performance of E-GANs. FID and IS are the most commonly used metrics among all the indicators for evaluating GANs performance. No matter how the model changes, the evaluation of these metrics in the context of EA can provide an automatic insight into the solution generated by the method.

Based on the generator relative standard architecture design in DCGAN, DCGAN is the most commonly used network architecture. In addition, from the fifth column in the table, we can find that all E-GAN variants are applied to image processing-related aspects. In all datasets, the evolutionary GANs are most inclined to use MNIST, CIFAR-10, CelebA, and LSUN datasets, which are popular in the real world. MNIST is a handwritten digit recognition dataset proposed by the National Institute of Standards and Technology (LeCun 1998). It is the most commonly adopted dataset in classification tasks. Because the number of channels in the image is only one, the training cost is relatively small. CIFAR-10 covers real images such as airplanes, cars, birds, and cats, and is more complex than MNIST which contains only gray images. It contains three channels, is a color RGB image, and size is larger than MNIST (Krizhevsky and Hinton 2010). Large-scale CelebFaces Attributes dataset (CelebA) is a dataset about face attributes, LSUN is a bedroom-type dataset based on human DL in the loop (Yu et al. 2015). The selection of these datasets indicates that GANs based on EA has great potential and advantages in the field of image processing. So far, the relevant answers to RQ1 are well presented.

Table 4 Different E-GAN variations’ comparison

4.2 GANs with DE

4.2.1 Conditioning the generator of GAN using DE

To better investigate the problem of input noise vectors for GAN’s generator, i.e., after training, the generator is able to map a certain class of random noise vectors to images. Saradagi and G (2021) based on DE and GA, first grouped the input noise vectors by using objective functions and then mapped them to specific classes of generators separately, thus solving the GAN mapping phenomenon. Their idea is to construct an adjustable generator in order to separate the input noise vectors without using class labels so that they can be mapped to specific classes. The study makes better use of the generator by introducing a reference image so that it generates images of the desired class. Overall, this study was able to use the Mean Absolute Error to guide the generator to some extent. Through continuous evolutionary iterations, the desired generator can be generated as an output. The novelty of this study is that after extensive training, the generators can be extracted individually and then DE and GA can map the input to a specific class. After training using CIFAR-10 as the dataset, the evolved generator is able to produce class-specific reference images. Figure 14 below shows a brief flow of group mapping of noise vectors to generate reference images.

Fig. 14
figure 14

Framework and running process

4.2.2 DEGAN

Since scenes in nature have different edge ratios, it is crucial to learn rich edge information in computer vision tasks. Traditional edge detection methods based on GANs perform poorly due to the mode collapse. In order to capture as much rich edge information as possible, the learning of GANs can be processed for evolutionary optimization. Given the ability of DE to move the algorithm away from local optima, Zheng et al. (2019) proposed a new differential evolution-based generative adversarial network (DEGAN) in order to enable GANs for richer edge detection. The method introduces a modified DE in the network structure of WGAN to refine the generator inputs. In addition to the original noisy input, the DE acts as an independent evolutionary process that can iteratively optimize the original image before the input until the best image individual is produced as another input to the generator. It is the fitness function provided by the discriminator that evaluates the individuals in the selection phase during the DE-guided evolution of the image individuals. The main flowchart of DEGAN is shown in Fig. 15 is shown. It can be noticed that the processing of individuals still uses the most basic genetic operators such as crossover, variation and selection, except that the individuals in the population are given as images. Such iterations can detect excellent edges of images. The experimental results on the BSDS500 and NYUD benchmark test sets show that DEGAN is able to quickly achieve excellent performance with high quality results in edge detection.

Fig. 15
figure 15

Framework and running process

4.2.3 CSAGAN-DE

Imbalanced data classification is an important applied research topic, and over-sampling methods are often used to address this topic. Jiawei et al. (2023) proposed a conditional self-attention generative adversarial network with differential evolution (CSAGAN-DE) to address the problem that traditional oversampling methods may change the distribution of the original data. CSAGAN-DE aims to improve the quality of generation of the minority data to improve the classification performance of minority data classification performance. First, a new over-sampling method, CSAGAN, is proposed to improve the generation quality of the sampling method by learning the minority data distribution. Then, in CSAGAN-DE, minority data are input into CSAGAN, and new minority data are created with approximate data distributions by using DE to automatically determine the over-sampling ratio, which allows DE to automatically determine the number of generated minority data with satisfactory classification performance.

4.2.4 The comparison and analysis of above methods

Currently, DE, one of the more well-known EAs, has almost the same capability as GAs when applied to GANs. Table 5 below shows some details of the three studies mentioned above. Since the number of generators and discriminators is unique for these models, we do not add these two columns. In the first study, the loss function used is very simple, i.e., the average of the sum of absolute difference images, and the whole process is performed in a single trained generator. Similarly, the generator in the second study and the third study is also single, but their evaluation metrics are diverse. One difference is that the DE in the first study acts on the noise vectors, in the second study it acts on the associated edge detection images, which can be used as another input to the generator, while in the third study it is mainly used to automatically optimize the over-sampling ratio.

Because of the uncertainty of the noise distribution, the evolution of the noise vector is a very novel idea that can be optimized by optimizing the input to the generator to achieve the effect that researchers want. Evolution of images is a conventional idea, while the evaluation of the fitness function using the discriminator of GAN can make the iteration of DE more promising. Also, DE can effectively avoid local optima, so in the field of data classification, DE can be considered for parameter optimization. Neural architecture search for GAN can be considered in the future to enhance its training and generation capabilities. Also, for applications in computer vision, improving the parallelism capability of the model and thus speeding up the training for more complex computations is a direction worth considering.

Table 5 The comparison of different GANs based DE

4.3 GANs with ES

ES is an optimization technique that has existed for decades (Zhong et al. 2020). Compared with standard reinforcement learning, it performs better in modern reinforcement learning benchmarks and overcomes many inconveniences of reinforcement learning. Since ES have the following advantages:

  • The realization of ES does not require reverse propagation.

  • It is easier to expand in distributed environment and will not be affected by reward sparsity.

  • There are fewer hyper-parameters.

So ES is usually applied to the optimization of large-scale neural networks. Facing the complexity of searching the parameters of large-scale network models, ES has stronger search ability in high-dimensional search space, which can guide the algorithm to search further in a better space. Its evolutionary approach generally samples from a population of individuals and lets the successful individuals guide the distribution of future generations. Meanwhile, ES can also be applied to optimize the potential space before the training of generative adversarial networks due to the specificity and complexity of the potential space of generative adversarial networks. However, the mathematical details of the biological evolution-based approach are very abstract, so ES can be regarded as a black box of stochastic optimization techniques. Based on the above description, this section answers RQ2 and discusses how ES are applied to network training for GANs.

4.3.1 ES-GAN and NSR-ES-GAN

In order to solve the problem of mode collapse in GANs, Jabr (Jabr 2018) adopted ES to train GANs, mainly using the simplified version of natural evolution strategy (NES). The improvement of NES compared with the original ES is that the population in evolution is represented as a search distribution, and then the search gradient is introduced to update the defined search distribution. Based on the above theory of natural evolution strategy, the author first proposed an ES based GANs, referred to as ES-GAN. As a gradient estimation black box optimization algorithm, ES can be used as an alternative strategy for SGD and updated by gradient in the search distribution (Wierstra et al. 2014). Then, Jabr proposed a novelty seeking reward evolution strategies GAN based on novelty search on the basis of reinforcement learning, referred to as NSR-ES-GAN. Novelty search is a different type of search. Its purpose is to get rid of the original function of the search target, and then try to search the behavior with novelty function in the search space. The final result moves to the direction of novelty and complexity. Experiments showed that the proposed method can make GANs further developed and try to solve the problem of mode collapse.

4.3.2 EvolGAN

Usually, a random sample is obtained from the latent space, which is the high-dimensional space for generating samples. For each model, the latent space is unique. Therefore, the input noise vector can ultimately determine the output of the generator. Potential exploration can promote researchers to solve and adapt to various problems in practical applications. Roziere et al. (2020) proposed evolutionary GAN (EvolGAN). Different from the E-GAN proposed by Wang et al., the proposed algorithm evolves the individuals of GANs in the latent space when training GANs on small or difficult datasets, and the latter evolves the generator population. EvolGAN does not pay attention to the training stage of GANs, but focuses on the optimization of input noise vector z. In the previous GANs’ method, the noise z is generated randomly, but z in EvolGAN is a parameter that is freely optimized according to the quality evaluation standard Q. The quality evaluation uses the quality estimation tool Koncept512 or AVA. The basic method adopted is to give the generation model \(z \rightarrow G(z)\), and use the classic \((1+1)-ES\) with uniform mutation rate to optimize z with Koncept512 as the criterion. Experiments showed that the results of EA were better than random search strategy, not only in speed, but also in quality and diversity.

4.3.3 Black-box ripper

A so-called black box neural model is one that inputs data samples and then directly outputs the prediction results. However, recent studies have shown that many details in black box models can be obtained by a number of techniques. Therefore, Barbalau et al. (2020) proposed a generative evolutionary framework based on teacher-student network, which included a black-box teacher network, a student network, a generator, and an evolutionary strategy. The main task of this framework is to steal a black-box neural model. Firstly, some random samples generated in the latent space are introduced into the generator and teacher network, then ES are used to optimize their coding in the latent space, adopting selection and mutation operations to obtain small batches of datasets. Secondly, zero-shot knowledge distillation (ZSKD) is utilized to minimize the cross entropy of the teacher network and the student network, so that the knowledge of the teacher network is distilled into the student network, and the gradient in this process is only updated in the student network. The generation model employs VAE or GANs as Fig. 16. One of the important details is that the teacher model is regarded as a black box, which cannot propagate information. The generated model is fixed and independent, and it will be trained on a proxy dataset without considering the real dataset. The goal of the whole task is to maximize the accuracy of the student model. The experiment was carried out on several popular datasets. Comparing this framework with state-of-the-art data-free knowledge distillation and model stealing methods, the results showed that although the teacher network was treated as a black box model, it is helpful in reducing the distribution gap between the roxy and the true dataset.

Fig. 16
figure 16

Framework and running process

4.3.4 Pruning of GANs with ES

In general, as another branch of EC, the mutation operator of ES is completely dependent on random mutation, so ES is very suitable for solving combinatorial optimization problems. Based on this advantage, Junior and Yen (2021) adopted \((1+\lambda )-ES\), that is, a candidate solution could generate a sub-generation, and proposed a network pruning algorithm combining ES and multi-criteria decision making (MCDM). This pruning only works in the generator to eliminate the redundant parameters of the generator, and the operation is realized by continuously eliminating the convolution filter from the original generator. More importantly, considering that the loss function and computational complexity of the generator are two conflicting objectives, the author regards pruning as a MOP, and uses multi-objective optimization to effectively balance the two. Evolutionary selection operator uses a geometric method in the field of MCDM. By the number of FLOPs and Wasserstein distance, MMD solution in two-dimensional target space is shown, which is also known as the knee solution, and MMD is used to select the best candidate solution in the current generation. The multi-objective optimization idea is used to effectively balance multiple objective functions without adjusting any trade-off parameters. The experimental results of generating chest images of healthy people and skin lesions images of malignant tumors showed that the proposed pruning-based GANs’ model could effectively reduce FLOPs by about 70% compared with the original model without pruning of maintaining consistent performance.

4.3.5 The comparison and analysis of above methods

Table 6 below manifests the details associated with GANs and ES in the above papers. The number of most generators and discriminators is relatively single, which is related to most studies are using ES to optimize latent space. Due to the uncertainty of the distribution process caused by the incoming noise before GANs’ training, the exploration of the latent spatial complexity of GANs is a direction worthy of study. In this part, many researchers have explored it, but there are still many efforts worthy of being carried out.

Different from the literature that proposed E-GAN and its variants, it can be found from the fourth and fifth columns of the Table 6 that the selection of evaluation metrics and datasets shows a trend of diversity. Some scholars also employ the evaluation metrics proposed by themselves, and the selection of metrics will undoubtedly have a certain impact on the final performance. When a new model is proposed, the synthetic dataset mixed with Gaussian noise can be selected to test and evaluate. If the performance is better, then the model had better be verified on the real dataset. The last author applied the GANs with ES to the field of medical image processing, which was very forward-looking for the diagnosis of medical images. In the future, it will be very useful to apply advanced DL technology to replace people to make judgments or give suggestions.

Table 6 The comparison of different GANs based ES

4.4 GANs combined with neuroevolution and automatic parameter and structure optimization

In the real world, there are many examples of learning real distribution and generating approximate real distribution. A typical example is to generate near-real images from a given concept, and the generation model based on NN is designed for this scenario. The number of layers of neural network is crucial to the accuracy of various fields. Although increasing the number of layers of neural network can make the effect of model training better such as image generation, the automatic design of network structure is a problem worthy of further discussion. Neuroevolution (NE) gives good results in automatic design of shallow networks, which help display the RQ3.

NE is an application of EAs in neural networks. Recently it has been proposed as a strategy to train and evolve GANs. GANs with NE can not only obtain effective models, high-quality results, and enhanced stability, but also avoid the shortcomings of manually searching network parameters and architectures, and can automatically find useful structures. The optimization of neural networks is usually done manually by experiments and fine-tuning, which is an empirical process defined by experts. However, the emergence of NE algorithms combined with neural networks can realize the automatic design and optimization of neural networks. Neurovolution of augmented topologies (NEAT) (Stanley and Miikkulainen 2004)is a well-known method of evolutionary neural network structure and internal parameters. Later scholars presented DeepNEAT (Tirumala et al. 2016) based on larger search space, which can be applied to deep neural networks. This provides another idea for later researchers to train GANs. Considering the training balance between the generator and discriminator network structure (such as topology structure, hyper-parameters, and optimization method) and internal parameters in GANs during iteration, this section will elaborate how neural evolution is applied to GANs in detail, summarize the literature on automatic search and optimization of network parameters and architecture in GANs in recent years. These are also helpful to understand RQ3.

4.4.1 Neuroevolution of GANs

The neural network can be fully applied to the context of GANs. Whether the number of network layers, connection modes, or the internal characteristics of each layer of the neural network (such as activation function, the number of output layers, etc.) can be used as the theme of evolution. Optimizer types, learning rates, batch sizes, and iterations used during training can also be automatically generated. Costa et al. (2020) introduced the idea of adopting Neuroevolution and co-evolution to train GANs. They introduced the most advanced EAs applied to GANs, explored the applicability of concepts related to EC in the context of GANs, and demonstrated components that could evolve and actively participate in EAs. It is extremely crucial to employ various methods to resolve the problems in GANs. With the advent of the era of computational intelligence, GANs and the diversity of strategies in EC constitute the possibility of a large number of open search.

4.4.2 Pareto GAN and its variation

In the optimization process of neural networks, the following aspects are usually the directions for researchers to optimize:

  1. (1)

    The structure of generators and discriminators, including (a) the number of hidden layers (b) the activation function used (c) the weight parameters of each layer.

  2. (2)

    Initial probability distribution of latent space.

  3. (3)

    Frequency of updating generators and discriminators.

  4. (4)

    Loss functions used to evaluate models.

  5. (5)

    Batch size and number of iterations.

  6. (6)

    Gradient-based optimization technology and hyper-parameters.

EAs can often be used for automatic exploration and optimization on components (1)–(4).

In ML, generated models are used to create data samples similar to training data. Garciarena et al. (2018) firstly used a neural evolutionary method to optimize the architecture of deep GANs, mainly using GA to evolve the network structure of generators and discriminators. In the process of algorithm execution, \((G_i,D_i)\) is represented as a separate individual in the evolution process, which includes a pair of a generator and a discriminator. The crossover operation is applied to two different individuals. For example, \((G_1,D_1)\) and \((G_2,D_2)\) can cross to produce children \((G_1,D_2)\) and \((G_2,D_1)\). In the process of mutation, the number of layers of neural network can be changed, which can be added layer and deleted layer. Mutation operation can also change the internal state of a certain neural network layer, such as network parameters and activation functions. The number of iterations and loss function in the algorithm level can also be changed by combining the neuroevolution method of GA. Then, in terms of the accuracy of the generated solution, the researchers focused on the pareto set approximation problem, using it as a suitable benchmark to evaluate the quality of the generator, and adopting the distribution rate of the generated solution on PF as an indicator to measure whether GANs has mode collapse. The experiment can be extended to 784 variables, and it can create a cross-dimensional and functional transfer architecture.

Based on the shortcomings of previous work and taking into account the high resource consumption rate and computational complexity of neural evolution algorithm, Garciarena et al. (2020) took the loss function as a main aspect of optimization and continued to adopt the combination of multi-objective optimization and NE. They proposed a GA based on neuroevolution method to optimize GANs. Considering that the convergence is an optimization element of the NE, an ‘loop’ parameter is added to the generator and discriminator to represent the update times in each iteration training. That is to say, the generator and discriminator updates can be synchronized. An important breakthrough of the method is that on portability, it is not limited to a network structure. In other words, the input and output of GANs are not included in the method, and the weight of the network is not encoded but learned in the process of evaluating the genotype. Meanwhile, the authors continue the previous work of Pareto GAN and continue to use the approximation problem of Pareto solution set as a measure to evaluate the structural approximation of GANs. Experiments showed that the proposed method presented a certain degree of transferability not only between the scale and domain of PS approximation problem but also between different domains and Gaussian mixture approximation problems.

This method can also be seen as an improvement of E-GAN. E-GAN puts all the genetic process in the generator, and the discriminator is only equivalent to a changing environment. This method does not fully consider the dependence between the generator and the discriminator and the variability of the network. The new method proposed by Garciarena et al. not only fully considers the relationship between the generator and the discriminator, but also adds a second goal to increase the dynamics of the model in training. The proposed algorithm with good generalization ability has great application potential in different fields. In the future, it can be used to guide model optimization in EC.

4.4.3 COEGAN and its variation

In the neural network, the topology and parameters of each neuron can be evolved. It uses population-based optimization methods to continuously improve the quality of each neural network in the population. Like the basic concepts in EC, intricate neural networks will be mapped into a gene sequence for selection, crossover, mutation, and evaluation. Costa et al. (2019a) and Costa et al. (2019b) thought that the mutation strategy of COEGAN was the addition layer, the removal layer, or the change layer. They combined neuroevolution and co-evolution, and proposed a neural evolution training algorithm, which extended the neural evolution model based on DeepNEAT and integrated it into the training of GANs. A competitive CEA is added to the confrontation between the generator and the discriminator, which increases the diversity of GANs genome space and makes the discriminator and the generator evolve according to their own subpopulations. In the proposed algorithm, Genotype-expression is mapped to a series of layers of deep neural networks, each gene represents a linear layer, pooling layer, or convolution layer. More importantly, a series of activation functions such as Sigmoid, Tanh, ReLU, Leaky ReLU, and ELU can be automatically selected. In the evolution process, according to the characteristics of the upper layer, the input layers of the next layer can be dynamically calculated. Figure 17 below displays an example of the genotype of the generator and the discriminator. In this example, the generator and the discriminator are composed of three genes, and their mapping to the phenotype will also be the same three layers as expressed in the genotype. The algorithm can enhance the stability of network training and automatically discover the network topology.

Fig. 17
figure 17

A genotype of a discriminator and a generator

In order to improve the performance of COEGAN, Costa et al. (2020) proposed a new quality diversity algorithm and demonstrated its application in GANs evolution. Its goal is to find a feasible solution to the diversity problem. It is based on the novelty search strategy in reinforcement learning and evolves on the basis of COEGAN. Local competition strategy is also introduced into the algorithm. Different from the previous species formation strategies using NEAT and DeepNEAT, the authors adopt the novelty algorithm based on local competition (NSLC) to improve the exploration of search space. NSLC utilizes pareto-based multi-objective evolutionary algorithm (MOEA), and takes quality and individual novelty as objectives according to neighborhood, which avoids crowding distance ranking when using NSGAII (Deb et al. 2002). Because the proposed ambiguous search has well met the diversity standard, it is also an application of elite thought in essence. The algorithm retains the expression of genome and phenotype in COEGAN, and the mutation strategy is also retained. The training steps are also consistent, and the loss functions of generators and discriminators do not change. In addition, the n nearest neighbors of an individual are selected to calculate the innovation and competition goals, where the innovation goal refers to the average distance between the individual and the neighborhood, the competition goal refers to the number of individuals who are better than the neighborhood under the fitness value ranking.

4.4.4 Automatic parameter and structure optimization in GANs

The choice of model architecture and parameters has always been one of the important issues worthy of consideration in training any neural network. In recent years, the complicated hyper-parameter optimization process has been eliminated in the context of automatic machine learning (AutoML). Automatic exploration can not only improve training efficiency, but also can often gain more reasonable results than manual settings. AutoML in DL basically includes two levels (He et al. 2018): automatic parameter adjustment and automatic network generation. The former focuses on the adjustment of hyper-parameters in a certain target network; the latter focuses on the network topology, namely neural architecture search (NAS). EAs can cover almost all combinations in this process.

Lu et al. (2018) introduced a GA-based Bi-GAN to realize automatic exploration and optimization of DNN parameters. This algorithm can automatically select the number of neurons and filters from a large-scale parameter set. It can also select the number of convolution layer and full connection layer, determine whether to add drop-out layer, pooling layer, and batch normalization, and determine what type of activation function is used to improve the accuracy of the model. In short, it can autonomously and deeply participate in every stage of any type of deep network training, saving manual trial and setting time, and improving efficiency. Bi-GAN has two generators, which have the same network structure. During the training process, each operator of GA can act on each model to conduct mutual confrontation and improvement between the two. It can explore a better solution from the known combination and cannot find a solution that has never been defined before. Compared with the traditional GANs, the Bi-GAN framework includes two generators, two evaluators and a discriminator. Among them, the input of the generator is still Gaussian noise, and the input of the evaluator is real data. The relationship between the generator and the evaluator is one-to-one, and the role of the discriminator is to evaluate the results of the two generators so that the two generators are binary classified to optimize the generators with poor performance. The proposed Bi-GAN framework is manifested in Fig. 18. In the whole process, Bi-GAN first updates the hyper-parameters, then GA updates and optimizes the network according to the fitness function value and genetic operator. The experimental results showed that the proposed algorithm can realize the automatic optimization of DNN parameters with high accuracy.

Fig. 18
figure 18

The structure of Bi-GAN

Similarly, Alarsan and Younes (2021) proposed a GA-based GAN (GANGA) to adjust the hyper-parameters of the network to obtain the best choice. This process is based on GA, which allows the network to automatically find the best hyper-parameters, such as learning rate, drop-out layer parameters, the number of neurons, and training batches. The experimental results showed that GA played an vital role in the automatic adjustment of hyper-parameters. After the adjustment of the algorithm, the accuracy of the final discriminant reaches 100%.

Based on the advantages of EA, Saradagi and G (2021) introduced two parts based on EA. One part is the semantic repair based on the encoder and decoder model, which can realize the filling of the missing area of the image. The other part is for generating the noise input vector in the confrontation network, and the trained generator can map the generated random noise to the image, which resolves the mapping phenomenon in GANs. When all noise vectors are clustered, the training results allow the vectors to map to a specific class. In the first application, the encoder and decoder are fully used. When the encoder encodes the input image, the dimension is continuously reduced, then the decoder is expanded. There is a feature map in the middle to fill the subsequent image. In this process, GA evaluates its fitness, and hyper-parameter filling can be carried out when it evolves to a certain stage until the basic architecture is ready. In the second part, the authors introduce the reference image, adopt GA and DE to separate the noise vector, then use the average absolute error to evaluate. The trained generator can map the vector in the randomly created noise vector group to the new image.

Du et al. (2020) proposed a method to generate the optimal structure of the DCGAN network by using the multi-objective algorithm NSGAII to solve the deep convolution GANs. The decision variables represent different parameters, and the combination of these decision variables becomes the decision space, resulting in a population. Mapping to the target space constitutes n different objective functions. DCGAN first generates images as input of classification model, then uses 1-TPR and FPR as two objectives of NSGAII for evaluation.

The above studies consider the exploration of a single hyper-parameter or network architecture. Kobayashi and Nagao (2020) utilized EA to search not only the structure of the network but also the hyper-parameter. In this process, similar to previous studies, the architecture and parameters are encoded as genomes, and then the FID and IS metrics in the GANs’ performance standards are regarded as fitness functions to be solved as a MOP.

4.4.5 The performance analysis of above algorithms

Both the hyper-parameters and the structure of the network are worth exploring. After the above description, it can be found that some literature only explore the hyper-parameters of the network, and some papers only adjust the structure of the network. Few papers can take into account the hyper-parameters and the structure of the network at the same time, because if so, a large number of resources will be consumed and the performance is not necessarily optimal.

As can be seen from the Table 7 below, only two papers are using EA to automatically search the neural network structure, and the rest are not specifically explained by using the excellent optimization ability of EC to automatically adjust the hyper-parameters. In addition, most researchers prefer to use the classical GA to operate, MNIST dataset, and FID evaluation index are still the most popular. In particular, in the research of scholars, many papers combine the multi-objective optimization field to optimize multiple objectives in GANs at the same time, and the evaluation indicator also involves the IGD metric in the multi-objective optimization field. The following section will elaborate on this kind of problem.

Table 7 The GANs model comparision mentioned above

4.5 Evolved GANs combined with optimization problems

In recent years, some studies use the idea of GANs to solve the benchmark problems in the field of multi-objective optimization or many-objective optimization based on the original simple use of EAs, that is, the use of evolutionary GANs to drive the solution of optimization problems. Other studies are to treat the training of GANs as a MOP, different aspects of training can be used as a single objective, and then EAs are used to optimize these objectives. The specific implementation process explains well RQ4. However, considering the limitations of single objective optimization problems, there is no research on such problems. Up to now, since the number of objectives in the training process of GANs is not as large as the practical application problem, most studies are about multi-objective optimization problems. In the following Section 4.5.1, how GANs are combined with MOPs is elaborated in detail, and then the following part involves the research in MaOPs and other optimization fields.

4.5.1 Evolved GANs in multi-objective optimization

With the proposal of E-GAN, the combination of ideas and optimization in ML and even DL has emerged since 2018. As a representative of adversarial game training problems in DL, GANs have certain similarity in the balance of generator and discriminator training with the exploration process of Pareto optimal solution set in the field of multi-objective optimization. In the Pareto GAN (Garciarena et al. 2018) and its subsequent improvement (Garciarena et al. 2020), the design of Pareto set generator can be used as a related sub-problem of MOPs. The author used a new metric to measure whether the mode collapse occurs in GANs. The input data is uniformly sampled from the PS of the function, and then a new set of points is generated to form an approximate PF. Finally, the Inverted Generational Distance (IGD) (Zitzler et al. 2003) is used to measure. There are two objectives of algorithm optimization, one is the IGD that reflects the convergence performance and distribution performance of the algorithm, and the other is the training time of the network. The smaller the result is, the better the result is. In Pareto GAN, the authors explored the portability of the algorithm, improved it in later studies, introduced the agility index, and finally successfully trained the transferable GANs.

EAs can be combined with GANs to improve the performance of GANs to generate or process data, or GANs can be used to drive EAs. Based on MOPs with curse of dimensionality in high dimensional decision space, He et al. (2020) introduced a multi-objective optimization algorithm driven by GANs (GMOEA). In each generation of the algorithm, the parent solution is divided into true and false samples to train GANs, and high quality candidate solutions are divided into true samples and the rest are divided into false samples. Then, the trained GANs is used to sample the offspring solution. In this process, the proposed reproduction algorithm is used to generate n offspring solutions. Finally, the selection operator is used to select the parent solution and the offspring solution. The proposed GMOEA framework training diagram is shown in Fig. 19. The test is carried out on 10 benchmark problems with up to 200 decision variables, and the performance was compared with six classical MOEAs. The statistical results showed that GMOEA had advantages in solving MOPs with relatively high dimensions of decision variables.

Fig. 19
figure 19

Scheme of model training in GMOEA

Since the fitness value of the generator offspring is evaluated in E-GAN, two objectives are adopted: quality and diversity. Therefore, on the basis of this idea, Baioletti et al. (2020) equated the training of the generator population with a MOP. Based on the two conflicting objectives of the quality and diversity of the generated samples, multi-objective evolutionary GAN (MOEGAN) was proposed. This method uses the Pareto advantage, which selects in the non-dominated sorting of NSGAII, to retain the optimal solution. Compared with E-GAN, MOEGAN uses evolutionary multi-objective optimization technology to select between parents and offspring. The proposed algorithm performs qualitative and quantitative evaluation experiments on synthetic datasets. The qualitative evaluation adopts the same kernel density estimation (KDE) as in E-GAN, and the quantitative evaluation adopts MMD. The PF distribution and time complexity are also analyzed. The results showed that the MMD of the generator in the proposed algorithm reaches the minimum value in 8 Gaussian mixture and 25 Gaussian mixture problems, and the approximate PF generated by different population sizes in the whole evolution process can almost fit the PF, the algorithm has good performance.

Usually in a network structure, architecture settings and hyper-parameters are artificially defined factors, so its performance is worthy of being considered as an important issue. Kobayashi and Nagao (2020) utilized EAs to search for the structure and hyper-parameters of the network, where the architecture and parameters were encoded as a genome, and then the FID and IS in the evaluation of GANs performance standards were put together as fitness functions to make the training of GANs be solved as a MOP. In the proposed method, the authors consider only a part of the parameters for training small networks and use a small number of iterations. As the number of iterations increases, the number of parameters increases.

In the field of MOPs, there is a class of problems called large-scale multi-objective optimization problems (LSMOPs), which contain hundreds to thousands of decision variables and goals. An excellent algorithm should be able to find the Pareto optimal in the search space. Previous studies have shown that these optimal solutions are often distributed on the manifold structure of low-dimensional space. Considering the shortcomings of traditional EAs in dealing with this structure, Wang et al. (2021) proposed a GANs-driven manifold interpolation method to solve LSMOPs which was called GAN-LMEF. An effective way to learn manifolds is to fill in the blanks in the manifold by learning the samples that exist in the manifold, interpolate the meaningful samples that did not exist in the manifold before, and then make the samples on the manifold have continuous. Therefore, it fills the gaps in the manifold, and finally uses this knowledge to guide the direction of evolutionary search. The framework first uses non-dominated solutions to construct a series of manifolds, and then uses three interpolation strategies to fill the gaps in the manifolds, then adopts generators to generate new manifolds and reinsert more promising solutions. In the selection stage, the authors do not use the previous fitness function method, but propose a manifold selection mechanism to predict the quality of the generated solution, avoiding large-scale calculations. The framework includes three parts: central solution calculation, manifold interpolation, and selection. The calculation of the central solution is to use the principal component analysis (PCA) (Wold et al. 1987) algorithm to map the non-dominated solution to an \(m-1\) dimensional manifold, and to classify the solutions with similar characteristics into one category through K-Means clustering (Hartigan and Wong 1979). Finally, the solution is divided into several categories, and a central solution is selected in each category.

One of the important problems in the field of image conversion is that data needs to be prepared in pairs when preparing data, but this is not always possible in practical applications. The potential of EC in dealing with large-scale optimization problems is huge. Therefore, on the basis of E-GAN, Bharti et al. (2022) proposed a new evolutionary multi-objective cyclic GAN (EMOCGAN), which included a new model training method that combines EC, multi-objective optimization, CycleGAN, and different selection mechanisms. At the same time, in order to overcome the local optimal problem, the network uses the metropolis criterion (based on the principle of SA) and the pareto-based selection strategy (selecting the optimal solution in non-dominated solution). The introduction of the concept of evolution help to improve the stability of the model and to a certain extent alleviate the problem of model collapse. This is the first time that EC has been introduced into the training of CycleGAN. In addition to adding different loss functions to the framework, the authors also use a residual network to adjust the structure of GANs. Each sample is composed of two generators and two discriminators and evolves through three operations: mutation, evaluation, and selection. In the mutation process, the principle of randomization is introduced, either the population crossover strategy or the strategy of secondary fine-tuning the parent generation is adopted; in the evaluation process, IS and FID are used as the fitness function of the model, and the structural similarity index (SSIM) (Wang et al. 2004) and universal quality index (UQI) (Wang and Bovik 2002) are used to evaluate the quality of the model; in the selection stage, the two fitness functions, FID and IS, are used to treat the optimization problem as a MOP. A large number of experiments on real image datasets showed that EMOCGAN is superior to advanced methods in terms of visual fidelity of appearance and target prominence.

Gonzalez et al. (2021) developed a rule TaylorGAN using loss function element learning to evaluate GANs. The purpose is to improve the quality of the generated image, and enhance the GANs by Taylor expansion of the loss function of the generator and the discriminator network. The loss function is modified to a cubic Taylor polynomial by using the parameterization method of TaylorGLO (optimizing loss functions through multivariate Taylor polynomial parameterization) loss function, and the loss functions of the two networks are constructed respectively. Then, the GA is used to optimize the network parameters by using the combination of two or more indexes according to the multi-objective evolutionary method. A good solution can be effectively found through a target transformation technology called composite objectives, in which evolution is performed according to the weighting of indicators. If one indicator is improved, the overall fitness of the target will be improved only if there is no comparable regression with another indicator. The dataset is the CMP Facade dataset (Tyleček and Šára 2013), and the model is pix2pix-HD conditional GAN (Wang et al. 2018). In the task of image-to-image translation, the improved method is analyzed from both quantitative and qualitative aspects.

Unlike MOEAs driven by GANs, Baldan and Di Barba (2022) separated evolutionary search from the use of GANs while introducing a forward neural network (FNN). They used the data obtained from NSGAII after solving MOPs for GAN training, whose training datasets consisted of Pareto optimal and non-optimal individuals, with the former labeled as real and the latter labeled as fake. After the GAN’s training, the new Pareto optimal solutions could be discoverd. Then, in the three-objective coil design task, the design variables are inputted into the FNN, and the objective function values can be estimated without additional field analyses, which is applicable to MOPs involving expensive FEM analyses.

In the field of evolutionary multitasking (EMT), multi-objective EMT algorithms often suffer from slow convergence and difficulty in generating superior results. To alleviate these problems, Liang et al. (2023) proposed a new multi-objective optimization EMT-GS, which is based on two new generative strategies. First, a new generative strategy based on GANs is introduced to generate high-quality knowledge that transforms between different tasks. GANs are utilized to estimate the distribution of the population and then actively learn the transition process between tasks without making any assumptions about the distribution model, reducing potential negative migration. Then, a generative strategy based on inertial differential evolution (IDE) is proposed to guide the population to search in the direction of promising PF and improve the convergence speed. The performance of EMT-GS is validated on three multi-task multi-objective benchmark problems. The experimental results show that EMT-GS has excellent competitiveness with other state-of-the-art multi-objective EMT algorithms.

The computational cost of GAN has been an obstacle in the field of image processing, and many acceleration techniques cannot properly handle the compression problem of GAN due to the instability of its training and the complexity of its network structure. Zhou et al. (2023) proposed a MOEA for GAN models in the task of compressing images for translation, called MEGC. In MEGC, the conflict between the computational cost of GAN and the quality of the generated images is defined as a bi-objective optimization problem. The set of evolved Pareto solutions can guide the sampling process in the training of the supernet, so that the focus of the supernet training is significantly shifted to a compact subnet with good overall performance. To further speed up the computation, an evaluation-free offspring strategy is designed, where the evaluation is periodically dropped during the supernet training, avoiding the fine-tuning of each individual and saving the subsequent evolutionary search process. Through image translation experiments, the proposed MEGC can effectively reduce the computational cost of GAN while improving the quality of generated images.

To solve MOPs, many algorithms use DL models to drive EAs progress. However, problems such as mode collapse still occur when generating candidate solutions. Cheng et al. (2023) proposed a dual-population multi-objective evolutionary algorithm driven by WGAN-GP (DGMOEA), which uses dual-population evolution to cooperatively generate high-quality solutions, using the WGAN-GP model to improve the performance of the EA. There are real data and fake data in the model, which is consistent with the concepts of non-dominated and dominated solutions in MOPs, and the distribution of candidate solutions can be learned with high quality by using WGAN-GP to differentiate and sample the candidate solutions. Meanwhile, a solution classification method based on the manifold distance selection of real data is proposed to avoid the imbalance of input data. The dual-populations are generated using WGAN-GP and NSGAII with adaptive rotation-based simulated binary crossover (NSGAII + ARSBX) to improve the quality of the generated data. First, the population generated by WGAN-GP is used as the primary population and the population generated by NSGAII + ARSBX is used as the secondary population to replace the WGAN-GP-generated inferior solution. Then, in order to increase the diversity of the input data, an information feedback mechanism was used to select individuals from the first three generations and the offspring in different proportions. Finally, when the WGAN-GP-generated solution reaches a steady state, it is used as the secondary population, and the primary population is set to NSGAII + ARSBX-generated individuals to strengthen the distribution of offspring. The experimental results verified its remarkable effectiveness in MOPs and were evaluated at the LEADS-PEP dataset containing 53 protein-peptide complexes with competitive results.

There are also NP-hard problems in the neural network. As an efficient method to solve various optimization problems from convex to nonconvex, from single objective to multi-objective, EC absorbs the inspiration of biological evolution and is widely used to solve NP-hard problems. Many researchers have conducted research in the integration of EC and DL. The experimental results demonstrate that EC is considered to have high potential in training networks.

The details of the above literature are summarized in Table 8 below. It can be found from the fourth column that, unlike the evaluation metrics selected by researchers in the previous sections, the evaluation indexes in these papers can be divided into two categories: one is IGD, hypervolume(HV) (Zitzler et al. 2003), SP, SSIM, and other metrics used to evaluate the performance of the solution generated by the MOEA; the other is the traditional indicators for evaluating GANs, such as FID, IS, and MMD. Combined with the eighth column in this table, only the first kind of evaluation index is generally using GANs to drive EA to solve MOPs, these problems are either conventional benchmark problems or solution evaluation indicator. On the contrary, only the second category of evaluation indexes takes the objectives that the training network needs to achieve as a MOP. In essence, it uses the combination of EA and GANs to solve the problems in GANs. The optimization goal is generally to train some objectives that GANs needs to achieve, such as the quality of the generated image and the model running time. These can also be found from the fifth column. There are several papers with dataset ’None’ that mean using GANs to drive EAs to solve MOPs. However, in the last line of research, not only the GAN model is used to drive the EA, but also the proposed method is validated in both benchmark and practical problems, which may be a promising trend in the future.

Another point that needs to be mentioned is that researchers tend to choose NSGAII that provide non-dominated sorting and are willing to adopt the selection strategy for the solutions provided therein. Unlike previous studies, the selection of dataset in this category shows a trend of diversity, and the selection of network structure is also complex, even automatically adjusted. In short, these emerging technologies and methods provide some guidance for both solving problems in neural networks and optimizing fields in the future, and multi-domain hybrid research has become a trend. A summary of the above questions is shown in Table 8.

Time overhead, as an important issue, has been a difficult barrier to break through in model optimization and EA search. However, in the latest study (Zhou et al. 2023), saving time cost by using EA-guided GAN compression is an extremely creative idea. With the proposed multi-objective EA, the EA-guided sampling strategy is used to compress the model in the supernet training, while the proposed offspring-free evaluation strategy maintains the visual quality of the generated images while effectively shortening the compressed GAN model and speeding up the execution efficiency. Future work includes investigating theoretical techniques for efficiently compressing GANs so as to improve the training efficiency.

Table 8 Demonstration of GANs combined with multi-objective optimization problem mentioned above

4.5.2 Evolved GANs in many-objective optimization

In 4.3, researchers mentioned using NSGAII to solve the optimal architecture of DCGAN (Du et al. 2020), which is essentially a many-objective optimization problem. In this problem, the decision variables represent different parameters of GANs structure, and these decision variables are combined to form a decision space, resulting in a population. Mapping to the target space constitutes n different objective functions. The evaluation indexes are 1-TPR and FPR. Experiments on different datasets prove that the use of many-objective algorithm makes the model have universal applicability, and effective tuning operation makes the design of network structure more reasonable and better performance.

Different from the ordinary EA, as an algorithm that uses the probability model to generate offspring, the estimation distribution algorithm (EDA) exhibits good performance in solving MaOPs. Therefore, Liang et al. (2020) developed a gradient penalty Wassertein-GAN (WGAN-GP) based reference vector guided evolution algorithm (RVEA) to resolve the performance degradation problem of EDA in solving three or more MaOPs. Firstly, n random individuals are used to generate the initial population, and a set of reference vectors are initialized at the same time. Then, WGAN-GP is used to generate the initial population instead of crossover and mutation operations. In the selection operation, the polynomial variation combined with the parent population is used to select the new generation of population, then the reference point vector is used to guide the selection of n individuals. Finally, the reference vector is adjusted and the cycle repeats until the termination condition is reached. Two training methods are used in the training of WGAN-GP, one is to train only the discriminator and the other is to train the whole network. The experiments were performed on two benchmark problems, LSMOP and DTLZ. RM-MEDA, MOPSO, and NSGAII were used to compare the algorithms. The proposed algorithm performs better on LSMOP, while it is not as good as MOPSO on the other problem. there is still a lot of room for future refinement of the algorithm, both on the network and in the implementation of the algorithm.

Considering the development of skin cancer detection technology, Lan et al. (2022) proposed a many-objective dual generative adversarial networks model (DGANM), which can simultaneously optimize FID, dual generative adversarial network loss, image sharpness (SOI), and image diversity (DOI) to improve the quality of generated images. Considering these objectives, they proposed a new many-objective evolutionary algorithm based on integrated strategy (MaOEA-IS). The federated learning is introduced in the population mutation, and different mating selection operators are used in the sub-population to generate new parents. Then, the distance from each solution to the ideal point (SID) and the achievement quantization function (ASF) value of each solution are combined for environment selection, while the elimination mechanism is utilized for offspring selection. Ultimately, MaOEA-IS solves the DGANM with good convergence and diversity and generates high quality images.

Several of the above studies considered optimization problems with more than three objectives, two of which used EA to optimize multiple objectives in GAN, while the second study used GAN to drive EA, mainly by replacing crossover and mutation operations in EA. The latest research has been extended to application areas, which will be a promising trend.

4.5.3 Generative adversarial optimization (GAO) and its variation

Inspired by GANs, Tan and Shi (2019) proposed a generative adversarial optimization (GAO) framework for solving a single objective continuous optimization problem, which is the first application of adversarial learning in continuous function optimization problems. At the same time, the researchers added the guidance vector in the heuristic algorithm guided firework algorithm (GFWA) to the generator, which improved the diversity of the generated samples and the stability of the training. GFWA is an algorithm using fireworks explosion process to obtain fitness value. It can construct a guided vector (GV) with good direction and adaptive length, and then add the guided vector to the corresponding fireworks position to generate an elite solution called guided spark (GS) (Li et al. 2019). In GAO, different from other EAs using random sampling to generate elite solutions or guide vectors, the introduction of GANs increases the variability of the algorithm. GAO employs a generator to generate a guide vector to guide the current solution to move in a better direction. At the same time, the discriminator is used to judge whether the candidate solution generated is better than the current solution. The implementation process of the framework is described in Fig. 20.

Fig. 20
figure 20

Architecture of GAO

Guo et al. (2020) proposed a multi-objective combinatorial generative adversarial optimization algorithm (MOCGAO) based on GAO, which combined generative adversarial optimization with NSGAII. The initialization is to select the optimal value of the randomly generated individuals by the greedy strategy, identify the optimal non-dominated solutions of the current generation by the classification strategy and use them as real data to train GANs. The reproduction strategy is a way to generate feasible solutions from a generator. Each parameter is updated by the Adam algorithm. The MOCGAO uses NSGAII’s non-dominated ranking and elite selection strategy to select individuals that constitute the next generation and finally find the Pareto optimal solution. The structure of the generator and discriminator is relatively simple, and a single hidden layer feedforward neural network is used. The comparison algorithm is also a very mainstream NSGAII and MOEA D, and the performance is better. The experiment is tested in the group perception problem. The results of different tasks and participants reveal that the proposed algorithm has advantages in convergence and distribution.

4.5.4 Training of GANs with hybrid evolutionary optimization (HEO)

Before introducing the Adam optimizer (Radford et al. 2016), GANs mainly uses the stochastic gradient descent strategy (SGD) to train. Up to now, the optimizer used in most GANs training is Adam optimizer or RMSprop. Then AGGAN and E-GAN were proposed to make GANs perform better, mainly in terms of quality and diversity. These investigations, however, disregard the impact of initial conditions on GANs’ training, so in the improvement of training methods, Korde et al. (2019) introduced a new training GANs method based on hybrid evolutionary optimization (HEO) technology, which first utilized EAs to stabilize weights in previous generations, and then used Adam optimizer to regularize the discriminator in the remaining generations. Experiments identified that this method enhanced the training stability of GANs and achieved convincing performance in the image generation task. Its training speed on GPU is about 93% faster than that on CPU, and its convergence time is three times faster than that on CPU.

4.6 The application examples of GANs combined with EC

The above mentioned are inclined to the category of theoretical level, of course, which is also extended to practice. The ability to combine theoretical research with practical application research is also another purpose of scientific research. Therefore, RQ5 is also a focus worth being noticed. The following are the applications of evolutionary GANs in computer vision, multimedia, medicine, architecture, industry, and biology (Table 9).

4.6.1 Computer vision (CV)

Including image processing and pattern recognition, the ultimate goal of CV is to employ computers to replace the human brain to achieve image understanding (Cootes and Taylor 2001), which is an important part of AI. Researchers have tried to obtain the information required from images or multidimensional data to establish an artificial intelligence system (Szeliski 2010). Considering the powerful generation ability of GANs and the effective exploration ability of EC, the combination of image processing and edge detection plays a crucial role.

As an important research direction of CV, face synthesis has been widely used in police system and security system, but the current technology either cannot reproduce the characteristics of the face, or does not fully consider the integrity of the image. Based on this reality, Zaltron et al. (2020) discussed a novel face synthesis method CG-GAN based on interactive evolutionary neural network, which used the powerful generation ability of GANs to let the system learn from the given training data to improve the efficiency of face synthesis. It first uses PG-GAN (Karras et al. 2017) to generate high-resolution face images, and then employs interactive evolutionary computation (IEC) technology to enable users to participate in the evolution and editing of the entire face. This method is based on the latent variable evolution (LVE) (Bontrager et al. 2017) method. In IEC, users can assign fitness values to evolutionary individuals according to their own preferences (Takagi 2001), so it can guide the search process of LVE. The subsequent target images are generated by GANs in unsupervised form, then the potential vectors in the target space can be input into the generator by EC. In the experiment, a prerequisite for face synthesis is that the image can be guided to evolve to a specific target, which is mainly manifested in the following aspects:

  • Due to the randomness of the initialization, users can randomly select the image and evolve them. The selected image can be used as the parent image of the next generation, and the locked image remains unchanged.

  • The algorithm uses three different types of variations.

Users can achieve the purpose of variation by selecting the way of introducing Gaussian noise or changing each feature. If users are satisfied with the generation results of human faces, one or more image outputs can be selected. This work is an extension of LVE, so that users can find the hidden characteristics in the evolution. The method uses similarity and recognition rate as evaluation indicators, and the generation process of face images is controllable under the supervision. The experimental results showed that even non-professional users could synthesize good faces in a short time with high accuracy.

In order to protect the user’s identity privacy while ensuring the utility of face data, Song et al. (2019) proffered a new face recognition algorithm based on E-GAN, which made the generated face and the original face not the same person but similar to the same person. The authors adapted the loss function of the generator in this method to evaluate the performance of the generator by introducing the distance and structural similarity between the original face and the face to be recognized into the loss function, which is defined as the inverse recognition index. Each generator is evaluated using the inverse recognition index, which depends on the current environment. Note that the mutation strategy in E-GAN is not changed in the proposed algorithm. The experiment used the classical face recognition method PCA and deep face recognition model VGGFace (Parkhi et al. 2015) to evaluate the privacy protection performance of the generated face recognition.

At present, the edge detection problem itself can be seen as a binary classification problem aimed at separating edges from the rest of the image information (Canny 1986). Zheng et al. (2019) introduced a GANs based on DE for image edge detection, and adopted an improved DE for the input of the generator. In the proposed algorithm, the DE process is regarded as a relatively independent process. Its input is the original image, and then the best individual image is output as the input of the generator. Discriminator as a changing environment can be used to evaluate the fitness. In other words, the discriminator provides a loss function to evaluate the image. Different from E-GAN, the proposed algorithm used a generator to initialize the edge image, and the EA is only applied to the input of the generator. In E-GAN, a group of generator populations are evolved, and finally the generator with excellent performance is selected from the population. The results proved that the proposed algorithm could achieve good goals while maintaining fast.

Recently, there has been an interesting research on the aging and rejuvenation of facial images. Gao et al. (2020) proposed an evolutionary GAN to achieve this goal, which is called EvoGAN. They divided a long-term stage into several small stages, and divided it into several different groups according to age, such as youngsters, adults, and elders. For each pair of age groups, evolutionary transformation methods are designed for facial rejuvenation and aging, respectively. These two methods can produce age characteristics while ensuring the stability of individual information. The general framework is shown in Fig. 21. EvoGAN includes encoder Enc, several evolutionary transformation methods ET, and decoder Dec. The purpose of the encoder is to mine the image for potential features, while the decoder does the opposite. The evolutionary transformation process between different age groups is very smooth. Five evolutionary transformation methods are defined according to the age groups of the input images, and the different groups are distinguished by subscripts. The task of the generator is to generate the image, while the task of the discriminator is to determine whether the generated facial image is a real facial image. In the process of growth, the adult stage can be seen as a bridge between the ‘child’ and the ‘elderly’. Based on this, the researchers introduced three conversion streams. The first conversion stream is to input the child’s facial image and output the adult and elderly images. The second conversion stream is to input adult images and output child and elderly images. Taking the conversion flow 1 as an example, after the image of childhood is input into the encoder Enc, \(ET_{12}\) and \(ET_{22}\) capture the aging characteristics of adulthood, and then input into \(ET_{21}\). The decoder reconstructs the image to generate adult images. If \(ET_{23}\) are input, the elderly images will be generated. So far, a conversion flow is completed. At the same time, in order to adapt to this problem, another important work is to construct a challenging dataset FFHQ_Age, which includes different ethnic groups and different age groups, and also has a wide range of image details such as hats and glasses. By comparing the previous methods of face aging, the proposed method can better predict age-specific factors, and contribute to future scientific research.

Fig. 21
figure 21

An overview of the proposed EvoGAN

Similar to the above facial image-based study, an EvoGAN (an EA assisted GAN), with the same name, was recently proposed by Liu et al. (2022) to generate various compound expressions with any accurate target compound expression. In the new EvoGAN, how to synthesize facial images with arbitrary facial movements and then search for target expressions using EA is the focus of the researchers’ consideration. EvoGAN uses an EA to search target results in the data distribution learned by GAN. They use Facial Action Coding System (FACS) as the encoding of EA, generate face images using pre-trained GAN, and then, use pre-trained classifiers to identify the expressions of the images and serve as the guiding EA search for the fitness function. In the facial expression generator, individuals in the EA can be transferred to the image, and then, the expression of that image is evaluated using the facial expression discriminator. Finally, the distance between the generated image expression and the target expression is evaluated. Through a series of evolutionary operations, the EA can converge the results to the target expression. EvoGAN shows potential for synthesizing images with the target expression.

Considering the strong generation ability of GANs and the excellent global optimization ability of PSO, Zhang and Zhao (2021) developed a GAN based on PSO to achieve high-quality face image generation task. The algorithm is also a good idea to turn adversarial problems into optimization problems. The network uses the structure of DCGAN as the generator and discriminator, and changes the loss function to the least squares loss function in LSGAN. Each particle represents a generator network, and the length of the particle is consistent with the number of parameters in the generator. The algorithm is applied to these particles for optimization, and continuous iteration is carried out to find a single optimal particle and a global optimal particle. Compared with the generation of DCGAN, LSGAN and E-GAN on CelebA and CIFAR-10 datasets, the FID of the proposed algorithm is the lowest, the quality of the generated data is the best, indicating that it can greatly suppress the mode collapse and gradient vanish in the original GANs, and improve the inertia weight value of the PSO.

In addition to image synthesis, target detection, and other common visual applications, image translation is also an important application that cannot be ignored. The so-called image translation is the conversion between images, aiming to convert the source image into the target image through the design model (Zhu et al. 2017). The proposal of GANs in DL greatly accelerates the evolution of image translation. Preparing data needs to be prepared in pairs before image translation, but this is not always possible in practical applications. In order to resolve this problem, Bharti et al. (2022) first introduced EC into the training of CycleGAN, and proposed an EMOCGAN framework, in which different loss functions were added, and the principle of randomization was introduced in the mutation process. The training of GANs is regarded as a MOP with IS and FID as two objectives, and NSGAII is used to resolve it. Finally, SSIM and UQI are used as indicators to evaluate the quality of the model. In order to overcome the local optimal problem, the network also uses the basic idea of SA and NSGAII. The introduction of the evolution concept helps to improve the stability of the model and alleviate the problem of model collapse.

4.6.2 Multimedia

In the field of computer, multimedia is a technology that manages language, text, audio, and video information by computer. In this process, users can constantly rely on a variety of sensory and computer for real-time information exchange (Hiriyannaiah et al. 2020). As a way of communication, multimedia communication based on human–computer interaction is particularly important. In recent years, its application is mainly manifested in the use of EAs and DL to process text and audio information and even create large games. User experience occupies a high proportion in the growing technical background.

Although the data generated by GANs is defined as differential, it can also play a role in generating discrete texts. DL is used in general text generation tasks such as RNN (Zaremba et al. 2014), SeqGAN (Yu et al. 2017), and TextGAN (Zhang et al. 2016), whose development has been relatively perfect. Introducing a new category text generation framework, Liu et al. (2020) first proposed a category awareness model to learn the difference between real samples and generated samples for category text generation, and then employed a hierarchical EA to train the proposed category awareness model, which included a generator using relational memory kernel (RMC) and a CNN-based discriminator. The algorithm evolves a group of generator population by using two mutation strategies of temperature mutation and target mutation in a given discriminator environment. Finally, the generator with good performance will be retained. It allows the model to retain good offspring, and the generated class samples can maintain diversity and high quality after each iteration. The experimental results on multiple datasets prove that CatGAN has succeeded in terms of both category text generation and generic text generation.

Recent studies have shown that GANs can be used for semantic learning in the game field, and LVE previously mentioned in CG-GAN is often used for game production. Schrum et al. (2020) introduced a texture-based interactive LVE tool for game levels. This tool can achieve a direct exploration of the latent space and allow users to experience the game level by setting parameter values by themselves. The algorithm has high user interaction and mainly adopts interactive EC. For example, users can explore the latent space by ‘exploring the latent space’ button. Clicking on the button will pop up a new window. Users can change any single real value component of the genotype by manipulating the slider or inputting new numbers in the adjacent input boxes. In the selection stage, a selection breeding algorithm using pure elite selection is adopted. m individuals in parent n are selected to enter the next generation, and the rest is filled through the offspring, with 50% crossover probability and 30% mutation probability. In order to avoid the algorithm falling into local optimum, the system sets a ‘Randomize’ button to facilitate users to choose whether to replace the selected genome with a new genome. The GANs model trained in ‘Super Mario brothers’ and ‘Legend of Serda’ can use this tool, and has excellent generalization ability in other games. At the same time, users’ preferences can be seen in the background, so as to find the advantages and disadvantages of the system, which is a field to be developed. In the future, the system can be combined with the application of big data to design better game systems and develop better optimization algorithms by collecting and analyzing a large number of user interaction data.

Previously, LVE technology has been successfully applied to fingerprint-based biometric identification system (Bontrager et al. 2018), game creation (Schrum et al. 2020; Volz et al. 2018), face synthesis (Zaltron et al. 2020), and other tasks. The latest research is about audio processing. During this process, data classification often requires large amounts of data, and the quality of data is also important. Data acquisition in the real world often costs a lot, so researchers have to expand data. So Mertes et al. (2020) introduced a new idea to add audio data considering the powerful generation ability of GANs. The added audio data filled and modified the original data in the context of retaining their respective tasks. The first stage adopts the architecture of WaveGAN (Donahue et al. 2019) to generate the variant of the original audio data. The model can convert the Gaussian noise vector into the audio data sample, although the generated audio seems to have never heard before, it is like thunderbolt. In the second stage, the EA is used to search the input domain space of GANs after training. The purpose of this stage is to retain the samples generated by the noise vectors with better performance, and then perform uniform crossover recombination method and Gaussian mutation. The retained and mutated vectors can be input into WaveGAN for training again. The overview of the whole method is presented in Fig. 22. Experiments showed that compared with the methods without increasing audio data and the uncontrolled reinforcement training, the combination of GANs and EA could enhance the support vector machines (SVM)-based audio classifier and improve the performance of audio data classification task.

Fig. 22
figure 22

Overview of the approach based WaveGAN

4.6.3 Medicine

DL in the field of medicine is mostly about medical image processing, including the use of computer technology to predict the analysis results of the diagnosis according medical images. Previous studies on medical image diagnosis were based on DCNN. Generally speaking, the amount of data points input by DCNN is huge, which is often not available in actual medical data. The number of medical image data points in some parts of the patient may be very small. Therefore, researchers tend to use GANs to synthesize data from a given data point to achieve the goal. Considering that different from the standard GA, as another branch of EC, the mutation operator of ES is completely dependent on random mutation, which is very suitable for combinatorial optimization problems. Therefore, Junior and Yen (2021) developed a network pruning algorithm combining ES and MCDM for medical imaging diagnosis. The algorithm also used the idea of multi-objective optimization to effectively balance the results between the loss function of the generator and the computational complexity. Finally, the experiment proved that in generating healthy chest images and skin lesions images of malignant tumors, the proposed pruning-based GANs model could effectively reduce the number of FLOPs by about 70% compared with the original model without pruning under the premise of maintaining consistent performance.

The 2019 coronavirus disease (COVID-19) pandemic has severely affected the health and economy of many countries. Recently, many DL models have been developed to screen for COVID-19 using chest computed tomography (CT) images. Thangavel and Sasirekha (2022) proposed and implemented a DCGAN-based model that automatically discovers and learns the rules in the input data and generates the desired samples. In this model, the hyper-parameters, such as learning rate, momentum, and the number of neurons, are optimized by GA. Finally, the proposed model is used to predict COVID-19 and non-COVID-19 images, which helps to improve the diagnosis rate. It is experimentally verified that the prediction accuracy of DCGAN optimized with GA is 94.50% compared to the mainstream models.

4.6.4 Building

In practical application, GANs can also be applied to the generation design of digital architecture. With the development of DL, the generation design method based on the combination of computer technology and algorithm optimization can improve the diversity and accuracy of architectural schemes. Zhang et al. (2021) studied the generative design of residential building layout, and the network architecture mainly used pix2pix. At the same time, in order to achieve the control of building generation density and height and other parameters, different training sample sets and Pareto-based GA are used. This design can meet certain practical application scenarios in the early stage of architectural design, and has stronger scalability. The Galapagos and Octopus of Rhino platform is also a toolbox based on GA. The author uses the latter, which combines GA and Pareto optimality, and is a python-based multi-objective optimization tool for architectural design. In the future, environmental parameters can be added to Octopus multi-objective optimization to control the generation of architectural forms, including wind environment, light environment, energy consumption, carbon emissions, etc., and truly achieve performance-oriented form generation.

4.6.5 Industry

As the mainstream combinatorial optimization problem, many researchers have modeled and proposed their own algorithms for problems such as flow shop scheduling and fuzzy scheduling under industrial scheduling (Zhang et al. 2019). In recent years, from the application level, EAs can solve many complex NP-hard combinatorial optimization problems, such as scheduling problem. Therefore, Chen et al. (2019) proposed a GANs algorithm mixed with GA to solve the permutation flow shop scheduling problem (PFSP), in which the order crossover was used as the crossover mode, and the other ideas were consistent with the basic GA. The algorithm used a network architecture similar to SeqGAN. At the same time, because the genetic code of the sample data is usually data such as [1, 2, 3, 4], it needs to be converted into training data that GANs can recognize in the form of a two-dimensional matrix. After 300 generations and running the same model 10 times on the same datasets, compared with the standard GA test, the average error rate and minimum error rate of the proposed algorithm are relatively low.

4.6.6 Biology

Predator and prey are frequently mentioned in middle school textbooks, and the two seem to have reached a certain balance under the influence of nature. Talas et al. (2020) described a CamoGAN in his paper, where ‘Camo’ refers to camouflage, that is, a species has evolved to almost integrate with the environment, so how to evolve the best camouflage state is a very interesting study. The goal of predators is to be able to distinguish between input images containing prey and images without prey, and prey is constantly evolving to achieve better camouflage. In CamoGAN, the generator is the genotype of the prey and does not contain the background of the image. Through the EA, the next generation can inherit the good features of the previous generation to achieve better camouflage; the discriminator can be considered as the visual system of the predator, it can identify prey more effectively with evolution. The model developed by the proposed method can provide effective hiding and surpass the two current mainstream camouflage technologies. The experiment measured the reaction time of human participants to recognize the target to quantify the quality of the model, and finally found that the prey generated by the generator is becoming more and more difficult to find.

4.6.7 Environment

Outdoor environments are heavily influenced by the morphological design of urban neighborhoods. Considering that performance-based morphological design of urban neighborhoods relies on time-consuming numerical simulation processes, Huang et al. (2022) proposed an automated design process applying GAN as a surrogate model to accelerate environmental performance-oriented urban design. The trained GAN model predicts pedestrian level wind (PLW), annual cumulative solar radiation (Radiation) and universal thermal climate index (UTCI) in real time. The GAN-based surrogate model is combined with a multi-objective genetic algorithm to achieve real-time optimization of urban form. The results show that the method can be speeded up by 120–240 times compared with numerical simulation methods, and has a time advantage in optimizing the outdoor environment in urban design.

4.6.8 Energy

The accuracy and stability of wind power prediction is an important aspect of wind farm operation. For new wind farms, it is difficult to make accurate predictions without enough historical data. Meng et al. (2022) proposed a novel prediction model based on secondary evolutionary GAN (SEGAN) and dual-dimension attention mechanism (DDAM) assisted bidirectional gate recurrent unit (BiGRU) for solving the few-shot learning problem of short-term wind prediction in new wind farms. The secondary evolutionary learning paradigm is used to learn the marginal distributions of real data and generate high quality real data. In the first evolution, a set of mutation operators is used to optimize the GAN to avoid mode collapse. In the second evolution, PSO is used to optimize the population parameters to further improve the quality of new data. In the prediction stage, DDAM is used to strengthen the important information and weaken the redundant information in both time and feature dimensions. The proposed approach is validated on the data collected from the Galicia Wind Farm in Sotavento, which is better than the other methods in short-term prediction of new-built wind farms.

4.6.9 Chemistry

Drug design is an important research area for pharmaceutical companies. DL models can provide attractive solutions for novel drug design. Abbasi et al. (2022) proposed a feedback GAN-based framework, which includes optimization strategy by incorporating Encoder–Decoder, GAN, and Predictor deep models interconnected with a feedback loop. Encoder-Decoder can convert drug molecule symbols into potential space vectors. GAN can learn and replicate the distribution of training data to generate new compounds. The purpose of the feedback loop is to integrate and evaluate the generated molecules at each training. To develop a more accurate set of molecules, they use NSGAII to select the best generated molecules, evaluate each molecule more comprehensively by analyzing the impact of different optimization objectives, and dynamically feedback into the generator. The results show that the proposed framework can generate authentic and novel molecules with a high level of diversity.

Table 9 Research and demonstration of evolutionary GANs in practical application

4.7 Application of GANs in combination with other machine learning models and other evolutionary algorithms

4.7.1 GANs with evolutionary ensemble learning

Toutouh et al. (2020) proposed two EAs to create integration so as to reuse the generator based on the better performance of ensemble testers than single testers in ML models. Firstly, the authors gave a set of heterogeneous generators to optimize single-objective problems and then creates their integration to train a different goal, the former is limited by the number of sets, and the latter is open but also limited by the upper bound of the set. They are called restricted optimization of generators and non-restricted optimization of generators, abbreviated as REO-GEN and NREO-GEN. In the first method, the encoding strategy is a real number vector with double precision, in which the integer part represents the number of the generator, and the decimal part represents the probability of the generator being selected. The recombination strategy is to randomly select two points in the model shown in Fig. 23 below to exchange their real part, and then average the parameter w of the decimal part. The mutation operation can mutate the real number part or the decimal part, or combine them. In the second method, the encoding strategy is a vector of length \(2n+1\), as shown in Fig. 24. The value of the first element represents the number of generators in a group, and the next n numbers represent the number of generator number, the last n elements represent the probability of being selected corresponding to the generator. The results showed that these two evolutionary methods both constructed generative models, which greatly improved the diversity of pre-training generators, and at the same time, the computational cost was low.

Fig. 23
figure 23

REO-GEN solution representation examples

Fig. 24
figure 24

NREO-GEN solute representation examples

4.7.2 GANs in latent space with GA

Most researchers consider how to further improve the performance of GANs by standing in the direction of the structure, parameters, and model stability of GANs. Fernandes et al. (2020) took into account the influence of potential space on the quality of the generated model, and utilized GA and MAP-Elites to explore to generate different input sets. In GANs, its latent space determines the output of the generator. Based on this theoretical basis, the authors use the directional method to model the latent space as a search problem, and create a set of potential vectors to find a set of images that can maximize the diversity. It can maximize the loss of the model without randomly generating all vectors, it only needs to generate a specific type of samples. This paper uses EC to search, and uses three methods to generate individuals:

  • Random Sampling (RS)

  • GA

  • MAP-Elites

Although the above three methods can all be considered as evolving a group of individuals, they have very different ways of iterative change. In the RS method, a new set of random individuals is created at the beginning of each generation and evaluated by an evaluation function. As a part of EC, GA starts from the randomly generated population, the new population is generated by the mutation operator. The last method is slightly different. The algorithm first generates a random number of individuals and places them on the feature map. After that, map-elite runs through iterations. In each iteration, a new individual is created by applying the mutation operator to an individual that already exists in the map. Then, each new individual will be evaluated according to its characteristics and put on the map, although in each unit of the map, only the best individual will be retained. By using a specific GANs model, the genotype of an individual will be converted into an image. Then use root mean square error (RMSE) and normalized cross correlation (NCC) to compare the images, and finally average between all values. The proposed method can effectively transform different sets of latent variables into samples that meet specific standards.

4.7.3 Evolutionary adversarial attention networks (EAAN)

Each form of information source can be called a modal, the definition of modal is very broad, and a language can also be called a modal. The application of multi-modal and DL has also shown an upward trend year by year after 2010. Huang et al. (2021) proposed an evolutionary adversarial attention network (EAAN) based on the concept of multi-modal representation learning under multi-modal ML. EAAN combines the attention mechanism with GANs and trains and optimizes it through an evolutionary process, thereby improving the robustness of multi-modal representation learning.

In multi-modal representation learning, there are two kinds of representation, one is joint representation, the other is collaborative representation (Guo et al. 2019). Joint representation is to map the information of multiple modals to a multi-modal vector space. Therefore, the author proposed a text-visual attention model based on twinning learning, which combined them. The purpose of twinning learning is to use the siamese similarity under asymmetric attention strategy to guide the learning of attention weight. As a regularization, GANs can match the learned multi-modal representation with the prior distribution. Finally, the above process is fitted into the framework of evolutionary training to improve the robustness of the model, and the hyper-parameters are effectively selected. Each hyper-parameter can be regarded as a gene of an individual in the population. The population is initialized by a uniformly distributed random sampling, and then the hyper-parameter set is selected, restructured, and mutated according to the evolutionary steps to generate the next generation of individuals. Experiments on four datasets proved that the image classification was more accurate.

4.8 The comparison about experiments

To measure the quality of some models, we have selected the FID metric to measure the results generated by the models on the three datasets MNIST, CIFAR-10, and CelebA. As shown in Table 10, the MNIST dataset uses the FID mean and the others have the FID minimum. It can be seen that on the MNIST dataset, COEGAN+NSGC achieves the best results and E-GAN the worst results, on the CIFAR-10 dataset, CEGAN has better results compared to the others, AEGAN has the worse results, and on the CelebA dataset, enhanced E-GAN achieves the greatest results.

Table 10 Results of FID on MNIST, CIFAR-10, and CelebA datasets

Table 11 presents the IS and FID results for selected models for both the unsupervised and supervised cases for the CIFAR-10 dataset, and for the STL-10 dataset for the unsupervised case. It can be observed that the best results for IS and FID obtained by Kobayashi et al. are obtained in all three settings. Under the unsupervised CIFAR-10 dataset, DFM achieved the worst results in IS and SNGAN in FID. Under the CIFAR-10 dataset with supervised settings, SNGAN achieved poorer results for IS and FID scores. Under the STL-10 dataset with unsupervised settings, DFM achieved worse IS scores and SNGAN achieved worse FID scores.

Table 11 Results on CIFAR-10 and STL-10 datasets using unsupervised and supervised settings of IS and FID

Figure 25 gives the results of SSIM and UQI for EMOCGAN and Cyclic GAN in the three datasets Apple and orange, Winter and summer, and Monet and pictures are obtained, where * denotes the results obtained by the model using the dataset for the forward pass and the others indicate the results achieved by the backward pass. EMOCGAN Improved training of the generator, effectively reducing model collapse and instability, and most of the problems associated with the vanilla as well as Cyclic GAN. It can be concluded that the results for EMOCGAN are better than Cyclic GAN overall.

Fig. 25
figure 25

Comparative analysis of EMOCGAN (Bharti et al. 2022) and Cyclic GAN (Zhu et al. 2017)

For the dataset statistics of each literature in this paper, Table 12 shows the availability of each dataset. After querying and verifying that all datasets are publicly available, their resource links are in the twice column, and readers can download them by themselves if needed.

Table 12 Availability statistics for each dataset

5 Discussion for future research

As a model with great generation potential, GANs can be used to solve all aspects of problems in any different field of society. However, due to its difficult training characteristics and unfamiliarity with the field of EC, there are not many studies on evolutionary GANs. When improving the antagonistic training process of GANs, the most commonly used methods in the existing research are the following two:

  • Modifying the objective function (modifying the objective function of the generator (Warde-Farley and Bengio 2017), modifying the objective function of the discriminator (Metz et al. 2017), or both (Salimans et al. 2016) (Arjovsky et al. 2017)).

  • Training a set of additional discriminators (Mu et al. 2020) or generators (Wang et al. 2019) and alleviating the degradation of GANs by heuristic methods.

The literature listed in Sect. 4 points out many directions for later evolutionary GANs’ studies, details are as follows.

5.1 Aspects of GANs

As a pioneer work of combining EC and GANs, E-GAN has many ideas that are not perfect. In the later variant research, how to enhance E-GAN has been described and summarized in the previous part. This section provides some insights into possible future research directions for reference only.

  • In many papers, the generator evolves in the form of population without considering the variability of the environment. Multiple discriminators can provide multiple environments for the generator population. If there are researchers in the future, it is suggested to start directly from multi-discriminator.

  • In terms of the improvement of the datasets, the experiment in MOEGAN is based on the synthetic dataset, and there is a lack of verification for the datasets existing in the real world, such as MNIST, CelebA, and LSUN. In the future, scholars can combine the synthetic datasets with the real datasets to improve the stability of the model. Experiments can also be carried out on more and more intricate datasets, and the applicability of the proposed method to other datasets and problem areas can be studied to obtain further general conclusions about the method.

  • Compared with the traditional neural network optimization goal is only a single objective problem, it is a very worthy research direction to optimize the performance metrics of different sets of discriminator models or GANs simultaneously as a MOP. In solving these problems, MOEAs can play a unique advantage, so the combination of two directions is completed.

  • With regard to the improvement of loss functions, many researchers have hesitated in the selection of loss functions for generators and discriminators, so how to select an alternative fitness function for the training of discriminators and generators deserves further consideration to better guide the progress of GANs in EC. In addition, the idea of loss function customization in TaylorGAN is also worth learning.

  • Influenced by Mustangs and Lipizzaner frameworks, researchers can think about how to implement algorithms on parallel platforms employing distributed training ideas for more complex calculations. The design process can be made into a visual interface to facilitate interaction with users, such as allowing users to scan many configurations of various GANs architectures, loss functions, hyper-parameters, and view the real-time results of GANs training. The hybrid-level EA can also be constructed and implemented to run in a distributed environment.

  • EvolGAN does not focus on the training process of GANs but focuses on the idea of GANs latent space exploration. Future work can use the powerful search ability of EC to explore the latent space of GANs.

  • Inspired by AEGAN, GANs can be combined not only with attention mechanism, but also with other ML fields in the future, such as transfer learning. For large-scale problems with complex search spaces such as constrained and disconnected Pareto optimal regions, evolutionary migration optimization, and advanced ML methods can stimulate further innovation in solving the real-world application problems of large-scale decision variables.

  • Time efficiency is an important indicator of the quality of a model or algorithm, which is not taken into account by GANs training in many studies. In the future, the operation efficiency of the algorithm can be accelerated by weight sharing. In the selection stage of EA, instead of using fitness function to evaluate the quality of the solution, a manifold selection mechanism can be used to predict the quality of the generated solution, which avoids large-scale calculation.

  • Affected by Pareto GAN, different frequencies can be used to optimize the parameters of generators and discriminators in the proposed model.

  • In practical applications, extending evolutionary GANs to other types of data such as multi-modal data is worth considering. In addition, for the application of GANs in the field of scheduling, the idea of multi-objective GANs can be used to solve the problem of scheduling multiple objectives at the same time, such as the maximum completion time and resource consumption rate in green scheduling.

5.2 Aspects of evolutionary computation

EC has great potential to solve various complex problems, and GANs has been successfully applied to different areas of society. However, the exploration of EC-based GANs is still limited, which will provide opportunities for many researchers in the future. In EC, there are still many works to be done in the future, mainly as follows:

  • Many representative EAs have been widely applied to solve optimization problems with different number of objectives. When solving these problems, the loss function of GANs can be combined with the evolutionary index used in EC. Whether the two can be considered as similar problems is a direction worth studying in the future. Of course, in the combination of the two above, additional consideration should be given to whether there are other targets that need to be optimized so as to increase the diversity of the population.

  • The evolutionary process includes several various stages of selection, crossover, and variation. Future research can focus on the optimization of a specific stage in EC. New algorithms or strategies can be proposed for a specific evolutionary operator. For example, the knowledge distillation strategy in reinforcement learning can be used in the retention process of offspring elite solutions in previous studies, and these new strategies can be combined into the optimization process of evolutionary GANs. Thus the stability of GANs model could be enhanced by using these strategies.

  • Mainstream EAs such as ES, GA, SA, NSGAII, PSO, and SPEA2 are widely used to solve MOPs. However, from the current application situation, the use of these algorithms in GANs is still insufficient, and many scholars only focus on the standard evolutionary paradigm and ignore the use of these methods. In addition, there are also many superior algorithms on the EC that are not limited to the algorithms mentioned above, so interested researchers can use more different EAs to optimize GANs.

  • Interactive evolutionary computing is an almost unexplored field based on the interaction characteristics of EC and human beings. If this evolutionary approach can be applied to practical problems such as GANs based on game production and GANs based on speech recognition, the superior performance effects of the future will be welcomed by consumers. Researchers can use the idea of IEC to increase the interaction between the model and users, so that scholars or researchers in non-professional fields can feel the charm of the evolutionary GANs and enhance the popularity of users.

  • In order to integrate automation into the field of evolutionary GANs, there are many researches on automatic search and automatic optimization of parameters of neural network architecture. Neuroevolution and architecture search can be used to automatically adjust the network parameters and structure on the basis of EAs. At the same time, the architecture and parameters of GANs also deserve receiving meaningful attention.

  • In most of the studies listed, researchers have not discussed in depth the runtime of EAs in combination with GANs. EAs have become the choice for solving automated problems in DL yield due to their ability to efficiently find solutions in high-dimensional spaces. However, the computational cost of EAs quickly becomes high because they need to iteratively evaluate different possible solutions, some of which need to evolve in high-dimensional spaces. Therefore, it is still worthwhile for researchers to explore effective strategies to reduce the running time of EAs in the future, while taking into account the high performance role of EAs.

6 Conclusion

As a recently developed deep generation model, GANs have been successfully applied in various fields. Based on the strong exploration ability of EC, it is very suitable for resolving problems related to large-scale combinatorial optimization problems. There are also many problems in GANs training due to search uncertainty, and EC is considered to be an effective tool. In this survey, we first introduce the detailed background information of GANs and the problems that are prone to occur in the training process. In view of these problems, we elaborate some solutions in previous studies and believe that EC is a more effective solution. This method will provide a better start for the training process of GANs and enhance the performance of the algorithm. Secondly, in order to make readers understand more clearly about the combination of GANs and EC, detailed classifications are exhibited in Sect. 4, including E-GAN and its variants, GANs using DE and ES, automatic tuning of GANs, combination of GANs and optimization problems, and research on GANs in practical applications and in combination with other ML algorithms or technologies. The analysis results are displayed in Tables 4, 5, 6, 7, 8, 9, 10, and 11. The classical evaluation metrics, network architecture, and multi-objective optimization algorithm have always been the preferences of people. Most of the studies in the field of CV are in the application part, because any CV research starts from the acquisition and processing of images, and the ability of GANs to generate images is relatively strong. Finally, from the perspective of GANs and EC, we provided the development direction that can be studied in the future.

If there is no first paper combining GANs and EC, then the field will not develop so rapidly, and it is difficult to assess whether the future is still at the forefront of academic. Therefore, our conclusion is that GANs combined with EC have very promising research prospects, and integrating EC with GANs is expected to enhance all aspects of performance. This makes us believe that the combination of EC and GANs has great potential for more open research issues and challenges in the future. This paper will provide readers with a comprehensive understanding of the research knowledge in the field of EC and GANs combination.