Evolutionary Computation in Social Propagation over Complex Networks: A Survey

Social propagation denotes the spread phenomena directly correlated to the human world and society, which includes but is not limited to the diffusion of human epidemics, human-made malicious viruses, fake news, social innovation, viral marketing, etc. Simulation and optimization are two major themes in social propagation, where network-based simulation helps to analyze and understand the social contagion, and problem-oriented optimization is devoted to contain or improve the infection results. Though there have been many models and optimization techniques, the matter of concern is that the increasing complexity and scales of propagation processes continuously refresh the former conclusions. Recently, evolutionary computation (EC) shows its potential in alleviating the concerns by introducing an evolving and developing perspective. With this insight, this paper intends to develop a comprehensive view of how EC takes effect in social propagation. Taxonomy is provided for classifying the propagation problems, and the applications of EC in solving these problems are reviewed. Furthermore, some open issues of social propagation and the potential applications of EC are discussed. This paper contributes to recognizing the problems in application-oriented EC design and paves the way for the development of evolving propagation dynamics.


Introduction
Network propagation refers to the flow, spread, and diffusion of information or material on complex networks. Network propagation phenomena have been widely seen in real-world engineering applications, such as traffic flow on road networks [1] , virus spreading on crowd networks [2,3] , information diffusion on social networks [4] , fluctuations propagation on power grids [5] , etc. In recent years, with the rapid development of information technology, more and more network propagation problems defined on very large-scale networks have been realized, leading to new challenges to the analysis of network propagation.
Simulating real-world propagation processes is fundamental and necessary in network propagation analysis [6,7] . There are various propagation models, such as epidemic models, linear threshold models, and information cascade models. However, as the network scale grows and the propagation behavior becomes more and more complex, the use of traditional network propagation models be-comes more and more complicated. Evolutionary dynamical models have been introduced in propagation models, such as the evolving epidemic models [8,9] , malware and anti-malware evolutionary models [10,11] , evolutionary graph theory [12] , genetic-algorithm-based diffusion model, etc.
Besides simulation, propagation optimization is another classical and significant task. In the area of information explosion, it is of great significance to develop fast and efficient algorithms to minimize negative diffusion and maximize positive diffusion. During these processes, computing efficiency, solution quality, and algorithm stability are simultaneously required. There have been extensive optimization methods developed for solving networked optimization problems, including three major categories.
1) Convex/semi-convex optimization, which is very suitable for the problems with regular, elliptical solution space [13,14] . These kinds of methods converge fast and guarantee convergence to the local optima [15] . However, when facing the complicated solution space with many peaks and troughs, it is difficult to select befitting simulation methods to map the irregular space into the regular one.
2) Heuristics, which are a kind of approximate optim-ization methods [16−19] . They are usually built on instinct or insights into problem characteristics and can provide a feasible solution under acceptable time or space overhead. Though heuristic methods have acclaimed success for the high availability and high efficiency, they are confined by the weak scalability. The reason is that customized designs for problems greatly decrease the generalization ability of methods in solving problems under different categories. Even for similar problems, the change of constraint conditions may also reduce the efficacy of the heuristics.
3) Evolutionary computation (EC), which encapsulates a set of bio-inspired optimization algorithms [20,21] . Unlike traditional heuristics, EC contains problem-independent optimization strategies and can be quickly migrated to different problems. Different from convex-based optimization methods, most stochastic search methods relax requirements to the shape of solution space, and can be easily extended to large-scale optimization problems. Besides, it can produce a population of solutions that are usually independently generated at each iteration, which contributes to the algorithm′s parallelizability.
EC is good at solving non-deterministic polynomial hard (NP-hard) problems [22] , such as multi-backpack problems [23] , travelling salesman problems [23] , workflows scheduling problems [24] , resource allocation problems [3] . As most propagation control problems can be formulated as complex optimization problems with arbitrary complexity [13] , such as the resources allocation, the blocking of edges, the status changes of nodes, or the planning of propagation paths, it is promising to apply EC algorithms to solve complex propagation optimization problems.
With these insights, this paper aims to present a comprehensive survey of numerous EC algorithms and their representative applications in social propagation dynamics. The contributions are as follows.
1) A comprehensive taxonomy of social propagation. This paper discusses various research subjects in social propagation and classifies them into three categories: simulation, optimization, and detection & analysis. Simulation studies focus on modeling realistic propagation phenomena and analyzing their dynamic characteristics. Optimization studies are based on simulation but concentrated more on minimizing the negative diffusion or maximizing the positive diffusion. Some other studies are about the detection & analysis of fake news, diffusion sources, or propagation paths, which show great potentials in engineering practice.
2) An overview of EC applications in social propagation. Classical EC algorithms are firstly reviewed. Then, their applications on solving the simulation, optimization, and other new propagation problems are respectively investigated. The literature survey shows that EC is mainly applied to solve the pollutant minimiz-ation and influence maximization problems. For the other problems, some exploratory studies and inspirations are discussed.
3) Highlighting the open issues of social propagation and the corresponding challenges for EC algorithms design. The evolving propagation dynamics, which combine the evolution thought and traditional propagation models, have raised much research attention in recent years. This paper also discusses the major challenges and potential solutions.
The organization of this paper is as follows. Section 2 provides an overview of EC. Section 3 describes the scope of this paper and the taxonomy of propagation surveys. The applications of EC in propagation simulation are introduced in Section 4. Section 5 provides the advanced EC techniques for solving propagation optimization problems. Some new applications are investigated in Section 6. The open issues and future directions in EC-based propagation research are discussed in Section 7. Finally, this paper is concluded in Section 8.

An overview of EC
Evolutionary computation is a branch of artificial intelligence and computational intelligence. It is inspired by the natural selection mechanism of "survival of the fittest" and the process of biological evolution. The natural evolution process in EC is simulated by iterative processes, and the optimal solution is obtained from a population of solutions [25] . Therefore, a standard EC can be described as a generic, stochastic, population-based, and iterative algorithm.

Gbest(t)
Consider a population with NP individuals. At the time t, each individual holds a position that can be changed by time. Each position indicates a solution to the problem. The solution can be evaluated by a fitness function , whose value (named fitness value) represents the individual adaptability to the environment. Generally, the higher the fitness value is, the better the solution is, so the target optimization problems are usually formulated as maximization problems by adjusting objective functions. The global best solution is represented by , which records the historically best solution of the whole population. The rules of a typical EC can be formulated as follows.
Initially (when t=0), all the individuals are randomly assigned values within the scope of , and is selected from all the initial solutions: (1) Then (when t ≥ 1), the population iterates several times to update their solutions, holding the as the best-know solution until the time t: represents the modification to the solution, which usually consists of a series of rules.
The scope of EC is shown in Fig. 1, which includes three branches: evolutionary algorithms (EAs), swarm intelligence algorithms (SIAs), and some problem-based EC extensions. Each branch is introduced as follows.

Evolutionary algorithms
Evolutionary algorithms (EAs) are a kind of population-based stochastic algorithms, which update a population of solutions in an iterative way. Each iteration typically includes three operators: selection, crossover, and mutation. All the operators imitate the evolution of biology and favor the survival of fitter solutions. Thereinto, the selection operator is used to select out individuals with good adaptability to the environment. The crossover operator can generate two new individuals from twoparent individuals by following specific crossover probabilities or rules. The mutation operator aims to modify the selected individuals by specific rules so that the population diversity can be improved.
The popular EAs include genetic algorithm (GA) [26] , evolution strategy (ES) [27] , evolutionary programming (EP) [28] , differential evolution (DE) [29] , genetic programming (GP) [26] , artificial immune system (AIS) [30] , etc. Thereinto, GA is the most basic and widely-used algorithm, which simulates the inheritance process of chromosome genetic genes [31] . ES and EP simulate the natural evolution processes from different aspects, where ES emphasizes the individual-level evolution and EP focuses on the population-level evolution. DE is very similar to GA in the evolutionary process. Both of them include the process of mutation, crossover, and selection. The differences lie in that DE includes differential variation vectors and holds the search space of floating-point encoding, while GA uses the fitness-based probability selection and holds the search space of binary encoding [32]. GP is designed to build programs automatically and usually takes the syntax tree as the representation of evolving programs. AIS is a rule-based machine learning system that simulates the mechanism of vertebrate immune system.

Swarm intelligence algorithms
Swarm intelligence algorithms (SIAs) originate from the research on the swarm behavior of social insects, such as ants, bees, bacteria, or human organizations. The swarm consists of a series of individuals with simple evolution rules, but their collective efforts will produce solutions with good quality. It turns out that the interaction between these agents shows individual agents with un- known "intelligence" on a global scale [3] . Classical SIAs are introduced as follows. 1) Ant colony optimization (ACO) [33] , firstly proposed by Dorigo in 1992, is inspired by the foraging behavior of ants. Each ant may walk in a random way, but the ant colony can search for the shortest path from home to the food source quickly with the perception of pheromones [34] . Currently, ACO becomes a general term of many ant colony algorithms, which includes but is not limited to the ant system (AS), ant colony system (ACS) [35] , max-min ant system (MMAS) [36] , etc. However, pheromone models used in most ACO algorithms require customized designs for different problems, which weaken their portability. 2) Particle swarm optimization (PSO), proposed by Eberhart and Kennedy [37] , simulates the foraging behavior of birds′ flock. Each bird in the swarm can learn from its optimal historical experience and the best experience of the swarm in the current generation. PSO is easy to use and can simultaneously produce a good solution, which is therefore quickly applied to all kinds of NP-hard problems [38] , such as the traveling salesman problems and the multiple backpack problems [23] , workflow scheduling problems [24] , resource allocation problems [3] , etc. 3) Artificial bee colony (ABC) algorithm, proposed by Karaboga [39] , learns from the behavior of bees in gathering honey. In ABC, the bee colony includes three kinds of bees: the employed foragers that hold the information of food sources, the scouter that search for new food sources, and the onlookers that follow the behavior of employed foragers. The diversity of the bee population contributes to the formation of collective intelligence. ABC has been widely used to solve multi-variable optimization problems, the parameter training problems of neural networks, and other engineering problems. Besides, there is a variety of SI-As learning from many other social behaviors, such as artificial fish swarm algorithm (AFSA) learning from swimming behavior of fishes, glowworm search optimization (GSO) inspired by the bioluminescence of glowworms, bacterial foraging algorithm (BFA) that simulates the consuming behavior of escherichia coli, cuckoo search (CS) originated from the obligate brood parasitism of cuckoo, etc.

EC extensions
Some EC extensions have been developed in recent years, designed to solve problems with specific characteristics. A typical example is the multi-objective evolutionary algorithm (MOEA). Traditional EC algorithms are mostly designed for solving problems with a single objective. However, many real-world problems consist of more than one objective. MOEAs are designed for solving this kind of problem, which are mostly based on two classical algorithms: multi-objective evolutionary algorithm based on decomposition (MOEA/D) [40] and non-dominated sorting genetic algorithm (NSGA-II) [41] . Based on the two MOEAs, many variants and applications are developed [42] . Besides, multi-modal optimization algorithms (MMOAs) are designed to discover multiple solutions with a similar quality [34,43] . Surrogate assisted evolutionary algorithm (SAEA) aims to solve expensive optimization problems [44] . Cooperative coevolution (CC) can be applied on larger-scale optimization problems [22,45] . Evolutionary bilevel optimization (EBO) is introduced to solve the problems with two levels of optimization tasks.
EC has been used to solve non-deterministic polynomial hard problems (such as feature selection problems [46] , mixed-variable problems [47] , etc.) and various engineering problems (such as traffic control [48] , complex network optimization [25] , etc.). Recently, it has been applied to solve the propagation problems in complex networks (such as the propagation process simulation [12] , negative diffusion control [49] , positive diffusion promotion, etc.) and shows good efficacy. However, to the best of our knowledge, few studies provided a holistic view of the applications in the field of networked propagation dynamics.

Literature scope and taxonomy
The scope of our literature review lies in studies of EC-based social propagation. Social propagation is defined as the propagation phenomena happening in social life, and the major participants are human beings (individuals, groups, or organizations) and their virtual agents. Such crossover studies are neither the pure arithmetic improvements nor the pure problem design, but the problem-oriented EC design or the EC-driven problem design. Adopting this view, some studies are excluded from the review, such as the propagation process or communication mechanism inside EC population (e.g., the population structure design in EAs, the information exchange frequency in CC) [50] , the physical communication problems (e.g., the fault propagation in power network) [51] , and the propagation mechanisms in neural networks [52] . These excluded research areas are of great importance and significance in scientific and engineering practice, but they are somehow beyond the scope of social propagation and need other dedicated literature reviews.
Within the scope of social propagation, a detailed taxonomy framework of existing studies is shown in Fig. 2, which contains the most popular research subjects in this field. It consists of three major branches: simulation, optimization, detection & analysis. In Fig. 2, to ensure the integrity of problem classification, some sub-branches with few EC applications are also listed, with their methods underlined by the dotted lines. The methods underlined by solid lines are EC-based methods. A realistic tendency is that the boundaries among different branches may be not that clear in the future; namely, comprehensive research referring to two or more research subjects may be frequently seen in one common study. However, this trend does not go beyond or conflict with the three basic branch units introduced in this paper.

Epidemic spread models
Epidemic denotes the rapid development of disease across a biological population. It can be infectious diseases caused by a pathogen or biological virus that enters the organisms and triggers wide infection, or some widespread common diseases caused by living & social environment [53,54] . Before containing the epidemic spread in the human world, the first step is to simulate epidemic propagation dynamics. For most common epidemics, data analysis and mathematical modelling can both realize this goal. However, for emerging infectious diseases (EIDs), only the mathematical models become feasible, for there are few reference data for analysis. Existing mathematical modelling for epidemics is mostly based on classical compartmental models or some modified versions [55] . The most classical compartmental models include: 1) the susceptible-infectious (SI) model, which is very simple and usually used to simulate incurable epidemics; 2) the susceptible-infectious-susceptible (SIS) model, which simulates the recurrent epidemics such as influenza; and 3) the susceptible-infectious-recovered (SIR) model, which simulates the vaccine-preventable epidemics such as smallpox. Besides, the susceptible-infectious-recoveredsusceptible (SIRS) model is a combination of the SIS model and the SIR model, which can be generalized to the SI, SIS, SIR models. Recently, the susceptible-exposed-infected-vigilant (SEIV) model is prevalent for it further considers the latent period of the epidemic, which is a SIRS-variant model.

Fake news detection Source localization
Diffusion path analysis CA [91] GA-LS [116] , ADOPT [117] , PBA [118−120] IDPC [126]  epidemics. They denoted that the genetic changes in the pathogen could happen in two ways: the concurrent processes raised by neutral mutation or coevolution between pathogen and hosts, and the adaptive evolution as pathogen transmitted among hosts with various constitutions. Similarly, Fraser et al. [56] revealed the apparent conflict between two levels of selection on the acquired immune deficiency syndrome (AIDS) virus in 2016, namely the virus evolution in hosts and the virulence of infection during transmission events. They developed a conceptual model to simulate this kind of reconcile adaptive evolution. Nelson and Holmes [57] emphasized that genetic diversity is of great importance to understanding evolutionary biology, which includes the complex relationships among antigenic evolution, natural selection, reassortment, zonality, and seasonality. Dominique and Maiden [9] investigated the meningococcal carriage and disease and used a population and evolutionary model to explain the meningococcal virulence. Leventhal et al. [58] studied the dynamic of disease evolution on different "population" structures, where the population was claimed as nodes and their connections in a network. These natural evolutions of biological viruses are worthy of a follow-up study, which may inspire the design of new bionics algorithms. Typical examples include the bacterial foraging optimization (BFO) [59] and adaptive clonal selection algorithm (ICSA) [60] . The simulated evolution may also enlighten the research of biological evolution.

Computer virus spread models
A computer virus is a piece of executable code with a mischievous or malicious purpose, which spreads quickly and poses a huge threat to Internet security. With the quick development of computer networks, a wide variety of viruses have been produced, spread, and updated, including but not limited to macro viruses, script viruses, network worms, Trojan viruses, phishing attacks, malware, etc. Since these viruses are often hard to eradicate, the early detection and prevention have become critical steps. The classical epidemic models can be used to model virus propagation dynamics [61] , such as SI, SIS, SIR models can be applied. Thereinto, SI and SIS models containing two basic states (susceptible and infected), are the most prevalent models. Computer viruses are easy to infect one same device repeatedly as long as the intercept measures go unheeded.
EC has been used to simulate the evolution and variation of computer viruses. Early in 2009, Noreen et al. [62] have proposed the concept of "evolvable malware" which introduced a GA-based evolutionary framework to simulate the mutation process of virus genotype. Meng et al. [11] further modularized the malicious code from different malware families into different attack features. They then built a malware meta-model to capture different fea-tures and constraints inside malware. Then, they apply the GA with gene crossover and mutations to mimic the evolution of the malware. Considering the renewal of mobile malware, Sen et al. [10] introduced the co-evolutionary (CEA) computation techniques to simulate malware evolution and develop anti-malware automatically, namely using a co-evolutionary arms race mechanism for developing more robust systems. EC helps to build some evolutionary game models to predict the potential variants of computer viruses and promotes the development of anti-virus programs. However, most existing models take GA as a basic simulator, and other EC algorithms are rarely seen. In fact, for human-made viruses or malware, some other evolution mode in EC may inspire the virus variation other than the genetic variation, e.g., learning from historicalbest infection behavior or other stronger viruses stored in reservoirs (as inspired by PSO), leaving pheromone to mark the susceptible host machines, causing repeated infection (as inspired by ACO), etc. The reasons are that a successful learning operator embedding in algorithms can accelerate the convergence speed and find high-quality solutions, so it has the potentials to accelerate the virus development and cause significant threats to network security. Consequently, this is a potential research direction.

Information diffusion models
Most epidemic models can be applied to simulate information propagation. However, they are mostly regarded as to be oversimplified to the information itself or overly optimistic to the known network structure.
Later, there are two classical probabilistic models specified for information dissemination, respectively the independent cascade (IC) model and the linear threshold (LT) model [63] . The IC model generalizes the SIR epidemic model, and the LT model is a probabilistic extension of the tipping model [64] . A broader model is named generalized threshold (GT), which combines the LT and IC models. Besides the three models, some soft computing models are proposed, such as the heat energy-based model [65] and the forest fire model [66] . A logic programming (LP) based diffusion model was proposed by Shakarian et al. [67] , which further considers the attributes of nodes. Later, many extensions and variants were developed, such as probabilistic similarity logic (PSL) [68] and modal logic [48] .
Opinions as a special kind of information are usually diffused in online or offline social groups. There have been many models developed for describing opinion spread dynamics, which can be classified into two specific categories: the discrete models (such as Ising [69] , Voter [70] , Sznajd [71] , Majority Rule [72] ), and the continuous models (such as Deffuant-Weisbuch (DW) [73] , Hegselmann-Krause (HK) [74] ).
In 2005, Lieberman et al. [12] incorporated the thought of evolutionary computation with network science and proposed the evolutionary graph theory (EGT). EGT simulates the nodes in the network as individuals in the population and the network topology as the population structure. The homogeneous population with the Moran process can be modelled as a special case of fully connected and un-weighted graphs. In 2010, Lahiri and Cebrián [31] proposed a genetic algorithm diffusion model (GADM) to simulate the diffusion in social networks.

EC for propagation optimization
Propagation dynamics is one of the major branches in complex network science, which includes the propagation control of negative diffusion (e.g., epidemic spread, computer viruses diffusion, rumor propagation) and promoting the positive diffusion (e.g., innovation diffusion [75,76] , social learning [77] , social norm evolution [78] , brand & product popularization [79] ). For more knowledge about social contagion, please refer to [80,81].
With the expansion of network scales, the networkbased optimization problems have become more and more complicated. EC is naturally designed to optimize NPhard problems, and many algorithms under EC realm have been developed to solve the propagation optimization problems in complex networks. According to the different applications, they can be summarized as below.

Pollutant minimization
A pollutant is used to denote hazardous materials or virtual entities with a negative effect, such as epidemics, computer viruses, rumors, fake news, fault information, etc. Minimizing the spread of these pollutants contributes to ecological health and social order.

Epidemic control
An epidemic control problem can be formulated as a constrained resource allocation problem. Considering there are M kinds of resources and N nodes in the network. Resources denote the goods and materials or services used in all kinds of prevention methods, which can be divided into two kinds: the node-oriented resources represented by a matrix where represents that the resource r is allocated to the i-th node in the network; and the edge-oriented resources represented by the matrix where denotes that the resource r is taken effect on the edge (i, j). Constraints denote the limited budget, manpower, grounds, or other control conditions, marked as C. Then, the epidemic control problem can be formulated as is the objective function, which is based on the epidemic simulation models and takes and/or as its decision variables.
is the total cost of the allocated resources.
Based on the compartmental models, all kinds of immunization strategies are developed to control epidemic spread [49] . Thereinto, population-based intelligent optimization algorithms are mostly hybrid algorithms or frameworks to fit the complex problem characteristics. For example, the memetic structure optimization strategy (MSOS) [82] was designed to adjust the epidemic threshold (τ), which takes the memetic algorithm as the global search strategy and the simulated annealing algorithm combined with the properties of networks as the local search strategy. The optimized homotopy perturbation method (OHPM) [83] was proposed to maximize the immune population, which combines PSO and the homotopy perturbation method. The binary PSO with priority planning and hierarchical learning (PHSO) [3] was designed for accelerating the epidemic decaying rate. Pizzuti and Socievole [84] combined DE and GA to solve the optimal curing policy problem.
Recently, community detection technology has been introduced into network propagation analysis. As the network has been divided into multiple communities, the problem can be divided into subproblems [55] , or the searching space can be further narrowed; this may contribute to the parallel optimization, and the algorithm efficiency can be further improved. However, it is pretty challenging to combine network propagation, community structure detection, and evolutionary algorithms. Some problems need to be concerned, such as how to balance the community size and modularity and how to deal with the subspaces in communities and the global space in the whole network. As an exploration, Wang et al. [85] attempted to use the network community structure to narrow down the node candidates for immunization and then designed a memetic algorithm (MA) to select immunization nodes for epidemic control. Zhao et al. [55] introduced an improved Louvain algorithm to divide the network into communities with relatively balanced sizes and then design a co-evolutionary algorithm with network-community-based decomposition (NCD-CEA) to accelerate the epidemic decay.

Computer virus prevention
The formulation of the virus control problem is similar to epidemic control. The differences lie in the concrete resource types. Epidemic control tends more to social distancing (edge blocking) and vaccine distribution (node immunization), while computer virus prevention focuses more on bug fixes (addressing problem-source) and preventive measures (increasing the infection threshold of nodes).
Existing EC-based parallel prediction methods can efficiently identify the mobile applications as benign or malicious, such as the dynamic hybrid ANFIS-PSO approach (DyHAP) proposed by Afifi [86] . Sen et al. [10] pro- In addition, some studies formulated malicious attacks as network robustness optimization problems and used EC to solve it. For example, Wu et al. [87] used the memetic algorithm (MA) to locate the host nodes and provided the optimal configuration of the hosts. Zhou and Liu [88] designed a multi-objective evolutionary algorithm (MOEA-RSF MMA ) against malicious attacks on nodes and links. Moreover, due to the heterogeneous nature of realworld networks, removing some nodes may be more significant than removing others. Simultaneously, the network connectivity should also be maximized to keep good network efficiency. With this insight, Liu et al. [89] designed a framework of evolutionary algorithm (Evol) to achieve network robustness optimization by minimizing the number of removal nodes.

Rumor containment
Rumor refers to misinformation and non-credible content that are diffused in online social media. As smartphones provide straightforward access to social media, every individual can publish their news, stories, opinions, and comments on the Internet. The cost of generating and spreading rumors becomes extremely low, but eliminating their negative impacts is quite expensive. Therefore, it is a challenging but significant task containing rumor diffusion.
However, rumor is different from epidemics or computer viruses. Its text content and spreads′ background cannot be easily ignored but are hard to be quantitatively represented. Most relative studies still focus on modeling rumor spread processes, analyzing one specific rumor dynamic, and the heuristics-based rumor control strategies. Due to the diversity of spread scenes, a rumor control problem with a unified form is hard to define. So far, the optimization-based methods are rarely investigated. As exploratory work, Chen et al. [90] formulated pollutant propagation prevention as a multiple objective subset selection problems and proposed an algorithm named MOEA/D-ADACO to solve the problem. Shah and Kobti [91] formulated fake news detection as a multimodal task and designed an evolutionary cultural algorithm (CA) to fulfill the task. Strictly speaking, the proposed cultural algorithm is actually an algorithm with evolution thought, but it contributes to rumor detection and control on real-world datasets.

Influence maximization
Influence maximization (IM) is a significant problem in information science, including information broadcasting, viral marketing, competitiveness improvement, knowledge/belief/innovation diffusion, etc. It can be seen as an inversion problem of pollutant minimization, and epidemic models can be adopted, while the objectives are changed to be the maximization of the influence spreading rate or infected individuals at the steady-state [92] . In 2001, Domingos and Richardson [4] denoted that the cus-tomer′s network value should be considered in social marketing, and convincing a subset of individuals can trigger a large spread income. Later in 2003, Kempe et al. [63] formulated the IM problem as a k-seed selection problem and proved it is NP-hard. Compared to the inversion problem of pollutant minimization, the IM problem is more popular in recent years its easy implementation and simple principles.
A standard IM problem can be formulated as follows. Consider a network with N nodes, with the node-set described by . Let represent the initial seed set, where is a non-negative integer and denotes the -th seed node. Each node in the network can be independently activated under the marketing strategy , with an activated probability . The nodes being successfully activated are selected as the initial seed nodes. Each seed node has the ability to activate the connected nodes, with the final activated set represented by . Then, the expected revenue is formulated as hv(χ)).

(4)
And the IM problem can be formulated as where is the constraint function of the marketing strategy, is the constraint condition. The propagation model is implicitly embedded in the marketing strategy . Influence maximization has been formally formulated as selecting an influential set of nodes, which is NP-hard for most influence models. EC techniques have been very skillful in solving various NP-hard problems (including subset selection problems [3] , and it is very promising to solve IM problems with EC. There have been three types of EC-based strategies for the IM problems with different settings: 1) The influential seed selection problem, namely the subset selection problem. The original solution method is a greedy hill-climbing algorithm proposed by Kempe et al. [63] , which is time-consuming and not efficient enough. Goyal et al. [93] further extended the IM problem. They introduced the minimum target set selection problem (MINTSS) by adding constraints on the size of the seed set and the minimum possible time problem (MINTIME) by adding constraints on the time that a predefined coverage is achieved. Both the problems are proved to be NP-hard, and a simple greedy algorithm and approximation algorithms are applied to solve the two problems, respectively. As improvements, some evolutionary algori-thms have been developed and applied to solve the IM problems, such as metric algorithms (MAs) [94] , genetic algorithms (GAs) [95−98] simulated annealing (SA) [99] , etc. These algorithms have good advantages over Kempe et al.′s algorithm in both run time and convergence performance [100] .
2) The seed selection and optimization problem, which first finds seed candidates and then optimizes the searching space. By this form, the underlying structures of networks can be explored. For instance, Simsek and Abdollahpouri [101] sorted the nodes according to the network metrics such as out-degree centrality and closeness centrality and then applied PSO to look for good solutions. They analyzed the influence level of nodes on their neighbors and then applied ACO to maximizing the network profit and minimizing nodes′ similarity.
3) Multi-objective IM problems. Some existing IM problems with two objectives can be transformed into single objective optimization problems, such as the influence maximization & cost minimization [102,103] , and the profit-maximization & node-similarity-minimization [101] . Some other studies that solve the multiple objective optimization problems require two or more conflicting objectives. For example, Bucur et al. [104] attempted to find a trade-off between the number of seeds and the influence on the network. Robles et al. [100] applied the NSGA-Ⅱ and MOEA/D, a single objective GA, and a greedy strategy to jointly optimize the revenue and the number of seeds. Olivares et al. [103] applied particle swarm optimization (PSO) to maximize the influence and minimize the seed number.

EC for propagation detection and analysis
In recent years, fake news detection became very popular, which can be regarded as an extension of rumor diffusion analysis. Due to the rampant growth of social media information, two research directions have been focused on source localization and diffusion path analyses, which have contributed significantly to the analysis of public social opinion.
So far, most EC paradigms applied to complex network propagation refer to simulation and optimization. However, the realms of social propagation are far from the two. There are some other realistic problems worthy of attention.

Fake news detection
The detection of fake news is different from the aforementioned rumor diffusion control. The former emphasizes the news content in terms of diverse subjects and contexts, while the latter focuses more on the network structure and individual attributes. Meanwhile, with the introduction of semantic and contextual information, fake news detection has become a comprehensive task that combines techniques and knowledge [105] . From the technical perspective, the popular methods include temporal analysis [106] , semantic analysis [107] , machine learning classification [108] , network propagation analysis [109] , etc. There have been many application-oriented fake news detection systems such as TweedCred, Snopes, Fact check, etc.
Evolution computation methods have not yet been frequently used. Recently, Shah and Kobti [91] applied a cultural algorithm (CA), a branch of evolutionary computation, to multimodal fake news detection, where situational and normative knowledge is considered. Experimental results show that their algorithm outperforms the stateof-the-art methods, which demonstrate that EC has the potential to be applied in fake news detection.

Source localization
Locating the sources where diffusion behavior started from is a very generic and realistic problem [110] . This problem becomes challenging due to the dynamic evolution of network structure and real-time data flow. Shelke and Attar [111] reviewed existing source detection methods. They classified the networks into three kinds of observations: 1) complete observation, which contains sufficient knowledge to the network topology, but is hardly possible in social or human-contact networks; 2) snapshot observation, which contains partial knowledge only about infected nodes, such as the event-related tweet stream [112] ; and 3) monitor observation, which inserts a sensor into the network and captures the real-time data flow about specific topics. As the snapshot and monitor observations are performed with limited knowledge about the network, source identification cannot be simply classified as a search problem but an inference problem. In 2011, Shah and Zaman [113] first used the rumor centrality metric to estimate sources and further analyzed its efficacy in [114]. Then in 2013, Luo et al. [115] firstly applied the rumor centrality metric to locate multiple sources.
Existing EC algorithms are mostly applied to the network with complete observation. Namely, the structure of the whole network is already known. This situation mostly happens in industrial networks, such as power networks, water distribution networks, wireless communication networks. For example, Mahinthakumar Sayeed [116] designed a GA-local search (GA-LS) approaches to solve the inverse problem of groundwater source identification. Liu et al. [117] designed an EA-based approach, adaptive optimization technique (ADOPT), for contamination source identification in water-flow networks. Particle backtracking algorithm (PBA), firstly introduced by Zierolf et al. [118] and extended by Shang et al. [119] , is a kind of method to predict the particle location according to the local velocity field, which has been applied to contamination source detection problems [120] .
However, for the data flow with an incomplete net-

Diffusion path analyses
In traditional face-to-face communication, tracing the information diffusion paths is manually implemented, e.g., questionnaire survey in the study of sociology, the witness sign words in settling a lawsuit. Since the 21st century, computer-mediated communication provides a traceable channel to investigate the propagation paths and patterns. Currently, the techniques for diffusion path analysis (DPA) include citation-based analyses [121−123] , social network analysis [123,124] , diffusion long short-term memory (diffusion-LSTM) for predicting image diffusion path [125] , main path analysis [124] , reactive diffusion process [126] , etc.
EC does not perform well in quantitative analysis but has the potential to solve information diffusion path construction (IDPC) [126] problem. IDPC aims to reproduce the information diffusion process and selects the nearpractical paths. Most existing algorithms are based on network metric analysis or semantic analysis to identify the possible paths while rarely consider the near-optimal path selection.

Data-driven evolutionary propagation simulation
Though various propagation models, it is still challenging to incorporate real-world datasets into the simulating propagation process. The reasons lie in three aspects: 1) The non-trivial work includes a lot of heterogeneous information. Individual information in the real-world social network is presented in diverse forms, such as profile images, descriptive text, social metrics, etc. To build a computable propagation model, the diverse information needs to be first processed into numeric values. The process may need many techniques, such as natural language processing, image processing, statistic methods, etc., increasing the threshold of data-driven propagation simulation.
2) The complicated parameter representation for individuals. In network propagation models, a realistic individual is characterized by a set of parameters of visual nodes. The early propagation model assumed that the nodes share the same parameters. These settings are applicable to the scene of lacking network structures (such as the lack of population contact information in epidemic simulation). However, they cannot provide an accurate propagation analysis due to the mass loss of individual information. Recent studies have widely considered the individual difference. They tended to randomly assign the parameter variables with values generated by a random initialization function. However, it is still a big challenge to transform the individuals′ multiple heterogeneous information into available parameters. Not only the parameter generation process needs to be carefully designed, but also the indexes to pick up valuable parameters should be well-designed and well-validated.
3) The incorporation of data-driven models and evolution models. The introduction of real-world datasets leads to that the propagation dynamic naturally becomes a dynamic evolution process. However, as the data-driven models have been complex enough, the combination of the data-driven model and the evolution model should be carefully designed to reduce the complexity and increase the availability. Simultaneously, some important performance indicators such as computing efficiency, model robustness, uncertainty, etc., should be considered.

Coevolution models of multiple contagions
Existing propagation models are mostly based on idealized and oversimplified environment settings. A typical example is the setting of single contagion. However, the real-word spread environment contains more than one contagion, which may infect the population simultaneously. Some studies have turned their eyes to the propagation of homogeneous and interacting contagions, such as 1) the multiple interacting epidemics [129] and the successively interacting social contagion model [130,131] , or 2) two heterogeneous but relevant contagions, such as the epidemics spread and the epidemic-prevention awareness spread [132,133] . Though there are some attentions and studies, there are still several open issues: 1) The measurement of relationships among multiple contagions. Intuitively, the relationships among three or more contagions can be simulated by multiple pairs of contagions. Nevertheless, for real-world social events, there are usually a series of messages with different angles. These messages may be independent of each other or conflicting with each other, and some of them cooperatively contribute to the formulation of public opinions. How to reasonably formulate or measure the co-relationships among many contagions is a challenging but significant problem.
2) More intricate coevolution propagation mechanisms. Multiple cooperative or competitive contagions will trigger interesting propagation phenomena, e.g., the collective agreement motivated by a series of cooperative messages, or the group differentiation motivated by the conflicting standpoints, or more complex combinations. Developing quantitative analyses of the cooperation mode, evolutionary process, and propagation results are of great sociological significance.
Besides, some other aspects deserve attention, such as the coevolution among epidemic resource spreading dynamics, and the coevolution dynamics on different network topologies. For more references, see [134].

Large-scale and distributed propagation optimization
Evolutionary computation used as a bio-inspired stochastic method has been widely applied to solve propagation problems in complex social networks. However, most existing EC algorithms are applied to the networks with limited scale, such as the co-authorship network of scientists working on network theory (NETS-CIENCE) with 1 589 nodes [97] , the message network between the users of an online community of students from University of California, Irvine (UCSOCIAL) with 1 899 nodes [97] , the email communications between university scholars (EMAIL) with 1 133 nodes [100] , etc. For some online social networks with many nodes, it is in high demand to design efficient algorithms to deal with the propagation optimization problems happening in those networks. The difficulties are on three aspects: 1) The balance of quality and efficiency. Most EC algorithms are population-based iterative algorithms, such as GA, DE, and PSO. The larger the population scale, the quicker the convergence speed, but at the same time, the higher the demands to the computing space and processing ability, and so is the number of iteration times. Balancing the solution quality and computing efficiency is a focused problem in large-scale EC design [38] .
2) The multi-population communication mechanism distributed in multiple computing units. The traditional distributed optimization with geographically dispersed computers tends to process the mirror problems parallelly. This way is applied to the separable problems with many independent search subspaces, which does not facilitate the large-scale optimization problems with many highly coupled decision variables. Namely, the dividingand-conquering ways of the global problem and the communication mechanisms among different subproblems on different computing units should be considered. Fortunately, complex networks are featured by significant community structures, and currently, the community detection techniques have been well-developed. By dividing the network into multiple independent or overlapping communities can help achieve problem decomposition [55] . However, after decomposition, cooperatively conquering methods among different computing units need further exploration.
3) The systematic design for large-scale and distributed optimization. Solving large-scale optimization problems not only relies on the algorithm-level distributed design but also requires system-level construction considering both software and hardware modules. Faced with dazzling computing frameworks such as Hadoop, Storm, Flink, Spark, etc., designing an efficient and environmentfitted solution scheme is necessary. Besides, GPU-Accelerated methods have shown their potential in solving IM problems in large-scale social networks [135] . Some advances attempted to accelerate the evolutionary algorithms by GPUs [136,137] , which is worth further attention.

Real-time dynamic propagation optimization
Compared to the networks with static nodes and fixed structures, the real-world networks usually have changeable nodes and dynamically changed links, such as the diffusion networks constructed from real-time Twitter streams and the wireless communication networks. In such a situation, traditional optimization algorithms for static network optimization may lose their efficacy or need to be improved to fit the dynamic and extensible optimization environment. Yonas and Yen [138] proposed the dynamic evolutionary algorithms (DEA) for solving the dynamic optimization problems (DOP). However, there are still some challenges in applying DEA in solving network-based propagation optimization problems.
1) How to design the fitness functions. So far, though the dynamic propagation models have been widely developed, the boundary conditions of dynamic propagation optimization have not yet been systematically analyzed and demonstrated. The DOP study of propagation dynamics is still in a very early stage.
2) How to reuse the historical information of both the algorithm settings and the network structure. Existing studies have pointed out that the reuse of previous information was crucial in accelerating the convergence speed of the searching process. The previous propagation processes, network structure, and corresponding solutions may contribute to the optimization problems in a later similar environment. It is valuable work to store the historical environment & solutions and build a quick index of archives. Similarly, the parameter settings or population structure of EC algorithms may help increase the al-gorithm′s solution quality and convergence speed in similar problems.
3) The processing of uncertainty. As a dynamic network environment highly increases the model complexity, the difficulty of simulation-based propagation optimization is greatly increased. The uncertainty in the nonlinear propagation processes will be amplified, decreasing the accuracy and effectiveness of solutions. Tracking the movement of the optima is another super uncertain factor and increases the difficulty of algorithm design.
Besides, EC algorithms have their limitations to solve, such as the trade-off between efficiency and effectiveness, and the balance between exploration and exploitation abilities. It may refer to the settings of many parameters, such as the population size, the population structure, the T. F. Zhao et al. / Evolutionary Computation in Social Propagation over Complex Networks: A Survey decomposition of dimensions, etc. For example, large population size is time-consuming or resource-consuming, but it can produce higher solution quality in the same iteration period. Increasing the exploration ability is usually accompanied decreasing the exploitation ability. Designing appropriate or adaptive parameters is still a longterm concern in problem-oriented EC design.

Conclusions
Social propagation phenomena are prevalent in modern society. This paper builds taxonomy for dividing the propagation problems into three branches: simulation, optimization, detection & analysis. Evolutionary computation as a kind of bio-inspired optimization method has been used to solve propagation problems. This paper focuses on the application of EC in diverse social propagation problems.
Based on a holistic review on the EC in propagation problems, this paper points out four challenging and promising research directions, i.e., data-driven evolutionary models, coevolution models of multiple contagions, large-scale and distributed algorithm design, and realtime propagation optimization. This work may be helpful to the development of evolving propagation dynamics. form of Big Data and Computational Intelligence (No. 2018B050502006) and Guangdong Natural Science Foundation Research Team (No. 2018B030312003).

Open Access
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article′s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article′s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.