Introduction

Networks play an important role in information dissemination and provide a means for intervening propagation process and controlling it. In many social and human networks, it is very demanding to maximize the spread of information, which is the objective of the influence maximization (IM) problem. Some examples include spreading information through social or human networks (e.g., viral marketing), spreading fake (rumors) or true news during elections, and maximizing road traffic [1]. Furthermore, detecting the most influential nodes in the network can help in immunizing the network by isolating those nodes, such as controlling the spreading of a contagious disease, virus, or rumors. These networks usually are dynamic in nature and evolve continuously over time. For example, in social networks, new friendship links can appear or old ones can disappear. The dynamism presents many challenges in the analysis of these networks.

During recent decades, many efforts have been made for solving IM problem over networks. The objective of an IM method is to maximize information propagation through a network by triggering a series of nodes (e.g., by giving a piece of information to certain people), which could cause the most significant effect on information spread. This problem is known to be an NP-hard problem under most propagation processes [2]. Hence, many approximation methods have been proposed for solving it, which try to approximate the future influence of each node. This problem can be considered as a trade-off between the method’s accuracy and its efficiency (space and time complexity). Furthermore, in a network with probabilistic propagation, this problem becomes more challenging.

In addition to the aforementioned problems in static networks, many challenges arise for the IM problem on dynamic networks. Due to the ever-changing structure of these networks, an IM method should be generalizable or time-efficient to handle these dynamics. There have been several efforts for solving this problem. In the paper [3] extensions of classical methods for dynamic networks were proposed. While these methods capture some dynamic properties of a network, but they cannot always guarantee a good performance. In the paper [4], a subgraph-based method for dynamic networks was introduced. This method also requires test network snapshots beforehand, which might not be always available previously and requires updating and running the method for arriving networks. In the method of paper [5], influential nodes are tracked incrementally, using already found seed nodes for previous snapshot and replacing them. This method assumes that seed sets for consecutive snapshots are similar which may not be true for networks with high dynamics. Hence, there is the need for devising a method that captures some general properties of the network during different time steps, where the learned properties can be applicable for unseen structures.

The IM problem can be formulated as a decision-making problem and it can be solved by reinforcement learning (RL) methods, which are powerful tools for Markov decision processes (MDP). These methods can capture the probabilistic structure of the environment (here is network) and find an optimal policy (order of nodes that are triggered or seed nodes) under certain conditions. The Q-learning approach as one of RL methods has been widely applied in several areas. This method uses state–action values (Q values) for finding an optimal policy that Q function can be evaluated using Q tables, approximation Q-learning, and deep Q-learning (DQL). The approach of Q tables suffers from memory limitations for large state and action spaces. The approximation methods with naive functions might be inaccurate in some situations. Hence, DQL methods have become more popular, where neural networks (NNs) are used for approximating Q functions. This provides nonlinear dependencies without any assumption about parametric Q function that results in increased accuracy of these methods.

DQL methods have been applied to solve graph problems, which are scalable and efficient for large graphs or graphs with dynamic properties. In [6,7,8,9], DQL methods have been proposed for solving optimization problems over graphs, where methods of [7] and [9] have been used for dynamic networks. In these methods that are based on the method introduced in [6], NNs are used for embedding networks and approximating Q function. The introduced approximation function in [6] employed in our research is an extendable method that can be used for networks with similar structures, making it suitable for dynamic networks. In RL frameworks proposed in [9] and [8], pre-calculated reward functions are applied, which might not be efficient and scalable for larger dynamic graphs. In addition, calculating rewards in execution time is not efficient. Hence, in this research, an RL framework with an immediate reward function that is mentioned in [10] is defined, which makes it efficient for rather large graphs.

The other drawback of the aforementioned DQL models is their weakness in extending the DQL model of [6] for dynamic networks. Recently, several incremental (lifelong) learning methods have been proposed for tackling dynamic and streaming data. The objective of these methods is to avoid forgetting previously learned information while learning from new data (catastrophic forgetting). These methods can be categorized into three groups: (i) regularization, (ii) structural, and (iii) replay memory (rehearsal)-based methods [11]. In regularization-based methods, importance of parameters in old data, is included in the regularization term of the loss function while training on new data. Replay memory methods use rehearsal strategy of previous data while training on new data and structural methods dynamically change the structure of the NN in presence of new data by adding new neural layers (or neurons). Because of improved performance of methods using incremental learning methods in learning from dynamic data, these methods are utilized in our research for capturing some dynamics of a network, and the method of [6] has been extended using two regularization-based methods, elastic weight consolidation (EWC) [12] and synaptic intelligence (SI) [13], and a rehearsal method. As it has been shown in this research, they have resulted in improved performance of our method for smaller networks. Besides, a naive transfer learning method is applied, which also has improved the results for small networks.

In a nutshell, we summarize our contributions as below:

  • We define the RL framework with immediate rewards [10] for the IM problem which reduces calculation time of reward function.

  • We introduce the DQL optimization method for dynamic networks using incremental and transfer learning algorithms.

  • The proposed methods are evaluated using different real-world and synthetic networks and are compared extensively with some heuristic and sketch-based methods. The experiments depict the good performance of our proposed methods especially for larger sizes of the tested networks.

Related Work

During recent decades, a large number of works are dedicated for IM over static networks, and some of thesm are reviewed in [14]. These methods include simulation-based [2, 15,16,17], heuristic [18,19,20,21,22,23,24,25,26] sketch-based [27,28,29,30], and other methods [31]. In these methods, network structure and propagation probabilities are assumed to remain constant. In simulation-based methods, Monte Carlo (MC) simulations are performed to estimate influence of nodes [14], which has high time complexity for large networks. In some of these methods, these simulations are reduced, such as CELF [15] and CELF++ [16] methods. In a group of heuristic methods some centrality measures are defined, which approximate influences of nodes according to some measure. These measures include, degree centrality, PageRank [18], distance centrality [19], and collective influence (CI) method [14, 20]. In the CI method, degrees of distant neighbors are considered as a proxy measure instead of direct degree of a node. These methods suffer from the overlapping of nodes’ influences. Therefore, some methods, such as degree discount [21], group PageRank [22], and [26], have been introduced for mitigating this effect [14]. The degree discount method considers a discount factor in influences of selected seeds’ neighbor nodes. In sketch-based methods, for improving the theoretical efficiency of simulation-based methods and retaining their accuracy, some sketches are constructed which speed up the computations of simulation-based methods [14]. One of known methods in this category is reverse influence sampling (RIS) method [28], where a number of reverse reachable (RR) sets for random nodes are generated using sampled graphs (sketches). .These sets are used for selecting influential nodes (more repeated nodes in these sets). The number of these RR sets is selected in a way to guarantee the approximate ratio.

According to our knowledge, despite the vast amount of studies for static networks, few have been dedicated to dynamic networks. In [3], three extensions of static IM methods (i.e., degree discount [21], CI [20], and RIS [28]) for dynamic networks are proposed. Dynamic degrees [32] have been employed in dynamic degree discount and CI methods, where varying neighborhoods of nodes are considered. In the dynamic extension of RIS method, all durations of network (snapshots) are considered for calculating RR sets. These methods have less time complexity than simulation-based methods, but they do not always guarantee good performance. In [33], a greedy adaptive IM method for networks with dynamic propagation probabilities has been proposed which guarantees its performance in the dynamic independent cascade model. In [4] a (DIM) method for evolving networks is proposed in which sample subgraphs are generated from different network snapshots and are compressed by two approaches (horizontal and vertical compression) to calculate average reachability of each node to select seed nodes. In paper [34], an IM method for evolving networks is proposed where it is assumed network changes are observed through node probing. In this method, a node probing algorithm is proposed for approximating best seed sets in every network snapshot. In paper [5], an influential node tracking method for evolving networks (both structure and propagation probabilities) is proposed. In this paper, the most influential nodes for each snapshot (seeds) are incrementally approximated based on the seeds found for previous snapshot and using an interchange heuristic (upper bound interchange greedy method), which constantly replaces previous seed nodes using an upper bound approach of [35] for replacing gain, which reduces MC simulations.

Recently, RL methods have been extensively employed in several applications and have shown good performance on different optimization problems over networks, including IM. In [10], a combination of RL and game theory has been used for multi-party IM over static networks. In [6], a DQL method has been proposed to solve three combinatorial optimization problems for static networks. This work that has been employed in our research introduces a NN embedding and a Q function approximation method where nodes of the network are mapped into a low dimensional space using a nonlinear embedding formula. An advantage of this method is the independence of the number of its parameters from the input data. Hence, it has been shown that a trained model for small networks can be used for larger networks with similar structures. In [6], only static networks have been considered, but its approximation formula can also be applied for dynamic networks, as also demonstrated in our research.

The method of [6] has been employed as the base method in several studies [7,8,9]. In [9], this method has been extended for topic-aware IM, that it considers attributes of users in addition to graph properties in embedding formula and a Double Q-learning algorithm has been employed. As stated in this paper, this method can also handle dynamic propagation probabilities, but dynamics network structure is not considered. In [7], the method of [6] has been used for graph protection problem in dynamic networks, where this problem is converted into a Minimum Vertex Cover problem. In this work, several neural models are trained on several network snapshots and the best model is selected. Hence, it cannot properly capture the dynamic properties of the network. In [8], a combination of a supervised method (for selecting good nodes) and the method of [6] has been used for three combinatorial problems including IM. This research confirms its scalability for large static networks.

In our research, the method of [6] is employed as the base method, and an RL Framework with immediate rewards is applied. Furthermore, this method is extended for dynamic networks using incremental and transfer learning algorithms.

Preliminaries and Influence Maximization

In this research, the proposed IM method is extended for dynamic networks. In order to define the environment precisely, in this section, first, the considered assumptions are explained and then the IM problem and the proposed methods are presented.

Dynamic Networks and Diffusion Process

In this research, a dynamic network \({\{G^t\}}_1^T\) is represented as a series of network snapshots at each time step, \(G^1, G^2,\ldots , G^T\). Each snapshot \(G^t\) is a graph represented as \(\{V, E^t\}\) where V is the set of nodes of the graph which are assumed to remain unchanged during the time and \(E^t\) is the set of un-directed edges of the graph at time t. In this research, only dynamics over edges are considered. For propagation process over this network, a weighted network with weight matrix W is considered, where W is symmetric zero-diagonal matrix and each entry w(ij) represents the probability of propagating information through edge between nodes i and j. The propagation process is also assumed to remain constant. In other words, in the presence of links, propagation probabilities over them do not change over time.

Different diffusion processes can be assumed in this model and the propagation model assumed for this research is an epidemic process for weighted networks. This diffusion model is the susceptible-infected process introduced in [36, 37], which has also been defined for weighted networks [38]. In this process, nodes can have one of two states, S (susceptible) and I (infectious), and each node after infection remains in this state (I) forever. In this model, each infected node i can infect its neighbors at each time step, e.g., j with probability equal to their edge weights w(ij). This process continues until no more nodes can be infected.

Fig. 1
figure 1

IM process in three snapshots of a dynamic network (a part of raccoon dataset [39]). Red, yellow and gray nodes represent seed, active and inactive nodes respectively. Edge thicknesses are proportional to propagation probabilities over them

IM Problem over Dynamic Networks

The objective of IM is to maximize the spreading of information through the network during a limited or unlimited time. Therefore, it is crucial to select the optimal seed nodes for activation. The activation process can happen in one or several steps, and the latter is assumed in this paper. In other words, by this assumption, after selecting each seed node, the propagation process is performed for one step. The seed selection and the propagation process are performed until no inactive node is in the network. An IM solution in this environment tries to find an order of seed nodes \(\{s_1, s_2,\ldots , s_k\}\) which are activated one at each step, such that the overall discounted influence is maximized.

In the case of dynamic networks, because of the evolving structure of a network, at each step of seed selection, the network structure can change. Therefore, in the IM problem over a dynamic network, the objective is to find an order of seed nodes that are activated one at each network snapshot \(G^t\). This process is illustrated in Fig. 1. As it is represented, after selecting a seed node at each network snapshot, the propagation process is performed for one step according to that snapshot. This process continues until all network is activated.

Reinforcement Learning

In this research, the explained IM problem has been formulated as an MDP problem and has been solved by an RL algorithm. In this section, after formulating the RL problem and a brief introduction to Q-Learning, the proposed methods are presented.

Framework

RL that is based on MDP, is a powerful optimization tool for solving decision-making problems for its solution without the complete knowledge of the system. RL framework consists of a tuple \((S, A, T, R, \gamma )\) where S represents the state space, A is the set of actions, T is a function \(S\times A\rightarrow S\) representing transition probabilities, R is the reward function defined as \(S\times A\times S\rightarrow \Re\) and \(\gamma\) is a discounting factor. Given this information about the environment, the goal of RL is to find an optimal policy \(\pi\) which is defined as a function \(S\rightarrow A\) (in the deterministic policy) and maximizes total return J under policy \(\pi\) which is formulated as:

$$\begin{aligned} J(\pi ) = \mathbb {E}_{\tau \sim \pi }\left[ \sum _{t=0}^{\infty } \gamma^t r_t(\tau )\right] , \end{aligned}$$
(1)

where \(\tau\) represents an episode under policy \(\pi\) and \(r_t\) is the immediate reward at step t of this episode.

The IM problem to be solved in this research is formulated as an RL framework which is defined as the following:


Environment

The environment consists of a dynamic network as defined in the beginning of “Preliminaries and Influence Maximization” section with its diffusion model which is the susceptible-infected process in this research. Propagation process of the network represents the environmental effect that information is propagated through the network after selecting an action.


State Space

States consist of network status at each time step (i.e., snapshots with propagation probabilities) and activated nodes until the current step. This state space can be very large (it would be in the order of \(2^N\) in a static network with N nodes), but as described in the next section, these states are converted into real vectors and it is not required to retain a Q-table.


Action Space

Action at each state is to select the next node as a seed node and activate it. Hence, there are N actions at each state.


Reward

After selecting an action, the environment returns an immediate reward evaluated by the number of activated nodes after one-step propagation, defined as follows:

$$\begin{aligned} r_t = [I(t) - I(t-1)], \end{aligned}$$
(2)

where I(t) is the number of activated nodes until time t. With this definition of the reward function, it is not required to evaluate it previously, which can be time-consuming and in addition, it requires a large space for storing these values. In addition, the reward function is not assumed to be constant for each (state, action) pair, which reflects the uncertainty property of the network due to propagation probabilities. Therefore, reward values may differ in several simulations, but it is converging in expectation in the case of static networks. The expected value of the reward function at each step can be calculated as the following:

$$\begin{aligned} \mathbb {E}[r_t] = \sum _{v \notin I(t-1)} \left( 1 - \prod _{\begin{array}{c} u \in N_t(v),\\ u \in I(t-1) \end{array}} (1 - w(u,v)) \right) , \end{aligned}$$
(3)

where Nt(v) represents the set of neighbors of node v at time step t. Since calculating this expectation at each state requires considering all edges between activated and inactive nodes, it is time-consuming. Hence, immediate rewards are applied in this research.

Q-Learning

Our RL approach is based on the Q-learning method. In this method, optimization is performed by updating state-action values (Q values) according to the following equation

$$\begin{aligned} Q_{t+1}(s_t, a_t)= & {} Q_t(s_t, a_t) + \alpha _t\nonumber \\{} & {} \bigg (r_t + \gamma \max _a Q_t(s_{t+1}, a) - Q_t(s_t, a_t) \bigg ), \end{aligned}$$
(4)

where \(Q_t(s, a)\) is the value of selecting action a in state s at time t, \(r_t\) is the immediate reward at time t and αt is learning rate at time t. Since the original Q-learning algorithm needs to retain Q values to update it constantly, it is not suitable for large state and action spaces. Hence, in this research, an approximation-based Q-learning algorithm using deep NNs, i.e., DQL method has been used which is explained in the next section.

Deep Q-Learning and Double Q-Learning

In DQL, Q values are approximated using deep NNs and the parameters of these networks can be updated using stochastic gradient methods. This parameter update formula is as follows:

$$\begin{aligned} {\varvec{\psi _{t+1}}} = {\varvec{\psi _t}} + \alpha _t(Y_t^Q - Q(s_t, a_t;{\varvec{\psi _t}})) \nabla _{{\varvec{\psi _t}}}Q(s_t, a_t; {\varvec{\psi _t}}), \end{aligned}$$
(5)

where \(\psi _t\) are model parameters and \(Y_t^Q\) is the target Q-value \(\alpha _t\). The target Q-value \(Y_t^Q\) is defined by

$$\begin{aligned} Y_t^Q = r_{t+1} + \gamma \max _a Q(s_{t+1}, a; {\varvec{\psi _t}}). \end{aligned}$$
(6)

As noted in [40], this method has an overestimation problem, and for solving this overestimation problem Double DQL algorithm (DDQL) has been introduced in [41]. This algorithm uses two separate models, policy and target networks for action selection and evaluation. In this method, the target Q-value is formed as follows [41]:

$$\begin{aligned} Y_t^{\textrm{DoubleQ}}= & {} r_{t+1} \nonumber \\ {}{} & {} +\gamma Q(s_{t+1}, \mathop {\textrm{argmax}}\limits _a Q(s_{t+1}, a; {\varvec{\psi _t}}); {\varvec{\psi _t'}}), \end{aligned}$$
(7)

where \({\varvec{\psi _t}}\) and \({\varvec{\psi _t'}}\) represent the parameters of policy and target networks, respectively. DDQL algorithm increases the speed of convergence for the Q-learning algorithm; hence it is used in this research.

Proposed Methods

The Q-learning approach proposed in this research is utilized for both static and dynamic networks and is extended for dynamic networks using incremental and transfer learning algorithms. In this section, first, Q-learning solution is explained, then its extension details are presented.

Q-Function Approximation

The approximation function for Q values is adapted from [6]. In this method for transforming raw-state information into a low-dimensional vector space, a network embedding method is proposed which exploits the power of recurrent NNs. In this research, this structure is used with some modifications, which are:

$$\begin{aligned} \mu _v^{k+1}= & {} \textrm{Relu}(\psi _1 x_v + \psi _2 \sum _{u \in N_t(v)\bigcup v}w(u,v) \mu _u^k \nonumber \\{} & {} + \psi _3 \sum _{u \in N_t(v)} \textrm{Relu}(\psi _4 w(u,v))), \end{aligned}$$
(8)
$$\begin{aligned} \mu _g= & {} \sum _{v \in Nodes(G)}\mu _v^K, \end{aligned}$$
(9)

where \(\mu _v^k\) is a d-dimensional vector for node v at iteration k of embedding process, \(x_v\) represents activation status of node v (it is 1 if the node is activated and 0 otherwise), w(uv) is the propagation probability between nodes u and v and \(\psi _1 \in \Re ^{1\times d}\), \(\psi _2 \in \Re ^{d\times d}\), \(\psi _3 \in \Re ^{d\times d}\), and \(\psi _4 \in \Re ^{1\times d}\) are the model parameters. The variable \(\mu _g\), the summation over vectors of all nodes, represents the network state. The iteration formula in Eq. 8 is performed for K steps with zero initial vectors. This embedding formula captures the network’s activation and structure information (including propagation probabilities). In this research, edge weights are included in summation over neighboring nodes (the second expression in Eq. 8) and the vector of each node itself is also considered in this summation as its history. An advantage of this embedding formula is the independence of the number of model parameters from the size of the network so that it can be applied for large networks. In addition, as it is represented in the experiments section (“Experimental Setup and Numerical Results” section), a model trained for a small size network can be employed for testing (or training) on more extensive networks with similar structures.

For calculating Q values, another NN is employed as in [6]. Calculated network and selected node vectors, i.e., \(\mu _g\) and \(\mu _a\) are given as the state s and actions a inputs to this Q function which is illustrated in the following formula:

$$\begin{aligned} Q(s,a) = \psi _5^T \textrm{Relu}(\textrm{concat}(\psi _6 \mu _g, \psi _7 \mu _a)), \end{aligned}$$
(10)

where \(\mu _a\) is vector of node a and \(\psi _5 \in \Re ^{2d\times 1}\), \(\psi _6 \in \Re ^{d\times d}\) and \(\psi _7 \in \Re ^{d\times d}\) are model parameters.

In the vanilla–DDQL model, benefiting from the extendable property of the model, it is trained using one train snapshot and is used for testing on several test snapshots. Experiments for training on several snapshots of a small dynamic network illustrate no significant difference with training on one or two snapshots and because of memory requirements it is not efficient for large networks. This algorithm is given in Algorithm 1. In this algorithm, a replay memory is utilized to retain traversed samples and use them for training the model in the subsequent iterations. At first iterations of this algorithm, a policy relaxation method has been used, in which no exploitation step, i.e., selecting the action from the model, is performed, and it only traverses state-space randomly and fills up the replay memory. Once the replay memory is full, epsilon-greedy policy [42] is followed for action selection. In this policy, with probability \(\epsilon\) exploration (selecting action randomly) is performed. Otherwise, action is selected using the model. This policy helps the model to explore the environment as it learns. But for meeting convergence of the algorithm, this exploration is decayed through time, concentrating more on the model. After action selection, information is propagated as the environmental effect, and the next state and reward are determined and the tuple \((state, action, reward, next\;state)\) is written in replay memory for use in training. Then if memory is filled, a batch from memory is sampled randomly and the mean squared error (MSE) loss is calculated for batch b according to:

$$\begin{aligned} \textrm{Loss}_b= & {} \frac{1}{\texttt {b}}\sum _{(s_t,a_t,r_{t+1},s_{t+1}) \in b}\nonumber \\{} & {} \left( r_{t+1}+\gamma Q(s_{t+1}, \mathop {\textrm{argmax}}\limits _a Q(s_{t+1},a;\psi _{{policy}});\right. \nonumber \\{} & {} \left. \psi _{{target}}) - Q(s_t, a_t;\psi _{{policy}})\right) ^2. \end{aligned}$$
(11)

Parameters of the policy network are updated by Adam optimizer and target net parameters are updated using Polyak averaging method [43] as following:

$$\begin{aligned} {\varvec{\psi }}_{target} = \tau *{\varvec{\psi }}_{policy} + (1-\tau )*{\varvec{\psi }}_{target}. \end{aligned}$$
(12)

An episode terminates when all nodes of the network are activated.

figure a

Incremental and Transfer Learning

For tackling the problem of the ever-changing structure of a dynamic network, incremental and transfer learning methods are also applied, which benefit from several network snapshots available for training (two snapshots used in this research). In this section, these methods and overall extended DDQL algorithms are presented.

Incremental learning is the method of training a NN model in the presence of dynamic or streaming data. A major problem that arises is catastrophic forgetting of already learned information while learning new information. In other words, previous weights of the NN change so dramatically that it does not suit previous data anymore. Hence, incremental learning methods try to alleviate this catastrophic forgetting, that the learned model is still useful for previous data and new data.

The incremental learning methods employed in this research are: EWC [12], SI [13], and replay memory methods. In addition, a naive transfer learning method is used for training the model using more dynamic information. In the incremental methods, the model trained on the first snapshot, is updated according to the used incremental method while training on the second snapshot. These methods are explained in the following:

  • EWC method EWC is a regularization-based incremental learning method introduced in [12]. In this method, the importance of parameters in the previous task is evaluated using the Fisher information matrix and their training in the new task is slowed down according to these values. To meet this goal, after training the model for task A, this model is then trained on task B using the modified loss function according to:

    $$\begin{aligned} \mathcal {L}(\psi ) = \mathcal {L}_B(\psi ) + \sum _{i} \dfrac{\lambda }{2} F_i (\psi _i - \psi _{A,i}*)^2, \end{aligned}$$
    (13)

    where \(\mathcal {L}_B\) is the loss for task B only, \(\lambda\) is the importance of the old task comparing with the new task, \(\psi _{A,i}*\) is the final value of parameter \(\psi _i\) in previous task A and F is Fisher information matrix which is defined as:

    $$\begin{aligned} F_i = \dfrac{1}{L} \sum _{i=1}^{L}(\nabla _{\psi _i} \log \mathcal {L}(\psi ))^2, \end{aligned}$$
    (14)

    where L is the number of samples chosen from the previous task A for calculating the Fisher values. These values represent importance of parameters in previous task and control their training in the next task.

  • SI method SI method proposed in [13] is another regularization-based method for incremental learning, which evaluates the importance of parameters in previous task A by measuring amount of contribution of parameters in the change of loss. The loss function in the new task B is presented in the following:

    $$\begin{aligned} \mathcal {L}(\psi ) = \mathcal {L}_B(\psi ) + c \sum _i \Omega _i^B (\psi _{A,i}* - \psi _i)^2, \end{aligned}$$
    (15)

    where c is the strength parameter and \(\Omega _i^B\) denotes per-parameter regularization strength which is defined by:

    $$\begin{aligned} \Omega _i^B = \dfrac{\omega _i^A}{(\varDelta _i^A)^2 + \zeta }, \end{aligned}$$
    (16)

    where \(\omega _i^A\) is the importance measure of parameter \(\psi _i\) in task A, \(\varDelta _i^A\) defined as \(\varDelta _i^A = \psi _{A,i}* - \psi _{i_\textrm{init}}\) is the amount of change in parameter \(\psi _i\) during training for task A and \(\zeta\) is damping parameter which avoids division by zero in the case of \(\varDelta _i^A \rightarrow 0\). The importance measure for each parameter is given by:

    $$\begin{aligned} \omega _i^A = \sum _{k=1}^T \nabla _{\psi _i} \mathcal {L}_A(\psi ) \delta \psi _i, \end{aligned}$$
    (17)

    which measures the contribution of parameter \(\psi _i\) in the change of loss during the previous task A. T is the total number of parameter update steps during task A.

  • Replay memory The replay memory (rehearsal) method is common in RL, which alleviates catastrophic forgetting by replaying data from previously learned tasks. In this method, after training the model on task A, some samples, i.e., \((state, action, reward, next\ state)\) tuples are stored in the memory and used for training for the next task B. At this step, stored data from the previous task are selected for training the model with a certain probability. This method for training the model on task B is initialized with the learned parameters for task A.

  • Transfer learning While incremental learning algorithms try to find a model suitable for both new and old data, in transfer learning methods, the goal is to obtain a model for new tasks using already learned model for previous data. In this research, a simple transfer learning algorithm is employed, where the model for training on new data is initialized with parameters learned from the previous data. This speeds up training the model for the new task and carries some information of the prior task.

The combination of these algorithms with the vanilla–DDQL algorithm is given in Algorithms 2, 3, and 4. Algorithm 2 illustrates the use of regularization-based incremental learning methods i.e., EWC and SI methods. In this algorithm, the same as DDQL Algorithm 1, the model is trained using the first training snapshot, then learned parameters are used as the initialization for training on the second train snapshot. In this step, according to the used incremental method, the corresponding loss regularization expression is added to the loss function, and the model is trained using this new loss. In Algorithm 3, replay memory method for the DDQL algorithm is illustrated. In this method, also after training the model for the first snapshot, samples of this step are selected and stored for replaying in the next training step. When the model is being trained using the second snapshot with a certain probability p, the stored samples from the previous training step are selected and used for training. Finally, Algorithm 4 represents the naive transfer learning algorithm, where it only keeps the previously trained model and updates it using the second train snapshot.

figure b
figure c
figure d

Experimental Setup and Numerical Results

In this section, implementation details and obtained results are presented. First, a few IM methods that have been compared with the proposed methods, are explained and then experimental details for both synthetic and real-world datasets are presented.

Baseline IM Methods

Our proposed IM methods are compared with some heuristics and a sketch-based IM method. Some extensions of these methods for dynamic networks have been compared with our methods. The following briefly explains these methods:

Heuristic Methods

The heuristic algorithms compared in this research include heuristics for static and dynamic networks. The heuristics for static networks include max degree and degree discount. For dynamic networks, in addition to static heuristic algorithms, some dynamic heuristics, such as dynamic degree and dynamic degree discount algorithms, were also tested. These algorithms are explained in the following:

  • Degree In the max degree algorithm, seed nodes are selected according to their degrees (or weighted degrees) in descending order if they are not activated until the current time step. In the static case, nodes degrees are calculated at the start of execution, and in the dynamic case, they are calculated at every step according to the current snapshot.

  • Degree discount The degree discount algorithm introduced in [21] assumes a discount factor for neighboring nodes of seed nodes in calculating their degrees. The discounted degrees of nodes are obtained using the following equation:

    $$\begin{aligned} dd_u = k_u - 2t_u - (k_u-t_u)t_u\lambda , \end{aligned}$$
    (18)

    where \(k_u\) denotes the degree of node u without discount (in this research, both weighted and naive degrees are assumed), \(t_u\) is the number of times node u has been in the neighborhood of a seed node, and \(\lambda\) represents the infection probability of node u by the currently selected seed. Note that the \(\lambda\) in this research is propagation probabilities.

  • Dynamic degree In max dynamic degree algorithm, nodes are ordered by their dynamic degrees defined by [32]:

    $$\begin{aligned} D_T(v) = \sum _{1<t \le T} \dfrac{|N_{t-1}(v)\setminus N_t(v) |}{|N_{t-1}(v)\bigcup N_t(v) |} |N_t(v) |, \end{aligned}$$
    (19)

    where \(D_T(v)\) represents the dynamic degree of node v calculated during T network snapshots, and \(|\cdot |\) represents cardinality operator. This method evaluates changes in neighbors of the nodes as their dynamic degrees. In this research, both weighted and unweighted dynamic degrees have been calculated.

  • Dynamic degree discount Dynamic degree discount heuristic introduced in [3] considers dynamic degrees, defined in Eq. 19 as node degrees in Eq. 18. For calculating discounted degrees, all nodes adjacent to node u during all time steps (T) are considered as the neighborhood of that node. Weighted dynamic degrees are also considered for calculating weighted dynamic discounted degrees.

RIS Methods

In addition to heuristic methods, our methods have also been compared with RIS method. In this research, both static and dynamic extension of RIS method have been tested, which are explained in the following:

  • Static RIS In the static RIS method, RIS method is applied on the aggregated train snapshots (which are different from test snapshots). In this method, aggregated network of train snapshots is obtained and used for generating instance graphs (transition probabilities were used for selecting edges). RR sets were constructed using breadth first search algorithm with a maximum distance of N (number of nodes in the network) from the randomly selected node. The number of generated RR sets depending on the size of the network was chosen between 500 and 10,000 sets (because of increasing time complexity, less RR sets were generated for larger networks). Since the seed set obtained by this method might not cover all nodes (because of probabilistic selection) and the purpose of the assumed IM problem is to infect all nodes, after obtaining the seed set, if the list was smaller than node numbers, the remaining nodes are added to the end of this list in random order.

  • RIS for each Snapshot In this variation of RIS method for dynamic networks, RIS method is applied for each test snapshot separately and ordered seeds are obtained. From this set, the first inactive seed is selected as the next seed (for the current snapshot). In this method, also in the case of not covering all nodes by each seed set, remaining seeds are selected randomly. Because of higher time complexity of this method during test, less RR sets were created which were in the range of 200–1000 sets according to the size of the network.

In the methods which require obtaining seeds sets beforehand (i.e., degree, degree discount, dynamic degree, dynamic degree discount and static RIS), training snapshots were used for calculations, and these sets were tested on test snapshots (by discarding active seeds). In the methods, which obtain seeds for each snapshot (i.e., degree algorithms and RIS for each snapshot), test snapshots were used for calculations.

Although, in this research, the explained methods have been compared with the proposed methods, but for more clarity of the results, only a few of them are illustrated in this paper. These results are discussed in the next section.

Evaluation metrics

We use mainly two metrics to evaluate the proposed methods. The average discounted reward and the average seed length. The first metric is the main evaluation metric as our proposed methods are reward-based optimization. The other metric, seed length, is only used to provide the completeness of the results and to ensure how models are performing even though we are not directly optimizing for it. These are explained in detail as below:

  • Average discounted reward This metric evaluates total discounted rewards as defined in Eq. 20:

    $$\begin{aligned} R = \sum _{t=0}^{M-1} \gamma ^t r_t, \end{aligned}$$
    (20)

    where rt represents the immediate reward at time step t, i.e., the number of activated nodes at time t and M is the total number of steps required for activating all nodes in the network. Since the number of activated nodes at each time step is discounted by the \(\gamma\) factor, this metric evaluates the how efficiently model selects the nodes so that information spread is fast. The average value of this measure for several test iterations is calculated and reported in the experiments. The higher value here denotes that model is able to identify the efficient nodes quickly.

  • Average seed length This metric represents the average number of selected seed nodes for activating the whole network. In other words, it represents the number of time steps required for activating all nodes of the network through the seed selection and propagation process. According to this definition, the lower value of this metric represents faster activation of the network.

Experimental Settings

This section explains the experimental setup and provides detailed experiments on both static and dynamic networks. The general setup is explained below. For managing network data, the NetworkX library has been used. Besides, for speeding up the computations, CUDA technology has been used. These models are implemented on a system with an Intel Core i5 processor with NVIDIA GeForce GTX 1650 GPUs and 4 GB GPU RAM.

The parameters of the policy and target networks in the DDQL algorithm are initialized using the Xavier method [44] which performed better than other tested initialization methods. In the epsilon greedy policy for training DDQL model, an exponentially decreasing epsilon has been used as below:

$$\begin{aligned} \epsilon = \epsilon _\textrm{end} + (\epsilon _\textrm{end} - \epsilon _\textrm{start}) * \exp ^{-\textrm{steps}\_\textrm{done}/\textrm{decay}\_\textrm{factor}}, \end{aligned}$$
(21)

where \(\epsilon _\textrm{start}\) and \(\epsilon _\textrm{end}\) represent initial and final values of \(\epsilon\) which decays with \(\textrm{decay}\_\textrm{factor}\) parameter. \(\textrm{steps}\_\textrm{done}\) represents number of steps performed (actions selected) during the execution of the epsilon greedy policy. For training the model, Adam optimization method with constant and decreasing learning rates has been used. For decreasing learning rate, an exponential decay function as the following equation has been utilized:

$$\begin{aligned} \textrm{lr} = \textrm{lr}_0 \exp ^{-\textrm{lr}\_\textrm{decay} * \textrm{lr}\_\textrm{steps}}, \end{aligned}$$
(22)

where \(\textrm{lr}_0\) represents the initial value of the learning rate, and \(\textrm{lr}\_\textrm{steps}\) is the number of steps performed during learning (parameter updates). Hyperparameter values are represented in Table 1.

Table 1 Hyperparameters for DDQL algorithm

Experiments on Static Networks

DDQL algorithm has been tested on two static real-world networks, HighschoolFootnote 1 and MemetrackerFootnote 2 datasets. These datasets are explained in the following, and a summary of their characteristics is illustrated in Table 2:

Table 2 Datasets for static networks
Table 3 Results for real-world static networks

Highschool 2013 Dataset

This dataset contains contact information between high school students in Marseilles, France, during five days in December 2013, collected by the work in [45]. This dataset contains a total number of 120 nodes and is a directed network. In our research, this network has been converted to an undirected network by assuming a link between each pair of nodes if there is a directed link between them. This network contains edge weights that represent contact duration between nodes. These weights are reported as discrete values which are 1, 2, 3, or 4. In our research, these weights have been mapped to decimal values 0.25, 0.5, 0.75, and 1 considered as propagation probabilities. For computing these probabilities in the case of two directed links between nodes, the average edge weights have been considered as the weights of the undirected links.


MemeTracker Dataset

This dataset contains news media sites and blogs as nodes and who-copies-whom relationships as edges and is obtained by the work of Gomez-Rodriguez et al. in [46]. This network contains 960 nodes and 5000 edges from 5000 phrase clusters which contain cascade information. This network is a directed network that an edge between two nodes represents that the destination node frequently copies from the source node. In our research, the network has been converted into an undirected network by considering a link between every two nodes if there is a directed link between them. This dataset contains the average propagation time between every two nodes, \(\Delta t_{uv}\) which is the average delay that the destination node v copies from the source node u. In our research, propagation probabilities are calculated by the formula \(\alpha /\Delta t_{uv}\) [47]. The parameter \(\alpha\) has been set to 0.1 as in [6, 47].

The results of the experiments on these datasets are presented in Table 3. This table represents average total (discounted) rewards and average seed lengths for heuristic and DDQL algorithms. As it is obvious from this table, the improvement is not significant using the DDQL method, and a heuristic algorithm can achieve comparable results. Hence, according to computational complexity, the proposed algorithm is not efficient for static networks with known features.

Experiments on Dynamic Networks

Our proposed methods are DDQL method, DDQL with Transfer Learning (DDQL-TL), DDQL with EWC method (DDQL-EWC), DDQL with SI method (DDQL-SI), and DDQL with Rehearsal method (DDQL-Rehearsal), which have been tested using both synthetic and real-world dynamic networks. Details of these experiments and their results are explained in this section.

Table 4 Synthetic dynamic networks characteristics
Fig. 2
figure 2

Comparing average discounted rewards for different sizes of synthetic dynamic Markov SBM networks [48]


Synthetic Networks

For generating synthetic networks, different dynamic network models can be used which satisfy dynamic network properties explained in “Preliminaries and Influence Maximization” section. In this section, the employed dynamic network model and the obtained results are presented.

The dynamic network model used in this research is the model introduced in [48] which combines stochastic block model [49] and Markovian property for edge dynamics and defines edge-state transition probability matrices \(P_\textrm{in}\) and \(P_\textrm{out}\) by the following formulas:

$$\begin{aligned} {\left\{ \begin{array}{ll} P_\textrm{in}(1,1) = q_\textrm{in},\\ P_\textrm{in}(0,1) = \mu _\textrm{in} (1-q_\textrm{in})/(1-\mu _\textrm{in}),\\ \end{array}\right. } \end{aligned}$$
(23)

where \(P_\textrm{in}(a,b)\) represents the intra-community probabilities of edge-state transition from a to b (edge state 1 means link establishment and 0 means no link), \(q_\textrm{in}\) is the within community link persistence probability, \(\mu _\textrm{in}\) is within community link creation probability in the first snapshot. The inter-community parameters, \(P_\textrm{out}(a,b)\) are also defined with corresponding parameters as well. To set more real parameters for synthesized networks, the dynamic network model parameters obtained from the Highschool dataset [45] are used for generating networks. The obtained parameters are: \(\mu _\textrm{in} = 0.029,\;\mu _\textrm{out} = 0.000042,\;q_\textrm{in} = 0.19,\;q_\textrm{out} = 0.096\). Different networks have been generated using these parameters, and their details are represented in Table 4.

In order to train and test different methods (heuristic and DDQL methods), separate train and test sets (network snapshots) have been considered. In dynamic heuristic algorithms, 5–50 train snapshots were used to calculate dynamic degrees. For training the vanilla–DDQL algorithm, only one last snapshot in the training set was used and trained like a static network, but for incremental and transfer learning algorithms, two of the last training snapshots were considered, where the trained model on the first snapshot was used for training on the second snapshot. The rest of the snapshots was considered the test set, where different snapshots were considered at each step of seed selection and propagation. Note that dynamic heuristics (except max degree per snapshot methods), in general by design, need more information (more snapshots) to give better results. However, our methods do not need to have more information in testing. The NN during training can capture some intrinsic properties of the network which is further used while testing.

Table 5 Comparing average discounted reward for synthetic dynamic networks with different sizes (node numbers)
Table 6 Comparing average seed lengths for synthetic dynamic networks with different sizes (node numbers)

The results of the experiments on synthetic networks are presented in Fig. 2 and Tables 5 and 6. These results represent the average of the obtained results over 100 test iterations. According to these results, the weighted degree heuristic at each network snapshot has the best performance among heuristic and RIS methods. The results represented in Table 5 confirm good performance of DDQL methods, which reveals their better performance for larger networks. For a network with 500 nodes, vanilla–DDQL method has the best average discounted reward around 9.8% improvement over the best heuristic method, as indicated in Table 5. The degradation of incremental and transfer learning methods for larger networks can result from the complexity of retraining the model for incremental (or transfer) learning in larger datasets. In addition, as illustrated in Table 6, average seed lengths of DDQL methods are also lower than in other methods, which represents another advantage of using DDQL methods.

To measure the extendability power of the proposed method, in another experiment, different models have been trained on smaller data using the replay memory incremental DDQL method. These models have been trained on synthetic networks with 50, 100 and 200 nodes and they have been tested directly on larger networks up to 500 nodes with the same dynamic network model parameters (\(\mu\) and q values). The results of this experiment are illustrated in Table 7, which represents average discounted rewards of compared methods. These results have been compared with max weighted degree at each network snapshot heuristic algorithm which is the best heuristic algorithm among compared heuristics. As Table 7 presents, the pre-trained DDQL models outperform the heuristic method in all these networks. This demonstrates the re-usability power of the DDQL method which decreases training overhead for large datasets.

Table 7 Results of testing pre-trained replay memory DDQL models on larger networks (represented in columns)

Real-World Networks

The proposed methods have been tested on different real-world dynamic networks, which are summarized in Table 8 and are explained in more detail in the following:


Hospital DatasetFootnote 3

This dataset contains interaction information between 75 patients and healthcare workers during about 5 days (6–10 December 2010) in a hospital ward in Lyon, France.


Hypertext DatasetFootnote 4

This dataset represents contact information between attendees in ACM Hypertext 2009 conference and consists of 113 nodes. This data has been collected during about 2.5 days.

Table 8 Datasets for dynamic networks

Primary School DatasetFootnote 5

This dataset involves interaction information between students and teachers which is used in the study published in BMC Infectious Diseases 2014. This network contains 242 total nodes and has been collected during about 1.5 days.


Highschool 2013 DatasetFootnote 6

This dataset is the dynamic version of the static Highschool 2013 dataset containing 327 nodes which contains additional time interval information ( every 20 s) in which every two nodes have been in contact.


Infectious DatasetFootnote 7

This dataset contains face-to-face interactions between 405 participants of the 2009 SFHH conference in Nice, France during two days (June 4–5, 2009). This data was released by the publication [50].

To generate network snapshots from these datasets, aggregated networks in specified time intervals are constructed. The aggregation procedure is as follows: if there is an interaction between any two nodes, an edge is considered between them at each time interval. By this procedure, all network snapshots during the total interaction times were generated. For avoiding sparse graphs, the snapshots with a high number of zero-degree nodes (\(\ge 90\%\) of nodes) were removed. The considered time intervals for each dataset are presented in Table 8. Random probabilities in the interval [0, 1] were generated for all datasets for propagation probabilities.

The results of testing different methods on real dynamic networks have been illustrated in Fig. 3 and Tables 9 and 10. As Fig. 3 and Table 9 demonstrate, except for the Hospital dataset, max weighted degree at each snapshot has the best discounted rewards for real datasets among heuristic and RIS methods. For these datasets, DDQL methods have better performance (in terms of discounted reward) than other methods. In addition, replay memory DDQL has better results for larger networks, by having about 18% improvement of average reward in Infectious data, over the best heuristic method (weighted degree at each snapshot).

Fig. 3
figure 3

Results for different real-world dynamic networks

Table 9 Comparing average discounted rewards for real dynamic networks
Table 10 Comparing average seed lengths for real dynamic networks

Practical Consideration

In this section, we now analyze the efficiency of different methods on different types of data. In addition to the general features of datasets, we extract some additional features. The considered features include the size, average density and dynamic features of the networks, which are explained and measured in this section.

Regarding the network sizes, the experimental results of “Experiments on Dynamic Networks” section represent that the replay memory method performs better for larger real networks. This might be the result of more rigorous settings required for the other incremental methods in larger datasets and need more exploration for finding the best setting for them.

The other measured feature of the networks is the average density of the network during several snapshots. The density of a static network is defined by

$$\begin{aligned} \textrm{Density }= \frac{2|E|}{N(N-1)}, \end{aligned}$$
(24)

where \(|E|\) represents the number of edges in the network. For dynamic networks, this metric is evaluated on consecutive snapshots and averaged over them. The value of this metric for five real-world networks is measured and is given in Table 11. According to the values of this feature and the results of the average discounted rewards given in Table 9, the DDQL-Rehearsal method has better efficiency on sparse networks (Infectious dataset), whereas other methods (transfer learning, SI, and EWC) have better performance on more dense networks.

Table 11 Extracted features of dynamic networks

The other considered feature of networks is the amount of dynamicity of networks. As Tables 3 and 9 represent, our methods do not have significant improvement in static networks compared with heuristic methods, but they are more efficient in dynamic networks in terms of average reward values. Furthermore, the probabilities of edge appearance and disappearance in the aforementioned Markov model in “Experiments on Dynamic Networks” section are evaluated for different real networks using 50 snapshots of each dataset assuming no clustering and are reported in Table 11. It should be noted that because of considering different time intervals for constructing snapshots of different datasets, these metrics cannot be used for comparing the dynamicity of networks, but some features of different methods can be deduced according to snapshot dynamicity. Tables 11 and 9 represent that for the networks with higher transition probabilities (Hospital and Hypertext data), the EWC method achieves higher reward values (Table 9). In addition, the transfer learning method also represents good results in smaller and more dynamic network snapshots.

Complexity Analysis

The main motivation here is that the time complexity of a method devised for dynamic networks should not be very high and these methods should be able to perform in a dynamic environment according to demand. We now calculate the running time complexity of our methods, which is the time required for taking action at each step (i.e., for each snapshot of the network). This time constitutes the time required for embedding the network which is \(O(N^2d+d^2N)\) for each iteration of the embedding process (Eq. 8) and is \(O(K(N^2d+d^2N))\) for the entire embedding process. The complexity of calculating Q values (Eq. 10) is \(O(d^2N)\) and finally selecting the best action (with maximum Q value) has O(N) time complexity. Therefore, selecting an action has a total of \(O(KN^2d)\) time complexity. Since K and d are set to 4 and 64 in the experiments respectively, and only N and probably d grow with the size of the network, by choosing an appropriate dimension, we can control the time complexity when implementing these algorithms for larger networks.

Conclusion

In this research, DQL and incremental learning-based methods for solving IM problem over dynamic networks are proposed. In the first proposed method, a DDQL method, based on the method of [6], is trained on only one network snapshot and is used on several snapshots. In this method, an immediate reward function is applied, which increases the method’s efficiency (in terms of memory and time). In addition, this method has been extended using incremental learning methods, i.e., EWC, SI, and rehearsal, and a naive transfer learning algorithm for tackling dynamic data. Empirical results demonstrate the superiority of these methods over heuristic and RIS methods (in terms of total influence) for the growing size of the network. In addition, the use of incremental and transfer learning algorithms yields better results for real-world networks.

There are several avenues for future research, which are: (i) using more powerful and extendable incremental learning methods to learn from more network snapshots and capture more dynamic information, (ii) defining a more accurate reward function, which considers dynamics of the network, (iii) inferring dynamic network model in addition to optimization task, (iv) inferring network properties (structure and propagation probabilities) for unknown networks during IM task, (v) devising an extendable method for more extensive networks.