1 Introduction

The social networks like LinkedIn, Facebook, and Twitter provide the grounds for people to interact with no time and space limitation [1]. Consequently, information diffusion occurs across a wide range of social networks at high speed. That’s why business companies are using the speed of information diffusion on social networks to achieve widespread advertising and business optimization. Given the limited advertising resources, the main challenge in this problem is to select a specific set of influential persons that most effectively on other people in a short time. To solve this challenge, the influence maximization problem in social networks has been provided.

The influence maximization problem identifies the active nodes as seed nodes. Algorithms in this problem take the graph G and a number k as input and generate seed set, with the intention that the expected number of nodes influenced by the seed set by the stochastic process of the diffusion model, One major problem in the influence maximization problem to maximize the expected size of the final active set, given some constraints on the seed set. The diffusion process of the seed nodes is performed to maximize the influence spread. The influence of seed nodes is determined by the number of nodes activated. The influence maximization problem under both IC and LT models are NP-hard. In other words, the goal of the influence maximization problem is to find initial active people who have the most influence on other people in a short time under a diffusion model [2, 3].

Viral marketing [1, 4,5,6,7,8,9], terrorist attack prevention [4], and network control [10, 11] constitute the applications of the influence maximization problem that viral marketing has attracted researchers more than other applications. In viral marketing, unlike traditional marketing where information diffusion was through newspaper advertising and the mass media, the focus is on peoples effective through family members, relatives, acquaintances, and friends by word of mouth [12]. In this way, diffusion is information transmission from person to other person, such as virus transmission [1]. Domingos and Richardson were the first who addressed the influence maximization problem [6]. Kempe et al. formulated and proved that this problem is an NP-hard [2], who then proposed an approximation algorithm called General Greedy algorithm to solve the influence maximization problem [2]. This algorithm has very low speed and high running time [2]. Some researchers proposed some algorithms to overcome drawbacks in the General Greedy algorithm. The existing influence maximization algorithms have several major drawbacks:

1. They have no effective methods to reduce the search space for influence spread calculation and do not optimally select the candidate nodes.

2. They are not suitable for large-scale networks due to high computational time.

Handling the NP-Hardness and to optimize the seed selection by reducing search space while keeping the running time faster are important to influence maximization problem. So, to address these issues, we propose a new algorithm named GIN heuristic algorithm. In this algorithm, by the identified nodes with a more common number of connections and neighbors than other nodes of the graph, different groups of nodes are generated. By selecting certain nodes from each group, the search space where the final effective nodes are to be found is reduced. Its goal is to increase the speed of the algorithm. During this process, the nodes with the highest expected diffusion value are selected as the seed nodes.

The contributions of our work are as follows:

• Use a grouping method for graph nodes.

• We use candidate nodes optimally to reduce the search space for influence spread calculation.

• The experimental results on seven real-world networks show that the proposed algorithm GIN performs better than the other algorithms in terms of influence spread and running time.

The structure of this article is organized as follows. In Sect. 2, the related works for influence maximization problem is introduced. In Sect. 3, problem definition is expressed. Our proposed algorithm is presented in Sect. 4. Experiments results and evaluations are described in Sect. 5 and we conclude this article in Sect. 6.

2 Related works

Identifying seed nodes is one of the key issues for influence maximization problem in social networks. This is one of the NP-hard problems [2] that has many challenges like high influence spread and low running time, especially on large-scale graphs. Therefore, researchers have presented different algorithms to solve the influence maximization problem. For the first time, Kempe et al. presented an approximate algorithm called General Greedy algorithm with high influence spread that has an optimal approximation [2]. However, this algorithm has a high running time, especially on large-scale graphs [13]. So they proposed the High Degree algorithm by selecting nodes with the highest degree in graph [2]. The High Degree algorithm is very fast, even on large-scale graphs, but has a lower influence spread than the General Greedy algorithm.

Leskovec et al. proposed the CELF algorithm [14]. This algorithm uses lazy evaluation in functions that have the submodularity, significantly reduces influence spread evaluation. Although the CELF algorithm is faster than the General Greedy algorithm, it does not have acceptable speed. Then, Chen et al. proposed the NewGreedyIC algorithm to improve the CELF algorithm [15]. This algorithm computes the influence spread by creating sample graphs from the original graph. However, the NewGreedyIC algorithm improves the running time of the CELF algorithm, but this algorithm still needs to improve the running time, especially on large-scale graphs.

Chen et al. presented the DegreeDiscount algorithm to improve the running time of the CELF algorithm [15]. This algorithm uses a degree discount method for influence spread computations. The DegreeDiscount algorithm has a better running time than the CELF algorithm but does not guarantee optimal approximation. Also, Jiang et al. presented an evolutionary algorithm SA to improve the DegreeDiscount algorithm [16]. This algorithm has gained more influence spread than the DegreeDiscount algorithm, with a high number of iterations and increased running time. Then presented the SAEDV algorithm to reduce the running time of the SA algorithm [16]. In this algorithm, the seed nodes are selected with the maximum expected diffusion value based on the SA algorithm. The SAEDV algorithm has a lower running time than the SA algorithm.

Goyal et al. introduced the SIMPATH algorithm to improve the running time of the CELF algorithm [17]. This algorithm computes the influence spread without using Monte Carlo simulation and using simple paths count. The SIMPATH algorithm has a low memory overhead but does not guarantee optimal approximation. Another algorithm proposed to solve the influence maximization problem is the MIA algorithm [18]. This algorithm was proposed by Wang et al. to improve the computation of the influence spread of the DegreeDiscount algorithm. In the MIA algorithm, using the Dijkstra algorithm, the shortest maximum influence path for each node of the graph is calculated. Although the MIA algorithm has an acceptable running time by eliminating the number of Monte Carlo simulation, the disadvantage of this algorithm is the high memory overhead.

Jung et al. proposed the IRIE algorithm to improve the MIA algorithm [19]. In this algorithm, the influence spread is calculated recursively based on the influence spread of the neighbors of each node in the graph. In the IRIE algorithm, the memory overhead is very low, but the influence spread is low compared to the General Greedy algorithm. In the following, He et al. presented the CLDAG algorithm to improve the General Greedy algorithm [20]. The focus of this algorithm is on blocking the influence maximization problem. Thus, the most important advantage of this algorithm is not dependent on the number of seed nodes, and its main disadvantage is the high memory overhead.

Tang et al. proposed the TIM algorithm to improve the SIMPATH algorithm [21]. In this algorithm, the first set of RR-Set includes nodes that have paths to each other, and then the seed nodes are selected using graph nodes covering. Benefits of TIM algorithm are low running time and low memory overhead. Then, Lu et al. developed the RR-CIM algorithm to improve the High Degree algorithm [22]. This algorithm solves the influence maximization problem by using complementary products. The parameters of this algorithm close the influence maximization problem to the real-world, but the algorithm has a high running time.

In the following, He et al. presented a new algorithm based on community detection to improve the High Degree algorithm [23]. In this algorithm, each community is considered as one node. Then influential nodes are selected based on the red-black tree. This algorithm avoids the rich-club phenomenon but has a high running time. Also, Morone et al. Presented the CI algorithm based on localization of influence spread computation [24]. In this algorithm, the influence spread computation in a circle is limited to the radius L. The main advantage of the CI algorithm is to provide acceptable influence spread into high-density social networks, however, that the calculation of influence spread and running time of the CI algorithm depends on the parameter L and the number of seed nodes.

Then the topology-based algorithm LIR is presented by Liu et al. to improve DegreeDiscount and NewGreedyIC algorithms [25]. First, in this algorithm is calculated the LI of each node, which is indicated the local index of each node that avoids the Rich-club Effect but does not guarantee optimal approximation. The CoFIM algorithm is also based on community detection proposed by Shang et al. to improve High Degree algorithm [26]. In this algorithm, first, the influence spread is calculated based on diffusion between communities and then diffusion inside communities. The advantage of this algorithm is low running time, but the overlapping nodes are not considered. Then Wu et al. developed the LAIM algorithm to improve CoFIM algorithm [27]. By this algorithm, the influence is estimated locally. The main advantage of the LAIM algorithm is its very low running time, and its main disadvantage is that it does not guarantee optimal approximation.

Cui et al. presented an evolutionary algorithm called DDSE to improve the running time of the CELF algorithm [28]. In this algorithm, using the genetic algorithm and local and global search, the seed nodes are selected with the highest expected diffusion value. The DDSE algorithm has influence spread very close to the CELF algorithm on some small datasets. Xie et al. developed the IRR algorithm based on the MBIC model [29]. This algorithm uses the same idea as the DegreeDiscount algorithm. The main advantage of the IRR algorithm is the use of different states based on real-world criteria for nodes, but this algorithm is dependent on the number of nodes in the graph.

The C2IM algorithm is presented by Singh et al. to improve the time-efficiency of the TIM algorithm for the influence maximization problem [10]. This algorithm uses the user’s interests and community framework to find effective seed nodes. Also, Ahmadi Beni et al. presented an algorithm called TI-SC based on community detection to improve the influence spread of the DegreeDiscount algorithm [30]. This algorithm selects seed nodes by examining the relationships between the core nodes. The TI-SC algorithm is useful to reduce the overlap in selecting the seed nodes, but this algorithm has not an efficient performance in large-scale networks.

3 Problem definition

The social networks’ graph is mapped as G (V, E), where V and E are described as follows:

  • V = {v1, v2, …, vn} is a set of graph nodes, where vi is the individuals, and n is the number of nodes in the graph.

  • E = {e1, 2, e1, 3, …, e1, n, e2, 1, e2, 3, …, en, n − 1} is a set of graph edges, where, ei, j is the relationship between two individuals or an edge between each pair of nodes (i ≠ j).

Considering G (V, E) to find k seed nodes for influence maximization problem, the subset S select from V = {v1, v2, …, vn} due to the diffusion model. The process of seed nodes selection is done until ∣ S ∣  = k so that the influence spread of S, σ(S) is maximized by Eq. (1) under the given diffusion model [31]:

$${S}^{\ast }={argmax}_{S\subset V, |S|=k}\sigma (S)$$
(1)

The influence spread is calculated due to the diffusion model for influence maximization problem. One of the most widely used information diffusion models is the independent cascade model proposed by Kempe et al. The independent cascade model has two important properties in the influence spread function which these properties are submodularity and monotonicity. A set function f : 2V →  is submodular if, for any S ⊆ T ⊆ V subsets and anyv ∈ V\T element , the marginal contribution of v element, when added to a T set can not exceed the marginal contribution of v element when added to a S ⊆ T subset due to Eq. (2) [31]:

$$f\left(S\cup \left\{v\right\}\right)-f(S)\ge f\left(T\cup \left\{v\right\}\right)-f(T)$$
(2)

The objective function f : 2v →  is monotone if we have f(S) ≤ f(T) for eachS ⊆ T ⊆ V subset [31].

4 Proposed algorithm

In GIN algorithm, a heuristic method is proposed to identify the influential nodes for the influence maximization problem. In this algorithm, different graph nodes are created to identify nodes with more communication and more common neighbors than other graph nodes. Then, certain nodes are selected from each group. This process is done to create new search space and reduce the number of influence spread calculations. At last, the final influential nodes are selected among the nodes in the new search space which maximize the expected diffusion value as the seed nodes. Figure 1 shows an overview of the general levels of GIN algorithm.

Fig. 1
figure 1

General levels of the GIN algorithm

According to Fig. 1, the GIN algorithm consists of three levels: (1) grouping graph nodes, (2) creating new search space, and (3) selecting seed nodes.

4.1 Grouping graph nodes

The first level of algorithm GIN is due to AntCBO [32] algorithm by modification. This level of GIN algorithm includes the process of creating different groups of graph nodes with more communication and common neighbors than other nodes. This level consists of six basic steps:

Step 1 For every node of graph, an available variable is defined which determines the available or non- available of each node. The value of this variable is either True or False. At first, the value of this variable is considered True for all graph nodes. If the node has an available = True value, it will be available; so, it can be selected for calculations, otherwise non-available; thus, it cannot be selected for calculations. Figure 2 shows a graph consists of 16 nodes with available = True variable for all graph nodes.

Fig. 2
figure 2

Example of the graph containing 16 nodes with the available = True variable

Step 2 The graph nodes are sorted from highest to lowest degree in descending order in an undirected graph, or out-degree in a directed graph and placed in list L. Thus, in the first place of list, the node is with the highest degree, and in the last place, the node is with the lowest degree from graph, according to Fig. 3.

Fig. 3
figure 3

Example of placing nodes in list L

Step 3 The group gi = {} is created (i = {1, 2, 3, …, l}). Then, the first node from list L which has an available = True value, is selected and added to gi group (gi ∪ {vi}). Then, the availability value of this node changes into available = False. In the following, vj node with the highest degree among vi node ’s neighbors which has the available = True value, is selected and added to gi group (gi ∪ {vj}). The availability value of vj node is changed into available = False (Fig. 4).

Fig. 4
figure 4

An example of placing graph nodes in the first group

Step 4 If vi and vj nodes do not have common neighbors, the computation continues from the previous step for the next node from L list with the value of avalable = True.

Step 5 If vi and vj nodes have common neighbors, ComNei set will be created including common neighbors of vi and vj nodes (ComNei = {v1, v2, …, vm}) which {v1, v2, …, vm} will be always belonging to ComNei for each vi and vj. The vm node is selected from ComNei set with available = True value randomly and is added to gi group (gi ∪ {vm}). Then, the availability value of this node is changed to available = False. If the neighbors of vm node do not have any common with nodes within ComNei set , the availability of next node from ComNei set will be checked which is added to gi group ; otherwise, Z set including the common nodes of vm’s neighbors and ComNei set is created (Z = neighbor(vm) ∩ ComNei). Then, ComNei set is replaced with Z set . This process is repeated until ComNei = {}, or the nodes in this set have available = false value (Fig. 5).

Fig. 5
figure 5

An example of adding other nodes to the first group

Step 6 The previous described steps are repeated until the created groups reach a specified number (Itrgi = l). l is the maximum number of groups created; also, that is constant.

4.2 Creating new search space

In this level, the two nodes vi and vj which include the first and second added nodes are selected from each gi group. The first node selected from each gi group is the node vi with the highest degree in that group, and the second node selected is the vj node with the highest degree among the neighbors vi. Each group gi contains at least one vi node. If there exists no node vj in a gi, that group has only one node, thus the only vi node is selected. Also, if there are multiple nodes vi that have the same maximum degree value in each gi group, and similarly, if there are such multiple nodes vj, these are selected randomly. In this way, by identifying and selecting these nodes from each group, the search space to find influential nodes is reduced. This level is repeated until the selected nodes reach a specified number (Ncandidate = m). m is the maximum number of candidate nodes selected.

4.3 Selection of seed nodes

In this level, the selection of seed nodes from a new search space is done by the greedy method. Thus, according to Eq. (3), the Expected Diffusion Value (EDV) of these nodes is calculated. This equation has also been used in other studies [16, 28]. The first time, Jiang et al. proposed the expected diffusion value to estimate the actual influence spread under the independent Cascade model and employed it in their algorithm to replace Monte Carlo simulation [16]. The EDV approximates the spread by computing how many direct neighbors of the nodes in the candidate node set S are expected to be activated. According to the Eq. (3), k is the number of seed node, p is the activation probability the non-seed nodes, \({N}_S^l\backslash S\) is the neighbors of the nodes S except the nodes of set S, and τ (i) is the number of neighbors of i node in set S.

$$EDV(S)=k+{\sum}_{i\epsilon {N}_S^l\backslash S}1-{\left(1-p\right)}^{\tau (i)}$$
(3)

Consequently, by applying the greedy selection method, a node from the new search space which causes maximization of the marginal gain function (EDV(S ∪ {w}) − EDV(S)) as the seed node added to the set S. This process is repeated until ∣ S ∣  = k. Then, the influence spread of seed nodes is calculated using the independent cascade model. Algorithm1 shows the pseudo-code of the GIN algorithm.

figure a

5 Experiments and evaluation of the proposed algorithm

The GIN algorithm has been evaluated on seven real datasets. Then the results of influence spread and running time of algorithms evaluated under the independent cascade model in a similar system. Influence spread means the number of final nodes activated by the seed nodes in the diffusion process. The running time means the time has passed to find the seed nodes. All the codes are written in C#, and all algorithms are performed on a Windows 10 operating system with 2.20 GHz CPU (Intel® Core (TN) i5-5200U) and 8 GB of RAM.

5.1 Description of the datasets

The GIN algorithm is implemented on seven datasets of different sizes. The characteristics of these datasets are shown in Table 1.

Table 1 Specifications of seven real-world social networks

Email [33]: This dataset is a network of emails at URV University that includes professors, administrators, technicians, researchers, and graduates. In this dataset, each node represents the email address, and the link between the emails is an edge.

NetScience [34]: This dataset contains a collaboration network of scientists working on network theory. Newman compiles this dataset in the year 2006. The nodes in this dataset represent the authors of the articles, and the edges indicate the collaboration between the authors in a common article.

Power [35]: This dataset contains information about the power grid of the Western States of the United States of America that a node is either a generator, a transformator, or a substation. Also, an edge represents a power supply line.

NetHEPT [15]: This dataset contains arXiv articles in the “High Energy Physics Theory” section, published in the years 1991 to 2003. The nodes in this dataset represent the authors and, if they work together in an article, an edge is created between them.

NetPHY [15]: This dataset contains arXiv articles in the “Physics” section. In this network, each node is an author, and the edges between two nodes represent the collaboration of two authors in one article.

Epinions1 Footnote 1: This dataset is an online social network trust-based, collected from www.epinions.com. Each node in this dataset represents customers who are users of the site, and each edge represents a secure connection.

Slashdot0902 Footnote 2: This dataset is a version of the popular site www.slashdot.org about technology news where users can tag other users as friends or foes. The network was collected in February 2009.

5.2 The compared algorithms

The GIN algorithm is compared with various algorithms, including General Greedy, High Degree, SAEDV, LIR, and DDSE algorithms based on two criteria of influence spread and running time. The algorithm is implemented with parameter l = 150, Monte Carlo simulation R = 10000, and the activation probability p = 0.01.

General Greedy According to 2], this algorithm is evaluated by Monte Carlo simulation R = 10,000 and the activation probability p = 0.01 under independent cascade model.

High Degree According to [2], in this algorithm, the influence spread of k seed nodes with the highest degree of the graph is calculated under the independent cascade model.

SAEDV According to [16], in this algorithm, choose to initialize a node set randomly and optimize it with iterations. The parameters for SAEDV algorithm are set as follows: the initial temperature T0 = 500000, the final temperature Tf = 100000, the number of inner loop q = 10 and the temperature drop after each inner loop ∆T = 2000.

LIR According to [25], in this algorithm, nodes with 0-LI are selected for the NetScience, NetHEPT, NetPHY, Epinions1, and Slashdot0902 datasets, while in the Email dataset, nodes with 0-LI are low, then nodes are selected with 1-LI. Influence spread calculations are performed by Monte Carlo simulation R = 10,000 and the activation probability p = 0.01 under independent cascade model.

DDSE According to [28], the results of this algorithm are obtained with the size of the population Npop = 10, mutation probability OPMutation = 0.1, Crossover probability OPCrossover = 0.4, Number of generation the evolution lasts MaxIt = 200 and the diversity of the population Diversity = 0.6.

5.3 Diffusion model

Researchers have proposed various types of information diffusion models [2, 36,37,38,39,40]. One of the most widely used diffusion models is the independent cascade model, which has been used in most studies [2, 5, 37, 41, 42]. Our algorithm focuses on the influence maximization problem under the independent cascade model too. This model was introduced by Goldenberg et al. [5]. In this model, the nodes are either in an active or inactive state. Each edge between two nodes v and u has the activation probability Pvu that means inactive node u is active by node v. Each active node v at time t can only activate its neighbor’s inactive node once. And then, nodes that are activated at time t − 1 can activate their neighbor inactive nodes. Thus, if every active node v at time t with probability Pvu activates its neighbor inactive nodes, then at time t + 1 considered as an active node. The diffusion process will stop when no more new nodes are activated.

5.4 Experimental results

According to the description of the proposed algorithm, first, l groups are created. Here, the GIN algorithm is tested with different l to selecting the best value of parameter l with the highest influence spread on each seven real-world datasets. The influence spread results of this algorithm with different l are shown in Tables 2, 3, 4, 5, 6, 7, and 8 on Email, NetScience, Power, NetHEPT, NetPHY, Epinions1, and Slashdot0902 datasets respectively. According to these results, obviously, the influence spread of the different l is very close to each other algorithms, but the influence spread of the GIN with l = 150 is higher than other values of the parameter l in most of the k nodes. Thus, to compare the influence spread of the proposed algorithm with other algorithms is used parameter l=150 for the GIN algorithm on all of seven datasets.

Table 2 The influence spread of GIN algorithm with different numbers of l on Email dataset
Table 3 The influence spread of GIN algorithm with different numbers of l on NetScience dataset
Table 4 The influence spread of GIN algorithm with different numbers of l on Power dataset
Table 5 The influence spread of GIN algorithm with different numbers of l on NetHEPT dataset
Table 6 The influence spread of GIN algorithm with different numbers of l on NetPHY dataset
Table 7 The influence spread of GIN algorithm with different numbers of l on Epinions1 dataset
Table 8 The influence spread of GIN algorithm with different numbers of l on Slashdot0902 dataset

The results of the evaluated algorithms in terms of influence spread and running time are shown in Fig. 6. These algorithms were implemented with fixed activation probability (p = 0.01) and varying sizes of the seed set with the range from k = 5 to k = 50 based on the independent cascade model. The x-axis indicates the seed nodes, and the y-axis indicates the influence spread.

Fig. 6
figure 6

Diagram of the influence spread on (a) Email dataset, (b) NetScience dataset, (c) Power dataset, (d) NetHEPT dataset, (e) NetPHY dataset, (f) Epinions1 dataset and (g) Slashdot0902 dataset

The influence spread of the evaluated algorithms on the Email dataset is shown in Fig. 6a. In this figure, the results of the algorithms General Greedy and GIN are close to each other. The influence spread in the SAEDV algorithm is higher than the High Degree and DDSE algorithms. The influence spread in the DDSE algorithm from k = 5 to k = 20 is very close to that of the High Degree algorithm, and from k = 25 to k = 50, it follows a gradual reduction. The LIR algorithm has the lowest influence spread than the other algorithms in the Email dataset.

In Fig. 6b, the influence spread of the evaluated algorithms on the NetScience dataset is shown. In this figure, the influence spread of General Greedy, GIN, and DDSE algorithms are higher than other algorithms and are close to each other. The influence spread in LIR and SAEDV algorithms is less than the High Degree algorithm. Also, the influence spread of the comparable algorithms on the Power dataset is shown in Fig. 6c. According to Fig. 6c, General Greedy and GIN have higher influence spread than other algorithms. The highest influence spread belongs to DDSE after than the General Greedy and GIN algorithms. Then, the influence spread from high to low belongs to the High Degree, LIR, and SAEDV in most k nodes, respectively.

The influence spread of the evaluated algorithms on the NetHEPT dataset is shown in Fig. 6d, where the highest influence spread is for General Greedy and then GIN algorithm. The influence spread in the GIN algorithm, for k = 5 to k = 25, is very close to the General Greedy algorithm. The DDSE algorithm has a higher influence spread than the High Degree, LIR, and SAEDV algorithms. Of course, the results of the DDSE algorithm from k = 5 to k = 20 are close to the GIN algorithm. After the DDSE algorithm, the highest influence spread is for the High Degree algorithm and then the LIR algorithm. The influence spread of the SAEDV algorithm is very low on the NetHEPT dataset.

The influence spread of the evaluated algorithms on the NetPHY dataset is shown in Fig. 6e. In this figure, the influence spread of General Greedy and then GIN are higher than other algorithms. Also, the influence spread from most to less is for the DDSE algorithm and then the High Degree, LIR, and SAEDV algorithms. Also, the influence spread of the evaluated algorithms on the Epinions1 dataset is shown in Fig. 6f. Epinions1 is a large-scale dataset, and the General Greedy algorithm cannot run on it because this algorithm has a high running time. Therefore, the run of the General Greedy algorithm on the Epinions1 dataset is ignored. As observed in Fig. 6f, the GIN and High Degree algorithms have higher influence spread than other algorithms. However, GIN and High Degree algorithms have very close results in some k nodes, but the GIN algorithm in most k nodes has a higher influence spread than the High Degree. Then the influence spread of the DDSE algorithm is more than SAEDV and LIR algorithms.

In Fig. 6g, the influence spread of the algorithms on the Slashdot0902 dataset is shown. This dataset, like the Epinions1 dataset, has a large-scale, and the General Greedy algorithm cannot run on it because this algorithm has a high running time. Therefore, the run of the General Greedy algorithm on the Slashdot0902 dataset is ignored. According to Fig. 6g, the influence spread of the GIN algorithm is higher than other algorithms. Of course, the High Degree and DDSE algorithms in some k nodes have results close to the GIN algorithm. But in general, the GIN algorithm in the Slashdot0902 dataset has shown more influence spread than the compared algorithms. The SAEDV algorithm and then the LIR algorithm in this dataset has the lowest influence spread.

The speedup % (in terms of influence spread) of the GIN algorithm over other algorithms is shown in Table 9. In this table, the General Greedy algorithm called GG and the High Degree algorithm called HD. Due to the General Greedy lack of scalability, its results are not reported for Epinions1 and Slashdot0902 datasets. Table 9 shows the speedup % (in terms of influence spread) of our proposed algorithm GIN over other algorithms. The speedup is computed using Eq. (4). We can see that the GIN algorithm has a positive speedup over other algorithms in this table.

Table 9 Speedup % (in terms of influence spread) on different datasets
$$Speedup=((\alpha -\beta)/\alpha )\times 100$$
(4)

According to the Eq. (4), α is the influence spread of the GIN algorithm, and β is the influence spread of the other algorithms for different k seed nodes.

The results of the running time algorithms are shown in Fig. 7 on the seven datasets Email, NetScience, Power, NetHEPT, NetPHY, Epinions1, and Slashdot0902. These results are for k = 50 and the number of the Monte Carlo simulation R = 10000 under the independent cascade model per second. According to Fig. 7, the High Degree and LIR algorithms have the lowest running time in all seven datasets. Of course, the High Degree and LIR algorithms have low the influence spread on more comparable datasets. Then, the running time from lowest to highest belongs to the GIN, DDSE, SAEDV algorithms, and then to the General Greedy algorithm.

Fig. 7
figure 7

Diagram of running time of different algorithms for k = 50

Given the results of influence spread and running time of the compared algorithms in Figs. 6 and 7, it is quite obvious that the GIN algorithm has high influence spread along with low running time in all seven datasets.

6 Conclusion

Influence maximization is a problem that has been addressed in social networks analysis and viral marketing. The purpose of this problem is to find the k individual that, by activating them, the highest number of people in a social network can be activated at an acceptable time. For this purpose, the GIN algorithm is presented in this article. In this algorithm, the process of creating different groups, including graph nodes with more number of connections and more common neighbors than other groups, is performed. Hence, the search space to find the final influential nodes is reduced. This process reduces the number of computations and increases the speed of the proposed algorithm. Next, nodes from the new search space that maximize the marginal gain function of the expected diffusion value are selected as the seed node. Experimental results show that the GIN algorithm, in comparison algorithms on all seven real-world datasets, has provided high influence spread along with low running time.