To compare the impacts of active and passive scenarios, we carried out a data-driven investigation modeling the social graph with both synthetic networks as well as with real-world network datasets, as described in Datasets. The analytical protocol we adopted is described in Analytical protocol and the evaluation in Analysis.
Datasets
For our simulations, we use a real-world dataset, the FB network. This is a sample of the WOSN2009 (Viswanath et al. 2009) dataset and describes online interactions between Facebook users. The FB graph is composed of 31 daily snapshots covering the month of January 2007: statistics of the graph are reported in Table 1. We conduct our experimentation analysis on the static scenario, so we collapsed all FB snapshot graphs in a single network composed by the union of individual node and edge sets.
Table 1 Base statistics of the analyzed Facebook graphs
Moreover we simulate the introduced diffusion models also on three synthetic network generator models: (i) Barabási-Albert (1999), (ii) Erdós-Renyi (1959) and (iii) Wats-Strogatz (1998). To have “comparable networks” to the real one, we fix the number of nodes and the average degree such as the characteristics reported in Table 1. So the generated networks have 63392 nodes each, and are obtained by setting the following parameter values:
-
Barabási-Albert graph: number of connections per new node m=13;
-
Erdós-Renyi graph: edge creation probability p=0.0004;
-
Wats-Strogatz graph: node neighbors k=13, rewiring probability p=0.01.
Analytical protocol
To compare the diffusion scenarios previously described, we designed the following analytical protocol:
-
For each dataset we randomly selected 100 sets of nodes each one covering 5% of V: such sets identify, for each scenario and model, 100 different initial seeds of infection configuration – \(I_{t_{0}}\);
-
For each dataset, scenario and \(I_{t_{0}}\) we executed the active, passive and mixed diffusion models previously introduced for an equal, fixed, number of iterations (30 for all the networks);
-
Finally, we compared the models by analyzing the obtained infection trends as well as the percentage of infected nodes at the end of each simulation.
To mitigate the effects of initial seed set selection, we considered as infection trend for each configuration the iteration wise average of the runs over the executions performed while varying the seeds. The same strategy is also applied to identify the final percentage of infected nodes at the end of each configuration simulation.
Finally, to understand the impact of different values of model parameters have on the diffusion process, we simulated the three scenarios with several configurations of the node threshold, τ, and node profile γ. Moreover, we also varied the immunization probability value, p, and spontaneous adoption rate, a. As a result, we instantiated all the – valid – parameter combinations for the selected models, varying their values in the following ranges:
-
Threshold, τ: [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]
-
Node Profile, γ: [0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]
-
Percentage of blocked nodes, p: [0, 0.1, 0.2, 0.3]
-
Probability of spontaneous adoption, a: [0, 0.001, 0.005, 0.01]
Analysis
The typical strategy to resolve this diffusion problem is to use a Threshold model (a passive approach). But the question is: “a passive approach is the right way to resolve the diffusion of information problem?”.
To answer this question, in the following, we report the diffusion trends obtained after our simulations for all the networks in a simple scenario – without immunization and spontaneous adoptions. All other scenarios (in which p≠0 and/or a≠0) are detailed only for the Facebook graph.
Results
To better characterize the obtained results we analyze separately models that contemplate blocked nodes from the ones that do not. We treat similarly the results obtained in presence/absence of spontaneous adoptions.
Without immunization, without spontaneous adoptions. In this scenario fall the standard implementation of the three methods; we analyze separately networks to better characterize the differences between the methods.
Barabasi Albert graph. The diffusion trends obtained with the simulation of the three methods on the Barabasi Albert graph are shown in Fig. 1.
As we can observe, if we fix the threshold (τ) equal to 0.1, with low values for the Node Profile (γ) the diffusion trends obtained with the three methods are very similar (Fig. 1a, b). The three models show a fast grow; after only four iterations almost all the nodes of the network are infected. If we change the value of γ, and we fix it to 0.8, as shown in Fig. 1c the growth of the number of infected nodes obtained with the Profile model is slower compared to the previous figures; only at the end of the observation period, the trend reaches the total number of infected nodes. This result shows that the peer pressure is significant; even if the people do not like the content spread (every people has the 20% of percentage to accept a content similar to the one that is currently spreading), they end up adopting it. With only a threshold equal 0.2 the spread does not start; in average, each node has 13 neighbors and with the choice of threshold equal 0.2 the node can become infected after three infected neighbors (Fig. 1d).
Erdós-Renyi graph. As we can see from the Fig. 2 the behavior of the diffusion trend for the Erdós-Renyi graph is very similar to the results obtained with the Barabasi Alberth graph.
Wats-Strogatz graph. The diffusion trends for the Wats-Strogatz graph are shown in Fig. 3. For this network, the diffusion process is slower compared to the other networks; in this case, we do not have particular nodes, such as hubs that speed up the diffusion, nor an evident small-world effect (due to the chosen parameter values). In fact, we fix p=0.01 and the network is more similar to a lattice than a random network, so the results of the three diffusion models are little dependent from the initial infected node sets \(I_{t_{0}}\). For this network, differently from the previous graphs, with a value of threshold equal 0.2, the diffusion process starts. As shown in Fig. 3d, even if the number of infected nodes is less than the number obtained with the Profile model (with γ=0.4) the two models with the threshold reach around the 70% of infected nodes at the end of the process. Conversely, with a threshold equal to 0.3 only the 16% of node became infected and with τ=0.4 the diffusion process does not start.
Facebook graph. Also for the real network, we obtain results similar to the synthetic networks; with low values for γ and τ the trends of the three methods are very similar as we can see in the Fig. 4.
For the real network and for the Wats-Strogatz graph we obtain an expected result: the active diffusion trends show the fastest growth; conversely, the passive diffusion trends seem to be tied to a slower start. Such results are somehow expected: the former model assumes that a susceptible node can decide to adopt when it discovers the existence of a given information (e.g., when at least a single of its neighbors has already adopted it) while the latter fixes an exposure threshold below which the node does not come in contact with the information. Particular attention should be reserved to the mixed approach, described by the Profile-Threshold model: for the first two synthetic networks, the mixed and passive models behave alike while in the Facebook and Wats-Strogatz network the Profile-Threshold trend stands below the Threshold one.
This result is also expected: with that approach, first, a susceptible node has to overcome the exposure threshold to come in contact with the information, after, he has to decide to adopt it. So to adopt the information, two conditions are necessary: (i) the node has to have a sufficient number of adopted neighbor and (ii) he has to autonomously decide to adopt the information because he loves it.
Without immunization, with spontaneous adoptions. For this scenario we show in Fig. 5 the heatmap obtained with the three methods on the real network Facebook. Every cell of the heatmap represents the percentage of infected nodes at the end of the observed period (in our case at the end of the 31st day) for different parameters. The cells with a darker shade of red have a percentage of infected node high; the cells with a lighter shade of red have a low percentage. We expected that with the introduction of the spontaneous adoptions the percentage of infected nodes increase. We can observe this result for all the methods.
For the Threshold model (Fig. 5b) in the x-axis we have the τ parameter and in the y-axis the spontaneous adoption rate a. On the top of the heatmap, the percentage of infected nodes is greater compared to the bottom, where the value of a is small (the range of a is from 0 to 0.01). For the Profile model (Fig. 5f), differently from the previous case, in the x-axis we put the γ parameter. In this case with each value of γ and with p=0 the percentage of infected nodes is high; as observed previously with the diffusion trend, the value of the threshold impact mainly the diffusion process. If the percentage of neighbors is below the fixed threshold, the node does not come into contact with the information; he can adopt the idea only spontaneously. This result can be observed also in the Fig. 5i: if γ has a big value (i.e. ≥0.4) the diffusion process can not start; this phenomenon is mitigated from the introduction of adopter spontaneous (Fig. 5j).
Also in this case, we find the previous result: the active diffusion trends show the fastest growth after we have the trend obtained with passive approach and finally the mixed approach.
With immunization, without spontaneous adoptions. When we introduce the concept of “blocked nodes” the diffusion patterns change. In this case, we want to simulate a random immunization; the nodes that will become immune are pick up at random. As expected the percentage of infected nodes of the three models experience dumping compared to that observed in the previous analysis. In particular, especially for the Profile model (Fig. 5e), we can observe that after the same period the percentage of infected nodes halves compared to the simulation with p=0 (the row at the bottom of the heat map). Indeed, our experiments underline a linear correlation between the value chosen for p and the observed reduction of the infected population.
With immunization, with spontaneous adoptions. When we introduce the concept of “blocked nodes” and “spontaneous adoption” the diffusion pattern that we obtain are those expected. With the increase of the a value, the percentage of infected nodes grows; conversely with the increase of the p value the percentage decrease, as we can observe in Fig. 5c, d, g, h, k, l.