Abstract
Probabilistic cascade models consider information diffusion as an iterative process in which information transits between users of a network. The problem of diffusion modeling then comes down to learning transmission probability distributions, depending on hidden influence relationships between users, in order to discover the main diffusion channels of the network. Various learning models have been proposed in the literature, but we argue that the diffusion mechanisms defined in most of these models are not well-adapted to deal with noisy diffusion events observed from real social networks, where transmissions of content occur between humans. Classical models usually have some difficulties for extracting the main regularities in such real-world settings. In this paper, we propose a relaxed learning process of the well-known independent cascade model that, rather than attempting to explain exact timestamps of users’ infections, focus on infection probabilities knowing sets of previously infected users. Furthermore, we propose a regularized learning scheme that allows the model to extract more generalizable transmission probabilities from training social data. Experiments show the effectiveness of our proposals, by considering the learned models for real-world prediction tasks.
Similar content being viewed by others
Notes
Throughout this paper, we indifferently talk of infection or contamination to denote the fact that the propagated content has reached a given user of the network.
The extraction of diffusion sequences from the data, which may be not straightforward with non-binary participations to the diffusion or in the case of a polymorphic diffused content, is not of our concern here. We assume diffusion episodes already extracted by a preliminary process.
The ending time of diffusion T is arbitrarily set to the infection time-stamp \(t^D(u)\) of the latest contaminated user u in the longest diffusion episode D.
Note that the second term of formula 3 remains unchanged since this part does not depend on any latent factor and can be considered as it in the optimization process.
In our setting, a counter-example of diffusion from user u to user v is an episode contained in \(\mathscr {D}_{u,v}^-\) (see formula 9): an episode where u is infected but v is not.
Relation u, v is considered only if there exists at least one diffusion episode in the training set where u is infected before v. With all approaches studied hereafter, relationships with no positive example would obtain a null weight anyway. They can therefore be ignored during the learning step.
References
Bourigault S, Lagnier C, Lamprier S, Denoyer L, Gallinari P (2014) Learning social network embeddings for predicting information diffusion. In: Carterette B, Diaz F, Castillo C, Metzler D (eds) WSDM '14 Proceedings of the 7th ACM international conference on Web search and data mining, ACM, New York, pp 393–402
Burton K, Java A, Soboro I (2009) The icwsm 2009 spinn3r dataset. In: Proceedings of the 3rd annual conference on weblogs and social media (ICWSM 2009), San Jose, 17–20 May 2009
Du N, Song L, Gomez-Rodriguez M, Zha H (2013) Scalable influence estimation in continuous-time diffusion networks. In: Burges C, Bottou L, Welling M, Ghahramani Z, Weinberger K (eds) Advances in neural information processing systems 26, Curran Associates Inc, pp 3147–3155
Goldenberg J, Libai B, Muller E (2001) Talk of the network: a complex systems look at the underlying process of word-of-mouth. Market lett 12(3):211–223
Gomez-Rodriguez M, Balduzzi D, Scholkopf B (2011) Uncovering the temporal dynamics of diffusion networks. In: Proceedings of the 28th international conference on machine learning (ICML-11), ICML'11, ACM, pp 561–568
Gomez-Rodriguez M, Leskovec J, Krause A (2010) Inferring networks of diffusion and influence. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD'10, ACM, New York
Granovetter M (1978) Threshold models of collective behavior. Am J Sociol 83(6):1420–1143
Gruhl D, Guha R, Liben-Nowell D, Tomkins A (2004) Information diffusion through blogspace. In: Proceedings of the 13th international conference on World Wide Web, WWW '04, ACM, New York, pp 491–501
Guille A, Hacid H (2012) A predictive model for the temporal dynamics of information diffusion in online social networks. In: Proceedings of the 21st international conference companion on World Wide Web, WWW '12, Companion, ACM
Kempe D, Kleinberg J, Tardos E (2003) Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '03, ACM, pp 137–146
Klimt B, Yang Y (2004) Introducing the Enron corpus. In: 1st conference on email and anti-spam (CEAS), 30–31 July 2004, Mountain View, California
Lagnier C, Denoyer L, Gaussier E, Gallinari P (2013) Predicting information diffusion in social networks using content and user's profiles. In: Serdyukov P, Braslavski P, Kuznetsov SO, Kamps J, Rüger S, Agichtein E, Segalovich I, Yilmaz E (eds) Advances in information retrieval, 35th European conference on IR research, ECIR 2013, Moscow, 24–27 Mar 2013, pp 74–85
Leskovec J, Backstrom L, Kleinberg J (2009) Meme-tracking and the dynamics of the news cycle. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '09, ACM, New York, pp 497–506
Ma H, Yang H, Lyu MR, King I (2008) Mining social networks using heat diffusion processes for marketing candidates selection. In: Proceedings of the 17th ACM conference on information and knowledge management, CIKM '08, ACM, New York, pp 233–242
Mathioudakis M, Bonchi F, Castillo C, Gionis A, Ukkonen A (2011) Sparsification of influence networks. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, KDD '11, ACM, New York, pp 529–537
Myers SA, Leskovec J (2010) On the convexity of latent social network inference. In: NIPS ’10 Proceedings of the 23rd international conference on neural information processing systems, Curran Associates Inc, USA, pp 1741–1749
Romero DM, Meeder B, Kleinberg J (2011) Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter. In: Proceedings of the 20th international conference on World wide web, ACM, pp 695–704
Saito K, Kimura M, Ohara K, Motoda H (2009) Learning continuous-time information diffusion model for social behavioral data analysis. In: Proceedings of the 1st Asian conference on machine learning: advances in machine learning, ACML '09, Springer, Berlin, pp 322–337
Saito K, Kimura M, Ohara K, Motoda H (2010) Generative models of information diffusion with asynchronous timedelay. JMLR: Workshop and Conference Proceedings 13, 2nd Asian conference on machine learning (ACML2010), Tokyo, 8–10 Nov 2010, pp 193–208
Saito K, Nakano R, Kimura M (2008) Prediction of information diffusion probabilities for independent cascade model. In: Proceedings of the 12th international conference on knowledge-based intelligent information and engineering systems, Part III, KES '08, Springer, pp 67–75
Saito K, Ohara K, Yamagishi Y, Kimura M, Motoda H (2011) Learning diffusion probability based on node attributes in social networks. In: Kryszkiewicz M, Rybinski H, Skowron A, Ras ZW (eds) ISMIS, lecture notes in Computer Science, vol. 6804, Springer, pp 153–162
Szabo G, Huberman BA (2010) Predicting the popularity of online content. Commun ACM 53(8):80–88
Ver Steeg G, Galstyan A (2013) Information-theoretic measures of influence based on content dynamics. In: Proceedings of the sixth ACM international conference on Web search and data mining, WSDM '13, ACM, New York, pp 3–12
Wang F, Wang H, Xu K (2012) Diffusive logistic model towards predicting information diffusion in online social networks. In: Proceedings of the 2012 32nd international conference on distributed computing systems workshops, ICDCSW '12, IEEE Computer Society, pp 133–139
Wang L, Ermon S, Hopcroft JE (2012) Feature-enhanced probabilistic models for diffusion network inference. In: Proceedings of the 2012 European conference on machine learning and knowledge discovery in databases, vol Part II. ECML PKDD'12, Springer, pp 499–514
Yang J, Leskovec J (2010) Modeling information diffusion in implicit networks. In: Proceedings of the 2010 IEEE international conference on data mining, ICDM '10, IEEE Computer Society, Washington, pp 599–608
Acknowledgments
This work has been partially supported by the REQUEST project (projet Investissement d’avenir, 2014–2017) and the project ARESOS from the CNRS program MASTODONS.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Proof of Proposition 1
Let us denote \(\theta _{u,v}^{(i)}\) the transmission probability from user u to user v at the i-th iteration of the learning process. Let also denote \(P^{D^{(i)}}_{v}\) the estimation of the infection probability of v in the episode D (computed using formula 1 using current transmission probabilities) at the i-th iteration of the learning process.
First, with \(A_{u,v}=\frac{|D^+_{u,v}|}{|D^+_{u,v}|+|D^-_{u,v}|}\), let us consider the following Lemma:
Lemma 1
Proof
Lemma 1 can be easily deduced from the update formula applied at each step of the learning process (Eq. 7), since we know from (Eq. 1) that \(\frac{\theta ^{(i)}_{u,v}}{P_v^{D^{(i)}}}\le 1\) for all \(I_{u,v} \in \mathscr {I}\) at every iteration \(i>0\) of the process. Note that, without loss of generality, for getting the lemma valid for \(i=0\), we assume that the probabilities \(\theta\) are all initialized such that for all \(I_{u,v} \in \mathscr {I}: \theta ^{(0)}_{u,v} \in [0,A_{u,v}]\). \(\square\)
Let’s now consider the following lemma:
Lemma 2
Proof
If \(|\mathscr {D}^-_{u,v}|=0\), we get, from formula 7:
where we used the fact that \(P_j^{D^{(i)}}\) is included in ]0; 1[. \(\square\)
For simplicity, let us now state \(I_v^D =(U^D_{v} \cap Preds_v)\). For every episode \(D \in \mathscr {D}\) and every user \(v \in U^D_{\infty }\), we have at any iteration i of the process:
Let state \(B_v^D = \prod \limits _{u \in I_v^D, |\mathscr {D}_{u,v}^-|>0} (1-A_{u,v})\). Note that \(B_v^D\) is a constant over the whole learning process. Now, let’s consider the case of the proposition, where it exists at least one user \(u \in I_j^D\) such that \(|\mathscr {D}_{u,v}^-|=0\). In that case, we can rewrite the inequality as :
Now, let us consider the sequence V defined as:
From Lemma 2, we know that V is decreasing, since any component of the max function does not own any counter-example in the training set. Moreover, This sequence is lower-bounded by 0. Then, V converges toward its fixed point, which we denote as l. From this, two possibilities: either l equals 0 or is strictly \({>}0\).
If \(l=0\), then we know that:
Now, the formula 1 leads to know that, at every iteration i, \(\forall u' \in I_v^D: P_v^{D^{(i)}} \ge \theta ^{(i)}_{u',v}\). Therefore, at every iteration i, we have: \(P_v^{D^{(i)}} \ge \max \limits _{u' \in I_v^D\setminus \{u\}, |\mathscr {D}_{u',v}^-|=0} \theta ^{(i)}_{u',v}\). Since we know that \(P_v^{D^{(i)}}\) is also upper-bounded by 1 at every iteration i, we can state that, in that case, \(\lim \limits _{n \rightarrow \infty } P_v^{D^{(n)}}=1\).
Else, we have at every iteration i:
Plugging this in inequality (19), we get for every i:
with \(\lambda = l B_v^D\). Then, we can rewrite the update formula 7 as:
Let us consider now the sequence W such that:
Then, since W takes its values in ]0; 1[, and that \(\lambda\) is also in ]0; 1[, we can state that:
The sequence is thus strictly increasing. Since it is upper-bounded by its fixed point 1, we know that it converges to 1. Now, since we know that, from inequality (20), \(\forall n : \theta _{u,v}^{(n)} \ge W_n\), we can get that \(\lim \limits _{n \rightarrow \infty }\theta ^{(n)}_{u,v}=1\). This concludes the proof since therefore: \(\lim \limits _{n \rightarrow \infty } P_v^{D^{(n)}}=1\).
1.2 Proof of Proposition 2
If, for a given relationship \(I_{u,v} \in \mathscr {I}\) such that \(|\mathscr {D}^-_{u,v}|>0\), it exists in each \(D \in \mathscr {D}^+_{u,v}\) at least one user \(u' \in U^D_v \cap Preds_v\) such that \(|\mathscr {D}^-_{u',v}|=0\), we can deduce from Proposition 1 that:
In that case, we can state that, after a given iteration m, it exists a value \(x \in ]A_{u,v};1[\) such that \(\forall D \in \mathscr {D}^+_{u,v}: P^{D^{(n)}}_v > x\). Then, we know that: \(\forall n>m, \theta ^{(n+1)}_{u,v} < \theta ^{(n)}_{u,v}\frac{A_{u,v}}{x} = \gamma \theta ^{(n)}_{u,v}\), with \(\gamma = \frac{A_{u,v}}{x}\). Note that \(\gamma \in ]0;1[\) since \(x>A_{u,v}\). Let us consider now the following sequence V:
This sequence converges to its unique fixed point 0 since \(\gamma \in ]0;1[\). Since we know that: \(\forall n>m, \theta ^{(n)}_{u,v} \le V_n\) and that \(\theta ^{(n)}_{u,v}\) is lower-bounded by 0, then we get: \(\lim \limits _{n \rightarrow +\infty } \theta ^{(n)}_{u,v} = 0\).
1.3 Proof of Proposition 3
Proving that the solution given by (14), denoted hereafter \(\theta ^*_{u,v}\), is nonnegative is straightforward. Inequality \(\theta ^*_{u,v} \ge 0\) can indeed be transformed into \(\upbeta \ge \sqrt{\varDelta }\) whose both sides are nonnegative terms and which can thus be verified by considering its square: as \(\varDelta -\upbeta ^2 = - 4\lambda \gamma \le 0\), \(\upbeta ^2 \ge \varDelta\) is always true.
Proving that \(\theta ^*_{u,v} \le 1\) requires showing that \(\upbeta - \sqrt{\varDelta } \le 2\lambda\), which is equivalent to \(\upbeta -2\lambda \le \sqrt{\varDelta }\). If \(\lambda \ge (|\mathscr {D}_{u,v}^-|+|\mathscr {D}_{u,v}^+|)\), the verification of the latter is direct since in that case \(\upbeta -2\lambda \le 0\) (and we know that \(\sqrt{\varDelta } \ge 0\)). In the opposite case, both sides of the inequality are nonnegative. It is then possible to consider the square of the inequality: \((\upbeta - 2\lambda )^2 \le \varDelta\) is equivalent to \(|\mathscr {D}_{u,v}^-|+|\mathscr {D}_{u,v}^+|-\gamma \ge 0\), that is always true since we know that \(|\mathscr {D}_{u,v}^+| \ge \gamma\). Then, \(\theta ^*_{u,v}\) always lies in [0, 1].
Proving that the solution given by (14) can be used as an update rule at each maximization step for solving the estimator of formula (12) implies to show that it maximizes, for any pair (u, v), the quantity \(Q=\mathscr {Q}(\theta |\hat{\theta }) - \lambda \sum _{\theta _{u,v} \in \theta } \theta _{u,v}\). Since we already know that \(\theta ^*_{u,v}\) corresponds to one of the two possible solutions of the cancellation of the derivative of Q from Eq. (13), it suffices to show that it corresponds to a maximum. This can be easily verified by considering the second derivative of Q w.r.t. \(\theta _{u,v}\), which equals:
where \(\hat{\theta }^D_{u\rightarrow v}\) is a shortcut for \(\frac{\hat{\theta }_{u,v}}{\hat{P}^D_v}\). From this formulation, it is easy to see that the second derivative of Q w.r.t. \(\theta _{u,v}\) is always negative on ]0; 1[, which concludes the proof: taking \(\theta ^*_{u,v}\) as an update of \(\theta _{u,v}\) allows us to maximize Q at each step of the EM algorithm.
Rights and permissions
About this article
Cite this article
Lamprier, S., Bourigault, S. & Gallinari, P. Influence learning for cascade diffusion models: focus on partial orders of infections. Soc. Netw. Anal. Min. 6, 93 (2016). https://doi.org/10.1007/s13278-016-0406-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-016-0406-1