Skip to main content
Log in

Influence learning for cascade diffusion models: focus on partial orders of infections

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Probabilistic cascade models consider information diffusion as an iterative process in which information transits between users of a network. The problem of diffusion modeling then comes down to learning transmission probability distributions, depending on hidden influence relationships between users, in order to discover the main diffusion channels of the network. Various learning models have been proposed in the literature, but we argue that the diffusion mechanisms defined in most of these models are not well-adapted to deal with noisy diffusion events observed from real social networks, where transmissions of content occur between humans. Classical models usually have some difficulties for extracting the main regularities in such real-world settings. In this paper, we propose a relaxed learning process of the well-known independent cascade model that, rather than attempting to explain exact timestamps of users’ infections, focus on infection probabilities knowing sets of previously infected users. Furthermore, we propose a regularized learning scheme that allows the model to extract more generalizable transmission probabilities from training social data. Experiments show the effectiveness of our proposals, by considering the learned models for real-world prediction tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Throughout this paper, we indifferently talk of infection or contamination to denote the fact that the propagated content has reached a given user of the network.

  2. The extraction of diffusion sequences from the data, which may be not straightforward with non-binary participations to the diffusion or in the case of a polymorphic diffused content, is not of our concern here. We assume diffusion episodes already extracted by a preliminary process.

  3. The ending time of diffusion T is arbitrarily set to the infection time-stamp \(t^D(u)\) of the latest contaminated user u in the longest diffusion episode D.

  4. Note that the second term of formula 3 remains unchanged since this part does not depend on any latent factor and can be considered as it in the optimization process.

  5. In our setting, a counter-example of diffusion from user u to user v is an episode contained in \(\mathscr {D}_{u,v}^-\) (see formula 9): an episode where u is infected but v is not.

  6. Relation uv is considered only if there exists at least one diffusion episode in the training set where u is infected before v. With all approaches studied hereafter, relationships with no positive example would obtain a null weight anyway. They can therefore be ignored during the learning step.

References

  • Bourigault S, Lagnier C, Lamprier S, Denoyer L, Gallinari P (2014) Learning social network embeddings for predicting information diffusion. In: Carterette B, Diaz F, Castillo C, Metzler D (eds) WSDM '14 Proceedings of the 7th ACM international conference on Web search and data mining, ACM, New York, pp 393–402

  • Burton K, Java A, Soboro I (2009) The icwsm 2009 spinn3r dataset. In: Proceedings of the 3rd annual conference on weblogs and social media (ICWSM 2009), San Jose, 17–20 May 2009

  • Du N, Song L, Gomez-Rodriguez M, Zha H (2013) Scalable influence estimation in continuous-time diffusion networks. In: Burges C, Bottou L, Welling M, Ghahramani Z, Weinberger K (eds) Advances in neural information processing systems 26, Curran Associates Inc, pp 3147–3155

  • Goldenberg J, Libai B, Muller E (2001) Talk of the network: a complex systems look at the underlying process of word-of-mouth. Market lett 12(3):211–223

    Article  Google Scholar 

  • Gomez-Rodriguez M, Balduzzi D, Scholkopf B (2011) Uncovering the temporal dynamics of diffusion networks. In: Proceedings of the 28th international conference on machine learning (ICML-11), ICML'11, ACM, pp 561–568

  • Gomez-Rodriguez M, Leskovec J, Krause A (2010) Inferring networks of diffusion and influence. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD'10, ACM, New York

  • Granovetter M (1978) Threshold models of collective behavior. Am J Sociol 83(6):1420–1143

    Article  Google Scholar 

  • Gruhl D, Guha R, Liben-Nowell D, Tomkins A (2004) Information diffusion through blogspace. In: Proceedings of the 13th international conference on World Wide Web, WWW '04, ACM, New York, pp 491–501

  • Guille A, Hacid H (2012) A predictive model for the temporal dynamics of information diffusion in online social networks. In: Proceedings of the 21st international conference companion on World Wide Web, WWW '12, Companion, ACM

  • Kempe D, Kleinberg J, Tardos E (2003) Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '03, ACM, pp 137–146

  • Klimt B, Yang Y (2004) Introducing the Enron corpus. In: 1st conference on email and anti-spam (CEAS), 30–31 July 2004, Mountain View, California

  • Lagnier C, Denoyer L, Gaussier E, Gallinari P (2013) Predicting information diffusion in social networks using content and user's profiles. In: Serdyukov P, Braslavski P, Kuznetsov SO, Kamps J, Rüger S, Agichtein E, Segalovich I, Yilmaz E (eds) Advances in information retrieval, 35th European conference on IR research, ECIR 2013, Moscow, 24–27 Mar 2013, pp 74–85

  • Leskovec J, Backstrom L, Kleinberg J (2009) Meme-tracking and the dynamics of the news cycle. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '09, ACM, New York, pp 497–506

  • Ma H, Yang H, Lyu MR, King I (2008) Mining social networks using heat diffusion processes for marketing candidates selection. In: Proceedings of the 17th ACM conference on information and knowledge management, CIKM '08, ACM, New York, pp 233–242

  • Mathioudakis M, Bonchi F, Castillo C, Gionis A, Ukkonen A (2011) Sparsification of influence networks. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, KDD '11, ACM, New York, pp 529–537

  • Myers SA, Leskovec J (2010) On the convexity of latent social network inference. In: NIPS ’10 Proceedings of the 23rd international conference on neural information processing systems, Curran Associates Inc, USA, pp 1741–1749

  • Romero DM, Meeder B, Kleinberg J (2011) Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter. In: Proceedings of the 20th international conference on World wide web, ACM, pp 695–704

  • Saito K, Kimura M, Ohara K, Motoda H (2009) Learning continuous-time information diffusion model for social behavioral data analysis. In: Proceedings of the 1st Asian conference on machine learning: advances in machine learning, ACML '09, Springer, Berlin, pp 322–337

  • Saito K, Kimura M, Ohara K, Motoda H (2010) Generative models of information diffusion with asynchronous timedelay. JMLR: Workshop and Conference Proceedings 13, 2nd Asian conference on machine learning (ACML2010), Tokyo, 8–10 Nov 2010, pp 193–208

  • Saito K, Nakano R, Kimura M (2008) Prediction of information diffusion probabilities for independent cascade model. In: Proceedings of the 12th international conference on knowledge-based intelligent information and engineering systems, Part III, KES '08, Springer, pp 67–75

    Google Scholar 

  • Saito K, Ohara K, Yamagishi Y, Kimura M, Motoda H (2011) Learning diffusion probability based on node attributes in social networks. In: Kryszkiewicz M, Rybinski H, Skowron A, Ras ZW (eds) ISMIS, lecture notes in Computer Science, vol. 6804, Springer, pp 153–162

  • Szabo G, Huberman BA (2010) Predicting the popularity of online content. Commun ACM 53(8):80–88

    Article  Google Scholar 

  • Ver Steeg G, Galstyan A (2013) Information-theoretic measures of influence based on content dynamics. In: Proceedings of the sixth ACM international conference on Web search and data mining, WSDM '13, ACM, New York, pp 3–12

  • Wang F, Wang H, Xu K (2012) Diffusive logistic model towards predicting information diffusion in online social networks. In: Proceedings of the 2012 32nd international conference on distributed computing systems workshops, ICDCSW '12, IEEE Computer Society, pp 133–139

  • Wang L, Ermon S, Hopcroft JE (2012) Feature-enhanced probabilistic models for diffusion network inference. In: Proceedings of the 2012 European conference on machine learning and knowledge discovery in databases, vol Part II. ECML PKDD'12, Springer, pp 499–514

  • Yang J, Leskovec J (2010) Modeling information diffusion in implicit networks. In: Proceedings of the 2010 IEEE international conference on data mining, ICDM '10, IEEE Computer Society, Washington, pp 599–608

Download references

Acknowledgments

This work has been partially supported by the REQUEST project (projet Investissement d’avenir, 2014–2017) and the project ARESOS from the CNRS program MASTODONS.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sylvain Lamprier.

Appendix

Appendix

1.1 Proof of Proposition 1

Let us denote \(\theta _{u,v}^{(i)}\) the transmission probability from user u to user v at the i-th iteration of the learning process. Let also denote \(P^{D^{(i)}}_{v}\) the estimation of the infection probability of v in the episode D (computed using formula 1 using current transmission probabilities) at the i-th iteration of the learning process.

First, with \(A_{u,v}=\frac{|D^+_{u,v}|}{|D^+_{u,v}|+|D^-_{u,v}|}\), let us consider the following Lemma:

Lemma 1

$$\begin{aligned} \forall i \in \mathbb {N}, \quad \forall I_{u,v} \in \mathscr {I} : \left( \theta ^{(i)}_{u,v} \le A_{u,v} \right) \end{aligned}$$

Proof

Lemma 1 can be easily deduced from the update formula applied at each step of the learning process (Eq. 7), since we know from (Eq. 1) that \(\frac{\theta ^{(i)}_{u,v}}{P_v^{D^{(i)}}}\le 1\) for all \(I_{u,v} \in \mathscr {I}\) at every iteration \(i>0\) of the process. Note that, without loss of generality, for getting the lemma valid for \(i=0\), we assume that the probabilities \(\theta\) are all initialized such that for all \(I_{u,v} \in \mathscr {I}: \theta ^{(0)}_{u,v} \in [0,A_{u,v}]\). \(\square\)

Let’s now consider the following lemma:

Lemma 2

$$\begin{aligned} \forall I_{u,v} \in \mathscr {I} : (|\mathscr {D}_{u,v}^-|=0 \implies \forall i \in \mathbb {N} : (\theta ^{(i+1)}_{u,v} \ge \theta ^{(i)}_{u,v})) \end{aligned}$$

Proof

If \(|\mathscr {D}^-_{u,v}|=0\), we get, from formula 7:

$$\begin{aligned} \frac{\theta ^{(i+1)}_{u,v}}{\theta ^{(i)}_{u,v}} = \frac{1}{|\mathscr {D}^+_{u,v}|} {\sum _{D\in \mathscr {D}^+_{u,v}} \frac{1}{P_j^{D^{(i)}}} } \ge \frac{1}{|\mathscr {D}^+_{u,v}|} {\sum _{D\in \mathscr {D}^+_{u,v}} 1} = 1 \end{aligned}$$

where we used the fact that \(P_j^{D^{(i)}}\) is included in ]0; 1[. \(\square\)

For simplicity, let us now state \(I_v^D =(U^D_{v} \cap Preds_v)\). For every episode \(D \in \mathscr {D}\) and every user \(v \in U^D_{\infty }\), we have at any iteration i of the process:

$$\begin{aligned} P_v^{D^{(i)}}= & {} 1- \prod _{u \in I_v^D} (1-\theta ^{(i)}_{u,v}) \\= & {} 1- \prod _{u \in I_v^D, |\mathscr {D}_{u,v}^-|>0} (1-\theta ^{(i)}_{u,v}) \prod _{u \in I_v^D, |\mathscr {D}_{u,v}^-|=0} (1-\theta ^{(i)}_{u,v}) \\\le & {} 1- \prod _{u \in I_v^D, |\mathscr {D}_{u,v}^-|>0} (1-A_{u,v}) \prod _{u \in I_v^D, |\mathscr {D}_{u,v}^-|=0} (1-\theta ^{(i)}_{u,v}) \end{aligned}$$

Let state \(B_v^D = \prod \limits _{u \in I_v^D, |\mathscr {D}_{u,v}^-|>0} (1-A_{u,v})\). Note that \(B_v^D\) is a constant over the whole learning process. Now, let’s consider the case of the proposition, where it exists at least one user \(u \in I_j^D\) such that \(|\mathscr {D}_{u,v}^-|=0\). In that case, we can rewrite the inequality as :

$$\begin{aligned} P_v^{D^{(i)}}\le & {} 1 - B_v^D (1-\theta ^{(i)}_{u,v}) \prod _{u' \in I_v^D\setminus \{u\}, |\mathscr {D}_{u',v}^-|=0} (1-\theta ^{(i)}_{u',v}) \nonumber \\\le & {} 1 - B_v^D (1-\theta ^{(i)}_{u,v}) \left(1 - \max \limits _{u' \in I_v^D\setminus \{u\}, |\mathscr {D}_{u',v}^-|=0} \theta ^{(i)}_{u',v}\right)^{|\{u' \in I_v^D\setminus \{u\}, |\mathscr {D}_{u',v}^-|=0\}|} \end{aligned}$$
(19)

Now, let us consider the sequence V defined as:

$$\begin{aligned} V_n=\left(1 - \max \limits _{u' \in I_v^D\setminus \{u\}, |\mathscr {D}_{u',v}^-|=0} \theta ^{(n)}_{u',v}\right)^{|\{u' \in I_v^D\setminus \{u\}, |\mathscr {D}_{u',v}^-|=0\}|} \end{aligned}$$

From Lemma 2, we know that V is decreasing, since any component of the max function does not own any counter-example in the training set. Moreover, This sequence is lower-bounded by 0. Then, V converges toward its fixed point, which we denote as l. From this, two possibilities: either l equals 0 or is strictly \({>}0\).

If \(l=0\), then we know that:

$$\begin{aligned} \lim \limits _{n \rightarrow \infty } \max \limits _{u' \in I_v^D\setminus \{u\}, |\mathscr {D}_{u',v}^-|=0} \theta ^{(n)}_{u',v} = 1 \end{aligned}$$

Now, the formula 1 leads to know that, at every iteration i, \(\forall u' \in I_v^D: P_v^{D^{(i)}} \ge \theta ^{(i)}_{u',v}\). Therefore, at every iteration i, we have: \(P_v^{D^{(i)}} \ge \max \limits _{u' \in I_v^D\setminus \{u\}, |\mathscr {D}_{u',v}^-|=0} \theta ^{(i)}_{u',v}\). Since we know that \(P_v^{D^{(i)}}\) is also upper-bounded by 1 at every iteration i, we can state that, in that case, \(\lim \limits _{n \rightarrow \infty } P_v^{D^{(n)}}=1\).

Else, we have at every iteration i:

$$\begin{aligned} \left(1-\max \limits _{u' \in I_v^D\setminus \{u\}, |\mathscr {D}_{u',v}^-|=0} \theta _{u',v}^{(i)}\right)^{|\{u' \in I_v^D\setminus \{u\}, |\mathscr {D}_{u',v}^-|=0\}|} \ge l \end{aligned}$$

Plugging this in inequality (19), we get for every i:

$$\begin{aligned} P_v^{D^{(i)}} \le 1- l B_j^D (1-\theta ^{(i)}_{u,v}) \le 1- \lambda + \lambda \theta ^{(i)}_{u,v} \end{aligned}$$

with \(\lambda = l B_v^D\). Then, we can rewrite the update formula 7 as:

$$\begin{aligned} \theta ^{(i+1)}_{u,v} = \frac{\sum _{D'\in \mathscr {D}^+_{u,v} \setminus D} \frac{\theta ^{(i)}_{u,v}}{P_v^{D'^{(i)}}} + \frac{\theta ^{(i)}_{u,v}}{P_v^{D^{(n)}}}}{|\mathscr {D}^+_{u,v}|} \ge \frac{(|\mathscr {D}^+_{u,v}|-1) \theta ^{(i)}_{u,v} + \frac{\theta ^{(i)}_{u,v}}{1- \lambda + \lambda \theta _{u,v}}}{|\mathscr {D}^+_{u,v}|} \end{aligned}$$
(20)

Let us consider now the sequence W such that:

$$\begin{aligned} {\left\{ \begin{array}{l} W_0 = \theta _{u,v}^{(0)} \\ W_{n+1} = \frac{(|\mathscr {D}^+_{u,v}|-1) W_{n} + \frac{W_{n}}{1-\lambda + \lambda W_{n}}}{|\mathscr {D}^+_{u,v}|} \end{array}\right. } \end{aligned}$$

Then, since W takes its values in ]0; 1[, and that \(\lambda\) is also in ]0; 1[, we can state that:

$$\begin{aligned} \frac{W_{n+1}}{W_{n}} = \frac{|\mathscr {D}^+_{u,v}|-1+\frac{1}{1-\lambda + \lambda W_{n}}}{|\mathscr {D}^+_{u,v}|} > 1 \end{aligned}$$

The sequence is thus strictly increasing. Since it is upper-bounded by its fixed point 1, we know that it converges to 1. Now, since we know that, from inequality (20), \(\forall n : \theta _{u,v}^{(n)} \ge W_n\), we can get that \(\lim \limits _{n \rightarrow \infty }\theta ^{(n)}_{u,v}=1\). This concludes the proof since therefore: \(\lim \limits _{n \rightarrow \infty } P_v^{D^{(n)}}=1\).

1.2 Proof of Proposition 2

If, for a given relationship \(I_{u,v} \in \mathscr {I}\) such that \(|\mathscr {D}^-_{u,v}|>0\), it exists in each \(D \in \mathscr {D}^+_{u,v}\) at least one user \(u' \in U^D_v \cap Preds_v\) such that \(|\mathscr {D}^-_{u',v}|=0\), we can deduce from Proposition 1 that:

$$\begin{aligned} \forall D \in \mathscr {D}^+_{u,v} : \lim _{n \rightarrow +\infty } P^{D^{(n)}}_v = 1 \end{aligned}$$

In that case, we can state that, after a given iteration m, it exists a value \(x \in ]A_{u,v};1[\) such that \(\forall D \in \mathscr {D}^+_{u,v}: P^{D^{(n)}}_v > x\). Then, we know that: \(\forall n>m, \theta ^{(n+1)}_{u,v} < \theta ^{(n)}_{u,v}\frac{A_{u,v}}{x} = \gamma \theta ^{(n)}_{u,v}\), with \(\gamma = \frac{A_{u,v}}{x}\). Note that \(\gamma \in ]0;1[\) since \(x>A_{u,v}\). Let us consider now the following sequence V:

$$\begin{aligned} {\left\{ \begin{array}{ll} V_0 = \theta _{u,v}^{(0)} &{} \\ V_{n+1} = \gamma V_n \end{array}\right. } \end{aligned}$$

This sequence converges to its unique fixed point 0 since \(\gamma \in ]0;1[\). Since we know that: \(\forall n>m, \theta ^{(n)}_{u,v} \le V_n\) and that \(\theta ^{(n)}_{u,v}\) is lower-bounded by 0, then we get: \(\lim \limits _{n \rightarrow +\infty } \theta ^{(n)}_{u,v} = 0\).

1.3 Proof of Proposition 3

Proving that the solution given by (14), denoted hereafter \(\theta ^*_{u,v}\), is nonnegative is straightforward. Inequality \(\theta ^*_{u,v} \ge 0\) can indeed be transformed into \(\upbeta \ge \sqrt{\varDelta }\) whose both sides are nonnegative terms and which can thus be verified by considering its square: as \(\varDelta -\upbeta ^2 = - 4\lambda \gamma \le 0\), \(\upbeta ^2 \ge \varDelta\) is always true.

Proving that \(\theta ^*_{u,v} \le 1\) requires showing that \(\upbeta - \sqrt{\varDelta } \le 2\lambda\), which is equivalent to \(\upbeta -2\lambda \le \sqrt{\varDelta }\). If \(\lambda \ge (|\mathscr {D}_{u,v}^-|+|\mathscr {D}_{u,v}^+|)\), the verification of the latter is direct since in that case \(\upbeta -2\lambda \le 0\) (and we know that \(\sqrt{\varDelta } \ge 0\)). In the opposite case, both sides of the inequality are nonnegative. It is then possible to consider the square of the inequality: \((\upbeta - 2\lambda )^2 \le \varDelta\) is equivalent to \(|\mathscr {D}_{u,v}^-|+|\mathscr {D}_{u,v}^+|-\gamma \ge 0\), that is always true since we know that \(|\mathscr {D}_{u,v}^+| \ge \gamma\). Then, \(\theta ^*_{u,v}\) always lies in [0, 1].

Proving that the solution given by (14) can be used as an update rule at each maximization step for solving the estimator of formula (12) implies to show that it maximizes, for any pair (uv), the quantity \(Q=\mathscr {Q}(\theta |\hat{\theta }) - \lambda \sum _{\theta _{u,v} \in \theta } \theta _{u,v}\). Since we already know that \(\theta ^*_{u,v}\) corresponds to one of the two possible solutions of the cancellation of the derivative of Q from Eq. (13), it suffices to show that it corresponds to a maximum. This can be easily verified by considering the second derivative of Q w.r.t. \(\theta _{u,v}\), which equals:

$$\begin{aligned} \frac{\partial Q}{\partial \theta _{u,v}^2}=-\sum \limits _{D \in \mathscr {D}_{u,v}^+} \left(\frac{\hat{\theta }^D_{u\rightarrow v}}{\theta _{u,v}^2} + \frac{(1-\hat{\theta }^D_{u\rightarrow v})}{(1-\theta _{u,v})^2}\right) - \sum \limits _{D \in \mathscr {D}_{u,v}^-} \frac{1}{(1-\theta _{u,v})^2} \end{aligned}$$

where \(\hat{\theta }^D_{u\rightarrow v}\) is a shortcut for \(\frac{\hat{\theta }_{u,v}}{\hat{P}^D_v}\). From this formulation, it is easy to see that the second derivative of Q w.r.t. \(\theta _{u,v}\) is always negative on ]0; 1[, which concludes the proof: taking \(\theta ^*_{u,v}\) as an update of \(\theta _{u,v}\) allows us to maximize Q at each step of the EM algorithm.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lamprier, S., Bourigault, S. & Gallinari, P. Influence learning for cascade diffusion models: focus on partial orders of infections. Soc. Netw. Anal. Min. 6, 93 (2016). https://doi.org/10.1007/s13278-016-0406-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-016-0406-1

Keywords

Navigation