1 Introduction

Nonnegative matrix factorization (NMF) methods (Paatero and Tapper 1994; Lee and Seung 2001) aim to find a latent representation of a positive \(n\times m\) matrix A as a sum of K nonnegative factors. For integer-valued data, Poisson factorization models (Dunson and Herring 2005) offer a flexible probabilistic framework for nonnegative matrix factorization and have found wide applicability in signal processing (Virtanen et al. 2008; Cemgil 2009) or recommender systems (Ma et al. 2011; Gopalan et al. 2015). In this paper, we focus on the application to network analysis, where \(m=n\) and the \(n\times n \) count matrix A, the adjacency matrix, represents the number of directed or undirected interactions between n individuals; the latent factors may be interpreted as latent and potentially overlapping communities (Ball et al. 2011), such as sport team members or other social activities circles. We also consider binary data where the matrix represents the existence or absence of a directed or undirected link between individuals. The estimated latent factors can be used for the prediction of missing links/interactions, or for interpretation of the uncovered latent community structure.

Poisson factorization approaches require the user to set the number K of latent factors, which is typically assumed to be independent of the sample size n. To address this problem, Zhou et al. (2012), Gopalan et al. (2014) and Zhou (2015) proposed Bayesian nonparametric approaches that allow the number of latent factors to be estimated from the data, and to grow unboundedly with the size n of the matrix. In particular, Gopalan et al. (2014) and Zhou (2015) considered a Poisson factorization model

$$\begin{aligned} A_{ij}\sim {{\,\mathrm{Poisson}\,}}\left( \sum _{k=1}^{+\infty } r_k v_{ik}v_{jk}\right) ,~1\le i,j\le n \end{aligned}$$
(1)

where the positive weights \((r_k)_{k\ge 1}\) represent the importance of community k, and \(v_{ik}>0\) represents the level of affiliation of individual i to community k. Gopalan et al. (2014) and Zhou (2015), extending work from Titsias (2008), assume that the weights \((r_k)\) are the jumps of a gamma process, ensuring the sum in Eq. (1) is almost surely finite. Using properties of Poisson random variables, the model (1) can be equivalently represented as

$$\begin{aligned} A_{ij}&=\sum _{k=1}^{+\infty } Z_{ijk} \end{aligned}$$
(2)
$$\begin{aligned} Z_{ijk}&\sim {{\,\mathrm{Poisson}\,}}(r_k v_{ik}v_{jk}),~k=1,2,\ldots \end{aligned}$$
(3)

for \(1\le i,j\le n\). The latent count variables \(Z_{ijk}\) may be interpreted as the number of latent interactions between two individuals i and j via community k, the overall number \(A_{ij}\) of interactions being the sum of those community interactions. For example, two members of the same company who also play sport together may meet five times at the company, and twice at the sport center, resulting in seven interactions overall. The overall number

$$\begin{aligned} K_n=\sum _{k=1}^{+\infty } \mathbbm {1}_{\sum _{1\le i, j\le n} Z_{ijk}>0} \end{aligned}$$
(4)

of communities k that generated at least one interaction between the n individuals is termed the number of active communities. For the gamma process Poisson factor model (Zhou 2015), the number of active communities \(K_n\) grows logarithmically with the number n of individuals. The logarithmic growth assumption may be too restrictive. For example, the number of active communities may actually be unknown but bounded above; alternatively, it may increase at a rate faster or slower than logarithmic.

In this paper, we consider generalizations of the gamma process Poisson factorization model, using completely random measures (CRM)  (Kingman 1967). CRMs offer a flexible and tractable modeling framework (Lijoi and Prünster 2010). The proposed models fit in the class of multivariate generalized Indian Buffet process priors recently developed by James (2017) and are also related to compound completely random measures (Griffin and Leisen 2017). We consider that \((r_k)\) are the points of Poisson point process with mean measure \(\rho \). Depending on the properties of this measure, the number of active communities \(K_n\) is either (i) bounded, with a random upper bound, (ii) unbounded and grows sub-polynomially (e.g., \(\log n\) or \(\log \log n\)) or (iii) unbounded and grows as \(n^{2 \sigma }\), for some \(\sigma \in (0,1)\). For the implementation, we focus in particular on the generalized gamma process (Brix 1999) where a single parameter flexibly controls all three behaviors.

The article is organized as follows. In Sect. 2, we describe the statistical model for count and binary matrices. The asymptotic properties of the model are derived in Sect. 3. In particular, we relate the asymptotic growth of the number of active features to the regular variation properties of the measure \(\rho \). In Sect. 4, we derive a Markov chain Monte Carlo algorithm for posterior inference that does not require any approximation to the original model. In Sect. 5, we consider applications of our approach to overlapping community detection and link detection in networks, considering real network data with up to tens of thousands of nodes.

2 Statistical model for count and binary data

2.1 General construction

We present here the model for directed count or binary observations, but the model can be straightforwardly adapted to undirected interactions. Let \((r_k)_{k=1,2\ldots ,}\) be the points of a Poisson point process with \(\sigma \)-finite mean measure \(\rho \) on \((0,+\infty )\), and assume that \(v_{ik}\), \(i=1,\ldots ,n\), \(k\ge 1\), are independent and identically distributed from some probability distribution F on \({\mathbb {R}}_+=[0,+\infty )\). The variable \(v_{ik}\) can be interpreted as the level of affiliation of an individual i to community k, and \(r_k\) to the importance of that community.

For count data \((A_{ij})\), where \(A_{ij}\) denotes the number of directed interactions from node i to node j, we consider the Poisson factorization model

$$\begin{aligned} A_{ij}\mid (r_k,v_{ik})\sim {{\,\mathrm{Poisson}\,}}\left( \sum _{k=1}^{+\infty } r_k v_{ik}v_{jk}\right) ,~1\le i,j\le n.\nonumber \\ \end{aligned}$$
(5)

Denoting \(\varLambda _{ij}=\sum _{k=1}^{+\infty } r_k v_{ik}v_{jk}\) the Poisson rate for \(A_{ij}\), the \(n\times n\) rate matrix \(\varLambda ^{(n)}=(\varLambda _{ij})_{1\le i,j\le n}\) admits the following factorization as an infinite sum of rank-1 matrices

$$\begin{aligned} \varLambda ^{(n)}=\sum _{k=1}^{+\infty } r_k v_{1:n,k}v^\intercal _{1:n,k} \end{aligned}$$

where \(v_{1:n,k}=(v_{1k},\ldots ,v_{nk})^\intercal \). For the model to be well specified, the sum in the right-hand side of Eq. (5) needs to be almost surely finite. A necessary and sufficient condition is

$$\begin{aligned}&\iint (1-e^{-rv^2})\rho (dr)F(dv)<+\infty ~~\text { and }\nonumber \\&~~\iint (1-e^{-rv_1 v_2})\rho (dr)F(dv_1)F(dv_2)<+\infty . \end{aligned}$$
(6)

A sufficient set of conditionsFootnote 1, which we will assume to hold in the rest of this article, is that \(\rho \) is a Lévy measure and F has finite second moment, that is

$$\begin{aligned} \int _0^{+\infty }(1-e^{-r})\rho (dr)&<+\infty ~~\text { and } \end{aligned}$$
(A1)
$$\begin{aligned} \int _0^{+\infty } v^2F(dv)&<+\infty . \end{aligned}$$
(A2)

In this case, denoting \(\delta _v\) the Dirac measure at vector v, the community affiliations and weights for n nodes can be conveniently represented by a completely random measure

$$\begin{aligned} G=\sum _{k\ge 1} r_k\delta _{v_{1:n,k}} \end{aligned}$$
(7)

on \({\mathbb {R}}_+^n\) with mean measure \(\rho (dr)F^{\bigotimes ^n}(dv_1,\ldots ,dv_n)\) where \(F^{\bigotimes ^n}\) denotes the nth product measure of F; see Kingman (1967) and Lijoi and Prünster (2010) for background on CRMs and their applications. If the Lévy measure is finite, that is, if

$$\begin{aligned} \int _0^{+\infty } \rho (dr)<+\infty \end{aligned}$$

then the number of points \((r_k)\), and therefore the number of communities, is almost surely finite. Otherwise, when \(\int \rho (dr)=+\infty \), the number of communities is infinite.

When we have binary observation \((Y_{ij})\), we treat the count matrix \((A_{ij})\) as a latent variable, and consider that \(Y_{ij}=\mathbbm {1}_{A_{ij}>0}\) as in (Caron and Fox 2017; Zhou 2015), where \(\mathbbm {1}\) is the indicator function. Integrating out \((A_{ij})\), this leads to the following model for binary observations

$$\begin{aligned}&Y_{ij}\mid (r_k,v_{ik})\sim {{\,\mathrm{Ber}\,}}\left( 1-\exp \left[ \sum _{k=1}^{+\infty } r_k v_{ik}v_{jk}\right] \right) ,\nonumber \\&~1\le i,j\le n. \end{aligned}$$
(8)

2.2 Specific model

In the inference and experimental part, we use the following choice for the \(\rho \) and F. The Lévy measure \(\rho \) is taken to be that of a generalized gamma process (GGP, see Hougaard (1986), Brix (1999), James (2002), Pitman (2003))

$$\begin{aligned} \rho (dr) = \frac{\kappa }{\varGamma (1-\sigma _0)} r^{-1-\sigma } e^{-\tau r} dr \end{aligned}$$
(9)

where \(\sigma _0\in (-\infty , 1)\), \(\kappa >0\) and \(\tau >0\). When \(\sigma _0=0\), we obtain a gamma process, and the model corresponds to that of Zhou (2015). When \(\sigma _0 < 0\), the Lévy measure is finite, while when \(\sigma _0 \ge 0\), the Lévy measure is infinite.

Concerning the affiliations, we will assume that F is a gamma distribution with parameters \(\alpha >0\) and \(\beta >0\). That is, the probability density function (pdf) f is given by

$$\begin{aligned} f(v) = \frac{\beta ^\alpha }{\varGamma (\alpha )} v^{\alpha -1} e^{-\beta v} \end{aligned}$$

where \(\varGamma \) denotes the usual gamma function. The hyperparameters \((\kappa ,\sigma _0,\tau ,\alpha ,\beta )\) and \((\kappa ^\prime =\kappa / \beta ^{2\sigma _0},\) \(\sigma _0, \tau ^\prime =\tau \beta ^2,\alpha ,1)\) induce the same distribution for the latent factors \((\varLambda _{ij})\). In order to guarantee the identifiability of the hyperparameters, we therefore set \(\beta =1\).

2.3 Related work

Several network models building on latent factors have been proposed in the last years and have proven to be very useful tools (Hoff et al. 2002; Airoldi et al. 2008; Hoff 2009; Durante and Dunson 2014). In general, these models differ from Poisson factor models since they use a different likelihood for the connections. However, they share a similar approach: every node i is embedded in \({\mathbb {R}}_+^{K}\) (where K is the number of latent factors or communities), resulting in a latent representation \(X_i\) quantifying the affiliation of node i to each latent factor. Then, the probability of an edge (ij) is function of the similarity between \(X_i\) and \(X_j\).

The model introduced in this section can be seen from different perspectives that nicely connect it to the existing literature. First, the model can be seen as obtained from a functional of a CRM. Recall the definition of the CRM G in Eq. (7). Define the \(n\times n\) matrix \(\varLambda ^{(n)}\) as the following functional of G

$$\begin{aligned} \varLambda ^{(n)}=\int _{(0,+\infty )^n} h(u)G(du)=\sum _{k\ge 1} r_k v_{1:n,k} v^\intercal _{1:n,k} \end{aligned}$$

where \(h(u)=u u^\intercal \). Alternatively, this can be interpreted in the framework of compound completely random measures (Griffin and Leisen 2017). For each \(1\le i,j\le n\), denote \(G_{ij}=\sum _{k\ge 1} r_k v_{ik}v_{jk}\delta _{\zeta _{k}}\) where \(\zeta _k\) are some community locations in some domain \(\varTheta \), iid from some distribution H, irrelevant here. Then, \((G_{ij})_{1\le i,j\le n}\) are compound CRMs on \(\varTheta \) and \(\varLambda _{ij}=G_{ij}(\varTheta )\). In the same vein, the model can also be interpreted as an instance of the class of Generalized Indian Buffet Processes introduced by James (2017), where the Bernoulli likelihood (of the classical IBP) is replaced by any likelihood as long as the observation can take the value 0 with strictly positive probability. More precisely, if we denote \(Z^{(n)}_k\) the \(n\times n\) matrix with entries \(Z_{ijk}\), then the matrix-valued process \(\sum _{k\ge 1} Z_k^{(n)}\delta _{\zeta _k}\) is a draw from a generalized multivariate Indian buffet process.

Finally, as mentioned in the introduction, the model admits as a special case the Poisson factorization based on the gamma process of Zhou (2015).

3 Asymptotic Properties

In this section, we study the asymptotic properties of the proposed class of models, and in particular the growth rate of the number of active communities as the sample size n grows, and the asymptotic proportion of communities of a given size. For a given sequence \((r_k)_{k\ge 1}\) and \((v_{ik})_{i\ge 1,k\ge 1}\), denote \(A^{(n)}_{ij}\) and \(Z^{(n)}_{ijk}\) where \(n\ge 1\), \(1\le i,j\le n\), \(k\ge 1\), respectively, the number of directed interactions and the number of community directed interactions distributed from Eqs. (2) and (3). We consider two different asymptotic settings

  • Unconstrained setting. This setting is more general, and we only assume that \(A^{(n)}_{ij}\) and \(Z^{(n)}_{ijk}\) are marginally sampled from Eqs. (2) and (3).

  • Constrained setting. For any \(1\le m \le n\), and \(1\le i,j \le m\), \(A_{ij}^{(n)}=A_{ij}^{(m)}\). In this setting, we suppose that the connections between the already observed nodes remain unchanged. It is equivalent to assuming that there is an infinite but fixed graph and \(A^{(n)}\) represents the connections between the n first nodes of that graph.

All the results of this section, otherwise stated, hold for the unconstrained setting. We indicate when a stronger result holds in the constrained setting. All proofs are given in Appendix A.

3.1 General model

Let \(d^{(n)}_k\) be the degree of the community/feature k, corresponding to the number of interactions amongst n individuals due to community k, and defined as

$$\begin{aligned} d^{(n)}_k = \sum \limits _{1\le i,j\le n} Z^{(n)}_{ijk}. \end{aligned}$$
(10)

A community is active if \(d^{(n)}_k\ge 1\). The number of active communities is therefore defined as

$$\begin{aligned} K_n = \sum \limits _{k=1}^{+\infty } \mathbbm {1}_{d^{(n)}_k \ge 1} \end{aligned}$$
(11)

Denote \(K_{n,j}\) the number of communities with degree \(j\ge 1\)

$$\begin{aligned} K_{n,j} = \sum \limits _{k=1}^{+\infty } \mathbbm {1}_{d^{(n)}_k = j} \end{aligned}$$

Note that under the constrained setting, \(d^{(n)}_{k}\), \(K_n\) and \(\sum _{\ell \ge j}K_{n,\ell }\) are all almost surely non-decreasing with the sample size n, whereas this is not necessarily the case for the unconstrained setting.

Proposition 1

Under Assumptions (A1) and (A2), the number of active communities \(K_n\) is a Poisson random variable with mean

$$\begin{aligned} \varPsi (n)= & {} \int \left( 1-e^{-r (\sum _{i=1}^n v_i)^2 } \right) \left[ \prod \limits _{i=1}^n F(dv_i) \right] \nonumber \\&\rho (dr) < +\infty . \end{aligned}$$
(12)

The number \(K_{n,j}\) of communities with degree j is also Poisson distributed, with mean

$$\begin{aligned} \varPsi _j(n)=\frac{1}{j!} \int r^j \left( \sum _{i=1}^n v_i\right) ^{2j} e^{-r \left( \sum _{i=1}^n v_i\right) ^2 } \left[ \prod \limits _{i=1}^n F(dv_i) \right] \ \rho (dr).\nonumber \\ \end{aligned}$$
(13)

Finally, for \(j\ge 1\), \(\sum \limits _{k \ge j} K_{n,k}\), the number of communities with degree at least j is also Poisson distributed with mean \(\sum \limits _{k \ge j} \varPsi _k(n)\).

In the rest of the section, we relate the asymptotic behavior of quantities of interest to the properties of the mean measure \(\rho \). Let consider the tail Lévy intensity defined as

$$\begin{aligned} \forall x > 0,\ \overline{\rho }(x) = \int _{x}^{+\infty } \rho (dr). \end{aligned}$$

We assume that \(\overline{\rho }\) is a regularly varying function at 0, that is

$$\begin{aligned} \overline{\rho }(x) \asymp x^{-\sigma } \ell (1/x)\text { as }x\rightarrow 0 \end{aligned}$$
(A4)

where \(\sigma \in [0,1)\) and \(\ell \) is a slowly varying function verifying \(\lim _{t \rightarrow +\infty } \ell (at)/\ell (t) = 1\) for all \(a>0\). Besides, we write \(a(x) \asymp b(x)\) if \(\lim a(x)/b(x) = 1\). Examples of slowly varying functions include functions converging to a constant, \(\log ^a t\) for any t, \(\log \log t\), etc. Note that the CRM is finite activity if and only if \(\sigma =0\) and \(\ell (t)\rightarrow C<+\infty \).

Now, let us consider the asymptotic behavior of the number of active communities \(K_n\).

Proposition 2

Let \(K_n\) be the number of active communities. Then for \(0\le \sigma < 1\),

$$\begin{aligned} {\mathbb {E}}[K_n] \asymp \varGamma (1-\sigma ) m_f^{2\sigma } n^{2\sigma } \ell (n^2) \end{aligned}$$
(14)

as n tends to infinity, where \(m_f = \int v F(dv)\). Additionally, for \(0< \sigma < 1\),

$$\begin{aligned} K_n \asymp {\mathbb {E}}[K_{n}] \ \ \text {a.s.} \end{aligned}$$
(15)

If we further assume that the sequence \((K_n)_{n\ge 1}\) is almost surely non-decreasing (as in the constrained setting), then (15) holds for \(\sigma = 0\) and \(\ell (t)\rightarrow +\infty \) as well. In the finite activity case, that is \(\sigma =0\) and \(\ell (t)\rightarrow {\overline{\rho }}(0)=\int _0^{+\infty } \rho (dr)<\infty \), we have

$$\begin{aligned} K_n\rightarrow K_\infty \end{aligned}$$

as n tends to infinity, where \(K_\infty \) is a Poisson random variable with mean \({\overline{\rho }}(0)\). The above convergence holds in distribution for the unconstrained setting and almost surely for the constrained setting.

Proposition 3

Let \(K_{n,j}\) be the number of communities of degree j. Then for \( 0< \sigma < 1\) and any \(j\ge 1\),

$$\begin{aligned} K_{n,j} \asymp \frac{\sigma \varGamma (j-\sigma ) }{j!} m_f^{2\sigma } n^{2 \sigma } \ell (n^2)\ \ \text {a.s.} \end{aligned}$$
(16)

as n tends to infinity. Therefore,

$$\begin{aligned} \frac{K_{n,j}}{K_n} \rightarrow \frac{\sigma \varGamma (j-\sigma )}{ \varGamma (1-\sigma ) j!}\ \ \text {a.s.} \end{aligned}$$
(17)

as n tends to infinity. This corresponds to a power-law behavior as

$$\begin{aligned} \frac{\sigma \varGamma (j-\sigma )}{ \varGamma (1-\sigma ) j!} \asymp \frac{\sigma }{j^{\sigma +1}} \end{aligned}$$

for large j. If we further assume that for all \(k \ge 1\), \(\left( \sum \limits _{j \ge k} K_{n,j}\right) _{n\ge 1}\) is non-decreasing (constrained setting), then (17) holds also for \(\sigma = 0\) and \(\ell (t)\rightarrow +\infty \).

It has been observed empirically that in many networks, the distribution of the sizes of the communities displays a power-law behavior \(f_S(s) \sim s^{-1-\sigma }\) where \(f_S\) is the distribution of community sizes and \(\sigma > 0\) [(see for example (Stegehuis et al. 2016; Radicchi et al. 2004; Clauset et al. 2004; Arenas et al. 2004)]. As stated in Proposition 17, this property cannot be captured in the framework of Zhou (2015) for example where \(\sigma = 0\) is constant. These empirical observations seem to indicate that models with flexible \(\sigma \) are needed.

Finally, let \(c^{(n)}(k,k')\) denote the cosine between the corresponding affiliation vectors

$$\begin{aligned} c^{(n)}(k,k') = \frac{\sum _{i=1}^n v_{ik}v_{ik'}}{\sqrt{\sum _i v_{ik}^2}\sqrt{\sum _i v_{ik'}^2}} . \end{aligned}$$

This coefficient gives a measure of the overlap between two communities k and \(k'\). By the law of large numbers, for any \(k\ne k'\),

$$\begin{aligned} c^{(n)}(k,k'){\asymp } \frac{(\int v F(dv))^2}{\int v^2 F(dv)} \ \ \textit{ a.s. as }n\rightarrow +\infty . \end{aligned}$$

3.2 Specific case of the GGP

In the case of the GGP, we have

$$\begin{aligned} {\overline{\rho }} (x)=\frac{\kappa \tau ^{\sigma _0}\varGamma (-\sigma _0,\tau x)}{\varGamma (1-\sigma _0)}\asymp \left\{ \begin{array}{ll} -\frac{\kappa \tau ^{\sigma _0}}{\sigma _0} &{}\quad \text {if }\sigma _0<0 \\ \kappa \log (1/x) &{}\quad \text {if }\sigma _0=0 \\ \frac{\kappa x^{-\sigma _0}}{\sigma _0\varGamma (1-\sigma _0)} &{}\quad \text {if }\sigma _0>0 \end{array}\right. \end{aligned}$$

as x tends to 0, where \(\varGamma (a,x)\) is the incomplete gamma function. Note that \({\overline{\rho }} (x)\) is of the form \(x^{-\sigma }\ell (1/x)\) where \(\sigma =\max (0,\sigma _0)\) and

$$\begin{aligned} \ell (t)=\left\{ \begin{array}{ll} -\frac{\kappa \tau ^{\sigma _0}}{\sigma _0} &{}\quad \text {if }\sigma _0<0 \\ \kappa \log (t) &{}\quad \text {if }\sigma _0=0 \\ \frac{\kappa }{\sigma _0\varGamma (1-\sigma _0)} &{}\quad \text {if }\sigma _0>0 \end{array}\right. \end{aligned}$$

is a slowly varying function at infinity. The results of the previous subsection therefore apply. For simplicity, we state the results for the constrained setting. We have, almost surely as \(n\rightarrow +\infty \)

$$\begin{aligned} K_n\asymp \left\{ \begin{array}{ll} K_\infty &{}\quad \text {if }\sigma _0<0 \\ 2\kappa \log (n) &{}\quad \text {if }\sigma _0=0 \\ \kappa \alpha ^{2\sigma _0}n^{2\sigma _0} /\sigma _0 &{}\quad \text {if }\sigma _0>0 \end{array}\right. \end{aligned}$$

where \(K_\infty \sim {{\,\mathrm{Poisson}\,}}(-\kappa \tau ^{\sigma _0}/\sigma _0)\). Additionally, for \(\sigma \ge 0\),

$$\begin{aligned} \frac{K_{n,j}}{K_n}\rightarrow \frac{\sigma _0 \varGamma (j-\sigma _0)}{ \varGamma (1-\sigma _0) j!} \end{aligned}$$

almost surely as \(n\rightarrow +\infty \). Finally,

$$\begin{aligned} c^{(n)}(k,k')\rightarrow \frac{\alpha }{\alpha +1}. \end{aligned}$$

Therefore, \(\sigma _0\) governs the asymptotic behavior of the number of active communities. \(K_n\) is bounded with a random upper bound (\(\sigma _0<0\)), increases logarithmically (\(\sigma _0=0\)) or polynomially (\(\sigma _0>0\)). In the polynomial case, \(\sigma _0\) also controls the power-law exponent of the proportion of communities of a given size. The parameter \(\kappa \) is an overall linear scaling parameter. Finally, the parameter \(\alpha \) governs the amount of overlapping between two communities.

4 Simulation, posterior characterization and inference

In this section, we describe the marginal distribution and conditional characterization of the model. Building on these, we derive an exact sampler for simulating from the model, and a Markov chain Monte Carlo algorithm to approximate the posterior distribution. Importantly, the sampler targets the distribution of interest and does not require any truncation or approximation. For simplicity of exposition, we assume that \(\rho \) and F are absolutely continuous with respect to the Lebesgue measure, with \(\rho (dr)=\rho (r)dr\) and \(F(dx)=f(x)dx\).

4.1 Marginal distribution and simulation

For a fixed n, recall that \(K_n\) denotes the number of active communities. Let \((({\widetilde{r}}_1,{\widetilde{v}}_{1:n,1}),\ldots ,\) \(({\widetilde{r}}_{K_n},{\widetilde{v}}_{1:n,K_n}))\) be the subsequence of \((r_k,v_{1:n,k})\) such that community k is active, meaning that \(\sum _{1\le i,j\le n} Z_{ijk}\ge 1 \), arranged in random order. Let \({\widetilde{Z}}_{ijk}\) be the number of community interactions corresponding to the active community \(({\widetilde{r}}_k, \widetilde{v}_{1:n,k})\). Note that

$$\begin{aligned} A_{ij}=\sum _{k=1}^{K_n}{\widetilde{Z}}_{ijk}. \end{aligned}$$
(18)

Let \({\widetilde{Z}}_k=({\widetilde{Z}}_{ijk})_{1\le i,j\le n}\). Using Proposition 5.2 of James (2017), we obtain the following lemma.

Lemma 1

(Marginal distribution) The joint distribution of \((K_n, ({\widetilde{r}}_{1:K_n}, {\widetilde{v}}_{1:n,1:K_n}), (\widetilde{Z}_k)_{k=1,\ldots ,K_n})\) is given by

$$\begin{aligned} K_n&\sim {{\,\mathrm{Poisson}\,}}(\varPsi (n)) \end{aligned}$$
(19)

where \(\varPsi (n)\) is defined in Eq.(12), and

$$\begin{aligned} p(({\widetilde{r}}_{1:K_n}, \widetilde{v}_{1:n,1:K_n})|K_n)&=\prod _{k=1}^{K_n} p({\widetilde{r}}_k,\widetilde{v}_{1:n,k}|K_n) \end{aligned}$$

where

$$\begin{aligned} p({\widetilde{r}}_k,{\widetilde{v}}_{1:n,k}|K_n)\propto (1-e^{-\widetilde{r}_k (\sum _{i=1}^n {\widetilde{v}}_{ik})^2})\rho (\widetilde{r}_k)\prod _{i=1}^n f({\widetilde{v}}_{ik}). \end{aligned}$$
(20)

Finally, for each \(k=1,\ldots ,K_n\),

$$\begin{aligned} {\widetilde{Z}}_k|({\widetilde{r}}_{1:K_n}, {\widetilde{v}}_{1:n,1:K_n})&\sim {{\,\mathrm{tPoisson}\,}}({\widetilde{r}}_k {\widetilde{v}}_{1:n,k}\widetilde{v}^\intercal _{1:n,k} ) \end{aligned}$$
(21)

where \({{\,\mathrm{tPoisson}\,}}(\varLambda )\) denotes the distribution of a integer-valued matrix with Poisson entries with mean values \(\varLambda _{ij}\), conditionally on the sum of the entries being strictly positive. This has probability mass function

$$\begin{aligned} p(A)= \left\{ \begin{array}{ll} (1-e^{-\sum _{ij} \varLambda _{ij}})^{-1}\prod _{1\le i,j\le n} \frac{\varLambda _{ij}^{A_{ij}}e^{-\varLambda _{ij}}}{A_{ij}!} &{}\quad \text {if }\sum _{ij} A_{ij}>0 \\ 0 &{}\quad \text {otherwise}\\ \end{array} \right. \end{aligned}$$

The model has an infinite number of parameters, but Lemma 1 allows us to derive an algorithm to exactly sample from it, by successively simulating \(K_n\), \((\widetilde{r}_{1:K_n}, {\widetilde{v}}_{1:n,1:K_n})\), \((\widetilde{Z}_k)_{k=1,\ldots ,K_n}\) and A using Eqs. (19), (20), (21) and (18).

Sampling from the conditional distribution (21) can be done efficiently by first sampling the number of multiedges \(\sum _{i,j} {\widetilde{Z}}_{i,j,k}\) from a truncated Poisson with mean \({\widetilde{r}}_k (\sum _i {\widetilde{v}}_{i,k})^2\), then sampling iid the end nodes of the edges proportionally to the affiliation vector. Simulating from the conditional distribution (20) can be more challenging since it requires sampling a \(n+1\)-dimensional vector. However, if we suppose that the affiliations are Gamma distributed, the problem reduces to sampling \(({\widetilde{r}}_k,\sum _i {\widetilde{v}}_{i,k})\), which is a two-dimensional vector, and independently sample the normalized affiliations from a Dirichlet distribution. Indeed, if the affiliations are Gamma distributed, we consider the following change of variable.

$$\begin{aligned} {\widetilde{\varsigma }}_k&=\sum _{i=1}^n \widetilde{v}_{ik},~~~k=1,\ldots ,K_n \end{aligned}$$
(22)
$$\begin{aligned} \widetilde{\varphi }_{ik}&=\frac{{\widetilde{v}}_{ik}}{\widetilde{\varsigma }_k},~~~~k=1,\ldots ,K_n;~i=1,\ldots ,n \end{aligned}$$
(23)

This gives the following algorithm for exact simulation from the model.

  1. 1.

    Sample \(K_n\) from Eq. (19)

  2. 2.

    For \(k=1,\ldots ,K_n\)

    1. (a)

      Sample \(({\widetilde{\varphi }}_{1k},\ldots ,{\widetilde{\varphi }}_{nk})\sim {{\,\mathrm{Dirichlet}\,}}(\alpha ,\ldots ,\alpha )\)

    2. (b)

      Sample \({\widetilde{\varsigma }}_{k}\) from

      $$\begin{aligned} p({\widetilde{\varsigma }})\propto \psi ({\widetilde{\varsigma }}^2){{\,\mathrm{Gamma}\,}}({\widetilde{\varsigma }};n\alpha ,\beta ) \end{aligned}$$
      (24)
    3. (c)

      Sample \({\widetilde{r}}_{k}|{\widetilde{\varsigma }}_{k}\) from

      $$\begin{aligned} p({\widetilde{r}}\mid {\widetilde{\varsigma }})\propto (1-e^{-\widetilde{r}{\widetilde{\varsigma }}^2}) \rho ({\widetilde{r}}) \end{aligned}$$
      (25)
    4. (d)

      Sample \({\widetilde{Z}}_k^{(n)}\) from Eq. (21)

  3. 3.

    For \(1\le i,j\le n\), set \(A_{ij}=\sum _{k=1}^{K_n} {\widetilde{Z}}_{ijk}\)

where \(\psi (t)=\int _0^{+\infty } (1-e^{-wt})\rho (dw)\) is the Laplace exponent, \({{\,\mathrm{Dirichlet}\,}}(\alpha ,\ldots ,\alpha )\) denotes the standard Dirichlet distribution and \({{\,\mathrm{Gamma}\,}}(x;a,b)\) denotes the probability density function of a Gamma random variable with parameters a and b, evaluated at x. In the case of the GGP, the Laplace exponent is

$$\begin{aligned} \psi (t)=\frac{\kappa }{\sigma }((t+\tau )^\sigma - \tau ^\sigma ). \end{aligned}$$
(26)

One can sample from Eqs. (24) and (25) using rejection.

4.2 Posterior characterization

Using Proposition 5.1 in (James 2017), one can characterize the conditional distribution of the CRM G given the latent community counts \({\widetilde{Z}}_{ijk}\).

Lemma 2

Conditionally on \(({\widetilde{Z}}_{k}^{(n)})_{k=1,\ldots ,K_n}\), the CRM G has the same distribution as

$$\begin{aligned} G^\prime + \sum \limits _{k = 1}^{K_n} {\tilde{r}}_k \delta _{{\tilde{v}}_{1:n,k}} \end{aligned}$$

where \(G^\prime \) is an inhomogeneous CRM on \({\mathbb {R}}_+^n\) with mean intensity

$$\begin{aligned} e^{-r (\sum \limits _{i=1}^n v_{i})^2} \rho (r)\prod _{i=1}^n f(v_i) \end{aligned}$$

and \(({\tilde{r}}_k,{\tilde{v}}_{1:n,k})_{k=1,\ldots ,K_n}\) are independent of \(G^\prime \) and iid with density

$$\begin{aligned} p({\tilde{r}}_k,{\tilde{v}}_{1:n,k}|{\widetilde{Z}}_k^{(n)})=e^{-\widetilde{r}_k (\sum _i {\widetilde{v}}_{ik})^2 } {\widetilde{r}}_k^{\widetilde{d}_k}\rho ({\widetilde{r}}_k) \prod _{i=1}^n {\widetilde{v}}_{ik}^{\widetilde{m}_{ik}}f({\widetilde{v}}_{ik}) \end{aligned}$$
(27)

where \({\widetilde{m}}_{ik}=\sum _j {\widetilde{Z}}_{ijk}+\widetilde{Z}_{jik}\) and \({\widetilde{d}}_k=\sum _{i,j} {\widetilde{Z}}_{ijk}\). .

In the case where f is a gamma pdf, we can use the same reparameterization as in the previous subsection with \((\widetilde{\varsigma }_k, {\widetilde{\varphi }}_{1:n,k})\) in place of \(\widetilde{v}_{1:n,k}\). This leads to the following conditional distributions.

$$\begin{aligned} {\widetilde{\phi }}_{1:n,k}|{\widetilde{Z}}_k^{(n)}&\sim {{\,\mathrm{Dirichlet}\,}}(\alpha +{\widetilde{m}}_{1k},\ldots ,\alpha +{\widetilde{m}}_{nk})\\ p({\widetilde{\varsigma }}_k|{\widetilde{Z}}_k^{(n)})&\propto \varkappa ({\widetilde{d}}_k,{\widetilde{\varsigma }}^2) {{\,\mathrm{Gamma}\,}}({\widetilde{\varsigma }};n\alpha +2{\widetilde{d}}_k,\beta ) \\ p({\widetilde{r}}_k|{\widetilde{\varsigma }}_k,\widetilde{Z}_k^{(n)})&\propto e^{-{\widetilde{r}}_k\widetilde{\varsigma }_k^2}{\widetilde{r}}_k^{{\widetilde{d}}_k}\rho ({\widetilde{r}}_k) \end{aligned}$$

where \(\varkappa (m,t)=\int _0^{+\infty } r^m e^{-rt}\rho (r)dr\). In the GGP case, we have

$$\begin{aligned} \varvec{\varkappa }(m,t)=\kappa \frac{\varGamma (m-\sigma )}{\varGamma (1-\sigma )}(t+\tau )^{\sigma -m} \end{aligned}$$

and

$$\begin{aligned} {\widetilde{r}}_k|{\widetilde{\varsigma }}_k,{\widetilde{Z}}_k^{(n)}\sim {{\,\mathrm{Gamma}\,}}({\widetilde{d}}_k-\sigma , {\widetilde{\varsigma }}_k^2+\tau ). \end{aligned}$$

4.3 Slice sampler for posterior inference

We recall that \(\theta \) denote the set of hyperparameters of the mean measure \(\rho \) and pdf f. To simplify the presentation, here we suppose that we observe the complete adjacency matrix A, which means that we observe a directed and weighted graph with no missing (hidden) edge. The objective is to obtain samples distributed from the conditional distribution

$$\begin{aligned} p(K_n,({\widetilde{r}}_k,{\widetilde{v}}_{1:n,k})_{k=1,\ldots ,K_n},\theta \mid A). \end{aligned}$$

In the Appendix, we show how to do inference when we only observe a partial graph (with missing edges to predict) that can be directed or undirected, weighted or binary. In order to leverage the Poisson factorization construction, we augment the model with the latent community counts \({\widetilde{Z}}_k\). Additionally, to deal with the unknown number of active communities \(K_n\), we use auxiliary slice variables, similarly to other Gibbs sampler for Bayesian nonparametric models (Walker 2007; Kalli et al. 2011; Favaro and Teh 2013). For each directed pair (ij) such that \(A_{ij}\ge 1\) consider the scalar latent variable

$$\begin{aligned} s_{ij}|({\widetilde{r}}_k,{\widetilde{Z}}_{ijk})_{k=1,\ldots ,K_n}\sim {{\,\mathrm{Unif}\,}}\left( 0, \min _{\{k|{\widetilde{Z}}_{ijk}\ge 1\}} \widetilde{r}_k\right) \end{aligned}$$
(28)

and denote \(s=\min _{ij} s_{ij}\). Note that by definition, \({\widetilde{r}}_k\ge s\) for all \(k=1,\ldots ,K_n\). In the following, we will consider the communities k, active or inactive, which r is higher than s. We use the notation \(\overline{\ \varvec{\cdot }\ }\) to denote the corresponding variables. More precisely, let

$$\begin{aligned} {\overline{G}}&=\sum _{k} r_k\delta _{v_{1:n,k}}\mathbbm {1}_{r_k\ge s}:=\sum _{k=1}^{{\overline{K}}_n} {\overline{r}}_k\delta _{\overline{v}_{1:n,k}} \end{aligned}$$

be the CRM corresponding to the set of active or inactive communities with weight \(r_k\ge s\), of (almost surely finite) cardinality \({\overline{K}}_n\ge K_n\). Denote \({\overline{Z}}_{ijk}\ge 0\) the associated community interactions, and \(\overline{Z}_{k}=({\overline{Z}}_{ijk})\). The data augmented slice sampler draws samples asymptotically distributed from

$$\begin{aligned} p(({\overline{Z}}_k)_{k=1,\ldots ,{\overline{K}}_n},{\overline{G}},\theta , s \mid A). \end{aligned}$$

The main steps of the algorithm are as follows.

  1. 1.

    For each directed pair (ij) such that \(A_{ij}\ge 1\), Update \(({\overline{Z}}_{ijk})_{k=1,\ldots ,{\overline{K}}_n}\) given the rest of the variables,

  2. 2.

    Update the hyperparameters \(\theta \) given the rest of the variables,

  3. 3.

    Update \(({\overline{G}}, s)\) given the rest of the variables.

The details of each step are given in Appendix B. Each iteration of the Gibbs sampler has a time complexity scaling in \({\overline{K}}_n S\) where S is the number of nonzero entries of the matrix. Therefore, the algorithm takes advantage of the sparsity of the networks. Additionally, each entry of the sparse graph can be dealt with independently, making the algorithm straightforwardly parallelizable.

Fig. 1
figure 1

Trace plots of a the number of active communities \(K_n\) and b \(\sigma \), on a synthetic example

5 Experiments

We implement the algorithm described in the previous section with the GGP-Gamma scores model. We assign Gamma priors on the hyperparameters \(\kappa , \tau , \alpha \) with parameters (0.1, 0.1). We fix \(\beta = 1\). We allow up to a linear growth of the number of communities, corresponding to \(\sigma < 0.5\), for small datasets and use a Gamma prior with parameter (0.1, 0.1) on \(1-2\sigma \). For larger datasets, we restrict \(\sigma < 0.25\), meaning that the number of communities cannot grow at a faster rate than \(\sqrt{n}\). This is obtained by using a Gamma prior with parameter (0.1, 0.1) on \(1-4\sigma \).

The model allows overlapping communities but, for visualization purposes for example, it is useful to obtain an associated partition of the nodes. For each iteration, one can cluster the nodes by assigning each node to the community where it is most active. That is, at iteration t of the MCMC algorithm, define for \(i=1,\ldots ,n\)

$$\begin{aligned} c_i^{(t)}={{\,\mathrm{argmax}\,}}_{k} \{\sqrt{r^{(t)}_k} v^{(t)}_{ik}\} \end{aligned}$$
(29)

the cluster membership of node i. We then compute an approximate Bayesian point estimate \({\widehat{c}}=({\widehat{c}}_1,\ldots ,\widehat{c}_n)\) of the partition of the nodes, using Binder’s loss function (Lau and Green 2007).

5.1 Synthetic datasets

Data generated from the GGP-gamma model. We first run the algorithm on a synthetic dataset simulated from our model, to check that the algorithm can recover the true parameters. We sample a directed and unweighted graph from the GGP-gamma model with size \(n=800\) and \(\sigma = 0.2,\kappa =1,\tau =0.15,\alpha =0.05,\beta =0.2\). The number of edges of the obtained graph is 20198, and the true number of active communities is 42. We run three chains in parallel with 500, 000 iterations, with 250, 000 iterations for burn-in. We show in Fig. 1 trace plots of the number of active communities \(K_n\) and parameter \(\sigma \) showing the MCMC algorithm can recover these parameters.

Data generated from a Poisson factor model with a fixed number of communities. Following Miller and Harrison (2013), we know that the Dirichlet process is inconsistent for estimating the number of clusters if the number of clusters is fixed and does not increase with n. Similarly, we conjecture that when \(\sigma =0\) is fixed, corresponding to the setting of Zhou (2015), the model will fail recovering the right number of communities, whereas if \(\sigma \) is free, we can expect it to concentrate on negative values. The model should then recover the correct number of communities if the data is generated from a Poisson factor model. A proper posterior consistency analysis is beyond the scope of this article. However, we design a simple numerical experiment to support this conjecture. We take \(K=5\) true communities, the affiliations \(v_{i,k}\) are iid \(\text {Gamma}(0.1, 1)\) and the five communities importance \((r_1,\ldots , r_5)\) are iid \(\text {Gamma}(2, 0.2)\). Networks of increasing sizes \(n=500,1000,1500,3000\) are then generated from the Poisson factor model (3).

Table 1 Concentration of the posterior of \(K_n\) when increasing n on synthetic dataset with a number of communities \(K=5\) fixed (independent of n)

We estimate the number of active communities \(K_n\) under the model with \(\sigma =0\) and with \(\sigma \) unknown. In Table 1, we report the posterior mean and variance of the recovered number of communities. We can see that the gamma process model (\(\sigma =0\)) does not seem to concentrate around the right value. On the contrary, when \(\sigma \) is free, we see that the posterior of \(K_n\) seems to concentrate around \(K=5\). For \(n=500\), the posterior on \(\sigma \) ranges from negative to positive values, which explains the higher variance. However, from \(n=1000\) onward, the posterior of \(\sigma \) concentrates on negative values, which translates in a significant decrease of variance for the posterior of \(K_n\).

Data generated from the Stochastic Block Model (SBM). From our experimentation, it seems that Poisson factorization models do not capture well the generating process of the SBM. Both when \(\sigma =0\) and \(\sigma \) free, the model creates many very small communities. However, the model is able to recover with high precision the true communities once we cluster the nodes using (29).

We generate a synthetic dataset from a SBM with \(n=600\) with three communities of size 200 each. The probability of an edge between nodes of a same community is \(p_{in} = 0.1\); the probability of an edge between nodes from different communities is \(p_{out} = 0.01\). The resulting dataset is hence undirected and unweighted with 7239 edges.

The GGP-Gamma model does not capture well the SBM generating process and the posterior mean of the number of communities is \(K_n = 38.6\), with \(\sigma \) positive. We obtain 3 main communities and the rest are very small (composed of a few edges each). As explained previously, we cluster the nodes by assigning each one of them to the community to which it has the highest scaled affiliation. We then get an average of 8.35 communities, 3 of which are on average composed of 195.5 nodes each. We plot the distribution of the sizes of the small communities in Figure 2. In Table 2, we report the contingency table of the posterior clusters.

We also use the model with \(\sigma = 0\) on this dataset and find very similar results. After clustering the nodes, we find that the average number of communities is 7.66, which slightly less than previously, but the small communities are slightly larger on average, giving at the end an average size for each large cluster of 195.5, which is exactly the same. We do not report here the contingency matrix as it is very similar to the one we obtain with \(\sigma \) free.

Fig. 2
figure 2

Posterior density of the sizes of the small communities for the dataset generated from a Stochastic Block Model

Table 2 Contingency table of the posterior communities in per cent of the true communities size
Fig. 3
figure 3

(Left) Estimated communities and (right) posterior on \(\sigma \) for the polblogs dataset with (top row) \(\alpha = 0.8\), (middle row) \(\alpha =0.4\) and (bottom row) \(\alpha = 0.2\)

Table 3 Proportion of the interactions of the features in each block for different values of overlapping
Fig. 4
figure 4

Posterior of \(K_n\) and \(\sigma \) for the Wiki-topcats dataset

Fig. 5
figure 5

Reordered adjacency matrix of the Wikipedia topcats dataset

5.2 Political blogs

The polblogs network (Adamic and Glance 2005) is the network of the American political blogosphere in February 2005. It is a directed unweighted graph, where there is an edge (ij) if blog i cites blog j. It is composed of 1490 nodes and 19025 edges. For each node, some ground truth information about its political affiliation (republican/democrat) is known.

We will use this dataset in order to illustrate the role of the parameter \(\alpha \) in the model. As indicated in Sect. 3, this parameter tunes the amount of overlapping between the communities. A smaller value enforces less overlap between communities. We run three chains with 500, 000 iterations. The posterior samples of \(\sigma \) for three different values of \(\alpha \) are in also shown in Fig. 3. Nodes are reordered according to their estimated membership \({\widehat{c}}\) (through (29)), and Fig. 3 shows the densities of connection between and within clusters for three different values of \(\alpha \). Depending on the amount of overlapping, we obtain two (\(\alpha =0.8\)), three (\(\alpha =0.4\)) or four (\(\alpha =0.2\)) communities. In order to interpret those communities, we calculate in Table 3 for each community the proportion of interactions between democrat blogs, between a democrat and a republican blog, and between two republican blogs. For \(\alpha =0.8\), there are two estimated communities which can clearly be identified as democrat (community #1) and republican (community 2). For \(\alpha =0.4\), we have three communities. One is mostly associated with democrat blogs (#1), while the other two correspond to a split of the republican blogs into right (#2) and center-right (#3) groups. For \(\alpha =0.2\), we obtain a further split of the democrat blogs into left (#1) and center-left (#2) groups. Increasing the value of \(\alpha \) therefore leads to a finer and finer partition of the nodes.

5.3 Wikipedia topcast

The network is a partial web graph of Wikipedia hyperlinks collected in September 2011 (Klymko et al. 2014). It is a directed unweighted graph where an edge (ij) corresponds to a citation from a page i to page j. We restrict it to the first 3000 nodes, and the associated 5687 edges. We run three MCMC chains for 200, 000 iterations. Trace plots of the number of active communities and parameter \(\sigma \) are given in Fig. 4. Figure 5 shows the adjacency matrix reordered by communities, as explained in the previous section. In order to check that the learnt communities/features are meaningful, we report in Figures the proportions of webpages associated with a given category within a given community/feature (note that a webpage can be associated with multiple categories hence the proportion do not sum to 1).

Note that, while the approach is able to estimate the latent block structure, this dataset has the particularity of having star nodes, a feature that is not captured by our model.

Fig. 6
figure 6

Features compared to categories for the Wikipedia dataset

Fig. 7
figure 7

Features compared to categories for the Wikipedia dataset

5.4 Deezer

The dataset was collected from the music streaming service Deezer in November 2017 (Rozemberczki et al. 2018). It represents the friendship network of a subset of Deezer users from Romania. It is an undirected unweighted graph where nodes represent the users and edges are the mutual friendships. There are 41773 nodes and 125826 edges. We run three chains with 100000 iterations each. Posterior histograms of the number of active communities and \(\sigma \) are given in Fig. 8. The algorithms find around 45 communities/features for this dataset. The reordered adjacency matrix and block densities based on the point estimate of the partition are given in Fig. 9.

Fig. 8
figure 8

Posterior of \(K_n\) and \(\sigma \) on Deezer’s dataset

Now we can reorder the nodes using approximate MAP clustering as previously. We obtain the following adjacency matrix

Fig. 9
figure 9

Reordered adjacency matrix and block densities for Deezer’s dataset

For each individual in the network, a list of musical genres liked by that person are available. There are in total 84 distinct genres. We represent in Fig. 10 the proportion of individuals who liked a subset of the 84 genres for three different communities where the interpretation in terms of genres is quite clear. The overall proportion of individuals liking a given genre is shown at the bottom of Fig. 10. If the bar is red, this indicates that the proportion is 10% higher in the community than in the population. If the bar is blue, this means the proportion is 10% lower. Community 11 can be interpreted as \( R \& B\), Community 8 as Dance, and Community 3 as Rock music. For some of the communities, not reported here, the interpretation in terms of the liked genres is less clear, and may be due to other covariates.

Fig. 10
figure 10

Features compared to genres for Deezer’s dataset

6 Discussion

The model presented in this paper assumed the same parameter \(\beta \) for each node. We can also consider a degree corrected version of the model, similarly to Zhou (2015), where each node is assigned a different parameter \(\beta _i > 0\) and then defining \(Z_{ijk} \sim {{\,\mathrm{Poisson}\,}}(\frac{r_k v_{ik} v_{jk}}{\beta _i \beta _j})\). It is unclear however if a MCMC sampler targeting the exact posterior distribution could be implemented, and one may need to resort to some truncation approximation as in Zhou (2015).

The count matrix \((A_{ij})\) is infinitely exchangeable; hence, the model presented in this article lead to asymptotically dense graphs. That is, \(\sum _{1\le i,j\le n} A_{ij}\asymp n^2\) as n tends to infinity. In order to obtain sparse graphs, we could consider two different strategies. The first solution consists in dropping the infinite exchangeability property and take \(\beta ^{(n)}_i \rightarrow +\infty \) with n, then the number of edges will behave as \((n/\beta ^{(n)})^2\) (we can for instance take \(\beta ^{(n)}_i = \sqrt{n}\) for any node i to obtain a linear growth of the number of edges). The model would still be finitely exchangeable for any fixed n, but not projective anymore. The second solution would be to consider the different notion of infinite exchangeability developed in (Caron and Fox 2017) and consider \((\beta _i)_i\) as a realization of a Poisson point process.

Finally, we presented a model for count (and binary) data. The results build on the additive contributions of the communities, which is why we chose the Poisson distribution on the entries of the adjacency matrix \((A_{ij})\). We can generalize to non-count data using other probability distributions which are closed under convolution. For example, one could consider \(A_{ij} \sim {{\,\mathrm{Gamma}\,}}(\sum _k r_k v_{ik} v_{jk},1)\) for \(A_{ij} \in {\mathbb {R}}_+\) or \(A_{ij} \sim \mathcal {N}(\sum _k r_k v_{ik} v_{jk},1)\) for \(A_{ij} \in {\mathbb {R}}\).