Under the LSBM, the inferential objective is to recover the community allocations \(\mathbf {z}=(z_1,\dots ,z_n)\) given a realisation of the adjacency matrix \(\varvec{A}\). Assuming normality of the rows of ASE for LSBMs (3), the inferential problem consists of making joint inference about \(\mathbf {z}\) and the latent functions \(\mathbf {f}_k=(f_{k,1},\dots ,f_{k,d}):{\mathcal {G}}\rightarrow {\mathbb {R}}^d\). The prior for \(\mathbf {z}\) follows a Categorical-Dirichlet structure:
$$\begin{aligned} z_i&\sim \text {Categorical}({\varvec{\eta }}),\ {\varvec{\eta }}=(\eta _1,\dots ,\eta _K),\ i=1,\dots ,n,\nonumber \\ {\varvec{\eta }}&\sim \text {Dirichlet}(\nu /K,\dots ,\nu /K), \end{aligned}$$
(4)
Where \(\nu ,\eta _k\in {\mathbb {R}}_+,\ k\in \{1,\dots ,K\}\), and \(\sum _{k=1}^K\eta _k=1\).
Following the ASE-CLT in Theorem 1, the estimated latent positions are assumed to be drawn from Gaussian distributions centred at the underlying function value. Conditional on the pair \((\theta _i,z_i)\), the following distribution is postulated for \(\hat{\mathbf {x}}_i\):
$$\begin{aligned} \hat{\mathbf {x}}_i\ \vert \ \theta _i,\mathbf {f}_{z_i},{\varvec{\sigma }}^2_{z_i} \sim {\mathbb {N}}_d\left\{ \mathbf {f}_{z_i}(\theta _i),{\varvec{\sigma }}^2_{z_i}\varvec{I}_d\right\} ,\ i=1,\dots ,n, \end{aligned}$$
(5)
Where \({\varvec{\sigma }}^2_k=(\sigma ^2_{k,1},\dots ,\sigma ^2_{k,d})\in {\mathbb {R}}_+^d\) is a community-specific vector of variances and \(\varvec{I}_d\) is the \(d\times d\) identity matrix. Note that, for simplicity, the components of the estimated latent positions are assumed to be independent. This assumption loosely corresponds to the k-means clustering approach, which has been successfully deployed in spectral graph clustering under the SBM (Rohe et al. 2011). Here, the same idea is extended to a functional setting. Furthermore, for tractability (5) assumes the variance of \(\hat{\mathbf {x}}_i\) does not depend on \(\mathbf {x}_i\), but only on the community allocation \(z_i\).
For a full Bayesian model specification, prior distributions are required for the latent functions and the variances. The most popular prior for functions is the Gaussian process (GP; see, for example, Rasmussen and Williams 2006). Here, for each community k, the j-th dimension of the true latent positions are assumed to lie on a one-dimensional manifold described by a function \(f_{k,j}\) with a hierarchical GP-IG prior, with an inverse gamma (IG) prior on the variance:
$$\begin{aligned} f_{k,j} \vert \sigma ^2_{k,j}&\sim \text {GP}(0,\sigma ^2_{k,j}\xi _{k,j}),\ k=1,\dots ,K,\ j=1,\dots ,d, \nonumber \\ \sigma ^2_{k,j}&\sim \text {IG}(a_0,b_0),\ k=1,\dots ,K,\ j=1,\dots ,d, \end{aligned}$$
(6)
Where \(\xi _{k,j}(\cdot ,\cdot )\) is a positive semi-definite kernel function and \(a_0,b_0\in {\mathbb {R}}_+\). Note that the terminology “kernel” is used in the literature for both the GP covariance function \(\xi _{k,j}(\cdot ,\cdot )\) and the function \(\kappa (\cdot ,\cdot )\) used in LPMs (cf. Sect. 1), but their meaning is fundamentally different. In particular, \(\kappa :{\mathbb {R}}^d\times {\mathbb {R}}^d\rightarrow [0,1]\) is a component of the graph generating process, and assumed here to be the inner product, corresponding to RDPGs. On the other hand, \(\xi _{k,j}:{\mathbb {R}}\times {\mathbb {R}}\rightarrow {\mathbb {R}}\) is the scaled covariance function of the GP prior on the unknown function \(f_{k,j}\), which is used for modelling the observed graph embeddings, or equivalently the embedding generating process. There are no restrictions on the possible forms of \(\xi _{k,j}\), except positive semi-definiteness. Overall, the approach is similar to the overlapping mixture of Gaussian processes method (Lázaro-Gredilla et al. 2012).
The class of models that can be expressed in form (6) is vast, and includes, for example, polynomial regression and splines, under a conjugate normal-inverse-gamma prior for the regression coefficients. For example, consider any function that can be expressed in the form \(f_{z_i,j}(\theta _i)= {\varvec{\phi }}_{z_i,j}(\theta _i)^\intercal \mathbf {w}_{z_i,j}\) for some community-specific basis functions \({\varvec{\phi }}_{k,j}:{\mathbb {R}}\rightarrow {\mathbb {R}}^{q_{k,j}}, q_{k,j}\in {\mathbb {Z}}_+,\) and corresponding coefficients \(\mathbf {w}_{k,j}\in {\mathbb {R}}^{q_{k,j}}\). If the coefficients are given a normal-inverse-gamma prior
$$\begin{aligned} (\mathbf {w}_{k,j},\sigma ^2_{k,j})&\sim \text {NIG}(\mathbf {0},{\varvec{\varDelta }}_{k,j},a_0,b_0) \\&={\mathbb {N}}_{q_{k,j}}(\mathbf {0},\sigma ^2_{k,j}{\varvec{\varDelta }}_{k,j})\text {IG}(a_0,b_0), \end{aligned}$$
Where \({\varvec{\varDelta }}_{k,j}\in {\mathbb {R}}^{q_{k,j}\times q_{k,j}}\) is a positive definite matrix, then \(f_{k,j}\) takes form (6), with the kernel function
$$\begin{aligned} \xi _{k,j}(\theta ,\theta ^\prime ) = {\varvec{\phi }}_{k,j}(\theta )^\intercal {\varvec{\varDelta }}_{k,j}{\varvec{\phi }}_{k,j}(\theta ^\prime ). \end{aligned}$$
(7)
Considering the examples in Sect. 3, the SBM (cf. Example 1) corresponds to \(\xi _{k,j}(\theta ,\theta ^\prime )=\varDelta _{k,j},\ \varDelta _{k,j}\in {\mathbb {R}}_+\), whereas the DCSBM (cf. Example 2) corresponds to \(\xi _{k,j}(\theta ,\theta ^\prime )=\theta \theta ^\prime \varDelta _{k,j},\ \varDelta _{k,j}\in {\mathbb {R}}_+\). For the quadratic LSBM (cf. Example 3), the GP kernel takes the form \(\xi _{k,j}(\theta ,\theta ^\prime )=(1,\theta ,\theta ^2){\varvec{\varDelta }}_{k,j}(1,\theta ^\prime ,\theta ^{\prime 2})^\intercal \) for a positive definite scaling matrix \({\varvec{\varDelta }}_{k,j}\in {\mathbb {R}}^{3\times 3}\).
The LSBM specification is completed with a prior for each \(\theta _i\) value, which specifies the unobserved location of the latent position \(\mathbf {x}_i\) along each submanifold curve; for \(\mu _\theta \in {\mathbb {R}}\), \(\sigma ^2_\theta \in {\mathbb {R}}_+\),
$$\begin{aligned} \theta _i \sim {\mathbb {N}}(\mu _\theta ,\sigma ^2_\theta ),\ i=1,\dots ,n. \end{aligned}$$
Posterior and marginal distributions
The posterior distribution for \((f_{k,j},\sigma ^2_{k,j})\) has the same GP-IG structure as (6), with updated parameters:
$$\begin{aligned} f_{k,j} \vert \sigma ^2_{k,j}, \mathbf {z}, {\varvec{\theta }}, \hat{\varvec{X}}&\sim \text {GP}(\mu _{k,j}^\star ,\sigma ^2_{k,j}\xi _{k,j}^\star ), \\ \sigma ^2_{k,j} \vert \mathbf {z}, {\varvec{\theta }}, \hat{\varvec{X}}&\sim \text {Inv-Gamma}(a_k,b_{k,j}), \end{aligned}$$
With \(k=1,\dots ,K,\ j=1,\dots ,d\). The parameters are updated as follows:
$$\begin{aligned}&\mu _{k,j}^\star (\theta ) = {\varvec{\varXi }}_{k,j}(\theta , {\varvec{\theta }}_k^\star )\{{\varvec{\varXi }}_{k,j}({\varvec{\theta }}_k^\star , {\varvec{\theta }}_k^\star ) + \varvec{I}_{n_k}\}^{-1}\hat{\varvec{X}}_{k,j}, \nonumber \\&\xi _{k,j}^\star (\theta ,\theta ^\prime ) = \xi _{k,j}(\theta ,\theta ^\prime ) \nonumber \\&- {\varvec{\varXi }}_{k,j}(\theta ,{\varvec{\theta }}_k^\star )\{{\varvec{\varXi }}_{k,j}({\varvec{\theta }}_k^\star ,{\varvec{\theta }}_k^\star )+\varvec{I}_{n_k}\}^{-1} {\varvec{\varXi }}_{k,j}({\varvec{\theta }}_k^\star ,\theta ^\prime ), \nonumber \\&a_k = a_0 + n_k/2, \nonumber \\&b_{k,j} = b_0+\hat{\varvec{X}}_{k,j}^\intercal \{{\varvec{\varXi }}_{k,j}({\varvec{\theta }}_k^\star ,{\varvec{\theta }}_k^\star )+\varvec{I}_{n_k}\}^{-1}\hat{\varvec{X}}_{k,j}/2, \end{aligned}$$
(8)
Where \(n_k=\sum _{i=1}^n\mathbbm {1}_k\{z_i\}\), \(\hat{\varvec{X}}_{k,j}\in {\mathbb {R}}^{n_k}\) is the subset of values of \(\hat{\varvec{X}}_j\) for which \(z_i=k\), and \({\varvec{\theta }}_k^\star \in {\mathbb {R}}^{n_k}\) is the vector \({\varvec{\theta }}\), restricted to the entries such that \(z_i=k\). Furthermore, \({\varvec{\varXi }}_{k,j}\) is a vector-valued and matrix-valued extension of \(\xi _{k,j}\), such that \([{\varvec{\varXi }}_{k,j}({\varvec{\theta }},{\varvec{\theta }}^\prime )]_{\ell ,\ell ^\prime }=\xi _{k,j}(\theta _\ell ,\theta ^\prime _{\ell ^\prime })\). The structure of the GP-IG yields an analytic expression for the posterior predictive distribution for a new observation \(\mathbf {x}^*=(x^*_1,\dots ,x^*_d)\) in community \(z^*\),
$$\begin{aligned}&{{\hat{x}}}^*_j\vert z^*,\mathbf {z},{\varvec{\theta }},\theta ^*,\hat{\varvec{X}} \nonumber \\&\quad \sim t_{2a_{z^*}}\left( \mu _{z^*,j}^{\star }(\theta ^*),\frac{b_{z^*,j}}{a_{z^*}}\left\{ 1+\xi _{z^*,j}^{\star }(\theta ^*, \theta ^*)\right\} \right) , \end{aligned}$$
(9)
Where \(t_\nu (\mu ,\sigma )\) denotes a Student’s t distribution with v degrees of freedom, mean \(\mu \) and scale parameter \(\sigma \). Furthermore, the prior probabilities \({\varvec{\eta }}\) for the community assignments can be integrated out, obtaining
$$\begin{aligned} p(\mathbf {z}) = \frac{\Gamma (\nu )\prod _{k=1}^K \Gamma (n_k+\nu /K)}{\Gamma (\nu /K)^K\Gamma (n+\nu )}, \end{aligned}$$
(10)
Where \(n_k=\sum _{i=1}^n \mathbbm {1}_k\{z_i\}\). The two distributions (9) and (10) are key components for the Bayesian inference algorithm discussed in the next section.
Posterior inference
After marginalisation of the pairs \((f_{k,j},\sigma ^2_{k,j})\) and \({\varvec{\eta }}\), inference is limited to the community allocations \(\mathbf {z}\) and latent parameters \({\varvec{\theta }}\). The marginal posterior distribution \(p(\mathbf {z},{\varvec{\theta }}\mid \hat{\varvec{X}})\) is analytically intractable; therefore, inference is performed using collapsed Metropolis-within-Gibbs Markov Chain Monte Carlo (MCMC) sampling. In this work, MCMC methods are used, but an alternative inferential algorithm often deployed in the community-detection literature is variational Bayesian inference (see, for example, Latouche et al. 2012), which is also applicable to GPs (Cheng and Boots 2017).
For the community allocations \(\mathbf {z}\), the Gibbs sampling step uses the following decomposition:
$$\begin{aligned} p(z_i= & {} k\mid \mathbf {z}^{-i}, \hat{\varvec{X}},{\varvec{\theta }}) \propto \\ p(z_i= & {} k\mid \mathbf {z}^{-i}) p(\hat{\mathbf {x}}_i\mid z_i=k,\mathbf {z}^{-i},{\varvec{\theta }},\hat{\varvec{X}}^{-i}), \end{aligned}$$
Where the superscript \(-i\) denotes that the i-th row (or element) is removed from the corresponding matrix (or vector). Using (10), the first term is
$$\begin{aligned} p(z_i=k\mid \mathbf {z}^{-i})=\frac{n^{-i}_k+\nu /K}{n-1+\nu }. \end{aligned}$$
For the second term, using (9), the posterior predictive distribution for \(\hat{\mathbf {x}}_i\) given \(z_i=k\) can be written as the product of d independent Student’s t distributions, where
$$\begin{aligned}&{\hat{x}}_{i,j}\vert z_i=k,\mathbf {z}^{-i},{\varvec{\theta }},\hat{\varvec{X}}^{-i} \nonumber \\&\quad \sim t_{2a_k^{-i}}\left( \mu _{k,j}^{\star -i}(\theta _i),\frac{b_{k,j}^{-i}}{a_k^{-i}}\left\{ 1+\xi _{k,j}^{\star -i}(\theta _i, \theta _i)\right\} \right) . \end{aligned}$$
(11)
Note that the quantities \(\mu _{k,j}^{\star -i},\xi _{k,j}^{\star -i}, a_k^{-i}\) and \(b_{k,j}^{-i}\) are calculated as described in (8), excluding the contribution of the i-th node.
In order to mitigate identifiability issues, it is necessary to assume that some of the parameters are known a priori. For example, assuming for each community k that \(f_{k,1}(\theta _i)=\theta _i\), corresponding to a linear model in \(\theta \) with no intercept and unit slope in the first dimension, gives the predictive distribution:
$$\begin{aligned}&{\hat{x}}_{i,1}\vert z_i=k,\mathbf {z}^{-i},{\varvec{\theta }},\hat{\varvec{X}}^{-i} \nonumber \\&\quad \sim t_{2a_{k,j}^{-i}}\left( \theta _i, \frac{1}{a_k^{-i}} \left\{ b_0 + \frac{1}{2}\sum _{h\ne i:z_h=k} ({\hat{x}}_{h,1} - \theta _h)^2 \right\} \right) . \end{aligned}$$
(12)
Finally, for updates to \(\varvec{\theta }_i\), a standard Metropolis-within-Gibbs step can be used. For a proposed value \(\theta ^*\) sampled from a proposal distribution \(q(\cdot \mid \theta _i)\), the acceptance probability takes the value
$$\begin{aligned} \min \left\{ 1,\frac{p(\hat{\mathbf {x}}_i\vert z_i,\mathbf {z}^{-i},\theta ^*,{\varvec{\theta }}^{-i},\hat{\varvec{X}}^{-i})p(\theta ^*)q(\theta _i\vert \theta ^*)}{p(\hat{\mathbf {x}}_i\vert z_i,\mathbf {z}^{-i},\theta _i,{\varvec{\theta }}^{-i},\hat{\varvec{X}}^{-i})p(\theta _i)q(\theta ^*\vert \theta _i)}\right\} . \end{aligned}$$
(13)
The proposal distribution \(q(\theta ^*\vert \theta _i)\) in this work is a normal distribution \({\mathbb {N}}(\theta ^*\vert \theta _i,\sigma ^2_*)\), \(\sigma ^2_*\in {\mathbb {R}}_+\), implying that the ratio of proposal distributions in (13) cancels out by symmetry.
Inference on the number of communities K
So far, it has been assumed that the number of communities K is known. The LSBM prior specification (4) naturally admits a prior distribution on the number of communities K. Following Sanna Passino and Heard (2020), it could be assumed:
$$\begin{aligned} K\sim \text {Geometric}(\omega ), \end{aligned}$$
Where \(\omega \in (0,1)\). The MCMC algorithm is then augmented with additional moves for posterior inference on K: (i) Split or merge two communities; and (ii) Add or remove an empty community. An alternative approach when K is unknown could also be a nonparametric mixture of Gaussian processes (Ross and Dy 2013). For simplicity, in the next two sections, it will be initially assumed that all communities have the same functional form, corresponding, for example, to the same basis functions \({\varvec{\phi }}_{k,j}(\cdot )\) for dot product kernels (7). Then, in Sect. 4.4, the algorithm will be extended to admit a prior distribution on the community-specific kernels.
Split or merge two communities
In this case, the proposal distribution follows Sanna Passino and Heard (2020). First, two nodes i and j are sampled randomly. For simplicity, assume \(z_i\le z_j\). If \(z_i\ne z_j\), then the two corresponding communities are merged into a unique cluster: all nodes in community \(z_j\) are assigned to \(z_i\). Otherwise, if \(z_i=z_j\), the cluster is split into two different communities, proposed as follows: (i) Node i is assigned to community \(z_i\) (\(z_i^*=z_i\)), and node j to community \(K^*=K+1\) (\(z_j^*=K^*\)); (ii) The remaining nodes in community \(z_i\) are allocated in random order to clusters \(z_i^*\) or \(z_j^*\) according to their posterior predictive distribution (11) or (12), restricted to the two communities, and calculated sequentially.
It follows that the proposal distribution \(q(K^*,\mathbf {z}^*\vert K, \mathbf {z})\) for a split move corresponds to the product of renormalised posterior predictive distributions, leading to the following acceptance probability:
$$\begin{aligned}&\alpha (K^*,\mathbf {z}^*\vert K,\mathbf {z}) \nonumber \\&\quad =\min \left\{ 1, \frac{p(\hat{\varvec{X}}\vert K^\star ,\mathbf {z}^\star ,{\varvec{\theta }}) p(\mathbf {z}^*\vert K^*)p(K^*)}{p(\hat{\varvec{X}}\vert K,\mathbf {z},{\varvec{\theta }})p(\mathbf {z}\vert K)p(K) q(K^*,\mathbf {z}^*\vert K, \mathbf {z})} \right\} , \nonumber \\ \end{aligned}$$
(14)
Where \(p(\hat{\varvec{X}}\vert K,\mathbf {z},{\varvec{\theta }})\) is the marginal likelihood. Note that the ratio of marginal likelihoods only depends on the two communities involved in the split and merge moves. Under (5) and (6), the community-specific marginal on the j-th dimension is:
$$\begin{aligned} \hat{\varvec{X}}_{k,j} \vert K, \mathbf {z}, {\varvec{\theta }}\sim t_{2a}\left( \mathbf {0}_{n_k}, \frac{b_0}{a_0}\{{\varvec{\varXi }}_{k,j}({\varvec{\theta }}_k^\star , {\varvec{\theta }}_k^\star ) + \varvec{I}_{n_k}\} \right) , \end{aligned}$$
(15)
Where the notation is identical to (8), and the Student’s t distribution is \(n_k\)-dimensional, with mean equal to the identically zero vector \(\mathbf {0}_\cdot \). The full marginal \(p(\hat{\varvec{X}}\vert K,\mathbf {z},{\varvec{\theta }})\) is the product of marginals (15) on all dimensions and communities. If \(f_{k,1}(\theta _i)=\theta _i\), cf. (12), the marginal likelihood is
$$\begin{aligned} p(\hat{\varvec{X}}_{k,1}\vert K,\mathbf {z},{\varvec{\theta }}) = \frac{\Gamma (a_{n_k})}{\Gamma (a_0)} \frac{b_0^{a_0}}{b_{n_k}^{a_{n_k}}} (2\pi )^{-n_k/2}, \end{aligned}$$
Where \(a_{n_k} \!=\! a_0+n_k/2\), \(b_{n_k} = b_0 \!+\! \sum _{i:z_i=k} ({\hat{x}}_{i,1} - \theta _i)^2/2\).
Add or remove an empty community
When adding or removing an empty community, the acceptance probability is:
$$\begin{aligned} \alpha (K^*\vert K )= \min \left\{ 1, \frac{p(\mathbf {z}\vert K^*)p(K^*)q_\varnothing }{p(\mathbf {z}\vert K)p(K)} \right\} , \end{aligned}$$
Where \(q_\varnothing =q(K\vert K^*,\mathbf {z})/q(K^*\vert K,\mathbf {z})\) is the proposal ratio, equal to (i) \(q_\varnothing =2\) if the proposed number of clusters \(K^*\) equals the number of non-empty communities in \(\mathbf {z}\); (ii) \(q_\varnothing =0.5\) if there are no empty clusters in \(\mathbf {z}\); and (iii) \(q_\varnothing =1\) otherwise. Note that the acceptance probability is identical to Sect. 4.3, Sanna Passino and Heard (2020), and it does not depend on the marginal likelihoods.
Inference with different community-specific kernels
When communities are assumed to have different functional forms, it is required to introduce a prior distribution \(p({\varvec{\xi }}_k),\ {\varvec{\xi }}_k=(\xi _{k,1},\dots ,\xi _{k,d})\), on the GP kernels, supported on one or more classes of possible kernels \({\mathcal {K}}\). Under this formulation, a proposal to change the community-specific kernel could be introduced. Conditional on the allocations \(\mathbf {z}\), the k-th community is assigned a kernel \({\varvec{\xi }}^*=(\xi ^*_1,\dots ,\xi ^*_d)\) with probability:
$$\begin{aligned} p({\varvec{\xi }}_k={\varvec{\xi }}^*\vert \hat{\varvec{X}},K,\mathbf {z},{\varvec{\theta }}) \propto p({\varvec{\xi }}^*)\prod _{j=1}^d p(\hat{\varvec{X}}_{k,j}\vert K,\mathbf {z},{\varvec{\theta }}, \xi ^*_j), \end{aligned}$$
normalised for \({\varvec{\xi }}^*\in {\mathcal {K}}\), where \(p(\hat{\varvec{X}}_{k,j}\vert K,\mathbf {z},{\varvec{\theta }}, \xi ^*_j)\) is marginal (15) calculated under kernel \(\xi ^*_j\). The prior distribution \(p({\varvec{\xi }}^*)\) could also be used as proposal for the kernel of an empty community (cf. Sect. 4.3.2). Similarly, in the merge move (cf. Sect. 4.3.1), the GP kernel could be sampled at random from the two kernels assigned to \(z_i\) and \(z_j\), correcting the acceptance probability (14) accordingly.