Link prediction in dynamic networks using random dot product graphs

The problem of predicting links in large networks is an important task in a variety of practical applications, including social sciences, biology and computer security. In this paper, statistical techniques for link prediction based on the popular random dot product graph model are carefully presented, analysed and extended to dynamic settings. Motivated by a practical application in cyber-security, this paper demonstrates that random dot product graphs not only represent a powerful tool for inferring differences between multiple networks, but are also efficient for prediction purposes and for understanding the temporal evolution of the network. The probabilities of links are obtained by fusing information at two stages: spectral methods provide estimates of latent positions for each node, and time series models are used to capture temporal dynamics. In this way, traditional link prediction methods, usually based on decompositions of the entire network adjacency matrix, are extended using temporal information. The methods presented in this article are applied to a number of simulated and real-world graphs, showing promising results.


Introduction
Link prediction is defined as the task of predicting the presence of an edge between two nodes in a network, based on latent characteristics of the graph (Liben-Nowell and Kleinberg, 2007). The problem has been widely studied in the literature (Lü and Zhou, 2011;Menon and Elkan, 2011), and has relevant applications in a variety of different fields. In this paper, the discussion about link prediction is motivated by applications in cyber-security and computer network mon-* The author is currently at Securonix Threat Labs, Securonix Inc., Addison (TX). This work was completed when the author was at Microsoft 365 Defender, Microsoft Corporation, Redmond (WA).
itoring (Jeske et al., 2018). The ability to correctly predict and associate anomaly scores with the connections in a network is valuable for the cyber-defence of enterprises. In cyber settings, adversaries may introduce changes in the structure of an enterprise network in the course of their attack. Therefore, predicting links in order to identify significant deviations in expected behaviour could lead to the detection of an otherwise extremely damaging network breach. In particular, it is necessary to correctly score new links (Metelli and Heard, 2019), representing previously unobserved connections. The task is particularly important since it is common to observe malicious activity associated with new links (Neil et al., 2013), and it is therefore crucial to understand the normal process of link formation in order to detect a cyber-attack.
In this article, it is assumed that snapshots of a dynamic network are observed at discrete time points t = 1, . . . , T , obtaining a sequence of graphs G t = (V, E t ). The set V represents the set of nodes, which is invariant over time, whereas the set E t is a time dependent edge set, where (i, j) ∈ E t for i, j ∈ V , if i connected to j at least once during the time period (t−1, t]. Each snapshot of the graph can be characterised by the adjacency matrix A t ∈ {0, 1} n×n , where n = |V | and for 1 ≤ i, j ≤ n, A ijt = 1 Et {(i, j)}, such that A ijt = 1 if a link between the nodes i and j exists in (t − 1, t], and A ijt = 0 otherwise. The graph is said to be undirected if (i, j) ∈ E t ⇐⇒ (j, i) ∈ E t , implying that A t is symmetric; otherwise, the graph is said to be directed. It will be assumed that the graph has no self-edges, implying A t is a hollow matrix. Similarly, bipartite graphs G t = (V 1 , V 2 , E t ) can be represented using two node sets V 1 and V 2 , and rectangular adjacency matrices A t ∈ {0, 1} n1×n2 , n 1 = |V 1 | , n 2 = |V 2 |, where A ijt = 1 if i ∈ V 1 connects to j ∈ V 2 in (t − 1, t]. This paper discusses methods for temporal link prediction (Dunlavy et al., 2011): given a sequence of adjacency matrices A 1 , . . . , A T observed over time, the main objective is to reliably predict A T +1 . In this article, temporal link prediction techniques based on random dot product graphs (RDPG, Young and Scheinerman, 2007) are discussed and compared.
graphs (Arroyo-Relión et al., 2020), will be analysed for link prediction purposes, and compared to standard spectral embedding techniques.
The main contribution of this work is to adapt the existing methods for multiple RDPG graph inference for temporal link prediction. Furthermore, this article proposes methods to combine the information obtained via spectral methods with time series models, to capture the temporal dynamics of the observed graphs. The proposed methodologies will be extensively compared on real world and simulated networks. It will be shown that this approach significantly improves the predictive performance of multiple RDPG models, especially when the network presents a seasonal or temporal evolution. Overall, this article provides insights into the predictive capability of random dot product graphs, and gives guidelines for practitioners on the optimal choice of the embedding for temporal link prediction.
Importantly, the strategies for combination of individual embeddings, and their time series extensions, can in principle be applied to any embedding method for static graphs, despite the main focus on RDPGs of this article. This article primarily focuses on RDPGs because of the wide variety of embedding techniques which have been suggested in the literature under this model, but that so far have not been compared for link prediction purposes.
The article is organised as follows: Section 2 discusses related literature around link prediction, and Section 3 introduces background on the generalised random dot product graph and adjacency spectral embeddings, the main statistical tools used in this paper. Methods for link prediction based on random dot product graphs are discussed in Section 4. Section 5 presents techniques to improve the predictive performance of the RDPG models, based on time series models. Results and applications on simulated and real world networks are finally discussed in Section 6.

Related literature
Many other models other than RDPGs have been proposed in the literature for link prediction. Traditionally, the temporal link prediction task is tackled using tensor decompositions (Dunlavy et al., 2011). Dynamic models have also been proposed in the literature of Poisson matrix factorisation and recommender systems (Charlin et al., 2015;Hosseini et al., 2020), and extended to Bayesian tensor decompositions (Schein et al., 2015). In general, including time has been shown to significantly improve the predictive performance in a variety of model settings, for example stochastic blockmodels (Ishiguro et al., 2010;Xu and Hero III, 2014;Xing et al., 2010). More generic latent feature models for dynamic networks have also been extensively discussed in the literature (Sarkar and Moore, 2006;Krivitsky and Handcock, 2014;Sewell and Chen, 2015).
Latent features are usually obtained via matrix factorisation, considering constant and time-varying components within the decomposition (Deng et al., 2016;Yu et al., 2017a,b). Usually, a Markov assumption is placed on the evolution of the latent positions Chen and Li, 2018). Gao et al. (2011) propose to combine node features into a temporal link prediction framework based on matrix factorisation. Nonparametric (Sarkar et al., 2014;Durante and Dunson, 2014) and deep learning (Li et al., 2014) approaches have also been considered.
More recently, advancements have been made in the application of deep learning to graph-valued data. In particular, deep learning methods on graphs are classified by Zhang et al. (2020) into five categories: graph recurrent neural networks, graph convolutional networks, graph autoencoders, graph reinforcement learning, graph adversarial methods. A comprehensive survey of existing static network embedding methods, including deep learning techniques, is provided in Cai et al. (2018). Commonly used static embedding methods in machine learning are DeepWalk (Perozzi et al., 2014), SDNE (Wang et al., 2016), node2vec (Grover and Leskovec, 2016), GraphSAGE (Hamilton et al., 2017), graph convolutional networks (GCN, Kipf and Welling, 2017), and Watch Your Step with Graph Attention (WYS-GA, Abu-El-Haija et al., 2018). Many of such methods could be unified under a matrix factorisation framework (Qiu et al., 2018). A systematic comparison of some of the aforementioned methodologies for unsupervised network embedding is provided in Khosla et al. (2021). Methodologies have also been recently proposed in the dynamic network setting, within the context of representation learning (for example, Nguyen et al., 2018;Kumar et al., 2019;Liu et al., 2019;Qu et al., 2020), and deep generative models (for example, Zhou et al., 2020). The interested reader is referred to the survey of Kazemi et al. (2020) and references therein.
Again, it is emphasised that the objective of this paper is not to claim that the RDPG is superior to competing models, but to provide guidelines for practitioners using RDPGs in their application domains, offering insights on the performance of these models for link prediction purposes.

Random dot product graphs and adjacency spectral embedding
In this section, the generalised random dot product graph (Rubin-Delanchy et al., 2017) and methods for estimation of the latent positions are formally introduced. Suppose A ∈ {0, 1} n×n is a symmetric adjacency matrix of an undirected graph with n nodes.
Definition 1 (Generalised random dot product graph -GRDPG). Let d + and d − be non-negative integers such that with p ones and q minus ones. Let F be a probability measure on X, A ∈ {0, 1} n×n be a symmetric matrix and X = ( The adjacency spectral embedding (ASE) provides consistent estimates of the latent positions in GRDPGs (Rubin-Delanchy et al., 2017), up to indefinite orthogonal transformations.
Definition 2 (Adjacency spectral embedding -ASE). For d ∈ {1, . . . , n}, consider the spectral decomposition A = ΓΛΓ +Γ ⊥Λ⊥Γ ⊥ , whereΛ is a d × d diagonal matrix containing the top d eigenvalues in magnitude, in decreasing order,Γ is a n × d matrix containing the corresponding orthonormal eigenvectors, and the matricesΛ ⊥ andΓ ⊥ contain the remaining n − d eigenvalues and eigenvectors. The adja- where the operator |·| applied to a matrix returns the absolute value of its entries.
If the graph is directed, and the adjacency matrix is not symmetric, it could be implicitly assumed that the generating model is P(A ij = 1) = x i y j , x i , y j ∈ X, where each node is given two different latent positions, representing the behaviour of the node as source or destination of the link. In this case, the embeddings can be estimated using the singular value decomposition (SVD).
Definition 3 (Adjacency embedding of the directed graph -DASE). Given a directed graph with adjacency matrix A ∈ {0, 1} n×n , and a positive integer d, 1 ≤ d ≤ n, consider the singular value decomposition is diagonal matrix containing the top d singular values in decreasing order,Û ∈ R n×d andV ∈ R n×d contain the corresponding left and right singular vectors, and the matricesD ⊥ ,Û ⊥ , andV ⊥ contain the remaining n − d singular values and vectors. The d-dimensional directed adjacency embedding of A in R d , is defined as the pair The DASE can be also naturally extended to bipartite graphs.

Dynamic link prediction in random dot product graphs
Given a time series of network adjacency matrices A 1 , A 2 , . . . , A T , the objective is to correctly predict A T +1 . The most common approach in the literature (Sharan and Neville, 2008;Scheinerman and Tucker, 2010;Dunlavy et al., 2011) is to analyse a collapsed versionÃ of the adjacency matrices:Ã where ψ 1 , . . . , ψ T ∈ R is a sequence of weights. Scheinerman and Tucker (2010) propose to consider an average adjacency matrix, setting ψ t = 1/T ∀ t = 1, . . . , T , which corresponds to the maximum likelihood estimate of E(A t ) if A 1 , . . . , A T are sampled independently from the same Bernoulli(XX ) distribution. The main limitation of such a model is that it is assumed that the graphs do not display any temporal evolution. Furthermore, if (4.1) is used, it is assumed that all the possible edges of the adjacency matrix follow the same dynamics. Obtaining the ASEX = [x 1 , . . . ,x n ] ofÃ leads to an estimate the scores: For simplicity, the inner product (4.2) is not weighted by the matrix I(d + , d − ), implicitly assuming d + = d and d − = 0. The estimation approaches in (4.1) and (4.2) will be used as baselines for comparison with alternative methods for temporal link prediction techniques using RDPGs, which will be proposed and discussed in the remainder of this section. The proposed methods will be classified in the following two categories: • averages of inner products of embeddings (AIP), • inner products of an average of embeddings (IPA). First, it is possible to consider the individual ASE for each adjacency matrix A t , and calculate an AIP score: A second option is to obtain an averaged embeddinḡ X fromX 1 ,X 2 , . . . ,X T , and use this for predicting the link probabilities. This procedure is slightly more complex than (4.3), since the embeddings are invariant to orthogonal transformations: given an orthogonal matrix Ω t ∈ R d×d , . Therefore, the embed-dingsX 1 , . . . ,X T only provide estimates of the corresponding latent positions up to an orthogonal transformation, which could vary for different values of t. Consequently, the embeddings must be suitably aligned or registered, before a direct comparison can be carried out. Discussion of a technique to align the embeddings is deferred to Appendix A. Assuming that an averaged embeddingX is obtained, the matrix of IPA scores for prediction of A T +1 is: (4.4) Similar scoring mechanisms can be derived for the techniques of multiple graph inference described in Section 1, such as the omnibus embedding (Levin et al., 2017), based on the the omnibus matrix: The ASEX ofÃ gives T latent positions for each node. The individual estimatesX t = [x 1t , . . . ,x nt ] of the latent positions for the t-th adjacency matrix are represented by the submatrix formed by the estimates between the ((t − 1)n + 1)-th and tn-th row ofX. Then, from the time serieŝ X 1 ,X 2 , . . . ,X T of omnibus embeddings, a matrix of scores can be obtained using either AIP (4.3) or IPA (4.4). In this case, the individual embeddings are directly comparable and an alignment step is not required. On the other hand, the omnibus embedding cannot easily be updated when new graphs A T +1 , A T +2 , . . . become available, sinceÃ and the embedding must be recomputed for each new snapshot. The idea of an omnibus embedding can be also easily extended to directed and bipartite graphs, constructing the matrixÃ analogously and then calculating the DASE. Embeddings generated using the more parsimonious COSIE model (Arroyo-Relión et al., 2020) are also considered. In COSIE networks, the latent positions are assumed to be common across the T snapshots of the graph, but the link probabilities are scaled by a time-varying matrix R t ∈ R d×d : The common latent positions X and the time series of weighting matrices R 1 , . . . , R T can be estimated via multiple adjacency spectral embedding (MASE, Arroyo-Relión et al., 2020).
Definition 4 (Multiple adjacency spectral embedding -MASE). Given a sequence of network adjacency matrices A 1 , . . . , A T , and an integer d ∈ {1, . . . , n}, obtain the individual ASEsX t =Γ t |Λ t | 1/2 ∈ R n×d . Then, construct the n × T d matrixΓ = [Γ 1 , . . . ,Γ T ] ∈ R n×T d , and consider its singular value decompositionΓ =ÛDV +Û ⊥D⊥V ⊥ , whereD ∈ R d×d + is a diagonal matrix containing the top d singular values in decreasing order,Û ∈ R n×d andV ∈ R T d×d contain the corresponding left and right singular vectors, and the matricesD ⊥ ,Û ⊥ , andV ⊥ contain the remain-ing singular values and vectors. The d-dimensional multiple adjacency embedding of A 1 , . . . , A T in R d is given bŷ X =Û, which provides an estimate of X, and the sequencê R 1 , . . . ,R T , whereR For prediction, the matrix of AIP scores could be obtained from the time series of estimated link probabilitieŝ XR 1X , . . . ,XR TX : Alternatively, an averagedR could be equivalently obtained from the time series of estimatesR 1 , . . . ,R T . Com-biningR with the estimate of the latent positionsX yields the IPA scores: S IPA =XRX . (4.7) The COSIE model can also be extended to directed and bipartite graphs assuming E(A t ) = XR t Y . This construction leads to estimatesR t =Û A tV , whereÛ andV are estimates of X and Y obtained from MASE on the DASE embeddingsX 1 , . . . ,X T andŶ 1 , . . . ,Ŷ T , based on two ma-tricesΓ constructed from the left and right singular vectors (cf. Definition 4). In summary, several link prediction schemes based on random dot product graph models have been proposed, corresponding to three different types of spectral embedding: • scores based on individual embeddings, cf. (4.3) and (4.4), • omnibus scores, cf. (4.3) and (4.4), based on the matrix representation in (4.5), • COSIE scores, cf. (4.6) and (4.7). Two scores, denoted AIP and IPA, are calculated for each embedding type. All methods will be compared to the popular collapsed adjacency matrix method in (4.2).
The methods described in this section are all based on truncated eigendecompositions of some form of the adjacency matrix. The full eigendecomposition of a n × n dense matrix requires a cubic computational cost O(n 3 ), but only d eigenvectors and eigenvalues, where d is in general O(1), are required in the algorithms presented in this section. This reduces the computational effort to O(n 2 ) (Yang et al., 2021). Also, graph adjacency matrices are binary and normally highly sparse. In this setting, efficient algorithms based on the power method calculate the required decomposition of a matrix A in O{nnz(A)P(1/ε)} (Ghashami et al., 2016), where nnz(·) denotes the number of non-zero entries of the matrix, P(·) is a polynomial function and ε ∈ (0, 1) is an approximation error parameter. This allows the methodologies in this section to be scalable to large networks with the support of modern computer libraries. For example, calculating the individual embeddingsX 1 , . . . ,X T only has complexity O{P(1/ε) T t=1 nnz(A t )}. COSIE adds a further SVD decomposition in the MASE algorithm, which requires further O(nd 2 ) operations. On the other hand, calculating the omnibus embedding for large graphs might quickly become cumbersome, especially if T is large, since up to O(n 2 T 2 ) operations are required.

Improving prediction using time series models
The collapsed matrix used in (4.1) assumes that the underlying dynamics of each link are the same across the entire graph. This assumption is particularly limiting in real world applications, where different behaviours might be associated with different nodes or links. Instead, edge specific matrix parameters Ψ 1 , . . . , Ψ T ∈ R n×n might be able to more reliably capture the behaviour of each edge. A modification of the collapsed matrixÃ in (4.1) is therefore proposed: where Ψ 1 , . . . , Ψ T ∈ R n×n is a sequence of weighting matrices, and is the Hadamard element-wise product. The matrix in (5.1) is denoted predicted adjacency matrix. Note that in (5.1), the weights can only be estimated for those entries such that A ijt = 1 for at least one t ∈ {1, . . . , T }, but the ASE ofÃ still allows to estimate non-zero link probabilities even for those edges such that A ijt = 0 ∀t. The idea could be easily extended to all the other prediction settings proposed in Section 4, replacing the average link probability or average embedding with an autoregressive combination. For example, from the sequence of standard em-beddingsX 1 ,X 2 , . . . ,X T , it could be possible to obtain the scores as: Alternatively, it could be possible to use a similar technique to obtain an estimateX The scores are then obtained as: Similarly, for COSIE, the scores could be based on a linear combination ofR 1 , . . . ,R T to estimate R T +1 . The equations (5.2) and (5.3) define two different methodologies to extend multiple RDPG inference models to a dynamic setting. Following the same nomenclature used in Section 4, the scores based on time series models have been respectively denoted as PIP for the predicted inner product (5.2), and IPP for the inner product of the prediction (5.3). The PIP scores have a particularly interesting property: the node-specific embeddings are combined with time series models at the edge levels, fusing information at two different network resolutions.
Note that, for COSIE scores, S AIP = S IPA from (4.6) and (4.7), but S PIP = S IPP . This is because for S PIP , the scores are obtained directly from a prediction based on the estimated link probabilities, whereas for S IPA , the prediction is based on an estimate of R T +1 fromR 1 , . . . ,R T .
For estimation of the weighting matrices Ψ 1 , . . . , Ψ T , the time series of link probabilities or node embeddings will be modelled independently. Seasonal autoregressive integrated (SARI) processes represent a flexible modelling assumption. A univariate time series Z 1 , . . . , Z T ∈ R, is a SARI(p, b)(P, B) s with period s if the series is a causal autoregressive process defined by the equation (Brockwell and Davis, 1987). For example, consider a pro- The parameters of the process p, b, P, B and s are all integervalued, and can be interpreted as follows: s is the seasonal periodicity; b corresponds to the differencing required to make the process stationary in mean, variance, and autocorrelations, and B refers to the seasonal differencing; p is to the number of autoregressive terms appearing in the equation, and similarly P refers to the number of seasonal autoregressive terms. More details about SARI models and their interpretation are given in Brockwell and Davis (1987).
The value of s usually depends on the application domain. For example, in computer networks with daily network snapshots, it is reasonable to assume s = 7, which represents a periodicity of one week. The remaining parameters, p, b, P and B, could be estimated using information criteria. For small values of T , the corrected Akaike information criterion (AICc) is preferred. The corresponding coefficients of the polynomials φ(v) and Φ(v), and the variance σ 2 , can be estimated via maximum likelihood (Brockwell and Davis, 1987). For a discussion on automatic selection of the parameters in SARI models, see Hyndman and Khandakar (2008).
For prediction of future values Z t+1 conditional on Z 1 , . . . , Z t , the general forecasting equation is obtained from (5.4), setting ε t+1 to its expected value E(ε t+1 ) = 0, and obtaining an estimateẐ t+1 solving from the known terms of the equation. Analogously, k-steps ahead forecasts for Z t+k can also be obtained.
In this article, the univariate time series Z 1 , . . . , Z T modelled using SARI are of four different types: • Time series of estimated link probabilities obtained from any embedding method: for examplê x i1x j1 , . . . ,x iTx jT for the standard ASE, representing the sequence of estimated scores for the edge (i, j). This type of time series is used to obtain PIP scores, see (5.2); • Time series of node embeddings on a given embedding dimension: for examplex ir1 , . . . ,x irT , obtained considering only the i-th node embedding on the r-th dimension fromX 1 , . . . ,X T . Such values are used for the IPP scores, see (5.3); • Time series of entries of the COSIE matrix, used to obtain IPP scores, see (5.3). For example: R kh1 , . . . ,R khT , corresponding to the (k, h)-th entry in R 1 , . . . ,R T ; • Times series of binary entries of the network adjacency matrices: for example A ij1 , . . . , A ijT for the edge (i, j), used for (5.1). The coefficients for each entry of the weighting matrices Ψ 1 , . . . , Ψ T are obtained by matching (5.1), (5.2) and (5.3) with the model equation (5.4).
In this work, the binary time series A 1 , A 2 , . . . , A T is also modelled using indipendent SARI models for estimation of (5.1). Such a modelling approach might not be entirely technically suited for binary-valued time series, but this choice has relevant practical advantages: most programming languages have packages for automatic estimation of the parameters in SARI models, whereas the choice of initial values and estimation of the parameters in most generalised process for binary time series (MacDonald and Zucchini, 1997; Kauppi and Saikkonen, 2008;Benjamin et al., 2003) is notoriously difficult, which is not desirable when the estimation task should be performed automatically and in parallel over a large set of time series.
The time series modelling extensions presented in this section are computationally expensive, since up to n 2 or nd time series model are fitted, each with complexity O(T k), where k is the number of models compared for the purpose of model selection using AICc. This results in a computational cost of O(n 2 T k) for PIP scores or O(ndT k) for IPP scores, difficult to manage for large networks. Therefore, with the exception of the IPP COSIE scores, the methodologies proposed in this section do not scale well to large networks, unlike the techniques proposed in Section 4. In the case of the IPP COSIE score (4.7), the cost for prediction of future values of R t is only O(d 2 T k), independent of n.
The methodologies for constructing link prediction scores discussed in this work are summarised in Figure 1 in a flowchart. In summary, three choices must be made: • embedding method (collapsed adjacency matrix, standard ASE, omnibus ASE, COSIE embedding); • combination method (average, cf. Section 4, or predic-tion, cf. Section 5); • inner product (combination of inner products or inner product of a combination). Clearly, the same methodology could be applied to any embedding method, not necessarily based on the RDPG. Some examples will be given in Section 6.2.2.

Results
The proposed methods were tested on synthetic data and on real world dynamic networks from different application domains: transportation systems, cyber-security, and coauthorship networks. For all examples, the parameter d was selected using the profile likelihood elbow criterion of Zhu and Ghodsi (2006), unless otherwise specified.
6.1 Simulated data 6.1.1 Seasonal stochastic blockmodel The performance of the rival link prediction techniques discussed in this article is initially compared on simulated data from stochastic blockmodels. The stochastic blockmodel (Holland et al., 1983) can be interpreted as a special case of a GRDPG (Rubin-Delanchy et al., 2017): each node i is assigned a latent community z i ∈ {1, . . . , K}, with corresponding latent position µ zi ∈ R d ; the probability of a link (i, j) only depends on the community allocation of the two nodes: P(A ij = 1) = µ zi I(d + , d − )µ zj . To simulate a stochastic blockmodel, a within-community probability matrix B = {B ij } ∈ [0, 1] K×K was generated, where B ij ∼ Beta(1.2, 1.2) is the probability of a link between two nodes in communities i and j, and K is the number of communities. The matrix has full rank with probability 1, hence K = d. In the simulation, T = 100 graph snapshots with n = 100 and K = 5 were generated. The community allocations were chosen to be time dependent, assuming a seasonality of one week. For each node, community allocations z i,u , u = 0, . . . , s − 1, with s = 7, were sampled uniformly from {1, . . . , K}. Then, the adjacency matrices were obtained as: Therefore, the link probabilities change over time, with a periodicity of 7 days. The models presented in Section 4 were fitted using the first T = 80 snapshots of the graph, with the objective of predicting the remaining T − T adjacency matrices. The methods that are initially compared are: • ASE of the averaged adjacency matrix, cf. (4.1) and (4.2), • AIP and IPA scores calculated from the standard ASE, cf. (4.3) and (4.4), • AIP and IPA scores calculated from the omnibus embedding, cf. (4.5), • AIP and IPA scores calculated from COSIE, cf. (4.6) and (4.7). The link prediction problem can be framed as a binary classification task. Hence, the performances of the methods presented in this article are evaluated using the area under the receiver operating characteristic (ROC) curve, commonly known as AUC scores. The results are plotted in Figure 2. The best performance is achieved by the AIP score based on the standard ASE, which outperforms the commonly used method of the collapsed adjacency matrix (4.1).
It is anticipated that the predictions should be improved upon by the PIP and IPP extensions presented in Section 5, since the simulated network has clear dynamics which are not explicitly taken into account using the techniques from Section 4. In particular, four methods are discussed: • ASE of the predicted adjacency matrix, cf. (5.1), with weights obtained from independent seasonal SARI(p, b)(P, B) 7 processes fitted on each binary sequence A ij1 , A ij2 , . . . , A ijT such that at least one A ijt = 1; • PIP scores calculated from the standard ASE, cf.
(5.2), based on the edge time serieŝ x i1x j1 ,x i2x j2 , . . . ,x iT x jT obtained from the individual ASEs on each A 1 , A 2 , . . . , A T ; • IPP scores calculated from the standard ASE, cf. (5.3), based on prediction of the subsequent embeddings X T +1 ,X T +2 , . . . from the time series of aligned 1 individual ASEsX 1 , . . . ,X T ; • IPP scores calculated from COSIE, based on prediction of correction matricesR T +1 ,R T +2 , . . . from the time seriesR 1 , . . . ,R T , where independent models are fitted to the d × d time series corresponding to each entry. The time series models were fit using the function auto_arima in the statistical python library pmdarima, using the corrected AIC criterion to estimate the number of parameters. The results are presented in Figure 3.
The AIP method (4.3) which had the best performance in Figure 2, is significantly improved by the PIP score (5.2) using time series modelling, and it is overall the only method that reaches values of the AUC well above 0.8. Remarkably, the performance of the predicted adjacency matrix method in (5.1) outperforms the results based on most of the other methods, despite the issues related to the modelling of binary time series pointed out in Section 5. On the other hand, the improvements obtained using the COSIE-based scores and the IPP score (5.3) seem to be less significant compared to the Embedding method -ASE of the collapsed adjacency matrix -standard ASE -omnibus ASE -COSIE embedding

Combination method -average (A) -prediction (P)
Inner product -inner product of the combination (IP-) -combination of the inner products (-IP) Link prediction method e.g. AIP scores, omnibus ASE (average of inner products obtained from the omnibus ASE embedding) Figure 1: Flowchart summarising the construction of a link prediction method using the techniques in Sections 4 and 5.  two other methods. This aspect will also be confirmed on real data examples in the next section. In general, it is clear from the plots in Figure 3 that adding temporal dynamics to the network via time series modelling is beneficial for link prediction purposes. In particular, including edge specific information from the time series of estimated link probabilities, or from the binary time series of links, has significantly improved link prediction.

Logistic dynamic network model
Next, the effect of the dynamic component on the prediction is evaluated. A directed dynamic graph with n = 100 and T = 100 is simulated, assuming where b ij is a baseline, such that b ij ∼ Uniform(−6.9, 0) independently for all pairs (i, j), implying v ij1 ∈ (0.001, 0.5). Furthermore, θ ∈ R is a trend parameter common to all the possible edges, and c ij ∈ {−1, 1} is the sign of the trend on each edge, such that P(c ij = 1) = P(c ij = −1) = 1/2. Note that if θ = 0, the graph does not have any dynamics, whereas if |θ| increases, the graph dynamics also increases. Note that,  θ ∈ {0, 0.025, 0.05, 0.075}, and the AIP and PIP scores obtained from standard ASE trained on the first T adjacency matrices are calculated for prediction of the last T − T network snapshots. The results are then compared with the AIP scores with standard ASE, sequentially updated when new network snapshots become available, used for a 1-step ahead prediction of the next snapshot. If the network has a relevant dynamic component, the difference between the AUC obtained from the sequential and non-sequential AIP scores increases over time, because the network structure changes and the non-sequential scores cannot capture such evolution.
On the other hand, the difference between the sequential AIP scores and the PIP scores should not show an increasing trend over time, since the time dynamics is taken into account via time series modelling. Figure 4 plots the time series of differences between the sequential AIP scores and the corresponding non-sequential scores (Figure 4a), and the difference between the sequential AIP scores and the PIP scores (Figure 4b), for four different values of θ. The plot demonstrates that in presence of a strong dynamic component, the sequential scores outperform non-sequential scores over time, as expected. On the other hand, the PIP scores perform similarly to the sequential scores, and the trend in the differences seem to disappear, up to fluctuations: this is overall remarkable, since the PIP scores are used for up to (T − T )-steps ahead predictions of matrices of scores based only on the initial T adjacency matrices, whereas the sequential scores also use the last T − T adjacency matrices sequentially for 1-step ahead predictions.

Santander bikes
Santander Cycles is a self-service cycle hire scheme operated in central London. Data about usage of the bikes are periodically released by Transport for London 2 . In this example, data from 7 March, 2018 to 19 March, 2019 were used, for a total of T = 378 days. Each bike sharing station is considered as a node, and an undirected edge (i, j, t) is drawn if at least one journey between the stations i and j is completed on day t. The total number of docking stations in London is n = 840. The daily graphs are fairly dense, with an average edge density of approximately 10% across the T networks. The first T = 250 graphs are used as training set.

Averaged scores
Initially, the methods compared are four of the techniques used to produce Figure 2: • ASE of the averaged adjacency matrix, • AIP and IPA scores calculated from the standard ASE, • IPA scores calculated from COSIE. For the Santander Cycles network, the results are reported in Figure 5 for d = 10. In Figure 5, the performance of the classification procedure drops around day 294. This corresponds to Christmas day, which has a different behaviour compared to non-festive days. It is also interesting to point out that COSIE methods tends to perform better on weekdays than weekends, whereas the other methods more accurately predict the links on weekends compared to weekdays. The results of the link prediction procedure in Figure 5 suggest that the data might not have a long term trend, but only a seasonal component, since the performance does not significantly decrease over time, and the parameters obtained using a training 2 The data are freely available at https://cycling.data.tfl.gov.  set of size T = 250 seem to reliably predict the structure of the adjacency matrix even at T = 378. Overall, this example seems to confirm that the method of AIP scores (4.3) based on standard ASE has the best performance for link prediction purposes when time dynamics are not included.

Comparison with alternative methods
To provide further comparison, the methods proposed in Section 4 are also compared in Figure 6 to other methods used for link prediction in the literature. In order to demonstrate that the proposed methodology could be readily extended to any embedding technique, the embeddings were calculated from each of the adjacency matrices A 1 , . . . , A T , and prediction scores were obtained using the AIP methodology, akin to (4.3). The alternative methods considered in this section are:  mated from T independent logistic regression models with response A ijt and d predictors of the formx it x jt . The link probabilities are then combined using the AIP method (4.3), and used to predict connections in the last T − T observed graphs. • Predictive scores calculated from three methods specifically developed for representation learning of dynamic graphs: the Deep Embedding Auto-Encoder Method for Dynamic Graphs (DynGEM, Goyal et al., 2017), the dynamic graph2vec autoencoder recurrent neural network model (DyG2V-AERNN, Goyal et al., 2020) 4 , and the Dynamic Self-Attention Network (DySAT, Sankar et al., 2020) 5 . The methods were run using the default implementation of the software packages, setting the output embedding dimension to d = 10 and the batch size to 100. Link probabilities for the T − T graphs in the test set were calculated using the same procedure described for GraphSAGE, GCN-DGI and WYS-GA. The best performance among the alternative methods is achieved by the NMF scores, which achieve an almost equivalent performance to the AIP scores obtained from standard ASE. Slightly inferior performance is achieved by DyG2V-AERNN and DySAT, despite consistently exceeding 0.85 in AUC. GCN, Jaccard, WYS-GA and GraphSAGE have a slightly worse performance, but still consistently achieve AUC scores exceeding 0.8. All of the proposed methodologies perform better than the Adamic-Adar and DynGEM, which seem to be largely outperformed by spectral embedding methods. Note that representation learning methods shine in particular when applied to large graphs with rich node features (for example, Hamilton et al., 2017), which is clearly not the case for the Santander cycles network. Furthermore, hyperparameter tuning is necessary to obtain a good link prediction performance, whereas RDGP-based methods only require the 4 The models were fitted using the implementation in the python library DynamicGEM (Goyal et al., 2018). 5 The model was fitted using the code in the GitHub repository https: //github.com/aravindsankar28/DySAT. embedding dimension as input, and no further tuning is required.
As discussed in Section 1, it should be noted that the methodologies of AIP and IPA scores proposed in Section 4, and the corresponding PIP and IPP extensions in Section 5, could be applied to any sequence of individual embeddings, not necessarily obtained using ASE, but also with other embedding methods for static networks, for example NMF, as demonstrated in this section. Since one of the main objectives of this paper is comparing different embedding methods based on RDPGs, the focus in subsequent examples will be only on RDPG-based techniques.

Sequential scores
So far, the embeddings learned using the initial T snapshots have been used to predict all the remaining T − T adjacency matrices. In practical applications, it would be necessary to sequentially update the scores when new snapshots become available over time, improving the predictive performance. This is demonstrated in Figure 7, where the AIP scores for standard ASE and the average adjacency matrix scores from Figure 5 are compared with their sequential counterparts, obtained when the model is updated using each new observation A t in the test set. Clearly, updating the scores sequentially is beneficial, especially towards the end of the test set, whereas the difference between the methodologies is negligible in the initial snapshots of the test set. Both methods seem to reliably predict even k-steps ahead network snapshots, since the nonsequential curves in Figure 7 are fairly close to their sequential counterparts. The method of AIP scores based on standard ASE outperforms the scores based on the averaged adjacency matrix, commonly used in the literature, including sequential settings. Furthermore, the non-sequential AIP scores (4.3) also outperform the sequential averaged adjacency matrix. This result is quite remarkable, since the former only use the initial T snapshots of the network for training, whereas the latter is sequentially updated with the snapshots in the test set.

Predicted scores
The performance of the classifiers can be improved using some of the time series model in Section 5. Figure 8a show the results obtained from the prediction of subsequent COSIE correction matrices. The predictive performance is slightly improved by the extended time series models. Again, it is empirically confirmed that adding temporal dynamics is beneficial for the performance of random dot product graph based classifiers.
On the other hand, predicting the subsequent adjacency spectral embeddings from the time series of aligned embed-dingsX 1 ,X 2 , . . . ,X T does not seem to improve the predic- t AUC averaged adjacency matrix, non-sequential averaged adjacency matrix, sequential AIP scores, standard ASE, non-sequential AIP scores, standard ASE, sequential Figure 7: Results for the AIP scores (4.3) and averaged adjacency matrix scores for the Santander Cycles network, with and without sequential updates.
tive performance. The results are presented in Figure 8b, and confirms the findings in Figure 3b, where the improvements on the simulated network were less significant compared to other methods. In this case, the time series models are not able to capture the dynamics of the aligned embeddings, and the predictive performance does not improve in AUC. The limited improvements in the results seem to suggest that the network does not have a strong dynamic component. The tradeoff between performance and the computational effort required to fit multiple independent time series simultaneously, would suggest use of the AIP scores (4.3) based on standard ASE in practical applications.

Los Alamos National Laboratory dataset
The unified host and network dataset (Turcotte et al., 2018) released by the Los Alamos National Laboratory (LANL) consists of a collection of network flow and host event logs generated from their machines running Microsoft Windows. From the host event logs, 90 daily user-authentication bipartite graphs have been constructed, writing A ijt = 1 if the user i initiates a connection authenticating to computer j, on day t. This graph is known as the user -destination IP graph. A total of n 1 = 12,222 users, n 2 = 5,047 hosts, and 85,020 pairs (i, j) are observed, corresponding to approximately 0.137% of all possible links.

Averaged scores and subsampling
The first T = 56 matrices are used as training set. Note that it is computationally difficult to calculate n 1 × n 2 scores for each adjacency matrix, and storing such large dense matrices in memory is also not efficient. Therefore, an estimate of the AUC can be obtained by subsampling the negative class at random from the zeroes in the test set adjacency matrices. Two subsampling techniques are used to construct the negative class for prediction of A t : (1) the negative class is constructed by sampling pairs (i, j) such that A ijt = 0, (2) the negative class contains randomly selected pairs (i, j) such that A ijt = 0, and all pairs (i, j) such that A ijt = 0 and A ijt = 1 for at least 1 value of t ∈ {1, . . . , T }. For simplicity, the two techniques are denoted with the numbers (1) and (2) in Figure 9, which reports the results for the 6 methods considered in Figure 2 in Section 6.1. The former subsampling technique provides an estimate of the ROC curve for the entire matrix, since the scores are sampled at random from the distribution of all scores. On the other hand, the latter method includes in the negative class more elements that tend to have associated high scores, represented by the pairs (i, j, t) such that A ijt = 0 and A ijt = 1 for at least 1 value of t ∈ {1, . . . , T }, giving an unbalanced sampling procedure and therefore a biased estimate of the ROC curve. Clearly, higher AUC scores are obtained using the first subsampling procedure.
Interestingly, in Figure 9a, the COSIE-based AIP scores seem to have the best predictive performance across the different methods. In particular, COSIE scores tend to largely outperform the other methods during weekdays, whereas the performance during weekends seems almost equivalent, and sometimes inferior, to the AIP scores (4.3) based on the standard ASE. On the other hand, in Figure 9b, COSIE scores   have the worst performance among the methods, except a spike on day 62. In Figure 9b, AIP scores (4.3) based on the standard ASE once again give the best predictive performance. The results of Figure 9b are of particular interest since these allow for a comparison of the classifiers on a more challenging negative class compared to Figure 9a. Therefore, it is reasonable to conclude that the AIP scores (4.3) emerge again as the most suitable method for link prediction based on random dot product graphs.

Significance of the difference between methods
It could be argued that the differences between the methodologies do not appear to be significant, and might be due to the subsampling scheme. In order to assess significance of the differences, the subsampling procedure was repeated for M = 100 times, and the corresponding AUCs were calculated. Figure 10 plots the estimated 95% confidence intervals for the difference between the AUCs obtained using different methods. Figure 10a uses the IPA scores calculated with COSIE as reference, and the AUCs are obtained with subsampling (1). Similarly, Figure 10b uses subsampling (2), and the AIP scores calculated with standard ASE are used  as reference. Note that the width of the confidence intervals in Figure 10 is barely visible, since the standard deviations of the AUC scores are < 10 −4 . This is because the number of subsamples is large enough to obtain extremely precise estimates of the AUC. Since the confidence intervals do not contain zero at any of the time points, the pairwise differences between the performance of different methodologies appear to be statistically significant. Similar confidence intervals, and corresponding p-values < 10 −6 were observed for all the comparisons in the current and next sections, suggesting that the subsampling procedure for estimation of the AUC is robust.

Streaming link prediction and sequential scores
For this example, it is also demonstrated how the proposed methods can be extended for streaming applications. For example, the method of combining inner products of individual embeddings is particularly suitable for implementation in a streaming fashion, since the inner products are easily updated on the run when new snapshots of the graph are observed. A demonstration using the method of fixed forgetting factors (Gama et al., 2013) is given here. The matrix of scores S t for prediction of A t+1 is updated in streaming as follows: starting from w 1 = 1 and S 1 =X 1X 1 . The forgetting factor λ ∈ [0, 1] is usually chosen to be close to 1 (Gama et al., 2013). Note that λ = 1 corresponds to the sequential AIP scores in (4.3), calculated as in Figure 7, whereas smaller values of λ give more weight to recent observations. In particular, λ = 0 only gives weight to the most recent snapshot. Schemes similar to (6.2) could be also implemented to update the collapsed adjacency matrix (4.1), obtaining the scores from its ASE at each point in time, or to update the matrix R t in (4) for the COSIE embeddings, or the rotated embeddingŝ X t . The forgetting factor approach might be interpreted as a simplification of the time series scheme proposed in Section 5, where the same weight is given to each edge.
The results for the entire observation period for a range of different values of λ are plotted in Figure 11. The best performance is achieved with the forgetting factor approach with λ ∈ [0.4, 0.8]. The performance for λ = 0 clearly drops around the weekends, since the network has a seasonal component which is not accounted for in the prediction. The difference between the curves for λ = 1 and λ < 1 suggests that the graph has temporal dynamics which is captured by the forgetting factor approach, which down-weights past observations in favour of more recent snapshots of the graph. This impression is confirmed by Figure 12, which shows the sequential and non-sequential AIP scores based on the standard ASE, akin to Figure 7. The predictive performance slightly deteriorates for snapshots that are further away in time, whereas 1-step ahead predictions based on the sequential scores consistently give better results.  Figure 12: Results for the AIP scores (4.3) for the LANL network, with and without sequential updates. AUCs are calculated from ≈ 150,000 links per graph, using subsampling (1).

Predicted scores
The performance can be again improved using the time series models in Section 5. Figure 13 shows the results obtained using the two different subsampling schemes for three methods: • ASE of the averaged and predicted adjacency matrix, • AIP and PIP scores calculated from standard ASE, • IPA and IPP scores calculated from COSIE. From Figures 13a, 13c and 13e, it is evident that the extensions do not improve the performance of the classifier when the subsampling scheme (1) is used. On the other hand, Figures 13b, 13d and 13f show that relevant improvements (especially on day 62) are obtained when the subsampling method (2) is used, which represents a more difficult classification task. Again, the results confirm that the performance of RDPGs for link prediction can be enhanced by time series models.

Imperial College network flow data
The methodologies proposed in Section 4 are also tested on a large computer network, demonstrating that the spectral embedding techniques are scalable to graphs of large size. A directed graph with n = 95,220 has been constructed using network flow data collected at Imperial College London, limiting the connections to internal hosts only. The data have been collected over T = 47 days, between 1st January and 16th February 2020. An edge between a client i and a server j is drawn on day t if the two hosts have connected at least once during the corresponding day. The number of edges ranges from a minimum of 344,565, observed on 1st January, to a maximum of 912,984 on 6th February.
The results are plotted in Figure 14. In this case, the best performance is achieved by the COSIE scores, similarly to the LANL network in Figure 9a. The traditional methodology of the averaged adjacency matrix is also outperformed by the AIP scores, which supports the results obtained in all the previous real data examples. As pointed out in Section 5, fit-ting the time series models on the sequenceR 1 , . . . ,R T of COSIE weighting matrices is inexpensive, and it is therefore possible to scale the time series extension of the IPA scores to a fairly large network. The results of this procedure are plotted in Figure 15, which shows again that the time series extension is beneficial for link prediction purposes.

DBLP dataset
Finally, the methodology is evaluated on a version of the DBLP co-authorship dataset 6 , extensively used in the computer science literature. The DBLP network is undirected, with n = 1,258,753 nodes, corresponding to authors of papers in computer science, for a total of 7,654,336 undirected co-authorship edges over T = 15 years, starting in 2000. An edge between two authors i and j is drawn if they co-authored a paper in year t. The network is extremely sparse, with edge densities ranging from 1.90 · 10 −7 in 2000 to 7.41 · 10 −7 to 2015. The number of co-authorships consistently increases over the years, corresponding to a steady increase in the number of publications in computer science journals and conferences.
The initial T = 10 graphs, from 2000 to 2009, are used as training set, whereas the last T − T graphs, from 2010 to 2014, correspond to the test set. The results are presented in Table 1, for the same models examined in Section 6.4 on the Imperial College network, for the two different subsampling schemes. Unsurprisingly, the AIP scores with standard ASE achieve the best performance. Limited improvement is obtained considering the time series extension on the IPA scores with COSIE.
The performance of the RDPG on the DBLP network is significantly worse than the other examples considered in this section, suggesting that RDPG-based methods might not be the most appropriate technique for this graph. This is because the graph is extremely sparse, and a large number of new nodes appears in the network over time. Such authors, before their first published paper, are represented by rows and columns filled with zeros in the adjacency matrix, acting as disconnected components in the graph. Therefore, learning the embedding for such nodes is particularly difficult for the RDPG, since a limited (or null) history of co-authorships is available.

Discussion and summary of results
The main methodological contribution of this article is an adaptation for temporal link prediction of existing RDPGbased methods for joint inference on multiple graphs. Also, this article proposes techniques to capture the temporal dynamics of the observed graphs, combining the link proba-  Figure 13: Comparison between three of the link prediction models in Figure 9, and their extensions using the methods in Section 5, on the LANL network. AUCs are calculated from ≈ 150,000 links per graph.
bilities and embeddings obtained via spectral methods with time series models. As demonstrated in Section 6.2.2, the proposed methods for combination of individual embeddings, and the extension based on time series modes, can be applied on any embedding method for static graphs, not necessarily restricted to RDPG models. The proposed methods have been applied on different synthetic and real-world datasets of different complexity, which highlighted the strengths and weaknesses of the proposed methodologies. In general, in most simulated and real-world networks analysed in this work, the AIP scores (4.3) based on standard ASE appear to consistently achieve a very good link prediction performance in terms of AUC scores. Both simulations and applications on real data demonstrated that adding temporal dynamics to the estimated link probabilities via time series modelling is beneficial for link prediction purposes (cf. Sections 6.1, 6.2.4 and 6.3.4). On the other hand, the time series extension is computationally expensive for most RDPG-based models, except the IPP scores based on COSIE scores. The extensions provide significant benefits only if the network presents a strong dynamic component, as demonstrated in the simulation on the logistic dynamic network model in Section 6.1.2.
The application to the Santander bikes data (cf. Section 6.2) highlighted that the random dot product graphs are an excellent option for link prediction for fairly small graphs without node or edge features. In such cases, the simplicity of RDPG-based models appears to overcome the advantages of deep learning methods (cf. Section 6.2.2), which instead shine when applied to large graphs with rich node and edge features (for example, Hamilton et al., 2017). Furthermore, it must be remarked that RDPG-based methods require minimal tun-   Figure 14: Results of the link prediction procedure on the Imperial College NetFlow network. AUCs are calculated from 6.7 million links per graph.     The applications on the LANL, ICL and DBLP networks (cf. Sections 6.3, 6.4 and 6.5) demonstrated that the methodologies of AIP and IPA scores are scalable to fairly large graphs, and a good performance is achieved when the set of nodes is stable over time. It has also been shown that the proposed methodologies, in particular the AIP scores, are well suited to streaming applications (cf. Section 6.3.3). The main limitation of the proposed link prediction framework is its inability to easily deal with new nodes appearing in the network, as exemplified by the application on the DBLP network in Section 6.5.

Conclusion
In this paper, link prediction techniques based on random dot product graphs have been presented, discussed and compared. In particular, link prediction methods based on sequences of embeddings, COSIE models, and omnibus embeddings have been considered. Applications on simulated and real world data have shown that one of the most common approaches used in the literature, the decomposition of a collapsed adjacency matrixÃ, is usually outperformed by other methods for multiple graph inference in random dot product graphs.
Estimating link probabilities with the average of the inner products from sequences of individual embeddings of different snapshots of the graph has given the best performance in terms of AUC scores across multiple datasets. This result is particularly appealing for practical applications: calculating the individual ASEs is computationally inexpensive using algorithms for large sparse matrices, and the method seems particularly suitable for implementation in a streaming fashion, since the average inner product could be easily updated on the run when new snapshots of the graph are observed, as demonstrated in Section 6.3. The methods discussed in the article have then been further extended to include temporal dynamics, using time series models. The extensions have shown improvements over standard random dot product graph based link prediction techniques, especially when the graph exhibits a strong dynamic component. The techniques presented in this article could also be readily extended to any embedding method for static graphs, following the framework for calculating AIP and IPA scores presented in Section 4, and the PIP and IPP extensions in Section 5. Therefore, our methodology is not only applicable to RDPG embeddings, particularly attractive for their theoretical properties and ease of implementation, but also to modern static embedding methods for graph adjacency matrices, for example graph neural network techniques like GCN or GraphSAGE. Overall, this article provides valuable guidelines for practitioners for using random dot product graphs as tools for link prediction in networks, providing insights into the predictive capability of such statistical network models.