Fast Meta-Analytic Approximations for Relational Event Models: Applications to Data Streams and Multilevel Data

,


Introduction
Social network analysis is the field that studies social structures by investigating human behavior with the goal of unveiling characteristics that affect actions and their consequences in social interactions (Scott 1988;Wasserman & Faust 1994;Brass 2022).Recent technological developments (e.g., digital communication, online databases) enable researchers to acquire richer and more extensive data about the social interactions between actors resulting in a more in-depth description of social interaction dynamics and improved predictions across various disciplines.Examples include the study of friendships (Goodreau et al. 2009), social learning in Massive Open Online Courses (Vu et al. 2015), the development of relations within teams (Leenders et al. 2016), inter-hospital patient transfers (Vu et al. 2017), analysis of microstructures in financial networks (Zappa & Vu 2021), social hierarchies (Redhead & Power 2022), the development of social relations among freshmen (Meijerink-Bosman et al. 2023), and many others.
In relational-event networks, the data consists of discrete instances of interactions among a finite set of actors in continuous time (Butts & Marcum 2017).A collection of these interactions is referred to as a sequence of "relational events".In this perspective, Butts (2008) was the first to propose the relational event model as a basic framework to model social actions.Since then, this model has been explored and expanded in multiple directions.For example, Vu et al. (2011) has proposed a model with time-varying parameters, Perry & Wolfe (2013) utilized a partial likelihood approach to model the receiver given the sender, Stadtfeld et al. (2017) and Stadtfeld & Block (2017) built upon this approach and introduced a model which can explain actors' preferences in social interactions, and Mulder & Leenders (2019) investigated the temporal evolution of networks by estimating a model in overlapping intervals.
However, these models suffer from computational issues mostly associated with large data structures, which limit our ability to learn complex social interaction dynamics from larger data acquired using new technological developments.The data array that is used to fit these models usually has a M × D × P dimensional structure, where M is the number of events, D is the number of dyads (i.e., directed pairs of actors), and P is the number of predictor variables (e.g., endogenous statistics, exogenous statistics, or interactions thereof).Already in rather common situations would this data array run into storage problems.For example, if we want to learn about social interaction dynamics among colleagues using email traffic data from the Enron company (Klimt & Yang, 2004), there are M = 32, 261 relational events (emails in this case), among N = 153 actors, which correspond to D = N (N − 1) = 23, 256 dyads.Furthermore, a typical relational event model would contain about P = 20 predictors.The dimensions of the data array would then be M × D × P = 32, 261 × 23, 256 × 20, which would contain approximately 1.5e10 elements.On a regular desktop computer, however, the maximal memory capacity would already be reached after the first M = 4, 000 events.In another paper, Brandes et al. (2009) considered four political networks where the number of countries/regions (N ) varied from 200 to 700 countries/regions and relational event sequences of 20,000 to 171,000 political events.Thus, while such data dimensions are quite common in practice, they cannot be stored in memory using standard desktop computers.Therefore relational event models cannot be fitted to such data to learn about social interaction dynamics in temporal networks.
In this paper, we provide a solution that allows relational event models to be fitted to such data formats on regular computers routinely.We shall focus on two different scenarios of relational-event data analyses.Firstly, we treat the case of streaming relational event data.A data stream consists of a set of observations that is continuously augmented by the arrival of new (batches of) data points (Ippel et al. 2019a,b).For instance, in Brandes et al. (2009), events are daily updated, whereas in Boschee et al. (2015) the relational event sequence is weekly brought up to date.These examples explicit the need for modeling relational events in a real-time fashion.Secondly, we treat the case of relational event data with hierarchical structures using multilevel relational event models.Betancourt & Girolami (2015) discuss in great length the issues associated with the fundamental structure of multilevel models and how parameter dependencies slow down model estimation.For relational event models, those problems add up to the fact that the alien form of the likelihood prevents modelers from taking full advantage of non-centered reparameterizations (Carpenter et al. 2017).
The solution that we present extends methodological ideas from the meta-analysis literature to the relational event modeling framework.Meta-analyses are used to combine the information (estimates) from multiple scientific studies using specific summary statistics (e.g., estimates and error (co)variances) which were computed from the separate studies, to obtain accurate estimates on a global level (Raudenbush & Bryk 1985;Sutton & Abrams 2001;Higgins et al. 2009;Borenstein et al. 2011).In the case of data streams where relational events are observed in an iterative manner, the batches of relational events are considered to be separate 'studies' in the meta-analysis.Thus, relational event models are estimated separately per batch, and subsequently, the separate estimates and error covariance matrices are combined into one global estimate using a fixed-effect meta-analytic model.This fixed-effect model assumes that the effects are constant across studies (e.g.Gronau et al. 2017) similar to a standard relational event model where it is assumed that the effect remains unchanged throughout the event stream.Thus, by treating subsequent batches of our relational event data stream as "independent studies", the fixed-effect meta-analytic model can properly approximate the full relational event model based on the entire sequence.In the case of multilevel data, a random-effect meta-analytic model is used to approximate the multilevel relational-event model.The multilevel model is rooted in the assumption of the underlying existence of heterogeneity among the independent networks.Hence, the random-effects model is a natural choice to analyze these data (Van Der Hofstad 2009).Moreover, a mixed-effect model can be developed by combining the fixed-and random-effects meta-analytic models.Therefore, the multilevel model can be implemented by combining a structure identical to the fixed-effects model for data streams and a random-effects model.
Simulation study were conducted to assess how these approaches are capable of considerably reducing the computational time needed for fitting multilevel models in this class and, at the same time, providing results comparable to the exact model.Moreover, we develop a Gibbs sampler algorithm for the data stream case, where the avoidance of revisiting the past allows for faster inferential updates.The code developed to carry out the estimation of these models is available on GitHub (https://github.com/TilburgNetworkGroup/remx).The methodology that is presented in this paper on the other hand is implemented in the R package 'remx'.An empirical analysis to study social interaction behavior using corporate email communications (Klimt & Yang 2004) and an empirical study of social interactions among political actors (Boschee et al. 2015) are performed to illustrate the models in action with real-world data.
Finally it is important to note that other efforts have been made to improve the speed of fitting relational event models.The most notable one is using case-control sampling to avoid using the entire data matrix (Lerner et al. 2013;Vu et al. 2015).In this methodology, besides the observed dyad for a given event, a subset of dyads is randomly drawn from the riskset that was not observed for each event.Their results indicated that such methods can result in accurate estimations.This method however has been particularly developed for the case of a single relational event sequence.It is yet unclear how to extend this sampling technique to the case of relational event data streams where new (batches of) relational events are pouring in on a regular basis or when considering multilevel relational event data.The proposed meta-analytic approximations on the other hand can straightforwardly be applied to the case of a single large relational event sequence.Furthermore, case-control sampling techniques are not readily available in existing R packages for temporal social network analysis.On the other hand, 'remx' is available as free R package.
The remainder of this text is structured as follows: Section 2 presents the relational event model framework; Section 3 briefly discusses a few of its computational challenges; Section 4 introduces the meta-analytic approximations; Section 5 contains the results of synthetic data studies; Section 6 displays applications with empirical data; and Section 7 concludes the text by briefly restating the main points.
2 Relational event modeling

Basic relational event models
The relational event model is used to analyze a temporal social network represented by an ordered sequence of events, which are characterized by time-stamped interactions among a finite set of actors (Butts 2008;Butts & Marcum 2017).
We represent each event by e = (s, r, t), where t is the time when the interaction took place, s and r are indexes representing the actors who took part in this interaction.In a directed network, we call s the sender and r the receiver.Considering a network of N actors, events are assumed to be randomly sampled from a set containing all pairs of actors that are at risk at t (without loss of generality, in this paper we will assume all pairs of actors are at risk at each point in time).possible pairs of actors at every point in time.This set is called the risk set, Thus, assuming we observe M events in a time window [0, τ ) ∈ R + , a relational event sequence is formally defined as E = {em = (sm, rm, tm) : (sm, rm) ∈ R(tm); 0 < t1 < t2, . . ., tM < τ }, where tm is the time of the m th event, and sm and rm are the actors involved in the event at time tm.The relational event model focuses on modeling the rates of interaction, λsr(t), in the social network.Butts (2008) considers the inter-event times, ∆m = tm − tm−1, as exponentially distributed and assumes λsr(t) to be constant between events, resulting in a piece-wise constant exponential model (Friedman 1982).The survival function of this model is given by Ssr(∆m) = exp{−(tm − tm−1)λsr(tm)}.Therefore, the likelihood has the following form where β is the parameter vector and X is a matrix with covariates.The rates of interaction are assumed to have a Cox regression form: λsr(t|E) = exp{x ′ sr (t)β} (Cox 1972), where xsr(t) is a vector of P predictor variables from the matrix X of the directional pair of sender s and receiver r.The predictor variables can include exogenous variables (such as actors' attributes) and endogenous statistics, which summarize the event history until time t which would be relevant to explain the social interaction between actors (such as inertia, reciprocity, in-coming shared partners, turn-taking, etc., Butts 2008;Leenders et al. 2016), as well as potential interactions between endogenous variables and exogenous variables.Note that these time-varying endogenous statistics need to be updated after each event.As social interaction behavior can be caused by complex social processes, relational event models may consist of many different potentially important predictors.For example, Karimova et al. (2023) considered a relational event model with 103 predictor variables to explain and predict the voice loops during NASA's famous (but disastrous) Apollo 13 mission to the moon.Thus, it is important to note that the number of predictor variables (which may need to be updated after every event) can easily become very large.
In case only the order in which the events occurred is known (so the exact timing tm is not observed), the likelihood of the sequence of observed dyads (sm, rm) becomes . (2) Throughout this paper, when data come from the relational event model, we shall write this as follows Depending on whether the exact timing is known, either the temporal or the ordinal REM can be used.
Finally, it is important to note that both the ordinal and the temporal relational event models can be written as specific Poisson regression models (see also [REF]).This important property facilitates the classical estimation with software such as glm in R. Details are provided in Appendix A. Moreover, for Bayesian analysis of these models, researchers can make use of the R packages relevent (Butts 2008) and remstimate (Arena et al. 2022), which are specifically designed for relational event analyses using endogenous network predictors (see also Meijerink-Bosman et al. 2021).Another option is the R package rstanarm (Goodrich et al. 2022) but then users need to specify the model by hand.These Bayesian packages support a range of algorithms, such as Hamiltonian Monte Carlo to Bayesian importance sampling.Finally note that for a Bayesian analysis, a prior distribution needs to be specified for the coefficients.A common default choice is a noninformative flat prior (see also Vieira et al. 2023).

Relational event models for data streams
As relational-event data are often collected by means of digital communication applications, it is common that relational-event histories are sequentially collected in batches over time.We denote the M ℓ events that were observed in batch ℓ according to E(ℓ), and the event sequence of all batches combined until batch ℓ as E(1, . . ., ℓ).The likelihood until event batch ℓ can be written as a function of the likelihood of the data before batch ℓ combined with the likelihood of the new batch, i.e., for ℓ = 2, 3, . .., where the likelihood of event batch ℓ is equal to either (1) or (2) for the temporal REM or the ordinal REM, respectively, by substituting E(ℓ) for E. The goal is then to obtain the fitted model for all batches E(1, . . ., ℓ) by updating the fitted model based on E(1, . . ., ℓ − 1) from the previous step with the newly observed batch E(ℓ).
This would allow researchers to update the model on the fly as new batches of relational events are pouring in.
Even though it would be possible to set the batch size M ℓ equal to the number of events that are recorded every time (e.g., all events that are observed per day as in the data from Brandes et al. 2009), it may be preferred to consider somewhat larger batches.There are two reasons for this.First, it would then need to update the model relatively often even though the estimates would remain largely unchanged.Second, when using the meta-analytic approximation, as will be discussed later, the events in one batch serve as one 'study' from a meta-analytic perspective.
From the meta-analytic literature is known that it is recommended to include studies of sufficient sample size such that the obtained estimates are reliable.For this reason, we will explore the influence of the batch size later in this paper in more detail.
As there are currently no R packages available specifically designed for relational event data streams, social network researchers need to rely on the same software as for a single relational event sequence, as was discussed in the previous section.Thus, every time a new batch of relational events is observed, the new events need to be combined with the entire sequence until that point, and the entire (updated) sequence can then be used for the estimation of the model.

Multilevel relational event models
Multilevel relational event models can be used to study the variation of network effects across different mutually independent clusters (DuBois et al. 2013).We will refer to each relational event history in this setting as an "event cluster."The likelihood of the K independent event clusters is defined as the probability of the events conditional on the cluster-specific effects, β k , and the common effects across clustered, denoted by ψ, i.e., where E = {E1, E2, . . ., EK }, with E k the events in cluster k, and M k , R k (t) and τ k are the number of events, the risk set and the end of the observation period for cluster k, for k = 1, . . ., K, respectively, and the rate parameter of events from actor s towards actor r can be written as λsr where z k,sr (t) is the vector of covariates of the common effects across clusters (e.g., see Vieira et al. 2023).In this context, it is assumed that the effects β k in every cluster are independent and identically distributed across clusters, following a normal distribution: where µ is the mean effect in the population and Σ is the covariance matrix.Then, β k contains the random effects, which are cluster-specific and sampled from its own population distribution of event cluster effects, similar as in a standard multilevel model (Gelman 2006a).It is common to write the multilevel model as Level 1: Level 2: where either the temporal or the ordinal model can be used on the first level depending on whether the exact timing is known.The main advantage of the multilevel structure is the automatic borrowing of information about network effects across event clusters.For instance, if an event cluster has a small sample size and thus there is little information to estimate the cluster-specific effect accurately, the estimate will be pooled towards the grand mean across all relational event clusters.
To fit multilevel relational event models, we can again make use of the Poisson regression formation of the model and thus use lme4 (Bates et al. 2014) for model fitting.Another possibility would be to use rstanarm for a Bayesian analysis.The computational burden of this method however is immense.For this reason Vieira et al. (2022) considered only a subset of 15 classrooms of relational event sequences rather than the entire data set consisting of all 153 classrooms, which was not computationally feasible.

Computational challenges
Currently, two crucial bottlenecks limit the applicability of REMs for routine use on personal computers to study temporal social networks: memory storage and computational complexity.Both aspects are discussed in this section.

Memory storage
The risk set for event m is the set that comprises all possible pairs of actors for which it would be possible to observe an event at every point time m.In the most general case, with a network of N actors, the size of the risk set is equal to N (N − 1) in the case of directional relational events (and if all dyads are at risk).For instance, Brandes et al.
(  2013) fit a relational event model to a data set with 168 actors belonging to ethnic groups or international organizations and 217,000 hostile/cooperative events.
The usual implementation of models with time-variant network statistics results in data objects that can easily become too large to be stored in the working memory of standard computers.This becomes even more problematic when a researcher has access to relational event data streams which potentially grow indefinitely.In data streams, the data size keeps growing over time intensifying the problem: as more data pours in, the data matrix grows larger and larger.For instance, the event data between countries and regions (Schrodt et al. 1994), which was considered by Brandes et al. (2009), are updated on a regular (e.g., daily) basis.Eventually, it becomes infeasible to store all predictor variables for all dyads for all events in memory in order to fit relational event models to the entire event sequence.
In multilevel data, multiple networks are often analyzed in one step, which, again, requires storing many large data objects in memory at the same time.In this case, assuming one wants to estimate P effects across K networks consisting of N k actors each, we would need to store K k=1 M k × N k (N k − 1) × P elements in working memory.In practice, where relational event data are collected using digital technologies, the memory would be drained on most personal computers and relational event model fitting would become unfeasible.

Computational complexity
For single relational event histories, the number of computations that are necessary for the survival part of the likelihood in equation ( 1) and in the denominator in equation ( 2) grows approximately with the square of the number of actors.As a consequence, for moderately large networks, an enormous number of operations would be needed to compute the likelihood function.Therefore, if a researcher keeps observing social network interactions between the actors, this computational task easily becomes computationally infeasible.Moreover, given the software in R that is currently available, the relational event model would need to be estimated in one step based on the entire event history, and therefore the model would need to be re-estimated by updating the entire event sequence after every newly observed event (or batch), which would be quite problematic from a computational point of view.
For multilevel relational event data, computing the survival part of the likelihood function already requires K k=1 N k (N k − 1) operations (see equation 3).Moreover, the cluster-specific effects are also tied together in a multidimensional structure represented by the population distribution.Hence, fitting this model would require the estimation of a large number of parameters.Betancourt & Girolami (2015) discussed the computational challenges of this type of model, particularly with respect to the dependency among the parameters.Small changes in µ and Σ result in drastic alterations in the population distribution, which makes model estimation a very complicated task.
Hence, given the number of operations needed to compute the likelihood function and issues related to dependencies among parameters, fitting the relational event models previously described is a tedious and computationally expensive process.This is especially the case because most social network researchers do not have access to supercomputers but rely on their personal computers to perform their analyses, typically using R.Moreover, researchers who want to analyze large relational event data using a Bayesian perspective have even larger computational hurdles to overcome since the algorithms typically used for Bayesian analyses rely on iterative methods (e.g.Markov chain Monte Carlo methods) which usually require computing the likelihood a very large number of times.

Meta-analytic approximations
In this Section, we present meta-analytic approximations to fit relational event models.These approximations can be used to fit models to relational event data streams, as well as a single large event sequence, and to multilevel relational event data.We will present these methods both from frequentist and Bayesian perspectives, showing that our approximation approach works for either modeling choice.Firstly, meta-analytic approximations are presented to estimate a relational event model for event streams in batches as described in Section 2.2.Secondly, we discuss meta-analytic approximations for multilevel relational event models discussed in Section 2.3.

Frequentist meta-analytic approximation
Following the terminology of a meta-analysis, the observed relational events in the ℓ-th batch can be viewed as a 'study'.A relational event model can then be fitted to this ℓ-th study resulting in a study specific estimate, β(ℓ), with error covariance matrix Ω(ℓ).Subsequently, the independent estimates are considered to be pseudo-data which can be pooled together in a fixed-effect meta-analytic model, i.e., β(ℓ) ∼ N (β, Ω(ℓ)). (6) In the meta-analysis literature, this setup is called a fixed effects meta-analytic model.Further, note that the normal approximation follows from large sample theory.Now we consider the situation where ℓ batches were observed resulting in a pooled estimate which is denoted by β(ℓ) and a multivariate Gaussian error covariance matrix Ω(ℓ).
Next, the (ℓ + 1)-th batch is observed with approximate likelihood (6), and thus, following multivariate Gaussian theory, the updated estimate and error covariance matrix are given by for ℓ = 2, 3, 4, . .., where β(1) = β(1) and Ω(1) = Ω(1).These formulas allow the updating of the estimates of the relational event coefficients and their uncertainty after every newly observed batch.It is easy to see that the mean vector and covariance matrix based on the first ℓ + 1 batches can also be written in the following non-iterative forms: The steps to update the relational event model in the case of observing batches of events in a streaming setup are summarized in Algorithm 1.We developed code for Algorithm 1, which is available on the remx package.

Bayesian meta-analytic approximation
Here we discuss a Bayesian meta-analytic implementation of the relational event model when observing streams of event batches over time.The Bayesian method requires the assignment of a prior distribution, denoted by p(β).The posterior of the vector of parameters based on the ℓ-th batch can be obtained using Bayes theorem, i.e., Samples from the posterior distribution can be obtained via simulation techniques (e.g., through Markov chain Monte Carlo).The Bayesian approach to relational event analysis suffers from the same issues as the frequentist approach but it has the added disadvantage of being considerably slower (refer to that section where you discuss this).
We propose the following approximation Bayesian solution.We shall use multivariate normal prior for the coefficients, denoted by β ∼ N (µ0, Σ0).As a default setup, a noninformative flat prior can be used which can be constructed using a diagonal covariance matrix with huge diagonal elements together with zero means.In this case, the posterior would be completely determined by the likelihood.Furthermore, following Bayesian large sample theory, Gaussian approximations are again used to approximate the posterior based on every batch.The posterior based on the first batch can then be written as where N (β; β, Ω) denotes a Gaussian distribution for β with mean vector β and covariance matrix Ω.The concept of Bayesian updating, where the current posterior serves as the prior when observing new data, can directly be applied to the setting of relational event data streams.Mathematically, the update can be written as follows: ≈ N (β; β(ℓ + 1), Ω(ℓ + 1)), for ℓ = 1, 2, 3, . .., where the formulas for the mean and covariance matrix can be found in Algorithm 1. Again the noniterative formulas for the posterior mean and covariance matrix can then be written as It is interesting to note that the resulting meta-analytic approximations based on a Bayesian approach are equivalent to the classical counterpart (Section 4.1.1)when a noninformative prior would be used in the Bayesian approach.

Multilevel relational event data
Let us now turn to the case where K independent networks have been observed.The following methods can be applied to perform approximate multilevel analyses of relational event history data.Unlike the fixed effects meta-analytic method for the streaming data scenario, for multilevel relational event data, we propose a random effects metaanalytic method which allows the variability of the coefficients across event clusters and the borrowing of information across clusters, in the same spirit as in ordinary multilevel models.

Classical multilevel meta-analytic approximation
Following the terminology of a meta-analysis, the k-th event cluster now serves as the k-th 'study'.Similar to metaanalyses, a relational event model is first fitted to each event cluster, and large sample theory is used to obtain a multivariate normal approximation of the effects in event cluster k.Subsequently, the meta-analytic mixed-effects approximation of the multilevel relational event model in ( 5) can be written as where βk is the maximum-likelihood estimate of the coefficients in cluster k and Ωk is the error covariance matrix of the coefficients in cluster k.Thus, the parameters to be estimated are δ k , µ β , Σ, where µ β is the vector of fixed effects (which are common across all clusters), δ k are the random-effects for group k (which quantify the cluster-specific deviations from the fixed effects), and Σ is the random-effect covariance matrix which quantifies the (co)variability of cluster-specific deviations across clusters.The estimators for the random-effect parameters are given by where δk , μβ , and Σ are the multilevel estimators.Then, an optimization method, such as the Newton-Raphson algorithm, can be used to obtain those estimates (Ypma 1995).An important property of δk concerns the shrinkage effect, which pulls the independent estimate towards the grand mean µ β in the case of considerable uncertainty about the group-specific estimate in the error covariance matrix.Therefore, clusters can borrow strength from other event clusters to obtain a shrunken estimate in the multilevel step.This is similar to the James-Stein estimator, which has been proven to be more efficient than the independent maximum-likelihood estimator when the data are nested in event clusters (Efron & Morris 1977;James & Stein 1992).This model can be fitted with the R packages metafor (Viechtbauer 2010) and mixmeta (Sera et al. 2019).

Bayesian multilevel meta-analytic approximation
The model described in the previous subsection is sufficient when it is realistic to assume that all coefficients vary across clusters.However, there are cases in multilevel relational event analyses where some coefficients are not heterogeneous across clusters but in fact, are constant across clusters (Vieira et al. 2022).To accommodate that, the complete heterogeneous random effects meta-analytic model in equation ( 12) can be generalized to A Bayesian MCMC algorithm is proposed for fitting this mixed-effects meta-analytic model.A uniform prior is specified for the joint distribution of ψ and µ β , i.e., p(ψ, µ β ) ∝ 1. Next, to sample δ k , we have that Hence, the conditional distribution of δ k is given by Finally, following Huang & Wand (2013), we define a matrix Half-t prior for the covariance matrix Σ.This prior is specifically designed for random effects covariance matrices and is preferred over the Inverse-Wishart, because the Inverse-Wishart induces a degree of informativeness that affects posterior inferences (Gelman 2006b).The matrix Half-t on the other hand, induces Half-t priors on the standard deviations in the diagonal of the covariance matrix.
In addition, for a specific hyperparameter choice, it results in uniform priors between (−1, 1) for the correlations between the random effects.Thus, this distribution enables non-informative priors for all standard deviations and correlation parameters.This prior is given by In the above expression, diag 1/α1, 1/α2, . . ., 1/αP represents a diagonal matrix with the values (1/α1, 1/α2, . . ., 1/αP ).Furthermore, positive real values are set for η ∈ R + and di ∈ R + , for i = 1, 2, . . ., P .Setting η = 2 leads to U(−1, 1) over all correlation parameters.Moreover, as in Gelman (2006b), large values of di result in vague prior distributions on the standard deviations.An advantage of this approach is that the Bayesian multilevel model is less prone to the degeneracy issues that have plagued the classic multilevel model, such as Σ being non-positive definite on the boundary of the parameter space (Chung et al. 2015).Algorithm 2 details the steps to estimate this model.
The MCMC algorithm is implemented in the new R package remx.

Synthetic data studies
In this section, we examine the performance of the approximation algorithms.Our objective is to investigate (i) the accuracy of the estimates as a function of the batch or cluster size (in comparison to the exact models), (ii) the computation time as a function of the sample size, and (iii) the efficiency of the multilevel estimators.

Data streams
We generated a relational event history with N = 25 actors, M = 5000 events, and P = 14 effects: 10 statistics In this experiment, we emulated the desired streaming effect by gradually increasing the number of events in the network.However, the same framework can be applied to large networks.The increments were made in batches of 30, 50, 100, 150, 200, 300, and 500 events until the maximum of 5000 was reached.The objective was to compare the results from the meta-analytic approximations discussed in Subsection 4.1 with the exact model.Therefore, every new batch would augment the sequence of relational events constituting a new partition for the meta-analytic approximation and an entirely new sequence for the exact model.For model fitting, we used the R packages remstimate for the exact model (Arena et al. 2022), metafor for the frequentist meta-analytic approximation (Viechtbauer 2010), and rstan for both the exact and approximate Bayesian method (Carpenter et al. 2017).

Parameter recovery:
The estimation process starts with two batches, for all batch sizes.Figure 1 shows the comparison between models for increments of 50, 200, and 500 events.The red, blue, and black lines represent, respectively, the exact model, the frequentist approximation, and the Bayesian approximation.For inertia and reciprocity, the size of the batch makes little difference and the estimates of the approximations are always close to the exact model.The same behavior is observed for all the statistics used in this simulation and is therefore omitted to keep the presentation of the results as concise as possible.The intercept, however, seems to absorb the bias when the size of the batch is small, (e.g.50 events).The estimates of the meta-analytic approximations move in the direction of the exact model as the size of partitions increases.
Computational time: Finally, Figure 2 shows the running times for each model.The exact model (dotted red line) is the only one that displays an upward linear trend as the number of events grows, due to the fact that the whole model needs to be re-estimated as new events are observed.Whereas, the meta-analytic approximation requires only an estimate for the new batch and then the pooling in the fixed-effect model.Thus, this reinforces the idea that eventually it will be infeasible to fit the exact model as the sequence grows larger and larger.

Multilevel models
In this experiment, K = 30 independent networks were generated.The number of events was gradually increased in each network from M k = 50 up to M k = 5000, for k = 1, . . ., K.Then, P = 6 effects (inertia, reciprocity, outgoing two paths, incoming two paths, psABXA, psABXB) plus an intercept were fitted to those event clusters using the multilevel models described in Subsection 4.2.The goal was to check whether the approximations performed in a similar manner to the exact models.The Bayesian exact model and approximation were fit using rstan Carpenter et al. (2017).The exact frequentist model was fit using lme4 (Bates et al. 2014), writing it as a Poisson regression (see Section 2).The frequentist approximation was fit with metafor (Viechtbauer 2010).We evaluate the performance of these models along four different criteria: recovery of mean effects, the strength of shrinkage, the efficiency of the estimator, and their computational time.

Recovery of mean effects:
We compare the estimators of the four models to see how well they recover the true mean effect that generated our data sets.Figure 3 shows the recovery for three covariates across different sample sizes.There is a difference between the estimators for the approximation and the exact models for small sample sizes.
However, they clearly converge as the number of events grows, going in the direction of the true mean effect.
Estimation of random effects: Shrinkage is defined as the difference between cluster-specific effects in the independent estimates (e.g.maximum likelihood estimates) and the estimates from the multilevel model.Thus, as the number of events (or the amount of information) in each cluster increases, we expect to see the shrinkage gradually decreasing towards zero.Figure 4 displays the results for the four models in three different covariates.As expected, the approximations produce a very similar degree of shrinkage as the exact models.For the efficiency of the estimator, we use the mean squared error (MSE) as a measure of the efficiency of the cluster-specific effect estimator.If β is an estimator for β, then the MSE is defined as . Thus, if an estimator is unbiased, the MSE reduces its variance.Therefore, when comparing two estimators, the one with a smaller MSE is more efficient.Figure 5 shows the comparison between the MSE of the multilevel estimator for the four models and the independent estimator (MLE).It is clear that the efficiency of the estimator in the approximation is comparable to the exact model.Besides, we get some evidence of the advantage of the multilevel approach.The MSE of the multilevel estimator dominates the independent estimator (MLE) in all cases for small-to moderate-sized samples.
In some cases (e.g.reciprocity), even when we have as many as 2000 events per network, there is still a considerable difference between the efficiency of the multilevel and the independent estimator.

Running time:
Figure 6 shows the comparison between running times for the four models across different sample sizes.The exact model takes longer in every single case.The exact Bayesian model, for example, displays a seemingly exponential growth in the running time.For 1000 events: the exact Bayesian model took 3 days and 4 hours to run; the exact frequentist model took 5 hours and 41 minutes; the Bayesian approximation took 31 seconds; and the frequentist approximation took 16 seconds.Thus, for large multilevel data, it does not seem feasible to run the exact model.Especially in the case when multiple models need to be compared, before deciding on a final model.

Empirical applications
In this Section, all network statistics were computed using remstats (Meijerink-Bosman et al. 2021).The Bayesian approximations and the Bayesian exact models were estimated using stan (Carpenter et al. 2017).The frequentist approximations were fitted using metafor (Viechtbauer 2010) and mixmeta (Sera et al. 2019).The frequentist exact models were estimated with remstimate (Arena et al. 2022) and lme4 (Bates et al. 2014).Besides, the experiments were run on a laptop with an Intel(R) Core(TM) i7-8665U processor with 4 cores and 16 GB RAM.

Data streams
In this experiment, we use the data set of email communications among employees of the defunct company Enron (Klimt & Yang 2004).Following Perry & Wolfe (2013), only events with 5 receivers or less were included in the analysis.Those events with multiple receivers were broken up into dyadic interactions.The network that is analyzed consists of 153 actors and contains 32,261 events.The actors are divided across the 'legal' department (25), the 'trading' department (60), and 'other' departments (68).They are almost evenly spread between Junior (79) and Senior (74) levels, and the majority are male (112).
We emulate the streaming effect by starting with a small portion of the sequence and then gradually increase the number of events in batches.We start with 2000 events and augment the sequence in batches of 1000.The objective of this experiment is to illustrate the computational challenges of the exact model and to develop insights into to what extent the approximation models solve these issues.The model contains 20 covariates (inertia, reciprocity, psABAY, psABBY, psABXA, psABXB, out-degree sender, out-degree receiver, in-degree sender, in-degree receiver, outgoing two path, incoming two path, outgoing shared partner, incoming shared partners, recency rank receiver, same gender, same seniority, same department, same gender and difference department, different seniority and different department) plus an intercept.Thus, the full data matrix is a 3-dimensional object with dimensions 32, 261 × 23, 256 × 21.We will evaluate the memory usage of both models (exact and approximation) and compare parameter estimates.
Memory usage: Figure 7 shows the differences in memory usage between the exact model and the approximation.
The exact model displays a linear increase in the amount of memory used, whereas the approximation consumes about the same amount of memory overall.The red dot in the plot shows the memory limit of the laptop we used.For the exact model, after the second batch (3, 000 events or less than 10% of the total sequence), the R console issued the message "Error: cannot allocate vector of size 14.6 Gb".Hence, the shaded area in figure 7, represents a region where no resources are available to fit the exact model in the machine that we used to conduct our research.The approximation, however, fits in memory without problem.
Parameter estimates: Figure 8 shows the evolution of the estimation along the event stream.Both models, Bayesian (black line) and frequentist (blue line), present virtually the same estimates to the point where it becomes impossible to distinguish them in the plot.One interesting aspect of Figure 8 is that for some effects, the behavior seen in the simulation study in Section 5 is observed.The estimated effect starts a bit wiggly, and then it gets approximately constant, indicating some stability for that effect in the network.However, for some effects, the constant behavior is never observed, which suggests that those effects might be changing over time.Thus, this model also can also capture temporal variations in the effects that would simply be lost when the constant relational-event model is used.Table 1 shows parameters estimates, 95% intervals for both models, and the widths of the intervals.
These estimates are final.They contain information from all 32,261 events.Most estimates are identical, as already shown in the plots.Differences are seen only from the third decimal point.Finally, statistical significance is very similar between the models.

Multilevel data
To illustrate the multilevel models, we use the data set from Harvard dataverse Integrated Crisis Early Warning System (ICEWS) (Boschee et al. 2015).These data consist of interactions between political actors collected from news articles.Here we consider the events in the year 2021.This data set consists of 147 countries where India has the largest sample with 31,513 events and Samoa has the smallest sample with just 62 events.Figure 9 shows that the distribution of events across those networks is highly skewed.
We fit a random effects model with 15 covariates (inertia, reciprocity, psABAY, psABBY, psABXY, psABXA, psABXB, out-degree sender, out-degree receiver, in-degree sender, in-degree receiver, outgoing shared partners, incoming shared partners, recency rank sender and recency rank receiver) plus an intercept to these data.At first, an experiment is designed where the size of the data set is gradually increased.We varied number of networks, actors and covariates.Finally, the shrinkage effect and the parameter estimates for the full data set are analyzed.
Data set size: Table 2 shows the results for simultaneously varying K, N and P in our model.Each model was left running for 12 hours to check if it can be fit within a reasonable amount of time.The (red crosses) green marks mean that the model (did not) run in time.The results show that the exact Bayesian model does not even run in time for a relatively small data set with K = 50, N = 15 and P = 6.The exact frequentist model fails to run within 12 hours for a moderately large data set with K = 100, N = 25 and P = 11.These results clearly show that for large multilevel data, it is computationally infeasible to run the exact model, regardless of whether the model is Bayesian or frequentist.The approximations, however, run on a fraction of the time assigned to all models.For K = 147, N = 30 and P = 16, the Bayesian approximation runs in approximately 1 hour, whereas the frequentist approximation runs in about 40 minutes.
Shrinkage effect: Shrinkage is the difference between the independent estimate (MLE) for the cluster-specific effect and the estimate provided by the multilevel model.The estimates presented were obtained using the complete data set and the largest model, K = 147, N = 30 and P = 16.Figure 10 shows shrinkage results of the cluster-specific effect for a few covariates for both approximations.They exhibit very similar behaviors, with smaller sample sizes showing larger shrinkage.The green dot represents Samoa, the blue dot India and the red line is zero, meaning that points lying on top of the line display no shrinkage.As expected the dot representing India, the largest network, always lies on top of the red line.
Parameter estimates: Table 3 displays estimates for the random effects means and variances for the Bayesian and frequentist approximations.The estimates were obtained for the largest setting K = 147, N = 30, and P = 16.
As expected, the estimates from both models are very similar, both in terms of direction (positive or negative) and size.There is a slight difference in the intervals, which is due to the differences in model specification (since the Bayesian model has a Student's t likelihood for the random effects).The participation shifts have all negative effects, which means that, on average they decrease the rates of communication.The largest positive effect is the recency rank sender, µ rrankSend = 1.65, which means that being the last one to send and event makes an actor more likely to send the next event, on average.Finally, most variance parameters are small, indicating low random effect variability.

Discussion and conclusion
The analysis of relational event history data using relational event models has been plagued by computational issues due to memory storage limitations and computational complexity.This is, in particular, the case of a stream of relational events, multilevel (or clustered) relational event history data, but also in the case of large relational event history sequences.In this paper, we introduced modeling approximations for relational event data streams and multilevel relational event data.The proposed approximations are based on statistical techniques that we borrow from the meta-analytic literature.In the case of data streams, newly observed batches of events are treated as new 'studies'.In the case of multilevel relational event data, each independent relational event sequence is treated as a 'study'.We then use meta-analysis to combine these 'studies' to produce inferences.The relational event model for data streams is based on the assumption of constant effects which we aim to estimate as new batches of relational events are pouring in.Thus a fixed-effect meta-analytic model is employed for relational event data streams.Both classical and Bayesian methods were proposed for this purpose.The multilevel relational event model is based on the assumption of the existence of underlying heterogeneity among the independent networks, which makes a randomeffect meta-analytic model the ideal approach.When all coefficients are assumed to be different across clusters, a classical random effects meta-analytic approximation was proposed.When certain coefficients are assumed to constant across clusters a Bayesian meta-analytic approximation was proposed using noninformative priors which can be used in a routine fashion.The goal of our approximations is to make the estimation of relation-event models feasible on standard desktop computers, which is currently not possible for the empirical relational event data that were considered in this paper.The algorithms developed have been implemented in the R-package remx which is publicly available.
For the data stream case, the network grows larger over time, and the interest is to update the inferences as new batches of events are observed.We provided a framework for these updates that rely only on newly observed events and that do not require doing any (re-)computations on the previously observed events.By avoiding the need to revisit past data points, the framework allows tremendously faster updating of the model and, at the same time, does not overload computer memory.We have shown that this approach approximates the relational-event model very well for network effects, but it seems to need large batches of events in order to properly approximate the model intercept.In addition, this model is also able to capture time variations in the covariate effects that would be lost when using the constant effect relational-event model.
For the multilevel case, the number of networks and/or their sizes become too large to fit an exact multilevel model in a reasonable time, or the data no longer fit within the practical memory constraints of many computers.
We alleviate these issues by using independent point estimates and uncertainty measures coming from each network separately as pseudo-data points.Then, we fit an approximated multilevel model that runs in a fraction of the time needed to fit a full multilevel relational event model.We showed that this simpler model behaves similarly to the exact model in terms of properties of the estimators, parameter recovery, and shrinkage behavior.
approximation can also be used for fitting relational event models to very large relational event sequences of, say, millions of events.Such large data sets are commonly observed in practice, but at the same time very problematic to analyze using the currently available (naive) approaches for relational event models.To use the proposed methodology, the big relational event sequence should be divided into batches, which are again treated as different 'studies', similar to the case of data streams.Next a relational event model is fit to the separate batches, and the resulting fitted models are combined using the proposed meta-analytic approximation method.For future research, it would be interesting to see how this method compares to other previous to handle big relational event sequences using case-control sampling (Lerner et al. 2013;Vu et al. 2015).This comparison falls outside the scope of the current paper.
Finally, other interesting future research directions include extending the proposed fast approximate methods to incorporate time-varying effects (Vu et al. 2011;Mulder & Leenders 2019).Moreover, the proposed meta-analysis approaches can be extended to actor-oriented relational event models (Stadtfeld et al. 2017;Vieira et al. 2022), which are helpful in separately modeling the behavioral choices on social interactions of senders and receivers.
For piece-wise constant exponential survival distributions, survival data can be modeled as Poisson regressions (Holford 1980;Laird & Olivier 1981).This also applies to relational event data, facilitating the fitting of relational event models by means of available computer software for generalized (mixed-)linear models.Thus, we write the REM as a Poisson regression as follows Proof.Assuming M events are observed and the risk set contains D dyads, where ys m rm = 1 if dyad (s, r) is observed and zero otherwise.As a result, the factorial term in the Poisson likelihood will always be equal to one, since 1! = 1.
Hence, by adding the inter-event time as offset in the event rate, the relational event model can be written as a Poisson regression model.An advantage of writing the relational event model as a Poisson regression is that we can handle relational event data for which the exact order of several events in shorter periods is unavailable.For example, relational event data between countries stored by digital news media are sometimes stored by only providing the events that occurred that day without reporting the exact timing or order within that day (Brandes et al. 2009).
Using a Poisson regression formulation, the observed dyads within each day are set to ysr = 1 while the other dyads are ysr = 0, so we do not need to use arbitrary event times that would cause bias.
In the multilevel setting, we multiply the likelihood of the independent event clusters by including a cluster-specific indicator, where M k and R k represent, respectively, the number of events and the risk set of cluster k.

B Estimates for the multilevel meta-analysis model
The model is defined as The parameters to estimate are µ β , δ k and Σ.Then, assuming we have a model with K networks and p covariates, the likelihood function is given by We need to take the logarithm and derive this function with respect to the parameters of interest in order to find the maximum likelihood estimates.
For δ k : For µ β : For Σ: C Meta-analytic approximation for multilevel relational event data Assuming with the maximum likelihood we are sampling observations from the distribution since θ k is the parameter vector in the sequence Where θk is the estimated vector of parameters for sequence k, and θ k is the true vector of parameters for sequence

Σ,Figure 2 :
Figure 2: Comparison of running times for fixed-effect model.Sequences were incremented with batches of 50, 200 and 500 events.

Figure 5 :
Figure 5: Comparison of mean squared error among multilevel estimator and independent estimator.

Figure 6 :Figure 7 :
Figure 6: Comparison of running time for the four multilevel models.

Figure 9 :
Figure 9: Distribution of events in the networks of the ICEWS data set.
2009) uses a data set with 202 political actors and 304,000 aggressive/cooperative events.They specified a model with 17 covariates.The risk set for this model has 202 × (202 − 1) = 40, 602 possible dyads, so the data matrix is a 3-dimensional structure with dimensions 304,000 × 40,602 × 17.Perry & Wolfe (2013) use a network of corporate email communications with 156 actors, 21,635 events, and 30 covariates.The size of the risk set is 24,180, resulting in a data matrix with dimensions 21635 × 24,180 × 30.Lerner et al. (

Table 1 :
Comparison of the Bayesian frequentist approximations fitted to the Enron data set.These results are for the last batch, resulting on information from all the 32261 events.Numbers out of parentheses are point estimates (maximum likelihood estimate and posterior mean) and the numbers in the parentheses are 95% intervals.