Compound Poisson Models for Weighted Networks with Applications in Finance

We develop a modelling framework for estimating and predicting weighted network data. The edge weights in weighted networks often arise from aggregating some individual relationships between the nodes. Motivated by this, we introduce a modelling framework for weighted networks based on the compound Poisson distribution. To allow for heterogeneity between the nodes, we use a regression approach for the model parameters. We test the new modelling framework on two types of financial networks: a network of financial institutions in which the edge weights represent exposures from trading Credit Default Swaps and a network of countries in which the edge weights represent cross-border lending. The compound Poisson Gamma distributions with regression fit the data well in both situations. We illustrate how this modelling framework can be used for predicting unobserved edges and their weights in an only partially observed network. This is for example relevant for assessing systemic risk in financial networks.


Introduction
We provide a modelling framework that can be used to estimate and predict weighted network data.
The edge weights in weighted networks often arise from aggregating some individual relationships between the nodes. For example, they can represent trades between financial institutions in trading networks, see e.g. Gandy & Veraart (2019) for a network of financial exposures arising from trading financial derivatives, or they can represent the supply of goods or services between different sectors in the economy modelled as an input-output network, see e.g. Acemoglu et al. (2012). Other applications arise for example in transport networks where the weights can represent the number of passengers travelling, see e.g. Barrat et al. (2004), or in networks representing co-authorship in scientific publications, see also Barrat et al. (2004), where the weights are a measure that accounts for the number of joint papers written in co-author networks. Motivated by this, we introduce a modelling framework for weighted directed networks based on the compound Poisson distribution.
We are interested in these weights and not just the topology of the underlying network, because in many applications the weights are fundamental for the behaviour of processes that can be observed on these networks. For example, in the 2007-2008 financial crisis, the interconnections between the financial institutions served as transmission channels for stress and losses that led to significant feedback and amplification mechanisms with severe consequences for the real economy. The magnitude of these losses is fundamentally linked to the weights of the edges in the network. This is clear from many studies on systemic risk in financial networks such as models looking at solvency contagion (Eisenberg & Noe, 2001;Rogers & Veraart, 2013), contagion caused by marking-to-market effects (Veraart, 2020+), fire sales (Cifuentes et al., 2005;Greenwood et al., 2015;Capponi & Larsson, 2015;Cont & Wagalath, 2016;Cont & Schaanning, 2017) or liquidity contagion (Lee, 2013); see also Glasserman & Young (2016); Capponi (2016)  This is why compound Poisson based models seem a natural choice. The networks would be resulting from a random number of individual items, that are themselves random.
Another feature of weighted networks is that they are heterogeneous. Financial networks are a prime example. Some nodes are strongly connected with a large number of trading partners, whereas others only trade with a small number of counterparties. In transport networks, we see similar effects.
E.g., if the nodes are the cities and the weights are available seats on non-stop flights between two cities per day as in Barrat et al. (2004), these networks are strongly heterogeneous.
We take account of this heterogeneity by allowing the nodes in the network to have individual characteristics, which we call fitness, with the interpretation that a larger fitness leads to a larger number of edges.
We model these fitness parameters using a regression framework (Section 2). In particular, we model some characteristics of the compound Poisson Gamma distribution (such as its mean, which represents the mean weight between two nodes in the network) as a suitable function of a fitness parameter that is associated with every node. By doing that, both the existence of an edge and also its weight is influenced by the fitness parameters associated with the nodes in between which the edge is formed. This enables us to reproduce several stylised facts of financial networks.
We apply the new model class to two different types of financial network data (Section 3): First, we consider networks that describe exposures based on a special type of financial derivative (Credit Default Swaps). Second, we consider networks that describe international lending relationships between financial institutions. We fit some models of our new model class to the empirical financial network data and find in general that they fit the data well. In particular, we find that in most cases the compound Poisson models that model both the expectation of the Poisson random variable and the expectation of the Gamma distribution via separate regression models perform best.
As an application, we show how the modelling framework can be used to predict unobserved parts of a larger network. For that, we take the empirical networks as given and assume that a subset of the edges is no longer observable. We fit several models from our framework to the observable part of the network and use the results to predict the unobserved edges. For the Credit Default Swap data we find that a model which only uses one regression for the mean of the Poisson distribution performs best.
The Credit Default Swap data exhibit a rather traditional monotonic relationship between strengths and degrees in the network. For the international lending network the relationship between strengths and degrees is no-longer monotonic. In this case we find a clear advantages of using a model with both a regression for the mean of the Poisson distribution and a separate regression for the mean of the Gamma distribution. This type of analysis, namely predicting unobserved parts of a network, could be incorporated into a macro-prudential stress test for assessing systemic risk in partially observed financial networks.

Related literature
Network models have been developed for a wide range of applications, for example in biology, information science and economics. The seminal model by Erdős & Rényi (1959) (henceforth ER) considers a network of n nodes and assumes that every pair of nodes is connected with probability p ∈ [0, 1].
To account for properties of empirical networks, a wide selection of models has been suggested, see Albert & Barabási (2002);Newman (2003Newman ( , 2010 for overviews. The existing literature that analyses financial network data mainly focuses on the corresponding adjacency matrix or on the degree distribution. Financial network data have been studied for various countries, e.g. Austria (Boss et al., 2004), Brazil (Cont et al., 2010), Germany (Upper & Worms, 2004), Italy (Iori et al., 2008), Mexico (Martinez-Jaramillo et al., 2014, the Netherlands (in 't Veld & van Lelyveld, 2014) and the UK (Wells, 2004). Papers that do consider the weights of the network usually focus on the tail of the weights and find heavy tails; for example, Boss et al. (2004) fit a power law to the tails of the weights in a liabilities network and Cont et al. (2010) fit a Pareto law to the tails of the exposures in an interbank market. The focus on adjacency matrices and degree distributions is also evident in the literature on core-periphery financial networks (Craig & Von Peter, 2014;in 't Veld & van Lelyveld, 2014;Fricke & Lux, 2015), as well as in the literature on reconstructing financial networks from partial information; see Gandy & Veraart (2017) for an overview of network reconstruction methods in finance and a proposal for modelling the weights conditional on link existence.
A huge variety of fitness models for financial networks has been considered in the literature, see e.g. Jacobs & Clauset (2014) for some classification of network models which also contains the main ideas underlying what we refer to as fitness models. Fitnesses are also sometimes referred to as sociability parameter (Caron & Fox, 2017), or capacities (Norros & Reittu, 2006) and see also Gandy & Veraart (2017) for some literature review. The statistics literature considers these fitness models in the context of graphons which are functions in two variables (fitnesses) determining the link existence probabilities between any two nodes; they are the defining objects for exchangeable random graphs, see e.g. Lovász (2012); Orbanz & Roy (2015); Wolfe & Olhede (2013).
The majority of fitness models use the fitnesses only to model the existence of the edges in a network but not its weights. To the best of our knowledge, Gandy & Veraart (2017) is the only model that uses a fitness approach to model both the existence and the weight of an edge in a (financial) network. This is also what we suggest in this paper. In contrast to the model considered in Gandy & Veraart (2017) we can allow for a wider class of models for the weights of the distribution of the edges. This is because in the present paper we fit a network model to observed network data and do not try to reconstruct a network from observed aggregates of the network. The statistical inference for the former problem seems to be more easily tractable than for the latter which allows us to consider a wider class of probability distributions for the financial network.
Compound Poisson models for networks have been considered before but in a slightly different context. For example, Ranola et al. (2010) propose a multigraph model in which each pair of nodes is connected by a Poisson number of edges. The mean number of edges is chosen to be the product of two fitnesses. They provide a maximum likelihood estimation approach to estimate the fitnesses and apply the model to "to real data on neuronal connections, interacting genes in radiation hybrids, interacting proteins in a literature curated database, and letter and word pairs in seven Shakespearean plays" (Ranola et al., 2010(Ranola et al., , p. 2004. "In practice many graphs are derived from multigraphs. To simplify analysis, the multiple edges between two nodes of a multigraph are collapsed to a single edge" (Ranola et al., 2010(Ranola et al., , p. 2004. While this approach also uses a Poisson distribution to model the number of edges and considers fitnesses to model the mean, this approach does not consider weighted edges as we do. Furthermore, our models allow for a more general dependence of the mean on the fitness parameters. Norros & Reittu (2006) propose a model in which first a fitness (referred to as capacity) is independently drawn for each node. Then, the number of edges between any two vertices is modelled using a Poisson distribution that depends on the fitnesses. The main result of the paper is on the existence of a giant component in this graph. Again this model does not consider weighted edges in contrast to our model. The fitnesses are assumed to be random variables in the first step of the random graph generation mechanisms. We assume that they are fixed (but unknown) real numbers, that have to be estimated, for which we use a regression framework. We could allow for random fitnesses in our model as well but leave this for future research.
Exponential random graph models (Holland & Leinhardt, 1981;Park & Newman, 2004) are another popular approach for statistical inference of networks. While these models do not consider weighted edges, there are some proposals for extensions to weighted random graphs, see for example the generalized exponential random graph model (GERGM) by Wilson et al. (2017) who specify a joint distribution for an exponential family of graphs with edge weights. They provide a Metropolis-Hasting method to estimate the model and apply it to several real-world networks one of which is also an international lending network of the type that we consider in our empirical study as well.
In our models, we develop stochastic (probabilistic) models for random weighted graphs (the financial networks) -so the random object is the graph itself. This is different from the field of probabilistic graphical models (Koller & Friedman, 2009) and from the field of high-dimensional random graph estimation (Meinshausen & Bühlmann, 2006), where graphs are used to help describe dependencies between components of a multivariate random variable. There the graph is not an (observable) random object -it is a property of the random object.

Compound Poisson models 2.1 Definitions
In the following, we introduce a new model class for weighted and directed graphs consisting of a fixed number n ∈ N of nodes. Furthermore, we assume that the edges are modelled as random variables. A network consisting of n ∈ N nodes is given by a matrix L = (L ij ) i,j∈{1,2,...,n} , where the L ij are random variables modelling the weight of the directed edge from node i to node j. A weight of 0 indicates that the corresponding edge is not present. This definition of a network allows for at most one weighted directed edge between two nodes. In practice, these weights are often aggregates of several individual relationships between the nodes, which motivates our model choice.
We propose using a compound Poisson Gamma distribution for these weights, with parameters given by a regression model. A compound Poisson Gamma distribution can be defined via the random variable where N ∼ Poisson(λ) and S ν ∼ Gamma(α, µ S ), ν = 1, . . . , N , are independent, where Poisson(λ) is the Poisson distribution with mean λ and Gamma(α, µ S ) is the Gamma distribution with shape parameter α and mean µ S . 1 Then Var(S ν ) = µ 2 α . This can be seen as a special case of the so-called Tweedie distribution Jorgensen (1987); Dunn & Smyth (2005), which is usually parametrised via its mean µ and parameters φ, p such that E[X] = µ and Our network L = (L ij ) 1≤i,j≤n will be modelled as independent random variables having a compound Poisson Gamma distributions, with parameters defined via a regression. 3 We will propose two ways of doing this -the first (CPNet1) will model µ ij := E[L ij ] via regression and the second (CPNet2) will model both the mean of N , i.e. λ and the mean of S ν , i.e. µ S , via regression. The numbers 1 and 2 in the names of CPNet1 and CPNet2 indicate how many regressions are embedded in the model.
The parameters of CPNet1 are chosen as follows. The shape parameter of the Gamma distribution is a fixed constant α. As mentioned before, we would like to define the overall mean via regressionthus we want to achieve E[L ij ] = µ ij for given µ ij . That leaves flexibility on how to define the means of the Poisson and Gamma part of the distribution. We resolve this by imposing a second moment . This ensures that every element of L will follow a Tweedie distribution with parameters µ ij , φ, p, with p ∈ (1, 2).
. Then we say that the matrix L has a Compound Poisson Gamma Network regression model for the mean (CPNet1) if for all i, j ∈ {1, . . . , n}, In the above, X ij , i ∈ {1, . . . , n}, j ∈ {1, . . . , p}, are the elements of the design matrix. The variable f i can be interpreted as "fitness" of node i.
We refer to l in the definition above as a link function. Examples for link functions are l(x, y) = exp(x + y), l(x, y) = max(exp(x), exp(y)) and l(x, y) = exp(x) + exp(y). We would usually choose link functions that are monotonically non-decreasing in each of their arguments. This then implies that higher values of the fitnesses imply higher means of the corresponding compound Poisson distributions.
Example 2.2 (CPNet1F model). One example of a model that falls into the CPNet1 model class is the model that we refer to as CPNet1F model, which we will use in our empirical analysis later.
1 In particular, the probability density function of Sν is given by αλ . Then, indeed E[X] = µ and Var(X) = λµ 2 S 1+α α = φµ p . 3 We might have additional information of the network such as for example that no self-loops exists. If that is the case we set Lii = 0 for all i ∈ {1, . . . , n}.
It is defined by setting p = n, X = I n ∈ R n×n and l : R 2 → (0, ∞) with l(x, y) = exp(x + y).
Hence, it has n + 2 parameters given by the vector θ = (β 1 , . . . , β n , α, φ) ∈ R n × (0, ∞) 2 . In this model, the fitness parameter satisfies f i = β i and the overall mean of the edge from i to j is given by

The parameter of the Poisson distribution is then given by
, the shape parameter of the Gamma distribution is given by α and the mean of the Gamma distribution is given by µ S ij = φα(exp(β i + β j )) 1 α+1 . Hence, we see that both the mean of the Poisson and the mean of the Gamma distributions are controlled simultaneously by the fitness parameter (β 1 , . . . , β n ) and the parameters α and φ.
Next we consider CPNet2, which is a model in which both the mean of the Poisson distribution and the mean of the Gamma distribution are modelled separately via regression. The shape parameter α of the Gamma distribution is again a fixed constant.
. Then a network L consisting of n nodes follows a Compound Poisson Gamma network model with links on lambda and the mean of the Gamma distribution(CPNet2) if L is given by In the above, X k ij , i ∈ {1, . . . , n}, j ∈ {1, . . . , p k }, k ∈ {N, S} are the elements of the design matrices. Examples for link functions are as above. The variables f N i and f S i can be interpreted as fitnesses of node i, one affecting the Poisson part of the model, the other the Gamma part of the model.
Example 2.4 (CPNet2FPG model). We introduce the CPNet2FPG as an example of a CPNet2 model. It has fitness-based parameters on both the Poisson and the Gamma part of the model, i.e. p N = n, X N = I n , p S = n, X S = I n , l S (x, y) = l N (x, y) = exp(x + y). It thus has 2n + 1 parameters, namely 2n fitness parameters and the shape parameter of the Gamma distribution α. In particular, the fitness parameters for the Poisson distribution are given by Hence, the mean of the Poisson distribution used to model the edge from i to j is given by exp(β N i + β N j ). Furthermore, the fitness parameters for the Gamma distribution are given by . . , n}. Hence, the mean of the Gamma distribution for the edge from i to j is given by exp(β S i + β S j ). The parameters for the Poisson distribution are different from the parameters used for the Gamma distribution. This will enable us to model the existence of edges independently of the weights of the edges as we will discuss later.

Motivation behind the choice of compound Poisson distributions
One motivation behind the compound Poisson models (Definitions 2.1 and 2.3) is that many weighted networks consist of multiple directed edges between the nodes and these are then aggregated to obtain one network with at most one directed edge between each node. For example, consider a network of bilateral exposures on individual CDS. Each bilateral exposure consists in fact of several separate transactions, as described in Peltonen et al. (2014).
Another motivation comes from the fact that many financial networks do not automatically net exposures between counterparties. Using the compound Poisson distribution independently for both possible directional exposures allows for exposures in both directions, as well as for no exposure at all between counterparties.
Furthermore, our basic framework has enough flexibility to match important features of the link existence distribution and of the exposure distribution. The compound Poisson Gamma distribution has three parameters (one for the Poisson part and two for the Gamma part), thus enabling us to match 3 properties such as the probability of no link, as well as the mean and the variance of the exposure.

Interpretation as fitness models
These new models were inspired by the classical fitness models (see Subsection 1.1) that assign fitnesses to every node which then determines the link existence probabilities for every edge. We, however, take a broader view by considering a general regression framework that enables us to characterise more general features of the random graph. In particular, our regression framework incorporates fitness models as special cases but with the additional feature that fitnesses are used to characterise properties of the weights of edges in addition to the existence of edges.
To see how CPNet1 can be interpreted as a classical fitness model (in which no regression is used to determine the fitness parameter), we can set X = I n , where I n is the n × n identity matrix in CPNet1. Then, f i = p j=1 X ij β j = β i for all i ∈ {1, . . . , n}. Hence, the overall mean of L ij is given by µ ij = l(f i , f j ) = l(β i , β j ) which can be interpreted as a fitness model for the mean of the weighted edges where β i , i ∈ {1, . . . , n} are the fitnesses.
Similarly, we can set X N = X S = I n in CPNet2. Then, can be interpreted as a fitness model for the mean of the Poisson distribution where β N i , i ∈ {1, . . . , n} are the fitnesses and l S (f S i , f S j ) = l S (β S i , β S j ) can be interpreted as is a fitness model for the mean of the Gamma distribution with fitnesses β S i , i ∈ {1, . . . , n}. Both CPNet1 and CPNet2 could be extended to give every node an in-fitness and an out-fitness.
For example, in CPNet2, we could have 4 instead of 2 design matrices, i.e. for k ∈ {N, S} and l ∈ {in, out} we have X k,l ∈ R n×p k,l and corresponding fitnesses Similarly, one could define an extension of CPNet1 with in-fitness and out-fitness.
As discussed in our literature review fitness models have been studied before and it has been shown that they can also be used to construct degree distributions with heavy tails, see e.g. Gandy & Veraart (2017). These results carry over to our class of compound Poisson models, since as one can see from the formulae for the link existence probabilities (1) one can model a wide range of link behaviour with an appropriate choice of fitness parameters and link functions l.

Expected degrees and strengths
Next, we derive formulae for the existence and non-existence of edges, for the expected in-and outdegrees and the expected in-and out-strengths in the new models. In particular, we show that only the parameters of the Poisson distribution determine the link existence probabilities of the edges (together with the link function l in CPNet1 or l N in CPNet2). The distribution used for the individual S ν ij only matters for the actual weights along the edges and these weights are then also influenced by the parameters of the Poisson distribution.
Proposition 2.5. Let L be CPNet1 as in Definition 2.1 and letL be CPNet2 as in Definition 2.3.
Then, for any i, j ∈ {1, . . . , n}, 1. the probability for the non-existence and the existence of a directed edge from i to j is given by 2. the expected in-and out-degrees are given by 3. the expected in-and out-strengths are given by The results follow directly from the definition of the new models and properties of compound Poisson distributions and therefore we omit the proof.
When comparing the expected strengths to the degrees, i.e., formula (3) to (2) we see the main difference between CPNet1 and CPNet2. In CPNet1 the same model parameters determine the magnitude of the degrees and the strengths. In CPNet2 there are additional model parameters f S i , and link functions l S i , i ∈ {1, . . . , n} that influence the strengths of the nodes but not the degrees. Hence, if there is no clear monotonic relationship between strengths and degrees this can be captured with the model class CPNet2. We will discuss this in more detail in our empirical case study.
2.5 Special cases: Erdős-Rényi and core-periphery model Both CPNet1 and CPNet2 reduce to the classical Erdős-Rényi random graph model for the existence of edges for special choices of the model parameters. Indeed, in CPNet1, if we set all parameters f i to the same value, say x, then from (1) we see that all link existence probabilities are identical and hence the CPNet1 model reduces to the classical ER model for the existence of the edges. The same holds for CPNet2 if all parameters f N i are set to the same value. We can also reproduce a core-periphery structure with our new model classes. One could, for example, choose two fitnesses x core ≥ x periphery and assign all nodes i in the core the fitness f i = x core and all nodes i in the periphery the fitness f i = x periphery . This can be achieved by setting β 1 = x core , β 2 = x periphery , p k = 2, X k i1 = 1 if i is in the core and 0 otherwise and X k i2 = 0 if i is in the core and 1 otherwise. Then for any function l k that is non-decreasing in its first two arguments, one would obtain the highest probability for existing edges between two members of the core and the lowest between two members of the periphery. From (2) it is also clear that nodes in the core would have higher expected in-and out-degrees compared to nodes in the periphery. This approach could be generalised by considering possibly more than two types of vertices as in the stochastic block models for random graphs.

Possible applications of the models
Our modelling framework can be used to deal with missing information in network models. For example, situations in which a financial network is only partially observed and one would like to fill in the remaining parts. In contrast to the literature on network reconstruction, see e.g. Gandy & Veraart (2017 we do not assume that the row and column sums of the network matrix L are observed and the individual entries need to be estimated, but we have a situation in mind in which the row and column sums are not observable but some individual entries of the matrix are observable. In such a situation one could fit our new model class to the available data and predict the missing entries from the fitted model. We will demonstrate how this can be done in our empirical case study. An alternative application would be that one observes a network in the past (on one or several occasions) and fits the new model class to these observations. One then uses these results to predict a network in the future.
Alternatively, one might be in a situation that one observes a network that is related to a network of interest, e.g., a derivative exposure network corresponding to Credit Default Swap exposures written on a given reference entity is observed (for example where the reference entity is a UK company) but one is interested in the same type of network written on a different reference entity (for example a non-UK company) and would like to make predictions about this network.
All these possible application areas could arise in the context of macro-prudential stress testing for systemic risk analysis in financial networks. To be able to conduct a macro-prudential stress test one needs to consider the financial system as a whole and analyse potential feedback and amplification mechanisms between the market participants. Often, the connections that give rise to such feedback mechanisms are not fully observable and therefore one will need to rely on statistical and simulation methods to deal with the missing information. This is where our compound Poisson model class can be used. In Gandy & Veraart (2017 it was demonstrated how a network reconstruction method can be used in a macroprudential stress test if the network of interest is not fully observable. As mentioned before, in these papers the assumption was that the network matrix itself was not observable but its row and column sums were. Here we assume that a subset of the network is observable, and we use the subset to estimate a statistical model that will then be used to predict the missing edges in the original network that is not fully observable.

Empirical case studies
We will now fit the new class of compound Poisson models to two different data sets of financial networks. The first data set contains financial networks representing exposures due to financial derivatives and the second data set contains financial networks representing cross-border lending activities. In addition to the compound Poisson models with regression introduced in this paper, we will compare the fit to some alternative models for financial networks. We compare the performance of the models in-sample in Section 3.4 (using the Akaike information criterion (AIC)) and out-of-sample in Section 3.5 (using cross-validation).

Data description: derivative exposure network
First, we consider a data set that contains a snapshot of roughly 134,000 outstanding positions in Credit Default Swaps (CDS) referencing 89 different UK institutions, taken in the second half of 2011. We will refer to them as CDS data. The data come from the Depository Trust & Clearing Corporation's (DTCC) Trade Information Warehouse (TIW) and were supplied to us by the Bank of England with anonymized counterparties. These data were also considered in Gandy & Veraart (2019). As described there, these data record for each reference entity, both counterparties of a position (buyer and seller) and the notional amount. We only consider positions for which the notional amounts are quoted in EUR. The notional amount "represents the par amount of credit protection bought or sold, equivalent to debt or bond amounts, and is used to derive the coupon payment calculations for each payment period and the recovery amounts in the event of a default" (DTTC, 2015, p.3). From these data, we construct for each UK reference entity being referenced a network between buyers and sellers describing the total outstanding positions in credit default swaps referencing this particular institution. This leads to 89 networks in total. Sometimes, for a given reference entity, a pair of buyer and seller is listed more than once which corresponds to outstanding positions for different maturities.
For these cases, we just add up all the multiple entries to obtain the total weight for such an edge. In the following we consider (an arbitrary selection of) 5 of these networks -we refer to them as CDS A, where A ∈ {1, . . . , 5}. Table 1 provides some summary statistics for these five networks. Figure 1 contains a plot of one of these networks consisting of 107 nodes. The network matrix has been normalised such that the sum of all entries of the matrix equals 1. We see that there is a strong clustering of exposures in the lower right corner representing mainly exposures between dealers in this network. This network represents a very typical financial network exhibiting some core-periphery structure.  In the following we consider some more descriptive statistics to understand some properties of the network. Figure 2 shows the empirical cumulative distribution functions of the in-and out-degrees (left) and the in-and out-strengths (right). In general, we find that this network appears to be symmetric with almost no difference between the in-or out-degrees and the in-or out-strengths.
To understand the relationship between strengths and degrees, we consider Figure 3 which shows a scatter plot of the strengths against the degrees. There is a clear tendency for nodes with high degrees to also have high strengths. We fit a simple linear model to the observed total strengths (in-+ outstrengths) using an intercept β 0 and slope parameter β 1 where we use the total degree as explanatory variables. In particular, we set strength i = β 0 + β 1 degree i + i , i ∈ {1, . . . , n} where i is the error term. The regression line is also included in Figure 3 and we see that the linear relationship seems to describe the data reasonably well. Hence, for this data set, a model that associates higher weights with more links seems to be appropriate. We will see that this can be achieved by our compound Poisson models.
Similar monotonic relationships between strengths and degrees have been found in other networks.
For example, Barrat et al. (2004) finds in an analysis of a world-wide airport network (in which nodes represent airports, and the weighted edges represent the number of available seats on direct flights between these airports) that the average strength of a node with degree k increases with the degree proportional to k b for some parameter b.

Data description: international lending network
As the second example, we consider data from the Bank for International Settlements that they collect as part of their locational banking statistics (LBS). We will refer to them as LBS data. These data are publicly available 4 . These data contain information on claims and liabilities of financial institutions x Fn(x) q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q out−strength in−strength aggregated on a country level. From these data, we chose the 38 countries that report their financial activities to the BIS, see Table 2    In the following, we investigate the relationship between strength and degree in the LBS data. Figure 5 shows the empirical cumulative distribution functions of the in-and out-degrees (left) and the in-and out-strengths (right). Similarly to the results for the CDS data, we find that this network appears to be quite symmetric. The in-strength seem to be quite similar to the out-strengths and the same holds for the in-and out-degrees. In contrast to the CDS data, now (in-/out-) degrees seem to exhibit a different pattern compared to (in-/out-) strengths. In particular, as one can see from the empirical cumulative distribution functions the in-and out-degree distributions appear to be bimodal which is not the case for the distribution of the in-and out-strengths. To illustrate this difference further we again look at a scatter plot of the log-strengths against the degrees in Figure 6. We fit a regression line which still exhibits a positive slope indicating that there is still some tendency for nodes with higher strengths to be associated with nodes that have higher degrees, with considerable scatter around the regression line. We clearly see that there are some nodes which have very high strengths but rather low degrees and nodes that have rather high degrees but low strengths.
On the one hand, there are four countries for which their total strength is greater or equal than the median strength of all countries and at the same time their total degrees are less or equal than the median degrees over all countries (this holds for China, Germany, Japan and Singapore). These countries have rather low degree despite their high strength. On the other hand, there are four countries whose total strength is less or equal than the median strength but their total degree is larger or equal than the median degree (this holds for Finland, Philippines, South Africa, South Korea).
These countries have high degrees despite their small strengths. Hence, according to this informal "outlier" criterion 8/38 ≈ 21% nodes are outliers. The same analysis for the CDS network only reveals 11/107 ≈ 10% outliers. Hence, the LBS data have different features compared to the CDS data. In particular, ordering the nodes according to their strength does not coincide with ordering the nodes according to their degree. To be able to fit such a type of behaviour we need a model class that is flexible enough to at least accommodate a partial separation of the weights of an edge from the existence of an edge. We will in the following show how this can be achieved with the compound Poisson regression models.

Models in the comparisons
We now list the models that we consider in our comparisons for the empirical case study. Since our model classes CPNet1 and CPNet2 are very flexible, we choose several choices of models that fall within these two model classes. In addition, we consider some modelling approaches that do not fall within the classes CPNet1 and CPNet2 but appear to be a natural alternative modelling approach to the compound Poisson approach.
The first three models are models for homogeneous networks. The other models allow for differences between the nodes by introducing fitness parameters.
1. ERE uses an Erdős-Rényi network for the link existence probability and then has weights following an exponential distribution. Formally, its model parameter is θ = (p, λ) ∈ [0, 1] × (0, ∞) and independently, P(L ij > 0) = p and L ij |L ij > 0 ∼Exponential(λ). This model has been used in Gandy & Veraart (2017) as an a priori model in a Bayesian framework for network reconstruction. This model is not part of the CPNet1 or CPNet2 class.

4.
CPNet1F is a fitness-based model from the CPNet1 family of models. It uses the fitness in the regression of the overall mean. To be precise, it sets p N = n, X = I n and uses the link function l(x, y) = exp(x + y). It has n + 2 parameters.

5.
CPNet2FP is a CPNet2 model with fitness-based parameters on the Poisson part of the model only, i.e. p N = n, X N = I n , p S = 1, X S = (1, . . . , 1) T . It uses the link functions l S (x, y) = l N (x, y) = exp(x + y). The distribution of the Gamma part of the model is only controlled by a one-dimensional parameter for the mean and by the shape parameter. It has n + 2 parameters in total.
6. CPNet2FG uses the regression on the Gamma part of CPNet2 only. It has n + 2 parameters, and uses the following settings: p N = 1, X N = (1, . . . , 1) T , p S = n, X S = I n , l S (x, y) = l N (x, y) = exp(x + y).

CPNet2FPG is a CPNet2 model with fitness-based parameters on both the Poisson and the
Gamma part of the model, i.e. p N = n, X N = I n , p S = n, X S = I n , l S (x, y) = l N (x, y) = exp(x + y). It thus has 2n + 1 parameters.
8. CPNet2FPGmax is the same as CPNet2FPG but with a different link function on the Poisson part, namely l N (x, y) = max(exp(x), exp(y)). It also has 2n + 1 parameters.
All models are implemented in R (R Core Team, 2018). All models but GlmF get fitted by optimising the likelihood using general-purpose optimisers (optim). The likelihood of the Tweedie, CPNet1 and CPNet2 models are using the methods developed in Dunn & Smyth (2005). GlmF uses the glm function available in R.

In-sample results
We now assess the fit of the models in the empirical case studies. Table 3 gives the Akaike information criterion (AIC) of the models, which is given by −2l + 2k, where l is the maximised log-likelihood of the model and k is the number of parameters in the model. To ease comparisons, we have subtracted the AIC of the basic ERE model for all datasets. Smaller numbers indicate a better fit.
In addition to the data from the case studies, two simulated networks are included (ER8 and ER50); these are simulated from the ERE model with 8 and 50 nodes, with p = 0.3 and λ = 0.2. As expected, the true underlying model (ERE) performs best. For the networks from the case studies the picture is different.
We find that ERG and Tweedie outperform ERE. However, compared to the fitness-based models, their performance is relatively poor.
The models CPNet1F, CPNet2FP and CPNet2FG all have one fitness parameter per node and additionally two free parameters. For the LBS data, CPNet1F, which is modelling the overall mean in a regression, seems to be slightly better than modelling only the mean of the Poisson distribution (CPNet2FP) in most cases. For the CDS data modelling the mean of the Poisson distribution CP-Net2FP is slightly better than modelling the overall mean (CPNet1F). In both cases, the results for modelling only the mean of the Gamma distribution (CPNet2FG) are worst.

Cross-validation results
Next, we use a cross-validation approach to compare the performance of the compound Poisson models.
We partition the elements of the network matrix into 10 folds (roughly equally sized; all elements belong to exactly one fold) and they stay in these folds for the duration of the analysis. We fit the model using the data for 9 folds and then compute the log-likelihood (using the fitted model) of the remaining fold. We repeat the process for all folds and average the results. Hence, every observation is used to fit the model 9 times and is used to test the fit exactly once. Table 4 presents the average log-likelihood in the testing fold. For the simulated data sets (ER8 and ER50), the true underlying model (ERE) does best as we would expect. For the CDS data sets, the CPNet2FP model seems to be doing best. For the LBS data set, it is one of the models with two fitness models -either CPNet2FPG or CPNet2FPGmax. For the CDS data sets, the CPNet1F data set seems to be doing badly.       Table 5 is also based on cross-validation, but with a different error criterion. It simulates from the fitted model 100 times and then reports the average accuracy (the proportion of elements that were correctly present/not present). The table reports the results in percent. Generally speaking, models that allow for a fitness parameter in the Poisson part of the model (such as CPNet2FP, CPNet2FPG or CPNet2FPGmax) are doing best. This is not surprising since in (1) we have seen that the link existence probabilities are directly determined by the fitnesses associated with the Poisson distribution.

Conclusion
We have introduced a new model class for directed and weighted random graphs with a fixed number of nodes in which each edge has a compound Poisson distribution for its weight. We have proposed different regression approaches to model features of the compound Poisson distribution. When fitting the new models to empirical network data we found that in most cases the compound Poisson models that model both the expectation of the Poisson random variable and the expectation of the Gamma distribution via separate regression models performed best (measured in terms of their AIC), i.e., the CPNet2 model class is preferable to the CPNet1 model class which itself is preferable to more basic Erdős-Rényi-type models.
In our tests on using these models for predicting subnetworks of a larger network, we found that for the CDS data which exhibit a more traditional monotonic relationship between strengths and degrees the CPNet2 model class in which only one regression was used for the mean of the Poisson distribution performed best. For the LBS data, in which the relationship between strengths and degrees is nolonger monotonic, we found clear advantages of using the CPNet2 model with both a regression for the