Topologies of complex networks have been known to exhibit non-trivial heterogenous properties such as heavy-tailed degree distributions (Barabási and Albert 1999) and assortative mixing (Newman 2002), to name a few. A similar non-trivial network heterogeneity is captured by the phenomenon of the friendship paradox. The friendship paradox was first studied by Scott Feld in the context of social networks (Feld 1991). It states that the average number of friends of the collection of friends of individuals in a social network will be higher than the average number of friends of the collection of the individuals themselves. The phenomenon results from a sampling bias that causes nodes to be counted in proportion to their degree, and it extends beyond popularity to other individual traits such as amount of content in social networks (Hodas et al. 2013), number of coauthors, citations and publications in scientific collaboration networks (Eom and Jo 2014), etc. This is known as the generalized friendship paradox and has been attributed to a correlation between these desirable traits and degree (Eom and Jo 2014; Fotouhi et al. 2014) although it has been proven to exist in absence of a significant correlation as well (Momeni and Rabbat 2016). The phenomenon has a measurable impact on people’s psyches because people tend to evaluate success in regard to a given trait through comparison to the social network around them (Bollen et al. 2017; Jackson 2016); an overshadowing effect of a collection of nodes on another is significant. The friendship paradox has been studied in the context of both online (Bagrow et al. 2017; Bollen et al. 2017; Hodas et al. 2013) and offline social networks (Feld 1991). The phenomenon is described as a global, network-wide characteristic. It is natural to ask if the friendship paradox holds at a more local level, e.g., at the level of individual nodes in a network (Hodas et al. 2013; Momeni and Rabbat 2016). In this work, we mathematically quantify the friendship paradox so as to understand its strength both locally and globally in the network.

Inspired by the friendship paradox, we first study a local network metric for a node, which we call the “friendship index” (FI). For a vertex v, it is defined as \(FI(v) = \frac {\sum _{(v, u) \in E}{d_{u}}}{d_{v}^{2}}\), which is the ratio of the average degree of the neighbors of the vertex to its own degree. FI captures two significant characteristics related to the friendship paradox. First, it captures the ‘direction of influence’, i.e., it indicates whether, in accordance with the friendship paradox, the vertex is outperformed by its neighbors in popularity, or if it is one of the vertices for which the paradox does not hold because it is more popular than its neighbors. Second, it quantifies the ‘discrepancy’ compared to a scenario where the friendship paradox does not exist; FI values further from 1 indicate an extreme manifestation of the paradox whereas FI values closer to 1 would indicate the vertex experiences the paradox less severely. Notice that this measure is truly local. The node’s FI value is determined entirely by its own degree and the degrees of its neighbors, the structure of network as a whole does not influence a node’s FI. This distinguishes FI from local assortativity (Thedchanamoorthy et al. 2014; Piraveenan et al. 2010), a node metric that captures a relative value normalized by other such values in the network. Having defined the local metric, we propose aggregating the indices over all vertices in order to quantify the strength of the overall network-wide phenomenon of friendship paradox by taking an arithmetic mean, the log of the geometric mean, and the harmonic mean. We explore these measures theoretically and experimentally, noting their ability to capture network characteristics. We compare and contrast them and ultimately find value in the arithmetic mean and the log of the geometric mean, while discarding the harmonic mean because drastically different graphs have equivalent harmonic means. Consistent with the nature of the friendship paradox, we prove a significant lower bound on these means that indicates the average FI is always greater than or equal to 1, meaning nodes feel outperformed by their neighbors on average. Because of this bound, we can say the aggregates indicate a stronger occurrence of the friendship paradox as they increase. Unlike FI, where the distance from 1 can be in either the positive or negative direction, the aggregate measures only increase with the strength of the paradox. We explore real-world networks that are categorized by their type, and find some consistencies in our aggregate measures within the individual categories.

A number of existing works have analysed the friendship paradox from different perspectives. The comparison between a node’s degree to both the mean and median of neighbors’ degrees has been used in the literature, primarily as a binary measure of whether a node is more or less popular compared to its neighbors (Jo and Eom 2014; Momeni and Rabbat 2016; Momeni and Rabbat 2018; Lee et al. 2019), while some (Hodas et al. 2013; Jackson 2016) have also used its absolute value to indicate severity of the paradox. Hodas et al. (2013) uses the binary measure obtained through the mean of neighbor’s degrees to investigate the friendship paradox in Twitter network. Jo and Eom (2014) characterized the paradox holding probability of individual nodes, and studied its behavior for network models with tunable degree-degree and degree-attribute correlations. A significant finding was that the paradox holding probability may depend on the assortativity of the network, which acts as a strong motivation for our work. Momeni and Rabbat (2016) used both the mean and median based binary measures to investigate the prevalence of friendship paradox in Twitter network. Lee et al. (2019) studied three perception models of the friendship paradox - mean-based, median-based and fraction-based binary measures for the friendship paradox aggregated across the entire network. These measures were analysed in configuration models with tunable assortativity, and the impact of the perception model was studied on opinion formation. Our contribution has been to investigate the measure based on the mean of neighbors’ degrees as a network metric by studying its properties in popular network models, and illustrating its relationship with local and global measures of assortativity.

The phenomenon of assortativity, or assortative mixing captures the preference for a network’s nodes to attach to others that are similar in some way. In this work, we focus on degree assortativity (Newman 2002) where the measure of similarity is in terms of the nodes’ degrees. Although assortativity was proposed as a global measure, local versions have also been proposed more recently (Piraveenan et al. 2010; Thedchanamoorthy et al. 2014). It is worth mentioning that these local assortativity measures are examples of node measures that are not truly local, because they are influenced by the global assortativity of the graph. Jo and Eom (2014), Lee et al. (2019), and Momeni and Rabbat (2018) conduct studies on networks that vary in assortativity and explore its impact on the friendship paradox. In this work, we further investigate the relationship between the friendship paradox and assortative mixing, by analysing the relationship between the FIs of nodes and their local assortativity measures. We observe that FI values close to 1.0 indicate strong local assortativity, while FI values further from 1.0 indicate weak assortativity. We also explore the relationship between our network-wide aggregate measures and network assortativity. Similar to the results of the local measures, when the aggregate values are extreme they indicate disassortativity and when they are moderate they indicate assortativity. These consistencies between the measures are revealed through theoretical arguments and experimental results. We conclude that the friendship index captures some information about assortative mixing in the network. We consider canonical graphs and common graph models and find our aggregate measures produce well behaved functions that smoothly reflect small adjustments in the graphs where the function of network assortativity has a far less smooth curve. The present work is an extension of our previous work (Pal et al. 2018); here we present complete and detailed proofs of results partially proven in Pal et al. (2018) along with new theoretical findings on the proposed metrics in networks models, such as Erdos-Renyi and Barabasi-Albert graphs. Additionally, a more thorough simulation study on network models is presented here.

The paper is organized as follows. In “Preliminary” section we define the local and global notions of friendship index and assortativity. These metrics are then studied for various canonical graphs such as regular graphs, star graphs and complete bipartite graphs. In “Theoretical results on friendship index” section, we highlight some theoretical results of the friendship index, followed by experimental results on network models and real networks in “Experimental results” section.


Friendship index

Consider a graph G=(V, E), with V and EV×V being the set of nodes and edges respectively. For each node iV, let di denote the degree of node i and Si denote the sum of i’s neighbors’ degrees. Therefore, we have

$$S_{i} = \sum_{v \in N_{i}} d_{v},$$

where Ni denotes the set of neighbors of node i. The friendship index (FI) of node i is defined as

$$\begin{array}{@{}rcl@{}} FI(i) &= \frac{S_{i}}{ d_{i}^{2}}, \text{ if}\ d_{i} >0 \\ &= 1, \text{ if}\ d_{i} = 0. \end{array} $$

The friendship index of a node is the average degree of its neighbors normalized by its own degree. Therefore, the FI of a node will be less than 1 if its own degree is larger in comparison to its neighbors’ degrees, and greater than 1 otherwise.

Another way to understand the friendship index is through the uniform and neighborhood node sampler (Leskovec and Faloutsos 2006; Momeni and Rabbat 2018). For a graph G=(V, E), we define the following node sampling techniques:

a) Uniform sampler ρ - A uniformly sampled node is chosen uniformly at random among the nodes in the graph. Therefore, \(\rho \sim \mathcal {U}(V)\).

b) Edge sampler σ - An edge is sampled uniformly at random, then one of its endpoints are chosen uniformly at random. For an uniformly chosen edge \((\sigma _{1},\sigma _{2}) \sim \mathcal {U}(E)\), \(\sigma \sim \mathcal {U}(\sigma _{1},\sigma _{2})\).

c) Neighborhood sampler μ - First define \(\mu (i) \sim \mathcal {U}(N_{i})\) to be a node sampled uniformly at random among the neighbors of node i. If node i has no neighbors then set μ=i. Next, we define μ(ρ) to be a node that is sampled randomly from the neighbors of an uniformly sampled node ρ. Operationally, first sample \(\rho \sim \mathcal {U}(V)\) and \(\mu \sim \mathcal {U}(N_{\rho })\); However if Nρ=, then set μ=ρ.

We can equivalently define the friendship index as follows,

$$\begin{array}{*{20}l} FI(i) &= \frac{\mathbb{E}[{d_{\mu(i)}}]}{ d_{i}}, \text{ if}\ d_{i} >0 \\ &= 1, \text{ if}\ d_{i} = 0. \end{array} $$

It is easy to see that the friendship index for regular graphs will always be 1 because all degrees are the same. This is a local network metric concerning a particular node. We could either aggregate the local measure to obtain a global measure or directly work with the local FIs {FI(i), iV}.

The local measures of the friendship paradox can be extended to the entire network in different ways. The following global notions of FI can be defined for a given graph G:

  1. 1

    Arithmetic FI (AFI) - We define AFI as the average, or arithmetic mean, of all the local FIs

    $$ AFI(G) = \frac{1}{|V|}\sum_{i \in V} FI(i) $$

    Observe that

    $$\begin{array}{*{20}l} AFI(G) &= \frac{1}{|V|} \sum_{i \in V} \frac{ \mathbb{E}[{d_{\mu(i)}}]}{d_{i}} = \mathbb{E}{ \frac{d_{\mu(\rho)}}{d_{\rho}} } \end{array} $$

    with the caveat that \(\frac {\mathbb {E}{d_{i}}}{d_{i}}\) is set to 1 if di=0. Therefore, AFI is the expected ratio between the degrees of a neighborhood sampled node μ(ρ) and that of the uniformly sampled node ρ.

  2. 2

    Geometric FI (GFI) - This metric is defined as the logarithm of the geometric mean of all the local FIs

    $$GFI(G) = \frac{1}{|V|} \sum_{v \in V} \log FI(v) $$

    While the linearity of expectation allows the alternative representation for AFI through the notion of sampling, we do not have an equivalent expression for GFI. This leads to greater mathematical tractability in analysing AFI of network models compared to GFI or other binary measures based on individual nodes involving Heaviside step functions. However, GFI has other benefits over AFI that will become apparent later.

  3. 3

    Harmonic FI (HFI) - This metric is defined as the harmonic mean of all the local FIs

    $$HFI(G) = \frac{1}{\frac{1}{|V|} \sum_{v \in V} \frac{1}{FI(v)} } $$

    Again for HFI, we do not have an alternative representation like (4).

We will give theoretical results on AFI and experimental results on both AFI and GFI. In what follows we explain the relative advantages of considering AFI and GFI.

Global and local assortativity

Newman (2002) defines assortativity as a measure of the similarity of degree in adjacent vertices.Footnote 1 Newman formally quantifies assortativity as a Pearson correlation coefficient of the degrees of the two vertices attached to every edge. This can be expressed as

$$ r = \frac{\mathbb{E}[{d_{\sigma_{1}} d_{\sigma_{2}}}] - \mathbb{E}[{d_{\sigma}}]^{2}}{\mathbb{E}[{d_{\sigma}]^{2}} - \mathbb{E}[{d_{\sigma}}]^{2}} $$

where (σ1,σ2) is an uniformly sampled edge and σ is a node sampled according to the random edge sampling. This difference in sampling nodes introduces inherent differences between the friendship index and assortativity.

The assortativity can also be defined in terms of the network degree distribution p, excess or remaining degree distribution q and the link distribution ej, k. Note that the distribution p is the degree distribution of an uniformly sampled node ρ. The distribution q is related to the excess degree of a node arrived through the random edge sampling procedure, given by

$$ q(k) = \frac{(k+1)p(k+1)}{\sum_{j=1}^{| V |} j p(j)}. $$

We also define the joint probability distribution of the remaining degrees of the two nodes on either end of a randomly chosen link as ej, k. We can equivalently define assortativity as

$$ r = \frac{1}{\sigma_{q}^{2}} \left[ \sum_{jk} jk \left(e_{j,k} - q(j)q(k) \right) \right] $$

where σq is the standard deviation of the distribution q.

While this definition of assortativity is at the global level, several efforts have been made to quantify the assortativity of the network at a local level (Piraveenan et al. 2010; Thedchanamoorthy et al. 2014). Specifically, we first consider the definition of local assortativity presented by Piraveenan et al. (2010). This definition of local assortativity, which we call the p-local assortativity is defined as

$$ r_{P}(i) = \frac{d_{i} \left((d_{i}-1) \bar{k} - \mu_{q}^{2} \right)}{2 |E| \sigma_{q}^{2}}, \ i \in V $$

where, \(\bar {k}\) is the average remaining degree of the node’s neighbors, μq and σq are the mean and standard deviation of the distribution q. It also follows that the p-local assortativities of all the nodes sum up to the network assortativity r.

Note that rp(i) is positive and large if \((d_{i}-1)\bar {k} >> \mu _{q}^{2}\). This happens when a high degree node is connected to other nodes which have relatively high degree or if a node with degree in a medium range has neighbors with large degrees. And, rp(i) is large in magnitude and negative if \((d_{i}-1)\bar {k} << \mu _{q}^{2}\), which can happen when a node with low degree is connected to low degree nodes. While in such cases the local assortativity should be positive, the p-local assortativity will have a negative value.

We define another notion of local assortativity due to Thedchanamoorthy et al. (2014), which has been argued to be closer to the fundamental notion of assortativity compared to that proposed by Piraveenan et al. (2010). This concurs with our analysis in the previous paragraph that the p-local assortativity does not always capture what a local measure of assortativity should. For these reasons, we only show results for the local assortativity due to Thedchanamoorthy et al.

The local assortativity measure calculates the ‘average neighbor difference’, a direct indicator of a node’s dissortativity, given by

$$ \delta_{i} = \frac{1}{d_{i}} \sum_{v \in N_{i}} |d_{i} - d_{v}|, $$

which is then scaled by the sum of the neighbor differences across all the nodes, to obtain the normalized neighbor differences \(\bar {\delta _{i}}\). We therefore have,

$$\bar{\boldsymbol{\delta}_{i}} = \frac{\boldsymbol{\delta}_{i} }{ \sum_{j \in V} \boldsymbol{\delta}_{j} }. $$

The final assortativity of a node, which we call the T-local assortativity (TLA) is obtained by

$$ r_{T}(i) = \lambda - \bar{ \delta_{i}} $$

where \(\lambda = \frac {r+1}{|V|}\), such that the sum of all the local assortativities yield the global assortativity r.

Observe that if δi is small, then the degrees of node i and its neighbors are very close. Therefore, the local measure FI(i) should be close to 1. However, if δi is large then at least one neighbor of i has degree very different from i. Therefore, it is expected that the FI(i) would be far from 1. If it is much less than 1, then the degree of i is more than its neighbors’ on average; while if it its greater than 1, then the degree of i is less than its neighbors’ on average. Therefore FI captures ‘direction of influence’ in scenarios of local disassortativity. If the TLA at node i is close to 1, i.e., degree of node i is similar to its neighbors, then FI(i) would be close to 1; and if TLA is small and close to 0, i.e., the degree of node i is significantly different from its neighbors, then FI(i) is expected to be away from 1. Therefore, |FI(i)−1| is expected to be anticorrelated with TLA, i.e., as |FI(i)−1| increases, TLA is expected to reduce.

Results for special graphs

  1. 1

    d-regular graphs - Since all nodes have the same degree d, FI=1 for all nodes. Therefore AFI=1, GFI=0 and HFI=1. A clique of n nodes is a n−1-regular graph. Thus, the result also holds for cliques. Therefore, for d-regular graphs where the friendship paradox does not occur, we get the minimum values for AFI and GFI (see Theorems 1 and 2). Also, the assortativity r=1 for regular graphs, and all local assortativity values will be positive and equal \(\left (r_{T}=\frac {2}{n}\right)\).

  2. 2

    Star graphs - Consider a star graph with n nodes. The FI for the center of the star will be \(\frac {1}{n-1}\), and n−1 for all leaves. One can show that the FI values for the center and the leaves of a star are the maximum and minimum possible values for a graph with the given number of nodes. Thus the friendship paradox is rather severe for a star graph. We calculate, \(AFI=n-2+\frac {1}{n-1}\), \(GFI=\frac {n-2}{n} \log n\) and HFI=1. An advantage of considering GFI is that the maximum and minimum possible values of FIs have the same absolute value in the logarithmic scale. Surprisingly, HFI for a star is the same as that of a d-regular graph. The assortativity r=−1, while all the local assortativities rT will be the minimum value of −1.

  3. 3

    Complete bipartite graphs - Consider a bipartite graph with x nodes on one side and y nodes on the other. This is a generalization of the star graph (x=1,y=n). Let us denote this as Bipartite(x, y), with the set of x nodes on one side being denoted as X, and the set of y nodes on the other side denoted as Y. For vertex vX, \(FI(v)=\frac {x}{y}\) and for vY, \(FI(v) = \frac {y}{x}\). Therefore, if x>y then nodes in X will have FI>1 and nodes in Y will have FI<1. The \(AFI = 1 + \frac {(x-y)^{2}}{xy}\) which is always greater than 1 as long as xy. Furthermore, \(\frac {\partial AFI}{\partial y} = \frac {\left (y^{2}-x^{2}\right)}{xy^{2}}\) implying that \(\frac {\partial AFI}{\partial y}>0\) for y>x and \(\frac {\partial AFI}{\partial y}<0\) for y<x. Therefore, with x kept constant, the AFI increases when y is varied away from x. We also have \(GFI=\frac {1}{x+y} \log \left (\frac {x^{x} y^{y}}{x^{y} y^{x}} \right)>1\) for xy. We calculate, \(\frac {\partial GFI}{\partial y} = \frac {2xy \log \left (\frac {y}{x} \right) + \left (y^{2} - x^{2}\right) }{(y+x)^{2}y}\), which again implies that the GFI will increase as y is varied away from x. Therefore, both AFI and GFI are well-behaved for the class of complete bipartite graphs in that both of the measures increase as y is varied away from x indicating more prominence of the friendship paradox. Interestingly, for xy, assortativity r=−1, while for x=y assortativity r=1 by definition, indicating that the variation in assortativity is rather sharp. We also compute HFI=1 for all positive values of x and y.

    We find HFI to be equal to 1 for widely different classes of graphs ranging from d-regular graphs to star graphs. Therefore in later sections, we will only continue investigation into AFI and GFI owing to their more well-behaved nature. Also from the star graph, we observed that in the logarithmic scale, the maximum and minimum values of the FI are equidistant from 0, the logFI value where friendship paradox is not observed. Therefore in the “Experimental results” section, we study the correlation between | logFI| and TLA, because | logFI| better captures the deviation from the scenario where friendship paradox does not occur.

Theoretical results on friendship index

A lower bound on global measures of fI

We first state a simple result that will be used in this section.

Lemma 1

For any positive number x, y and monotonically increasing functions \(f,g: \mathbb {R} \rightarrow \mathbb {R}\) such that f(x),g(x)≥0 for x≥1, we have

$$ \frac{f(x)}{g(y)} + \frac{f(y)}{g(x)} \geq \frac{f(x)}{g(x)} + \frac{f(y)}{g(y)} $$


For x, y≥1, we have

$$\begin{array}{*{20}l} &\left(\frac{f(x)}{g(y)} + \frac{f(y)}{g(x)} \right) - \left(\frac{f(x)}{g(x)} + \frac{f(y)}{g(y)} \right) \\ &= \frac{\left(f(x) - f(y) \right) \left(g(x) - g(y) \right)}{g(x)g(y)} \geq 0. \end{array} $$

With f(x)=x, and g(x)=x2, the above lemma leads to the following corollary.

Corollary 1

For any edge e(i, j), we have \(\frac {d_{i}}{d_{j}^{2}}+ \frac {d_{j}}{d_{i}^{2}} \ge \frac {1}{d_{i}}+\frac {1}{d_{j}}\)

Theorem 1

For all graphs G, AFI(G)≥1. AFI(G)=1 only for the class of graphs where each connected component is a regular graph.


For a graph G, we have

$$\begin{array}{*{20}l} AFI(G) &= \frac{1}{n} \sum_{i \in V} \frac{S_{i}}{d_{i}^{2} } \\ &= \frac{1}{n} \sum_{i \in V} \sum_{j \in N_{i}} \frac{d_{j}}{d_{i}^{2}} \end{array} $$

Observe that for each edge (i, j), the term \(\frac {d_{j}}{d_{i}^{2}}\) appears when summing over i and \(\frac {d_{i}}{d_{j}^{2}}\) appears when summing over j. Therefore, rather than the double sum in (13), we can have

$$\begin{array}{*{20}l} AFI(G) &= \frac{1}{n} \sum_{(i,j) \in E} \left(\frac{d_{j}}{d_{i}^{2}} + \frac{d_{i}}{d_{j}^{2}} \right) \\ &\geq \frac{1}{n} \sum_{(i,j) \in E} \left(\frac{1}{d_{i}} + \frac{1}{d_{j}} \right) \text{ (Using Corollary 1)} \end{array} $$
$$\begin{array}{*{20}l} &= \frac{1}{n} \sum_{i \in V} \frac{d_{i}}{d_{i}} = 1 \end{array} $$

where the last step follows by noticing that for each node i, the term \(\frac {1}{d_{i}}\) appears for exactly di edges. Also, equality in (14) occurs if and only if all the nodes on either side of an edge have the same degree, i.e., all nodes in a connected component have the same degree. □

With f(x)= logx, and g(x)=x, Lemma 1 leads to the following corollary.

Corollary 2

For any edge e(i, j), we have \(\frac {\log d_{i}}{d_{j}}+ \frac {\log d_{j}}{d_{i}} \ge \frac {\log d_{i}}{d_{i}}+\frac {\log d_{j}}{d_{j}}\)

Theorem 2

For all graphs G, GFI(G)≥0. GFI(G)=0 only for the class of graphs where each connected component is a regular graph.


Using the expression for GFI, we obtain for a graph G

$$\begin{array}{*{20}l} GFI(G) &= \frac{1}{n} \sum_{i \in V} \log \frac{S_{i}}{d_{i}^{2}} \\ &= \frac{1}{n} \sum_{i \in V} \log \frac{1}{d_{i}} + \frac{1}{n} \sum_{i \in V} \log \frac{S_{i}}{d_{i}} \\ &= \frac{1}{n} \sum_{i \in V} \log \frac{1}{d_{i}} + \frac{1}{n} \sum_{i \in V} \log \left(\frac{1}{d_{i}} \sum_{j \in N_{i}} d_{j} \right). \end{array} $$

Applying Jensen’s inequality in (16), we obtain

$$\begin{array}{*{20}l} GFI(G) &\geq \frac{1}{n} \sum_{i \in V} \log \frac{1}{d_{i}} + \frac{1}{n} \sum_{i \in V} \sum_{j \in N_{i}} \frac{1}{d_{i}} \log d_{j} \\ &= \frac{1}{n} \sum_{i \in V} \log \frac{1}{d_{i}} + \sum_{(i,j) \in E} \left(\frac{1}{d_{i}} \log d_{j} + \frac{1}{d_{j}} \log d_{i} \right) \\ &\geq \frac{1}{n} \sum_{i \in V} \log \frac{1}{d_{i}} + \sum_{(i,j) \in E} \left(\frac{1}{d_{i}} \log d_{i} + \frac{1}{d_{j}} \log d_{j} \right) \text{ (Using Corollary 2)} \\ &= \frac{1}{n} \sum_{i \in V} \log \frac{1}{d_{i}} + \frac{1}{n} \sum_{i \in V} \log d_{i} = 0 \end{array} $$

where the last step is obtained by noticing that for each node i, the term \(\frac {1}{d_{i}} \log d_{i}\) appears for exactly di edges. Again, equality occurs if and only if both nodes on either side of an edge have the same degree. □

Theorem 1 has been proven by Jackson (2016)(Lemma 1), while Theorem 2 is a new result. Theorems 1 & 2 show that both the AFI and GFI will be at their minimum values for the class of regular graphs, where the friendship paradox does not occur. For any other graph, the AFI and GFI will be strictly greater than their minimum values, thereby suggesting the occurrence of the friendship paradox due to an imbalance between neighbor degrees. This is in agreement with the observation due to Feld (1991), where he noted that the mean number of friends of friends is always greater than the mean number of friends, i.e.,

$$ \frac{\sum_{i} d_{i}^{2}}{ \sum_{i} d_{i}} \geq \frac{1}{n} \left(\sum_{i} d_{i} \right) $$

with equality taking place in (18) only for regular graphs where the empirical degree distribution has zero variance. This demonstrates a well-behaved property for both of the proposed global metrics.

Erdos-Renyi graphs

We study the behavior of the friendship index for the class of Erdos-Renyi (ER) graphs (Erdos and Rényi 1960), a basic graph model for capturing random connections in networks. Consider a sequence of ER graphs \(\{\mathbb {G}_{1},\mathbb {G}_{2},\ldots \}\), such that \(\mathbb {G}_{n}=(V_{n},\mathbb {E}_{n})\) with Vn being the vertex set {1,2,…,n}, and \(\mathbb {E}_{n}\) being the random edge set with every edge occuring with probability p. We define the following notation: For nodes i and j, let {ij} denote the event of i being connected to j. Let Dn, i and \(\mathcal {N}_{n,i}\) denote the degree and the set of neighbors of node i in \(\mathbb {G}_{n}\). In a random graph setting, the expected friendship index of node i in \(\mathbb {G}_{n}\) can be expressed as

$$ \mathbb{E}[{{FI}_{n}(i)}] = \mathbb{E}\left[{\frac{S_{n,i}}{D_{n,i}^{2}} \boldsymbol{1}[{D_{n,i} > 0}] + \boldsymbol{1}[{D_{n,i}=0}}]\right]. $$

Although we will primarily focus on the expected behavior of the FI of a node, it is worth noting that the AFI will exhibit similar behavior, because

$$\begin{array}{*{20}l} \mathbb{E}[{{AFI}_{n}}] = \mathbb{E}\left[{\frac{1}{n} \sum_{v \in V} {FI}_{n}(v)}\right] = \mathbb{E}[{{FI}_{n}(i)}], \ i \in V. \end{array} $$

with, AFIn being the AFI of graph \(\mathbb {G}_{n}\).

Case 1: np constant

Considering the first term in (19), we have

$$\begin{array}{*{20}l} \mathbb{E}\left[{\frac{S_{n,i}}{D_{n,i}^{2}} \boldsymbol{1}[{D_{n,i} > 0}]}\right] &= \mathbb{E}\left[{\boldsymbol{1}[{D_{n,i}>0}] \frac{1}{D_{n,i}^{2}} \sum_{j \in \mathcal{N}_{n,i}} D_{n,j}}\right] \\ &= \mathbb{E}\left[{\boldsymbol{1}[{D_{n,i}>0}] \frac{1}{D_{n,i}^{2}} \mathbb{E}\left[{\sum_{j \in \mathcal{N}_{n,i}} D_{n,j}\ \bigg|\ D_{n,i} = d_{i}}\right]}\right] \\ &= \mathbb{E}\left[{\boldsymbol{1}[{D_{n,i}>0}] \frac{1}{D_{n,i}^{2}} \sum_{j \in \mathcal{N}_{n,i}} \mathbb{E}\left[{1 + \sum_{k \in V \setminus \{i,j\}} \boldsymbol{1}[{j \sim k}] \bigg| D_{n,i} = d_{i}}\right]}\right] \\ &= \mathbb{E}\left[{\boldsymbol{1}\left[{D_{n,i}>0}\right] \frac{1}{D_{n,i}^{2}} \sum_{j \in \mathcal{N}_{n,i}} \mathbb{E}\left[{ 1 + \sum_{k \in V \setminus \{ i,j \}} \boldsymbol{1}[{j \sim k}}]\right]}\right] \\ &= (1+ (n-2)p)\mathbb{E}\left[{\frac{1}{D_{n,i}}\boldsymbol{1}[{D_{i} >0}]}\right]. \end{array} $$

As n, Dn, idΠ(c) under the regime npc. We know that if XndX, then \(\mathbb {E}[{f(X_{n})}] \rightarrow \mathbb {E}[{f(X)}]\) for bounded Lipshitz functions f. The mapping \(f:x \to \frac {1}{x}, \ x \in \mathbb {N}\) is bounded by 1, and Lipschitz because \(|f(x)-f(y)|= \left | \frac {1}{x} - \frac {1}{y} \right | \leq |x-y|\). Using this result in (21), we obtain

$$\begin{array}{*{20}l} \mathbb{E}[{{FI}_{n}(i)}] \to (1+c) \mathbb{E}\left[{\frac{1}{\Pi(c)} \boldsymbol{1}[{\Pi(c) > 0}}]\right] + \mathbf{P}[{\Pi(c) = 0}] \end{array} $$

under the regime npc, with c>0. We set the limit as

$$\begin{array}{*{20}l} \beta(c) &= (1+c) \mathbb{E}\left[{ \frac{1}{\Pi(c)} \boldsymbol{1}[{\Pi(c) > 0}}]\right]+ \mathbf{P}[{\Pi(c) = 0}] \\ &= (1+c) \sum_{d=1}^{\infty} \frac{1}{d} e^{-c} \frac{c^{d}}{d!} + e^{-c}. \end{array} $$

Simple bounds on Friendship Index: Using (22), we first try bounding the FI, since the expression cannot be exactly computed.

Lemma 2

For sequence of Erdos-Renyi graphs \(\{\mathbb {G}_{n}, n=1,2,\ldots \}\)

$$ {max} \left(1, e^{-c} \left[1+(1+c)\left(c + \frac{c^{2}}{4} \right) \right] \right) \leq \underset{n \rightarrow \infty}{lim} \mathbb{E[}{{FI}_{n}(i)]} \leq 1 +c -c e^{-c} $$

under the regime npc.


Using the limiting expression of FI, we obtain for c>0

$$\begin{array}{*{20}l} & (1+c) \mathbb{E}\left[{ \frac{1}{\Pi(c)} \boldsymbol{1}[{\Pi(c) > 0}}]\right] + \mathbf{P}[{\Pi(c) = 0}] = (1+c) \sum_{d=1}^{\infty} \frac{1}{d} e^{-c} \frac{c^{d}}{d!} + e^{-c} \\ &\leq (1+c) \sum_{d=1}^{\infty} e^{-c} \frac{c^{d}}{d!} + e^{-c} = 1+c-c e^{-c}. \end{array} $$

For obtaining the lower bound we only use the first term from the infinite sum

$$\begin{array}{*{20}l} (1+c)\mathbb{E}\left[{ \frac{1}{\Pi(c)} \boldsymbol{1}[{\Pi(c) > 0}}]\right]+ \mathbf{\text{P}}[{\Pi(c) = 0}]&\geq (1+c) \left(c e^{-c} + \frac{c^{2}}{4} e^{-c} \right) + e^{-c}, \end{array} $$

and also note that \(\mathbb {E}[{{FI}_{n}(i)}] \geq 1\) since \(\mathbb {E}[{{FI}_{n}(i)}] = \mathbb {E}[{{AFI}_{n}}] \geq 1\) using Theorem 1. □

Theorem 3

For a sequence of Erdos-Renyi graphs \(\{ \mathbb {G}_{n}, n=1,2,\ldots \}\) under the regime npc, and node i, we have (a) \({\lim }_{n \to \infty } \mathbb {E}[{{FI}_{n}(i)}] \rightarrow 1\)as c →0; (b) \({\lim }_{n \rightarrow \infty } \mathbb {E}[{{FI}_{n}(i)}] \rightarrow 1\)as c; (c) there exists δ>0such that for all c∈(0,δ), \({\lim }_{n \rightarrow \infty } \mathbb {E}[{{FI}_{n}(i)}]>1\).


Part (a) of Theorem 3 can be proved using the upper bound of Lemma 2 and also by using the lower bound \(\mathbb {E}[{{FI}_{n}(i)}]= \mathbb {E}[{{AFI}_{n}}]\geq 1\).

Lemma 2 does not help in determining the behavior of β for c large, because the upper and lower bounds diverge as c increases. Therefore, we study the behavior of \(\beta : \mathbb {R} \rightarrow \mathbb {R}\) over the range of c analytically. For c∈(0,C), with C>0, the infinite sum β(c) converges uniformly on the specified range. Thus, differentiating β(c) w.r.t. c yields

$$\begin{array}{*{20}l} \beta^{\prime}(c) &= -c \sum_{d=1}^{\infty} \frac{1}{d} e^{-c} \frac{c^{d}}{d!} + \frac{1+c}{c} \left(e^{c} - 1 \right) e^{-c} - e^{-c} \\ &= - \frac{c}{1+c} \beta(c) + \left(1 + \frac{1}{c} \right) (1-e^{-c}) - e^{-c} \end{array} $$

For c large and β(c)>1, we have β(c)<0. Furthermore for c large, β(c)≈−β(c)+1 implying that \({\lim }_{c \rightarrow \infty } \beta (c) = 1\). This to Theorem 3(b).

Part(c) follows from the lower bound in Lemma 2 and observing that the lower bound is greater than 1 for c∈(0,δ) (for some δ>0). □

Therefore, the result indicates that for c small, the effect of the friendship paradox is weak when measured in terms of AFI. This is probably because the graph is so sparse that there are a lot of isolated nodes and edges, and the paradox does not occur for the two entities. This is true for c large as well because the graph becomes very dense. Also, from Theorem 3(c), we prove that the AFI is strictly greater than 1 for a certain range of parameter c suggesting that the friendship paradox is observed even in large Erdos-Renyi graphs.

Note that the average degree of the ER graph in this regime is (n−1)p which converges to c. Theorem 3 can also be thought to imply that in random ER graphs, as the average degree of the graph becomes small or gets large, the friendship paradox becomes weaker. For c large, the growth in every node’s degree and their neighbors’ degrees are such that the friendship paradox is not observed at any chosen node. This is rather intuitive, by recalling that FI is defined as the ratio between the degree of a node’s random neighbor and its own degree. If the degrees get large in an ER graph where the degree rvs are uncorrelated, then the sampling bias is expected to be small, which is exactly what we observe.

Case 2: p constant

We also address the question whether the friendship paradox is observed when the probability of connection p is kept constant as n goes to infinity. We return to the expression of the friendship index

$$ {FI}_{n}(i) = \left(\frac{1}{D_{n,i}^{2}} \sum_{j \in \mathcal{N}_{n,i}} D_{n,j} \right) \boldsymbol{1}[{D_{n,i} > 0}] + \boldsymbol{1}[{D_{n,i}= 0]} $$

As n gets large, the second term in (28) behaves as

$$ \boldsymbol{1}[{D_{n,i} = 0}] \longrightarrow_{n} 0 \ a.s. $$

We define the edge rvs as χi, j, where χi, j=1 if the edge exists between nodes i and j, 0 otherwise. We write the first expression in (28) as follows

$$\begin{array}{*{20}l} &\left(\frac{1}{D_{n,i}^{2}} \sum_{j \in \mathcal{N}_{n,i}} D_{n,j} \right) \boldsymbol{1}[{D_{n,i}> 0] } \\ &= \frac{1}{D_{n,i}^{2}} \sum_{j \in \mathcal{N}_{n,i}} \left[ 1 + \sum_{\ell \in V \setminus \{ i,j \}} \chi_{\ell,j} \right] \boldsymbol{1}[{ D_{n,i} >0}] \\ &= \frac{1}{D_{n,i}} \boldsymbol{1}[{ D_{n,i} > 0}] + \left[\frac{1}{D_{n,i}^{2}} \sum_{j \in \mathcal{N}_{n,i}} \sum_{\ell \in V \setminus \{ i,j \}} \chi_{\ell,j} \right]\boldsymbol{1}[{ D_{n,i} > 0}] \\ &= \frac{1}{D_{n,i}} \boldsymbol{1}[{ D_{n,i} > 0}]+ \left[ \frac{n}{D_{n,i}} \cdot \frac{1}{n D_{n,i}} \sum_{j \in \mathcal{N}_{n,i}} \Omega_{i,j} \right] \boldsymbol{1}[{ D_{n,i} > 0}], \end{array} $$

where we have defined,

$$\Omega_{i,j} = \sum_{\ell \in V \setminus \{ i,j \}} \chi_{\ell,j},$$

as the excess degree of node j with respect to the edge (i, j).

Lemma 3

$$ \frac{ \sum_{j \in \mathcal{N}_{n,i}} \Omega_{i,j}}{(n-2) D_{n,i}} \boldsymbol{1}\left[{ D_{n,i} > 0}\right]\longrightarrow_{n} p~a.s. $$


We use the Borel-Cantelli lemma to show the a.s. convergence result. For ease of exposition the detailed proof is shown in Appendix “Proof of lemma 3”. □

Furthermore, we have

$$ \frac{n}{D_{n,i}} \boldsymbol{1}[{ D_{n,i} > 0}] \longrightarrow_{n} \frac{1}{p}~a.s. $$


$$ \frac{1}{D_{n,i}} \boldsymbol{1}[{D_{n,i}>0}] \longrightarrow_{n} 0~a.s. $$

The convergence results (31)-(33) when applied to (30) yields

$$ \left(\frac{1}{D_{n,i}^{2}} \sum_{j \in \mathcal{N}_{n,i}} D_{n,j} \right) \boldsymbol{1}[{D_{n,i} > 0}] \longrightarrow_{n} 1~a.s. $$

which together with (29) leads to Theorem 4.

Theorem 4

For sequence of Erdos-Renyi graphs \(\{\mathbb {G}_{n}, n=1,2,\ldots \}\)

$$ {FI}_{n}(i) \longrightarrow_{n} 1~a.s. $$

under the regime p constant.

Also, using the dominated convergence theorem we can also prove convergence in expectation. Details provided in Appendix “Proof of theorem 5”.

Theorem 5

For sequence of Erdos-Renyi graphs \(\{\mathbb {G}_{n}, n=1,2,\ldots \}\)

$$ \mathbb{E}[{{FI}_{n}(i)}] \longrightarrow_{n} 1 $$

under the regime p constant. Therefore,

$$\mathbb{E}[{AFI}_{n}] \longrightarrow_{n} 1.$$

Theorems 4 & 5 together imply that the friendship paradox is not observed for large ER graphs when p is kept constant. This suggests that the degree imbalances in every node’s neighborhood vanish for large graphs under the mentioned regime. One can show that with p held constant, degree of individual nodes diverge in the large graph limit. The intuition for this result is similar to that for the np constant regime with c large, although the two regimes are very different mathematically.

Barabasi-Albert graphs

We analyze the behavior of the AFI in Barabasi-Albert graphs (1999), a well known model for understanding power-law behavior in networks. The sequence of BA graphs \(\{ \mathbb {G}_{0}, \mathbb {G}_{1}, \ldots \}\) are such that \(\mathbb {G}_{0}\) is the initial graph, and the graph grows from \(\mathbb {G}_{t}\) to \(\mathbb {G}_{t+1}\) by adding a node labelled t+1 and an edge preferentially between t+1 and a node Qt chosen from the graph \(\mathbb {G}_{t}\) preferentially on the basis of degree.

The joint degree distribution pT(k,) is defined as the joint probability of having an edge between two nodes with degrees k and in graph \(\mathbb {G}_{t}\). This can be written as

$$ p_{T}(k,\ell) = \frac{1}{T} \sum\limits_{t=1}^{T} \mathbf{P}[{D_{T,Q_{t}}=k, D_{T,t} = \ell}] $$

where DT, t and \(D_{T,Q_{t}}\) are the degrees of nodes t and Qt in graph \(\mathbb {G}_{t}\) respectively.

The limiting joint degree distribution exists and is defined as (Fotouhi and Rabbat 2013),

$$ {\lim}_{T \to \infty} p_{T}(k,\ell) = p(k,\ell) = \frac{4}{k(k+1)\ell(\ell+1)} \left[ 1-6\frac{\binom{k+l-2}{l-1}}{\binom{k+l+2}{l+1}} \right]. $$

Let AFI for the graph \(\mathbb {G}_{t}\) be defined as AFIT. We actually prove that AFIT diverges with respect to T.

Theorem 6

For sequence of Barabasi-Albert graphs \(\mathbb {G}_{T}, T=1,2,\ldots \), the expected values of AFIs, \(\mathbb {E}[{{AFI}_{T}}]\) diverges.


The following proof uses the fact that the sum of FIs for all the nodes is greater than the sum of the FIs of nodes with degree 1. Using this insight we have

$$\begin{array}{*{20}l} \mathbb{E}[{{AFI}_{T}}] &\geq \frac{1}{T} \sum\limits_{t=1}^{T} \mathbb{E}[{ D_{T,Q_{t}} \boldsymbol{1[}{D_{T,t}=1}]]} \\ &= \frac{1}{T} \sum_{t=1}^{T} \sum_{d=1}^{T} d \cdot \mathbf{P}[{D_{T,Q_{t}}=d,D_{T,t}=1}] \end{array} $$

We continue by exchanging the order of the finite double sum and only summing for degrees dM for some fixed positive integer M

$$\begin{array}{*{20}l} \mathbb{E}[{{AFI}_{T}}] &\geq \sum_{d=1}^{T} \frac{1}{T} \sum_{t=1}^{T} d \cdot \mathbf{P}[{D_{T,Q_{t}}=d,D_{T,t}=1}] \\ & \geq \sum_{d=1}^{M} d \frac{1}{T} \sum_{t=1}^{T} \mathbf{P}[{D_{T,Q_{t}}=d,D_{T,t}=1}] \\ &= \sum_{d=1}^{M} d p_{T} (d,1) \end{array} $$

Using (38) with k=d and =1, we obtain on letting T go to infinity

$$\begin{array}{*{20}l} {\lim}_{T \to \infty} \mathbb{E}\left[{{AFI}_{T}}\right] &\geq \sum_{d=1}^{M} d \cdot \frac{2}{d(d+1)} \left[ 1 - \frac{12}{(d+2)(d+3)} \right] \\ &= \sum_{d=1}^{M} \frac{2}{d+1} \left[ 1 - \frac{12}{(d+2)(d+3)} \right] \end{array} $$

Observe that the sum in (41) diverges as M is taken to infinity, and therefore the result follows. □

The above result suggests that the friendship paradox keeps growing as the size of the graph grows. This is potentially due to the increasing influence of hubs as the size of the network grows. This is in stark contrast to the behavior of assortativity, which is known to converge to 0 as the size of the BA graph grows (Newman 2002).

Experimental results

Network models

We study three network models - Erdos-Renyi (ER) graphs for modeling random connections; Barabasi-Albert (BA) graphs for modeling power-law behavior; and Watts-Strogatz (WS) graphs known for capturing small-world networks. We study how the global metrics like AFI, GFI, and assortativity behave as the parameters of the models are varied; and connect the experimental results with our theoretical findings on ER and BA graphs obtained in the previous section. We also study the correlation between the local FI metrics and local assortativity (TLA). Specifically, we compute the Pearson, Spearman, and Kendall-Tau correlation coefficients between | logFI| and TLA because the intuition developed in “Theoretical results on friendship index” section suggests that | logFI| and TLA must be negatively correlated – Any deviation from the scenario where the friendship paradox does not occur is best captured by | logFI|, and this measure is expected to increase as the local assortativity (TLA) reduces.

Erdos-Renyi graphs (1960). In Fig. 1, we show the AFI, GFI and assortativity plots for n=100,200,300,400,500 and varying 0≤p≤1. We observe that the phenomenon of the friendship paradox is minimal for both small and large values of p, with the global FI indices reaching a maximum somewhere in between. The intuition behind this finding is that as the probability of connection, p, is varied keeping n fixed, three regimes occur – (a) for small values of p, the graph mostly consists of isolated nodes and edges, which leads to a low value of AFI and GFI; (b) for large values of p, the graph is very close to a clique, which again exhibits low friendship paradox, and therefore low values of global FI metrics; (c) the global FI metrics reaches the maxima for intermediate values of p. From Theorems 3 & 4 we know that friendship paradox occurs in the regime np constant and does not occur for p constant. And from our experiments we infer that the friendship paradox is observed in the intermediate range of p, which shrinks and gets closer to 0 as the size of the graph grows. This corresponds to the np constant regime, where the friendship paradox is known to occur. Furthermore, we observe that for any fixed value of sufficiently large p such that the realized graphs are dense enough (p>0.04 in Fig. 1), increasing the size of the graph n, reduces the AFI. This validates the theoretical result Theorem 4, which states that the AFI should converge to 1 as n grows large keeping p fixed.

Fig. 1
figure 1

Results on AFI, GFI and assortativity of Erdos-Renyi graphs shown for varying values of 0<p<0.1 and n=100,200,300,400,500. Local FI distribution shown through a heatplot for varying 0<p<0.1 along the y-axis

On the other hand, assortativity is observed to be very close to 0 for almost all values of n and p. Thus even for the class of ER graphs, the global FIs exhibit behavior somewhat different from assortativity. We also observe from the heatplot that the local FI distribution peaks at 1 for all values of p, with the peak being very prominent for low values of p, less prominent as p increases, and again becomes prominent as p increases even further. This agrees with the plots on the global metrics of AFI and GFI. In Fig. 4, we show the correlation coefficient between the | logFI| values and the T-local assortativities of individual nodes. We observe that as p increases, the correlation coefficients start from high negative values close to −1, then reduce in magnitude, and then again approach −1 as p is further increased. This implies that for sparse and dense ER graphs (p<0.8), the | logFI| is strongly anticorrelated to TLA. This can be explained by our previous plots which show that the friendship paradox is strong only for a range of values of p, while it is weak for small and large values of p. Our interpretation is that for this particular range of p values where the friendship paradox exists, the anticorrelation between | logFI| and TLA reaches a minima. However, when p is further increased beyond 0.8, the correlation coefficients become unstable because both the | logFI| and the TLA values approach 0.

Barabasi-Albert graphs (Barabási and Albert 1999). In Fig. 2, we observe that the AFI and GFI increase with number of nodes n, keeping number of edges m fixed, which agrees with the theoretical result Theorem 6, and decrease with increasing m for fixed values of n. The intuition behind this is that with fixed m, higher values of n lead to greater occurrence of the friendship paradox, because the disparity in the degrees of the hubs and leaves or low degree nodes become greater. This is evident by looking at star graphs and how the global FIs increase with graph size. Increasing m and keeping n fixed leads to lower global FIs, because the low degree nodes now have larger degrees to begin with. The assortativity approaches 0 with increasing m for reasons similar to that of the FIs. However, assortativity approaches 0 with increasing n, because of the limiting behavior of BA graphs. The heatplot of the local FI distribution indicate a peak at 1, thereby agreeing with the plots of AFI and GFI.

Fig. 2
figure 2

Results on friendship indices and assortativity of Barabasi-Albert graphs for varying values of n and m. Results on AFI, GFI and assortativity of Barabasi-Albert graphs shown for varying values of 0<n<10000 and m=1,2,3,4,5. Local FI distribution shown through a heatplot for varying 0<n<10000 along the y-axis

We also show the correlation coefficient plot in Fig. 4 between | logFI| and TLA for BA graphs with n=1000 and varying m. We observe the two measures to be anticorrelated, with small and larger values of m leading to stronger anticorrelation.

Watts-Strogatz graphs (1998). In Fig. 3 we show the behavior of AFI and assortativity as the parameters of the network models such as the size of the graph n, probability of rewiring p or average degree k are changed. We observe that with graph size fixed at n=200 nodes, increasing p increases the AFI; and, increasing k for fixed p reduces the AFI. The reason for this behavior is that with increasing p, the graphs become more heterogeneous and hence the friendship paradox strengthens. Also, fixing p and increasing k reduces the AFI because larger average degree weakens the effect of the friendship paradox. We also observe that increasing n keeping p and k fixed increases the AFI which saturates beyond a certain point.

Fig. 3
figure 3

Results on friendship indices and assortativity of Watts-Strogatz graphs for varying values of mean degree k and rewiring probability p. Result on AFI of Watts-Strogatz graphs for fixed n=200 varying values of 0<p<0.4 and k=2,4,6,8,10; and for fixed p=0.3,0.6k=21 varying graph size 22<n<200. Result on assortativity of Watts-Strogatz graphs for two different fixed graph size n=100,400 varying 0<p<0.1 and k=2,4,6,8,10

From Fig. 3, we also observe that increasing p reduces the assortativity; While for a fixed p, increasing k gets the assortativity to be closer to 0. The reason for this observation is similar to that stated in the previous paragraph, in that increasing p makes the graph more heterogeneous and leads to the formation of hubs, while fixing p and increasing k reduces the influence of hubs. We also observe that for a larger value of n, the graphs become dissassortative for smaller values of p, the reason being that hubs form for smaller values of p in larger graphs. Also Fig. 4 show strong anti-correlation between | logFI| and TLA for the different parametric settings being considered.

Fig. 4
figure 4

Correlation coefficients between | logFI| and T-local assortativity for Erdos-Renyi graphs with varying probability of connection 0<p<1; Barabasi-Albert graphs with varying edge density 2≤m≤50; and Watts-Strogatz with varying rewiring probability 0≤p≤1 and average degree k=2,4,6,8,10

While we study the behavior of local and global metrics on network models, we also observe that the FI is related to assortativity in that | logFI| is anti-correlated to TLA for most regimes in the three network models being studied. This strengthens the argument that the phenomenon of friendship paradox and assortative mixing are related.

Real world networks

Description of networks. We consider Social(S) networks like the Hamsterster, Brightkite, Douban, Gowalla and Hyves datasets; Human Social(HS) networks of Jazz musicians and the Zachary karate club; Human contact(HC) network at the ACM Hypertext conference held in Turin and the INFECTIOUS:STAY AWAY exhibit at Dublin, both in 2009; Computer(C) networks formed by Route views and the Internet topology; and, Infrastructure(I) networks like the Power grid and the Euroroad datasets. All of the network datasets were taken from the Koblenz network collection (Kunegis 2013).

Study on the local metrics. In Figs. 5 and 6, we show the scatter plot between the local FI values and the T-local assortativities for the Hamsterster and the EuroRoad networks respectively. We observe that when T-local assortativity is positive the logFI is close to 0, while for negative local assortativity the logFI could be either more negative or positive. This gives rise to a bell-shaped curve between local assortativity and logFI, and negative correlation between local assortativity and | logFI| as observed in the scatter plots. Therefore, if the local assortativity is positive then the friendship paradox would be low, leading to logFI being close to 0. However, if the local assortativity is negative, then there would be a significant occurrence of the friendship paradox, thereby leading to either more negative or positive values of logFI, and hence higher | logFI|. This justifies the significant negative correlation between TLA and | logFI| as observed in Table 1.

Fig. 5
figure 5

Scatter plot between T-local assortativity and local FI for the Hamsterster network

Fig. 6
figure 6

Scatter plot between T-local assortativity and local FI for the Euro Road network

Table 1 Assortativity and FIs on real networks and the correlations between their local measures

Study on the global metrics. We consider 13 real networks of different types. We report basic network parameters such as number of nodes and edges, global metrics such as assortativity r and the proposed metrics AFI and GFI. We also report the Pearson correlation coefficient ρP and Spearman’s rank correlation coefficeint ρs between the local FI measure in absolute value, | logFI|, and local assortativity rT.

We observe that while most of the social networks have negative assortativity, all of them exhibit the friendship paradox very strongly. On the other hand, the human social and contact networks exhibit the friendship paradox very weakly partly due to the small size of the network. Computer networks like the Route views and Internet topology show negative assortativity and a strong friendship paradox while infrastructure networks show positive assortativity and a weak friendship paradox. Experimental results from Momeni and Rabbat (2018) also show that networks with high assortativity (Collaboration in Figure 2 of the paper) have many nodes that do not exhibit friendship paradox while most nodes exhibit the paradox for networks with negative asssortativity (Friendster). Lee et al. (2019) also observed that the median-based and the fraction-based perception models correlate negatively with assortativity, which does not quite hold for the mean-based model. Note that the aggregate measures used by Lee et al. (2019) are based on binary measures corresponding to each node, which is substantially different from our setting. As argued previously, we also observe that local assortativity is negatively correlated to | logFI| very strongly for most considered networks.

Concluding remarks

We first propose a metric called the friendship index that captures the phenomenon of the friendship paradox, locally, from the perspective of an individual node. We then aggregate these metrics to globally capture the network-level friendship paradox. The arithmetic mean of the FI values, AFI, is found to have operational significance in sampling, and is also found to be mathematically suited for analysis of network models. On the other hand, the geometric mean, GFI, is found to better adjust the range of FI values about the FI value for which the friendship paradox does not occur. We lower bound the global metrics, AFI and GFI, and show that the lower bound is achieved only for the class of regular graphs where the friendship paradox does not occur, thereby showing that the proposed aggregate measures are well-behaved. We theoretically demonstrate that large random (ER) graphs with average degree kept constant exhibit the friendship paradox. However, if the connection probability is kept constant, then large ER graphs do not exhibit the paradox anymore. The results indicate that the sampling bias disappears for large random graphs as the average degree of nodes gets large or very small. We also theoretically show that the paradox exists for the BA graph and gets stronger as the size of the graph grows. This behavior is very different from global assortativity which is known to converge to 0 for both the ER and BA graphs. This is because the friendship index measures the sampling bias by choosing a random neighbor instead of a random node, and the assortativity only compares the degrees of the two endpoints of a random edge. Nevertheless, the two phenomena are not unrelated.

In fact, the local FI measure can shed light on the local assortativity of the network. Experimental results on network models and real world networks suggest that the local FI measures and local assortativity (TLA) are closely related. High values of TLA would mean that the node is connected to similar nodes and the | logFI| measure would be close to 0, while small values of TLA would mean that the node is connected to dissimilar nodes and | logFI| would be greater. More broadly, the FI measure captures the imbalance between a node and its neighbors’ degrees along with the direction of imbalance, while assortativity only serves as a measure for indicating imbalance between two nodes connected by a random edge. In conclusion, although the friendship paradox and assortativity measures have very different functional forms due to their widely different motivations, they are nonetheless related to each other. Future work could focus on finding theoretical relationships between the two concepts in general graphs, or in certain classes of graphs where the analysis is tractable.


Proof of lemma 3

To prove Lemma 3, we use the Borel-Cantelli lemma (Billingsley 2008) which we state below.

Lemma 4

Let E1,E2,… be a sequece of events in some probability space. If the sum of the probabilities of {En} is finite,

$$\sum_{n=1}^{\infty} \mathbf{P}[{E_{n}}] < \infty$$

then the probability that infinitely many of them occur is 0, i.e.,

$$\mathbf{P}\left[{\underset{n \rightarrow \infty}{\text{lim sup}}~E_{n}}\right] = \mathbf{P}\left[{\cap_{n=1}^{\infty} \cup_{k \geq n}^{\infty} E_{k}}\right]=0$$

For a fixed ε>0 and n=1,2,…, we define the following event,

$$E_{n, \epsilon} = \left\{ \left| \frac{\sum_{j \in \mathcal{N}_{n,i}} \Omega_{i,j}}{(n-2) D_{n,i}}\boldsymbol{1}[{D_{n,i} > 0}] - p \right| > \epsilon \right\}. $$

We will use bounding argument on these events and apply the Borel-Cantelli lemma to prove a.s. convergence.

Using Markov’s inequality, we have the upper bound

$$\begin{array}{*{20}l} \mathbf{P}[{E_{n, \varepsilon}}] &= \mathbf{P}\left[{\left| \frac{\sum_{j \in \mathcal{N}_{n,i}} \Omega_{i,j}}{(n-2) D_{n,i}}\boldsymbol{1}[{D_{n,i} > 0}] - p \right| > \varepsilon}\right] \\ &\leq \frac{1}{\varepsilon^{4}} \mathbb{E}\left[{\left(\frac{\sum_{j \in \mathcal{N}_{n,i}} \Omega_{i,j}}{(n-2) D_{n,i}}\boldsymbol{1}[{D_{n,i} > 0}] - p \right)^{4}}\right] \\ & = \frac{1}{\varepsilon^{4}} \mathbb{E}\left[{\left(\frac{\sum_{j \in \mathcal{N}_{n,i}} \Omega_{i,j}}{(n-2) D_{n,i}} - p \right)^{4} \boldsymbol{1}[{D_{n,i} > 0}}]\right] + \frac{p^{4}}{\varepsilon^{4}} \mathbf{P}[{D_{n,i}=0}] \\ & = \frac{1}{\varepsilon^{4}} \mathbb{E}\left[{\frac{1}{n^{4} D_{n,i}^{4}} \left(\sum_{j \in \mathcal{N}_{n,i}} \Omega_{i,j} -(n-2) D_{n,i} p \right)^{4} \boldsymbol{1}[{D_{n,i} > 0}}]\right] + \frac{p^{4}}{\varepsilon^{4}} (1-p)^{n-1} \end{array} $$

The second term in (42) goes to 0 exponentially fast. The first term can be written as

$$\begin{array}{*{20}l} &\frac{1}{\varepsilon^{4}} \mathbb{E}\left[{ \frac{1}{n^{4} D_{n,i}^{4}} \left(\sum_{j \in \mathcal{N}_{n,i}} \Omega_{i,j} -(n-2) D_{n,i} p \right)^{4} \boldsymbol{1}[{D_{n,i} > 0}}]\right] \\ &= \frac{1}{\varepsilon^{4}} \mathbb{E}\left[{ \frac{1}{n^{4} D_{n,i}^{4}} \left(\sum_{j \in \mathcal{N}_{n,i}} \left[ \Omega_{i,j} -(n-2) p \right] \right)^{4} \boldsymbol{1}[{D_{n,i} > 0}}]\right] \\ &= \frac{1}{\varepsilon^{4}} \mathbb{E}\left[{\frac{1}{n^{4} D_{n,i}^{4}} \mathbb{E}\left[{ \left(\sum_{j \in N_{i}} \left[ \Omega_{i,j} -(n-2) p \right] \right)^{4} \bigg| \mathcal{N}_{n,i} = N_{i}}\right] \boldsymbol{1}[{D_{n,i} > 0]}}\right]. \end{array} $$

We work with the term in the inner expectation,

$$\begin{array}{*{20}l} &\mathbb{E}\left[{ \left(\sum_{j \in N_{i}} \left[ \Omega_{i,j} -(n-2) p \right] \right)^{4} \bigg| \mathcal{N}_{n,i} = N_{i}}\right] \\ &= \mathbb{E} \Bigg[ \sum_{q \in N_{i}}\sum_{r \in N_{i}}\sum_{s \in N_{i}}\sum_{t \in N_{i}} \left(\Omega_{i,q} -(n-2) p \right)\left(\Omega_{i,r} -(n-2) p \right) \\ & \hspace{3cm} \times \left(\Omega_{i,s} -(n-2) p \right)\left(\Omega_{i,t} -(n-2) p \right) \bigg| \mathcal{N}_{n,i} = N_{i} \Bigg] \\ &= \sum_{q \in \mathcal{N}_{n,i}}\sum_{r \in \mathcal{N}_{n,i}}\sum_{s \in \mathcal{N}_{n,i}}\sum_{t \in \mathcal{N}_{n,i}} \mathbb{E} \Bigg[ \left(\Omega_{i,q} -(n-2) p \right)\left(\Omega_{i,r} -(n-2) p \right) \\ & \hspace{3cm} \times \left(\Omega_{i,s} -(n-2) p \right)\left(\Omega_{i,t} -(n-2) p \right) \bigg| \mathcal{N}_{n,i} = N_{i} \Bigg] \\ &= \sum_{q \in \mathcal{N}_{n,i}}\sum_{r \in \mathcal{N}_{n,i}}\sum_{s \in \mathcal{N}_{n,i}}\sum_{t \in \mathcal{N}_{n,i}} \mathbb{E} \Bigg[ \left(\Omega_{i,q} -(n-2) p \right)\left(\Omega_{i,r} -(n-2) p \right) \\ & \hspace{3cm} \times \left(\Omega_{i,s} -(n-2) p \right)\left(\Omega_{i,t} -(n-2) p \right) \Bigg] \end{array} $$

For a given choice of q, r, s, t, we define

$$\Omega^{\prime}_{i,z} = \sum_{\ell \neq i,q,r,s,t} \chi_{z,\ell}, \ \text{for}~ z =q,r,s,t $$

The first term in the product can be written as

$$\left(\left(\Omega^{\prime}_{i,q} - (n-5)p \right) + \left(\chi_{q,r} - p \right) + \left(\chi_{q,s} - p \right) + \left(\chi_{q,t} - p \right) \right).$$

Observe that, the terms (χq, zp), z=r, s, t is mutually independent with respect to terms \( \left (\Omega ^{\prime }_{i,z} - (n-5)p \right), \ z=r,s,t\). Therefore, such terms will not contribute. The product between terms of the form (χq,zp), z=q, r, s, t will be finite and bounded by a constant say c0. Therefore, we can only consider the product of the terms \( \left (\Omega ^{\prime }_{i,z} - (n-5)p \right), \ z=q,r,s,t\). There are several types of products that have to be considered separately – (a) For qrst, the contribution is 0 because all the terms are mutually independent; (b) For q=r, rst, the terms \( \left (\Omega ^{\prime }_{i,z} - (n-5)p \right) \) for z=s, t are mutually independent with respect to z=q. Hence, such terms also do not have any contribution; (c) For q=rs=t, the two terms (Ωi, z′−(n−5)p)2, z=q, s will have a contribution, which will be discussed later; (d) For q=r=st, term \( \left (\Omega ^{\prime }_{i,t} - (n-5)p \right)^{2}\) is independent of the other three terms. The expectation of this term is 0, therefore leading to no contribution. (e) For q=r=s=t, the contribution of such terms need to be considered.

Using the above insights and continuing from (44), we have

$$\begin{array}{*{20}l} &\mathbb{E}\left[{ \left(\sum_{j \in N_{i}} \left[ \Omega_{i,j} -(n-2) p \right] \right)^{4} \bigg| \mathcal{N}_{n,i} = N_{i}}\right] = \sum_{q \in \mathcal{N}_{n,i}} \mathbb{E}\left[{\left(\Omega^{\prime}_{i,q} -(n-5) p \right)^{4}}\right] \\ & \hspace{2cm} + \sum_{q \in \mathcal{N}_{n,i}} \sum_{s \in \mathcal{N}_{n,i}} \mathbb{E}\left[{ \left(\Omega^{\prime}_{i,q} -(n-5) p \right)^{2}}\right] \mathbb{E}\left[{ \left(\Omega^{\prime}_{i,s} -(n-5) p \right)^{2}}\right] \end{array} $$

We write

$$\begin{array}{*{20}l} \mathbb{E}\left[{\left(\Omega^{\prime}_{i,q} -(n-5) p \right)^{2}}\right] &\approx \mathbb{E}\left[{ \left(\sum_{\ell \neq i,q,s} \chi_{q,\ell} - p \right)^{2}}\right] \\ &= \sum_{\ell \neq i,q,s} \mathbb{E}\left[{\left(\chi_{1,2} - p \right)^{2}}\right] = n c_{0}, \end{array} $$


$$\begin{array}{*{20}l} &\mathbb{E}\left[{\left(\Omega^{\prime}_{i,q} -(n-5) p \right)^{4}}\right] \approx \mathbb{E}\left[{\left(\sum_{\ell \neq i,q,s} \chi_{q,\ell} - p \right)^{4}}\right] \\ &= \sum_{\ell \neq q,s} \sum_{\ell' \neq q,s,\ell} \mathbb{E}\left[{\left(\chi_{q,\ell} - p \right)^{2}}\right] \mathbb{E}\left[{\left(\chi_{q,\ell'} - p \right)^{2}}\right]+\sum_{\ell \neq q,s} \mathbb{E}\left[{\left(\chi_{q,\ell} - p \right)^{4}}\right] \\ &\approx n^{2} c_{0}^{2} + n c_{1} \end{array} $$

where \(c_{0} = \mathbb {E}\left [{\left (\chi _{1,2} - p \right)^{2}}\right ]\), and \( c_{1} = \mathbb {E}\left [{\left (\chi _{1,2} - p \right)^{4}}\right ] \). Substituting (46) and (47) into (45), we obtain

$$\begin{array}{*{20}l} &\mathbb{E}\left[{\left(\sum_{j \in N_{i}} \left[ \Omega_{i,j} -(n-2) p \right] \right)^{4} \bigg| \mathcal{N}_{n,i} = N_{i}}\right] = D_{n,i}^{2} n^{2} c_{0}^{2} + D_{n,i} \left(n^{2} c_{0}^{2} +n c_{1} \right) \end{array} $$

Substituting (48) to (42), we obtain

$$\begin{array}{*{20}l} \mathbf{P}[{E_{n,\varepsilon}}] &= \frac{1}{\epsilon^{4}} \mathbb{E}\left[{\frac{1}{n^{4} D_{n,i}^{4}} \cdot \left(D_{n,i}^{2} n^{2} c_{0}^{2} + D_{n,i} \left(n^{2} c_{0}^{2} +n c_{1} \right) \right) \boldsymbol{1}\left[{D_{n,i}\!>\!0}\right]}\right] + \frac{p^{4}}{\epsilon^{4}} (1-p)^{n-1} \\ &\leq \frac{2c_{0}^{2}}{ \epsilon^{4} n^{2}} + \frac{p^{4}}{\epsilon^{4}} (1-p)^{n-1} \end{array} $$

Observe that,

$$ \sum_{n=1}^{\infty} \mathbf{P}[{E_{n,\varepsilon}}] < \infty $$

holds and therefore the result follows using Borel-Cantelli Lemma.

Proof of theorem 5

We will use bounded convergence theorem (Billingsley 2008) to prove convergence in expectation of the FIs.

Recall that

$$ {FI}_{n}(i) = \left(\frac{1}{D_{n,i}^{2}} \sum_{j \in \mathcal{N}_{n,i}} D_{n,j} \right) \boldsymbol{1}[{D_{n,i} > 0}]+ \boldsymbol{1}[{D_{n,i} = 0}] $$

Since 1[Dn, i=0]≤1, we only consider the first term in (51). In order to apply Chernoff-Hoeffding bound we consider the event {Dn, i<n(pδ)}, and fix δ>0. We have the following decomposition

$$\begin{array}{*{20}l} &\mathbb{E}\left[{\left(\frac{1}{D_{n,i}^{2}} \sum_{j \in \mathcal{N}_{n,i}} D_{n,j} \right) \boldsymbol{1}[{D_{n,i} > 0}}]\right] \\ &= \mathbb{E}\left[{\left(\frac{1}{D_{n,i}^{2}} \sum_{j \in \mathcal{N}_{n,i}} D_{n,j} \right) \boldsymbol{1}[{D_{n,i} > 0}]\boldsymbol{1}[{D_{n,i} >n(p-\delta)}}]\right] \\ & \hspace{2mm} + \mathbb{E}\left[{\left(\frac{1}{D_{n,i}^{2}} \sum_{j \in \mathcal{N}_{n,i}} D_{n,j} \right) \boldsymbol{1}\left[{D_{n,i} > 0}\right] \boldsymbol{1}[{D_{n,i} < n(p-\delta)}}]\right] \end{array} $$

The first term in (52) is upper bounded as

$$\begin{array}{*{20}l} \mathbb{E}\left[{\left(\frac{1}{D_{n,i}^{2}} \sum_{j \in \mathcal{N}_{n,i}} D_{n,j} \right) \boldsymbol{1}[{D_{n,i} > 0}] \boldsymbol{1}[{D_{n,i} >n(p-\delta)}]}\right]\leq \frac{1}{n^{2} (p-\delta)^{2}} \cdot n^{2} = \frac{1}{(p-\delta)^{2}}, \end{array} $$

where the upper bound follows by noting that the sum of the degree of neighbors of i is bounded by n2. Furthermore, we also have the upper bound

$$\begin{array}{*{20}l} \mathbb{E}\left[{\left(\frac{1}{D_{n,i}^{2}} \sum_{j \in \mathcal{N}_{n,i}} D_{n,j} \right) \boldsymbol{1}\left[{D_{n,i} > 0}\right] \boldsymbol{1}\left[{D_{n,i} < n(p-\delta)}\right]}\right] &\leq n^{2} \mathbf{P}[{D_{n,i} < n (p-\delta)}] \\ & \leq n^{2} e^{-n D(p || p-\delta)}, \end{array} $$

where D(p||pδ) is the KL divergence between Bernoulli rvs with parameters p and pδ and the final step is obtained by Chernoff-Hoeffding bound. Since n2enD(p||pδ)→0 as n, there exists a constant cδ such that

$$ n^{2} e^{-n D(p || p-\delta)} \leq c_{\delta}, n=1,2,\ldots $$

Therefore using the upper bounds (53) and (55), we obtain that

$$ \mathbb{E}\left[{{FI}_{n}(i)} \right]\leq 1 + c_{\delta} + \frac{1}{(p-\delta)^{2}}, \ n=1,2,\ldots $$

Since all the rvs FIn(i) are bounded, the bounded convergence theorem yields the result.