Skip to main content
Log in

Generalizing unweighted network measures to capture the focus in interactions

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Unweighted network measures are commonly used to analyze real-world networks due to their simplicity and intuitiveness. This motivated the search for generalizations of unweighted network measures that take weights into account. We propose a new generalization methodology that captures how focused are the interactions over edges. The less focused the interaction (more uniform over edges) the closer is our generalization to the original unweighted measure. None of the previously developed generalizations capture this aspect of weighted networks. We analyze several real-world networks using our generalizations of the degree and the clustering coefficient. The analysis shows that our generalizations reveal interesting observations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. A node’s degree is the number of edges incident to the node, while a node’s strength is the summation of weights incident to the node. Section 2 provides the formal definitions.

  2. We use the generalized α-degree as a representative of the state-of-the-art generalizations (Opsahl et al. 2010). The α values of 0.5 and 1.5 were proposed by the original paper.

  3. Other examples include heterophilicity and dyadicity. We describe these measures in further detail later.

  4. The entropy is a measure often used in information theory to quantify the uncertainty of a set of outcomes/events (Shannon 1948). The higher the uncertainty is (which is equivalent to more uniform weights), the higher the value of the entropy.

  5. Note that the quantity \(x\log_2{\frac{1}{x}}\rightarrow 0 \hbox{ as }x \rightarrow 0\hbox{ or }x=1.\)

  6. Because more than one edge can have the same weight.

  7. There are other network measures that also quantified the strength of connections within a class (community) of nodes, such as the modularity measure (Newman and Girvan 2004).

  8. Note that, particularly for directed graphs, some researchers argued that a clustering signature would be more suitable in distinguishing networks (Ahnert and Fink 2008). In a clustering signature, seven types of directed triangles are counted separately. The effective cardinality can still be used to replace the discrete counts of these triangles. For the purpose of this paper we focus on the simpler, more widely used definition of the clustering coefficient.

  9. Available through http://www-personal.umich.edu/mejn/netdata/.

  10. Source code adopted from http://www.santafe.edu/ãaronc/powerlaws/.

  11. The datasets are publicly available at http://netkit-srl.sourceforge.net/data.html. In a university network, a node represents a web page, which has a label indicating its type (personal web page, department, etc.). A link from one node to another (directed) means there is at least one URL link from the first node to the other. The weight on the link represents the number of such URLs. In an industry dataset, a node represents a company, which has a label indicating its type (transportation, technology, etc.). A link between two nodes exists if the two companies appear in the same news article. The weight represents how many articles the two companies appeared in.

  12. The percentage of weights that equal one captures the variation in weights more accurately than the standard deviation, which is sensitive to outliers.

  13. It is worth noting that the Newcomb Fraternity dataset (available through UCINET (Borgatti et al. 2002)) is very similar to the example depicted in Fig. 1. The dataset provided snapshots of a dynamic social network over time, but the network is not weighted (only ranking of neighbors were provided).

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sherief Abdallah.

Additional information

An earlier version of this paper was presented in the International Workshop on Social Network Analysis (SNAKDD) 2009.

Appendix: Proofs of effective cardinality properties

Appendix: Proofs of effective cardinality properties

Theorem 1

The effective cardinality satisfies the three properties described above: the maximum cardinality, the minimum cardinality, and the consistent partial ordering.

Proof

The proof follows from the following three lemmas. □

Lemma 1

The effective cardinality satisfies the maximum cardinality property.

Proof

When all the weights are equal to a constant C we have

$$ \forall e \in E^{\prime}: {\frac{w(e)}{\sum_{o\in E^{\prime}}w(o)}}={\frac{C}{C|E^{\prime}|}}={\frac{1}{|E^{\prime}|}} $$

We then have

$$ \begin{aligned} c(E^{\prime}) &=2^{\sum_{e\in E'}{\frac{1}{|E'|}}\log_2(|E^{\prime}|)}\\ &=2^{\log_2(|E^{\prime}|)} \\ &=|E^{\prime}| \end{aligned} $$

In other words, both the cardinality and the effective cardinality of a weighted set of edges become equivalent when the weights are uniform. The effective cardinality is also maximum in this case, because the exponent is the entropy of the weight probability distribution, which is maximum when weights are uniform over edges. □

Lemma 2

The effective cardinality satisfies the minimum cardinality property.

Proof

When the set of edges is empty, then the effective cardinality is zero by definition. When all weights are zero except only one weight that is greater than zero, then weight probability distribution is deterministic and the entropy is zero; therefore, the effective cardinality will be 1. □

Lemma 3

The effective cardinality satisfies the consistent partial order property.

Proof

Let \(E^{\prime}_1\) and \(E^{\prime}_2\) be two (edge) sets such that \(|E^{\prime}_1|=|E^{\prime}_2|\) (both have the same cardinality). Let W 1 and W 2 be the corresponding sets of weights, where \(\sum_{e1 \in E^{\prime}_1} w(e1)=\sum_{e2 \in E^{\prime}_2} w(e2) = S\) (the total weights are equal). Furthermore, let \(|W_1 \bigcap W_2|=n-2, \){\(w_{11},w_{12}\)}\( = W_1 - W_2,\){\(w_{21},w_{22}\)}\( = W_2 - W_1\), where the ‘−’ operator is the “set difference" operator (the two sets share the same weights except for two elements in each set), and \(|w_{11}-w_{12}| < |w_{21}-w_{22}|\) (the weights of W 1 are more uniform than the weights of W 2). To prove that the effective cardinality satisfies the consistent partial ordering property, we need to prove that \(c(E^{\prime}_1)>c(E^{\prime}_2)\). □

Without loss of generality, we can assume that \(w_{11} \ge w_{12}\) and \(w_{21} \ge w_{22};\) therefore, \(w_{11}-w_{12} < w_{21}-w_{22}\). We then have

$$ w_{11}+w_{12}=S - \sum_{w \in W_1 \bigcap W_2}w = w_{21}+w_{22} $$

or

$$ {\frac{w_{11}+w_{12}}{S}}=1 - \sum_{w \in W_1 \bigcap W_2}{\frac{w}{S}} = {\frac{w_{21}+w_{22}}{S}} = L $$

therefore

$$ L \ge {\frac{w_{21}}{S}} > {\frac{w_{11}}{S}} \ge {\frac{L}{2}} \ge L-{\frac{w_{11}}{S}} > L-{\frac{w_{21}}{S}} $$

where \(\frac{w_{12}}{S} = L-\frac{w_{11}}{S}\) and \(\frac{w_{22}}{S} = L-\frac{w_{21}}{S}\). Then from Lemma 6 we have \(h(L,\frac{w_{11}}{S}) > h(L,\frac{w_{21}}{S})\), or

$$ -\frac{w_{11}}{S}lg\left(\frac{w_{11}}{S}\right) - \left(L-\frac{w_{11}}{S}\right)lg\left(c-\frac{w_{11}}{S}\right) > -\frac{w_{21}}{S}lg\left(\frac{w_{21}}{S}\right) - \left(L-\frac{w_{21}} {S}\right)lg\left(c-\frac{w_{21}}{S}\right) $$

Therefore \(H(E^{\prime}_1) > H(E^{\prime}_2)\), because the rest of the entropy terms (corresponding to \(W_1 \bigcap W_2\)) are equal, and consequently \(c(E^{\prime}_1)>c(E^{\prime}_2)\).

Lemma 4

The quantityh(Cx) =  − x lg(x) − (C − x)  lg(C − x) is symmetric around and maximized at\(x={\frac{C}{2}}\)for\(C \ge x \ge 0\).

Proof

$$ h\left(C,{\frac{C}{2}}+\delta\right)=-\left({\frac{C}{2}}+\delta\right)\lg\left({\frac{C} {2}}+\delta\right) - \left({\frac{C}{2}}-\delta\right) \lg \left({\frac{C}{2}}-\delta\right) = h\left(C,{\frac{C}{2}}-\delta\right) $$

Therefore h(Cx) is symmetric around c/2. Furthermore, h(Cx) is maximized when

$$ {\frac{\partial h(C,x)}{\partial x}} = 0 = -1 -\lg x + 1 + \lg(C-x) $$

or

$$ \lg x = \lg(C-x) $$

Therefore h(Cx) is maximized at \(x=C-x = {\frac{C}{2}}\). □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abdallah, S. Generalizing unweighted network measures to capture the focus in interactions. Soc. Netw. Anal. Min. 1, 255–269 (2011). https://doi.org/10.1007/s13278-011-0018-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13278-011-0018-8

Keywords

Navigation