Abstract
We propose weighted line graphs for overlapping community discovery where a node in a network can be assigned to more than one community. For undirected connected networks without selfloops, we propose weighted line graphs by: (1) defining weights of a line graph based on the weights in the original network, and (2) removing selfloops in weighted line graphs, while sustaining their properties. By applying some offtheshelf node partitioning method to the transformed graph, community labels of adjacent links are assigned to each node in the original network. Experiments are conducted over both synthetic and realworld networks, and the results indicate that the proposed approach can improve the quality of discovered overlapping communities.
Notes
We also refer to a network as a graph, a node as a vertex, and a link as an edge.
A kcliquecommunity is defined as a union of all kcliques that can be reached from each other through a series of adjacent kcliques (which share k1 nodes).
The ith diagonal element in D is set to the ith element of \({\bf k}\).
http://wwwpersonal.umich.edu/~mejn/netdata/ (celegans was converted into undirected network in the experiment).
The initial degree in BA model was set to 20.
The initial degree in BA model was set to 2 (1/10 of 20 in Step 1.).
References
Ahn YY, Bagrow JP, Lehmann S (2010) Link communities reveal multiscale complexity in networks. Nat Biotechnol 466:761–764
Barabási AL, Albert R (1999) Emergence of scaling in random networks. Sci Agric 286:509–512
Bhattacharyya P, Garg A, Wu SF (2011) Analysis of user keyword similarity in online social networks. Social Netw Anal Mining 1(3):143–158
Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(6):066,111
Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduction to algorithms. MIT Press, Cambridge
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc 39(2):1–38
Diestel R (2006) Graph theory. Springer, Berlin
Evans T, Lambiotte R (2009) Line graphs, link partitions, and overlapping communities. Phys Rev E 80(1), 016,105:1–8
Evans T, Lambiotte R (2010) Line graphs of weighted networks for overlapping communities. Eur Phys J B 77:265–272
Gregory S (2009) Finding overlapping communities using disjoint community detection algorithms. In: Complex networks. Springer, Berlin, pp 47–61
Gregory S (2011) Fuzzy overlapping communities in networks. J Stat Mech Theor Exp P02017
Hanneman RA, Shelton CR (2011) Applying modality and equivalence concepts to pattern finding in social processproduced data. Social Netw Anal Mining 1(1):59–72
Harville DA (2008) Matrix algebra from a Statistican’s perspective. Springer, Berlin
Kempe D, Kleinberg J, Tardos E (2003) Maximizing the spread of influence through a social network. In: Proceedings of KDD’03, pp 137–146
Mika P (2007) Social networks and the semantic web. Springer, Berlin
Müller M (2007) Information retrieval for music and motion. Springer, Berlin
Nepusz T, Petróczi A, Négyessy L, Bazsó F (2008) Fuzzy communities and the concept of bridgeness in complex networks. Phys Rev E 77:016,107
Newman M (2006) Finding community structure using the eigenvectors of matrices. Phys Rev E 76(3):036,104
Newman M (2010) Networks: an introduction. Oxford University Press, Oxford
Palla G, Derényi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nat Biotechnol 435:814–818
Pons P, Latapy M (2006) Computing communities in large networks using random walks. J Graph Algorithms 10(2):191–218
Raghavan U, Albert R, Kumara S (2007) Near linear time algorithm to detect community structures in largescale networks. Phys Rev E 76:036,106
Scott J (2011) Social network analysis: developments, advances, and prospects. Social Netw Anal Mining 1(1):21–26
Shen HW, Chenga XQ, Guo JF (2011) Quantifying and identifying the overlapping community structure in networks. J Stat Mech Theor Exp P07042
von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
Watts DJ (2003) Small worlds: the dynamics of networks between order and randomness. Princeton University Press, Princeton
Watts DJ (2004) Six degrees: the science of a connected age. W W Norton & Co Inc, New York
Whitney H (1932) Congruent graphs and the connectivity of graphs. Am J Math 54:150–168
Yoshida T (2012) Overlapping community discovery via weighted line graphs of of networks. In: Proceedings of PRICAI’12 (LNAI 7458), pp 895–898
Yoshida T (2013) Toward finding hidden communities based on user profile. J Intell Inf Syst (in press)
Zhang S, Wang FS, Zhang XS (2007) Identification of overalpping community structure in complex networks using fuzzy cmeans clustering. Phys A 388(8):483–490
Acknowledgments
We express sincere gratitude to the reviewers for their careful reading of the manuscript and for providing valuable suggestions to improve the paper. This work is partially supported by the grantinaid for scientific research (No. 24300049) funded by MEXT, Japan, and the Murata Science Foundation.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Proof of Theorem 2
Theorem 2 can be formalized in terms of the adjacency matrices of transformed networks based on the following properties:
Proof
From Eq. (9) in Proposition 1, 1 ^{T}_{ m } B ^{T} = k ^{T} holds. Furthermore, from Eq. (10), 21 ^{T}_{ m } = 1 ^{T}_{ n } B holds. Similarly, by utilizing the righthand side of Eqs. (9) and (10) in Proposition 1, we can prove Eq. (25).
On the other hand, based on Proposition 1, we can prove Eq. (26) and Eq. (27) as follows:
Similarly, by utilizing the righthand side in Proposition 1, we can prove the righthandside of Eqs. (26) and (27).
From the left hand side of Eqs. (26) and (27), we can see that 1 ^{T}_{ n } A 1 _{ n } = 1 ^{T}_{ m } E 1 _{ m } = 1 ^{T}_{ m } E _{1} 1 _{ m } = 2m. Similarly, from the right hand side of Eqs. (26) and (27), \({\bf 1}_{n}^{\rm T}\tilde{\bf A} {\bf 1}_{n}\) = \({\bf 1}_{m}^{\rm T}\tilde{\bf E} {\bf 1}_{m}\) = \({\bf 1}_{m}^{\rm T}\tilde{\bf E}_{1} {\bf 1}_{m}\) = \(\sum_{i,j} \tilde{\bf A}_{ij}\). Thus, by defining the adjacency matrix of transformed network with Eq. (12) or Eq. (13), the sum of the weights in the original network is preserved in the transformed network. □
Appendix 2: Proof of Theorem 3
The properties (1) to (3) in Theorem 3 can be formalized in terms of the corresponding matrix \({\bf N}\) in Eq. (18) as:
We prove that the above properties hold for the matrix N for a symmetric square matrix M with nonnegative real values.
Proof
From Eq. (16), diagonal elements of M _{ wo } are all zeros. Since D ^{1/2}_{ M } diag(m _{ wo })^{−1/2} is a diagonal matrix, scaling the rows and columns of M _{ wo } by multiplying it with D ^{1/2}_{ M } diag(m _{ wo })^{−1/2} from both left and right (with its transposition) does not change its diagonal elements. Thus, since the diagonals in D ^{1/2}_{ M } diag(m _{ wo })^{−1/2} M _{ wo } diag(m _{ wo })^{−1/2} D ^{1/2}_{ M } are also zeros, Eq. (28) holds.
Since M is symmetric, M _{ wo } in Eq. (16) is also symmetric. Multiplying it by the diagonal matrix D ^{1/2}_{ M } diag(m _{ wo })^{−1/2} from both left and right (with its transposition) is invariant to the symmetric property of a matrix. Thus, since both the first and second terms in Eq. (18) are symmetric matrices, Eq. (29) holds.
For a diagonal matrix D \({\in \mathbb{R}^{\ell \times \ell}}\), D \({\bf 1}_{\ell}\,=\,{\bf d}\, =\,{\bf 1}_{\ell}\, \odot\,{\bf d}\), where the vector d is the row sum of D, and \(\odot\) stands for the Hadamard product (elementwise product) (Harville 2008). Thus, since diag(m _{ wo })^{−1/2} D ^{1/2}_{ M } is a diagonal matrix, the following holds:
where \({\bf d}_{D_{M}} \) is the row sum of D _{ M } in Eq. (15).
Equation (31) follows as above. Since m _{ wo } is the row sum of M _{ wo } in Eq. (17), diag(m _{ wo })^{−1/2} m _{ wo } = m ^{1/2}_{ wo } in Eq. (32), and Eq. (33) follows based on the definition of Hadamard product. Finally, based on the above property of diagonal matrices and Hadamard product, Eq. (30) follows.
Furthermore, for the first term in Eq. (18), M _{ wo } 1 _{ℓ} = M 1 _{ℓ} − D _{ M } 1 _{ℓ} = M 1 _{ℓ} − \({\bf d}_{D_{M}}. \) Thus, by summing M 1 _{ℓ} − \({\bf d}_{D_{M}} \hbox{ and } {\bf d}_{D_{M}} \) , Eq. (30) follows.
Appendix 3: Proof of Corollary 4
Corollary 4 can be formalized in terms of the following properties of the adjacency matrices:
Proof Since the adjacency matrices \(\tilde{\bf E}\) and \(\tilde{\bf E}_{1}\) satisfy the condition in Theorem 3, by substituting these matrices as M in Eq. (18), we can construct the corresponding matrices \(\tilde{\bf F}\) and \(\tilde{\bf F}_{1}\). From Eq. (30), \({\bf 1}_{m}^{\rm T}\tilde{\bf F}={\bf 1}_{m}^{\rm T}\tilde{\bf E} \) and \({\bf 1}_{m}^{\rm T}\tilde{\bf F}_{1}={\bf 1}_{m}^{\rm T}\tilde{\bf E}_{1} \) hold. Since 1 ^{T}_{ m } E = 21 ^{T}_{ m } and 1 ^{T}_{ m } E _{1} = 2 1 ^{T}_{ m } hold from the right hand side of Eqs. (26) and (27), the right hand side of Eq. (36) and that of Eq. (37) hold. Based on a similarly argument, the left hand side of Eq. (36) and that of Eq. (37) hold.
As shown in Theorem 2, the properties in Eqs. (36) and (37) indicate that the sum of the weights in the original network is preserved in \(\tilde{\bf F}\) and \(\tilde{\bf F}_{1}\) (also in F and F _{1}).□
Appendix 4: Complexity analysis
Suppose a simple connected network G contains n nodes and m links, and let \(\langle k \rangle\) be the average degree in G. Basically, the time complexity of constructing \(\tilde{\bf E}\) and \(\tilde{\bf E}_{1}\) from G based on the weighted incidence matrix \(\tilde{\bf B}\) is the same with that of E and E _{1} in Evans and Lambiotte (2009). This is because both approaches define the adjacency matrices based on a similar matrix calculation.
Since each row of \(\tilde{\bf B}^{\rm T}\) contains two nonzero elements and \(\tilde{\bf D}^{1}\) is a diagonal matrix, multiplication of \(\tilde{\bf B}^{\rm T}\) and \(\tilde{\bf D}^{1}\) can be done in O(m), and each row of \(\tilde{\bf B}^{\rm T}\tilde{\bf D}^{1}\) contains two nonzero elements as well.
Since \(\tilde{\bf E} \) in Eq. (12) is based a link–node–link random walk on G, it can be calculated by considering only \(2 \langle k \rangle\) links for each link in G. Since the calculation of \( (\tilde{\bf B}^{\rm T} \tilde{\bf D}^{1}) \tilde{\bf B} \) can be done in \(O(m \langle k \rangle)\), the time complexity of constructing \(\tilde{\bf E}\) is \(O(m \langle k \rangle)\). Similarly, since \( \tilde{\bf E}_{1} \) in Eq. (13) is based on a link–link–link random walk on G, \( \tilde{\bf E}_{1} \) can be calculated by considering only \(2 \langle k \rangle^{2}\) links for each link in G. The calculation of \( (\tilde{\bf B}^{\rm T} \tilde{\bf D}^{1}){\bf A} \) can be done in \(O(m \langle k \rangle)\), and that of \( (\tilde{\bf B}^{\rm T} \tilde{\bf D}^{1}{\bf A}) (\tilde{\bf D}^{1} \tilde{\bf B}) \) in \(O(m \langle k \rangle^{2})\). Thus, the time complexity of constructing \(\tilde{\bf E}_{1}\) is \(O(m \langle k \rangle^{2}.\)
As for the removal of selfloops in Sect. 4.2, let \(\langle k_{M} \rangle\) be the average degree of a network with an adjacency matrix M \({\in \mathbb{R}_{+}^{\ell \times \ell}}\) (e.g., \(\langle k_{M} \rangle\) is \(O(\langle k\rangle)\) in \(\tilde{\bf E}\), and \(O(\langle k\rangle^{2})\) in \(\tilde{\bf E}_{1}\)). The calculation of M _{ wo } in Eq. (16) can be done in O(ℓ), and that of m _{ wo } in Eq. (17) in \(O(\ell \langle k_{M} \rangle)\). Since both D ^{1/2}_{ M } and diag(m _{ wo })^{−1/2} are diagonal matrices, the calculation of D ^{1/2}_{ M } diag(m _{ wo })^{−1/2} can be done in O(ℓ). The scaling of M _{ wo } by multiplying D ^{1/2}_{ M } diag(m _{ wo })^{−1/2} from left and right needs to be conducted only for \(O(\ell \langle k_{M} \rangle)\) nonzero elements, and the addition as well. Thus, the time complexity of constructing N in Eq. (18) is \(O(\ell \langle k_{M} \rangle)\). By substituting m into ℓ, the time complexity of constructing F (and \(\tilde{\bf F}\)) is \(O(m \langle k \rangle)\), and that of F _{1} (and \(\tilde{\bf F}_{1}\)) is \(O(m \langle k \rangle^{2}).\)
On the other hand, since it is necessary to store the adjacency matrices of line graphs in memory, the space complexity is \(O(m \langle k_{M} \rangle),\) where \(\langle k_{M} \rangle\) is the average degree in the constructed line graph. In our approach, allocation of adjacency matrices in memory can become a problem for large networks.
Appendix 5: Construction of synthetic networks
Let C be the number of communities, n _{ c } for the number of nodes in a community (a network has n _{ c } × C nodes). Let w _{ u } stand for the link weight in the overall network, and r _{ m } > 1 for the weight ratio of the links within communities.
A synthetic network was generated as follows:
 Step 1:

The overall network with n _{ c } × C nodes was created with the Barabási–Albert (BA) model. The constructed overall network was rather dense,^{Footnote 6} and all the link weights were set to small value w _{ u }.
 Step 2:

A network of n _{ c } nodes was created for each community with the BA model. In this case, the constructed communities were rather sparse, ^{Footnote 7} and all the link weights in the communities were set to w _{ u } × r _{ m }.
 Step 3:

The communities constructed at Step 2 were embedded into the diagonal blocks of the adjacency matrix of the overall network at Step 1. Note that there was no overlap between the embedded diagonal blocks (i.e., embedded communities).
 Step 4:

For each node i (with degree k _{ i }) in the overall network, another community was randomly selected for which the node did not belong to. Then, up to k _{ i } nodes were randomly selected in the selected community. Finally, the node i was connected to the selected nodes with link weight w _{ u } × r _{ m } as in Step 2.
.
The overall dense network with relatively small weights is constructed at Step 1. Sparse communities with large weights at Step 2 are embedded into the overall network at Step 3. In addition, since each node is connected to other nodes in another community at Step 4, the constructed network has an overlapping community structure. In the experiments, the parameters were set as w _{ u } = 1 and r _{ m } = 100 so that nodes in each community were tightly connected with large weights.
Rights and permissions
About this article
Cite this article
Yoshida, T. Weighted line graphs for overlapping community discovery. Soc. Netw. Anal. Min. 3, 1001–1013 (2013). https://doi.org/10.1007/s1327801301041
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s1327801301041