Skip to main content

Weighted line graphs for overlapping community discovery

Abstract

We propose weighted line graphs for overlapping community discovery where a node in a network can be assigned to more than one community. For undirected connected networks without self-loops, we propose weighted line graphs by: (1) defining weights of a line graph based on the weights in the original network, and (2) removing self-loops in weighted line graphs, while sustaining their properties. By applying some off-the-shelf node partitioning method to the transformed graph, community labels of adjacent links are assigned to each node in the original network. Experiments are conducted over both synthetic and real-world networks, and the results indicate that the proposed approach can improve the quality of discovered overlapping communities.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Notes

  1. We also refer to a network as a graph, a node as a vertex, and a link as an edge.

  2. A k-clique-community is defined as a union of all k-cliques that can be reached from each other through a series of adjacent k-cliques (which share k-1 nodes).

  3. The ith diagonal element in D is set to the ith element of \({\bf k}\).

  4. http://www-personal.umich.edu/~mejn/netdata/ (celegans was converted into undirected network in the experiment).

  5. Pascal: http://analytics.ijs.si/~blazf/pvc/data.html; IV’04: http://iv.slis.indiana.edu/ref/iv04contest/.

  6. The initial degree in BA model was set to 20.

  7. The initial degree in BA model was set to 2 (1/10 of 20 in Step 1.).

References

  • Ahn YY, Bagrow JP, Lehmann S (2010) Link communities reveal multiscale complexity in networks. Nat Biotechnol 466:761–764

    Article  Google Scholar 

  • Barabási AL, Albert R (1999) Emergence of scaling in random networks. Sci Agric 286:509–512

    Article  MathSciNet  Google Scholar 

  • Bhattacharyya P, Garg A, Wu SF (2011) Analysis of user keyword similarity in online social networks. Social Netw Anal Mining 1(3):143–158

    Article  Google Scholar 

  • Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(6):066,111

    Article  Google Scholar 

  • Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduction to algorithms. MIT Press, Cambridge

  • Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc 39(2):1–38

    MathSciNet  MATH  Google Scholar 

  • Diestel R (2006) Graph theory. Springer, Berlin

  • Evans T, Lambiotte R (2009) Line graphs, link partitions, and overlapping communities. Phys Rev E 80(1), 016,105:1–8

    Google Scholar 

  • Evans T, Lambiotte R (2010) Line graphs of weighted networks for overlapping communities. Eur Phys J B 77:265–272

    Article  Google Scholar 

  • Gregory S (2009) Finding overlapping communities using disjoint community detection algorithms. In: Complex networks. Springer, Berlin, pp 47–61

  • Gregory S (2011) Fuzzy overlapping communities in networks. J Stat Mech Theor Exp P02017

  • Hanneman RA, Shelton CR (2011) Applying modality and equivalence concepts to pattern finding in social process-produced data. Social Netw Anal Mining 1(1):59–72

    Article  Google Scholar 

  • Harville DA (2008) Matrix algebra from a Statistican’s perspective. Springer, Berlin

  • Kempe D, Kleinberg J, Tardos E (2003) Maximizing the spread of influence through a social network. In: Proceedings of KDD’03, pp 137–146

  • Mika P (2007) Social networks and the semantic web. Springer, Berlin

  • Müller M (2007) Information retrieval for music and motion. Springer, Berlin

  • Nepusz T, Petróczi A, Négyessy L, Bazsó F (2008) Fuzzy communities and the concept of bridgeness in complex networks. Phys Rev E 77:016,107

    Google Scholar 

  • Newman M (2006) Finding community structure using the eigenvectors of matrices. Phys Rev E 76(3):036,104

    Google Scholar 

  • Newman M (2010) Networks: an introduction. Oxford University Press, Oxford

  • Palla G, Derényi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nat Biotechnol 435:814–818

    Article  Google Scholar 

  • Pons P, Latapy M (2006) Computing communities in large networks using random walks. J Graph Algorithms 10(2):191–218

    Article  MathSciNet  MATH  Google Scholar 

  • Raghavan U, Albert R, Kumara S (2007) Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 76:036,106

    Google Scholar 

  • Scott J (2011) Social network analysis: developments, advances, and prospects. Social Netw Anal Mining 1(1):21–26

    Article  Google Scholar 

  • Shen HW, Chenga XQ, Guo JF (2011) Quantifying and identifying the overlapping community structure in networks. J Stat Mech Theor Exp P07042

  • von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416

    Article  MathSciNet  Google Scholar 

  • Watts DJ (2003) Small worlds: the dynamics of networks between order and randomness. Princeton University Press, Princeton

  • Watts DJ (2004) Six degrees: the science of a connected age. W W Norton & Co Inc, New York

  • Whitney H (1932) Congruent graphs and the connectivity of graphs. Am J Math 54:150–168

    Article  MathSciNet  Google Scholar 

  • Yoshida T (2012) Overlapping community discovery via weighted line graphs of of networks. In: Proceedings of PRICAI’12 (LNAI 7458), pp 895–898

  • Yoshida T (2013) Toward finding hidden communities based on user profile. J Intell Inf Syst (in press)

  • Zhang S, Wang FS, Zhang XS (2007) Identification of overalpping community structure in complex networks using fuzzy c-means clustering. Phys A 388(8):483–490

    Article  Google Scholar 

Download references

Acknowledgments

We express sincere gratitude to the reviewers for their careful reading of the manuscript and for providing valuable suggestions to improve the paper. This work is partially supported by the grant-in-aid for scientific research (No. 24300049) funded by MEXT, Japan, and the Murata Science Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tetsuya Yoshida.

Appendices

Appendix 1: Proof of Theorem 2

Theorem 2 can be formalized in terms of the adjacency matrices of transformed networks based on the following properties:

$$ {\bf 1}_{m}^{\rm T}{\bf C}=({\bf k} - {\bf 1}_{n})^{\rm T}{\bf B}, \quad {\bf 1}_{m}^{\rm T}\tilde{\bf C}=(\tilde{\bf k} - {\bf 1}_{n})^{\rm T}\tilde{\bf B} $$
(25)
$$ {\bf 1}_{m}^{\rm T}{\bf E}=2 {\bf 1}_m^{\rm T}, \quad {\bf 1}_{m}^{\rm T}\tilde{\bf E}=2 {\bf w}^{\rm T} $$
(26)
$$ {\bf 1}_{m}^{\rm T}{\bf E}_{1}=2 {\bf 1}_m^{\rm T}, \quad {\bf 1}_{m}^{\rm T}\tilde{\bf E}_{1}=2 {\bf w}^{\rm T}. $$
(27)

Proof

$$ {\bf 1}_{m}^{\rm T}{\bf C}={\bf 1}_{m}^{\rm T}({\bf B}^{\rm T}{\bf B} - 2{\bf I}_{m})={\bf k}^{\rm T}{\bf B} - 2{\bf 1}_{m}^{\rm T}\\={\bf k}^{\rm T}{\bf B} - {\bf 1}_{n}^{\rm T}{\bf B}=({\bf k} - {\bf 1}_{n})^{\rm T}{\bf B}\\ $$

From Eq. (9) in Proposition 1, 1 T m B T = k T holds. Furthermore, from Eq. (10), 21 T m  = 1 T n B holds. Similarly, by utilizing the right-hand side of Eqs. (9) and (10) in Proposition 1, we can prove Eq. (25).

On the other hand, based on Proposition 1, we can prove Eq. (26) and Eq. (27) as follows:

$$ \begin{aligned} {\bf 1}_{m}^{\rm T}{\bf E} &= {\bf 1}_{m}^{\rm T}{\bf B}^{\rm T}{\bf D}^{-1}{\bf B}={\bf k}^{\rm T}{\bf D}^{-1}{\bf B}={\bf 1}_{n}^{\rm T}{\bf B}=2{\bf 1}_{m}\\ {\bf 1}_{m}^{\rm T}{\bf E}_{1} &= {\bf 1}_{m}^{\rm T}{\bf B}^{\rm T}{\bf D}^{-1}{\bf A}{\bf D}^{-1}{\bf B}={\bf k}^{\rm T}{\bf D}^{-1}{\bf A}{\bf D}^{-1}{\bf B}\\ &= {\bf 1}_{n}^{\rm T}{\bf A}{\bf D}^{-1}{\bf B}={\bf k}^{\rm T}{\bf D}^{-1}{\bf B}={\bf 1}_{n}^{\rm T}{\bf B}=2{\bf 1}_{m} \end{aligned} $$

Similarly, by utilizing the right-hand side in Proposition 1, we can prove the right-hand-side of Eqs. (26) and (27).

From the left hand side of Eqs. (26) and (27), we can see that 1 T n A 1 n  = 1 T m E 1 m  = 1 T m E 1 1 m  = 2m. Similarly, from the right hand side of Eqs. (26) and (27), \({\bf 1}_{n}^{\rm T}\tilde{\bf A} {\bf 1}_{n}\) = \({\bf 1}_{m}^{\rm T}\tilde{\bf E} {\bf 1}_{m}\) = \({\bf 1}_{m}^{\rm T}\tilde{\bf E}_{1} {\bf 1}_{m}\) = \(\sum_{i,j} \tilde{\bf A}_{ij}\). Thus, by defining the adjacency matrix of transformed network with Eq. (12) or Eq. (13), the sum of the weights in the original network is preserved in the transformed network. □

Appendix 2: Proof of Theorem 3

The properties (1) to (3) in Theorem 3 can be formalized in terms of the corresponding matrix \({\bf N}\) in Eq. (18) as:

$$ {\rm diag}({\bf N})={\bf 0}_{\ell} $$
(28)
$$ {\bf N}^{\rm T}={\bf N} $$
(29)
$$ {\bf N}{\bf 1}_{\ell}={\bf M}{\bf 1}_{\ell} $$
(30)

We prove that the above properties hold for the matrix N for a symmetric square matrix M with non-negative real values.

Proof

From Eq. (16), diagonal elements of M wo are all zeros. Since D 1/2 M diag(m wo )−1/2 is a diagonal matrix, scaling the rows and columns of M wo by multiplying it with D 1/2 M diag(m wo )−1/2 from both left and right (with its transposition) does not change its diagonal elements. Thus, since the diagonals in D 1/2 M diag(m wo )−1/2 M wo diag(m wo )−1/2 D 1/2 M are also zeros, Eq. (28) holds.

Since M is symmetric, M wo in Eq. (16) is also symmetric. Multiplying it by the diagonal matrix D 1/2 M diag(m wo )−1/2 from both left and right (with its transposition) is invariant to the symmetric property of a matrix. Thus, since both the first and second terms in Eq. (18) are symmetric matrices, Eq. (29) holds.

For a diagonal matrix D \({\in \mathbb{R}^{\ell \times \ell}}\), D \({\bf 1}_{\ell}\,=\,{\bf d}\, =\,{\bf 1}_{\ell}\, \odot\,{\bf d}\), where the vector d is the row sum of D, and \(\odot\) stands for the Hadamard product (element-wise product) (Harville 2008). Thus, since diag(m wo )−1/2 D 1/2 M is a diagonal matrix, the following holds:

$$ \begin{aligned} & {\bf D}_{M}^{1/2}{\rm diag}({\bf m}_{wo})^{-1/2} {\bf M}_{wo} {\rm diag}({\bf m}_{wo})^{-1/2} {\bf D}_{M}^{1/2} {\bf 1}_{\ell} \\ &\quad = {\bf D}_{M}^{1/2}{\rm diag}({\bf m}_{wo})^{-1/2} {\bf M}_{wo} {\bf 1}_{\ell} \odot ( {\rm diag}({\bf m}_{wo})^{-1/2} {\bf D}_{M}^{1/2} ) \end{aligned} $$
(31)
$$={\bf D}_{M}^{1/2} {\bf m}_{wo}^{1/2} \odot ( {\rm diag}({\bf m}_{wo})^{-1/2} {\bf D}_{M}^{1/2} ) $$
(32)
$$={\bf D}_{M}^{1/2} {\bf 1}_{\ell} \odot {\bf D}_{M}^{1/2} $$
(33)
$$={\bf D}_{M}^{1/2} {\bf D}_{M}^{1/2} {\bf 1}_{\ell} $$
(34)
$$={\bf d}_{D_{M}} $$
(35)

where \({\bf d}_{D_{M}} \) is the row sum of D M in Eq. (15).

Equation (31) follows as above. Since m wo is the row sum of M wo in Eq. (17), diag(m wo )−1/2 m wo  = m 1/2 wo in Eq. (32), and Eq. (33) follows based on the definition of Hadamard product. Finally, based on the above property of diagonal matrices and Hadamard product, Eq. (30) follows.

Furthermore, for the first term in Eq. (18), M wo 1  = M 1  − D M 1  = M 1  − \({\bf d}_{D_{M}}. \) Thus, by summing M 1  − \({\bf d}_{D_{M}} \hbox{ and } {\bf d}_{D_{M}} \) , Eq. (30) follows.

Appendix 3: Proof of Corollary 4

Corollary 4 can be formalized in terms of the following properties of the adjacency matrices:

$$ {\bf 1}_{m}^{\rm T}{\bf F}={\bf 1}_{m}^{\rm T}{\bf E}=2 {\bf 1}_m^{\rm T}, \quad {\bf 1}_{m}^{\rm T}\tilde{\bf F}={\bf 1}_{m}^{\rm T}\tilde{\bf E}=2 {\bf w}^{\rm T} $$
(36)
$$ {\bf 1}_{m}^{\rm T}{\bf F}_{1}={\bf 1}_{m}^{\rm T}{\bf E}_{1}=2 {\bf 1}_m^{\rm T}, \quad {\bf 1}_{m}^{\rm T}\tilde{\bf F}_{1}={\bf 1}_{m}^{\rm T}\tilde{\bf E}_{1}=2 {\bf w}^{\rm T} $$
(37)

Proof Since the adjacency matrices \(\tilde{\bf E}\) and \(\tilde{\bf E}_{1}\) satisfy the condition in Theorem 3, by substituting these matrices as M in Eq. (18), we can construct the corresponding matrices \(\tilde{\bf F}\) and \(\tilde{\bf F}_{1}\). From Eq. (30), \({\bf 1}_{m}^{\rm T}\tilde{\bf F}={\bf 1}_{m}^{\rm T}\tilde{\bf E} \) and \({\bf 1}_{m}^{\rm T}\tilde{\bf F}_{1}={\bf 1}_{m}^{\rm T}\tilde{\bf E}_{1} \) hold. Since 1 T m E = 21 T m and 1 T m E 1 = 2 1 T m hold from the right hand side of Eqs. (26) and (27), the right hand side of Eq. (36) and that of Eq. (37) hold. Based on a similarly argument, the left hand side of Eq. (36) and that of Eq. (37) hold.

As shown in Theorem 2, the properties in Eqs. (36) and (37) indicate that the sum of the weights in the original network is preserved in \(\tilde{\bf F}\) and \(\tilde{\bf F}_{1}\) (also in F and F 1).□

Appendix 4: Complexity analysis

Suppose a simple connected network G contains n nodes and m links, and let \(\langle k \rangle\) be the average degree in G. Basically, the time complexity of constructing \(\tilde{\bf E}\) and \(\tilde{\bf E}_{1}\) from G based on the weighted incidence matrix \(\tilde{\bf B}\) is the same with that of E and E 1 in Evans and Lambiotte (2009). This is because both approaches define the adjacency matrices based on a similar matrix calculation.

Since each row of \(\tilde{\bf B}^{\rm T}\) contains two non-zero elements and \(\tilde{\bf D}^{-1}\) is a diagonal matrix, multiplication of \(\tilde{\bf B}^{\rm T}\) and \(\tilde{\bf D}^{-1}\) can be done in O(m), and each row of \(\tilde{\bf B}^{\rm T}\tilde{\bf D}^{-1}\) contains two non-zero elements as well.

Since \(\tilde{\bf E} \) in Eq. (12) is based a link–node–link random walk on G, it can be calculated by considering only \(2 \langle k \rangle\) links for each link in G. Since the calculation of \( (\tilde{\bf B}^{\rm T} \tilde{\bf D}^{-1}) \tilde{\bf B} \) can be done in \(O(m \langle k \rangle)\), the time complexity of constructing \(\tilde{\bf E}\) is \(O(m \langle k \rangle)\). Similarly, since \( \tilde{\bf E}_{1} \) in Eq. (13) is based on a link–link–link random walk on G, \( \tilde{\bf E}_{1} \) can be calculated by considering only \(2 \langle k \rangle^{2}\) links for each link in G. The calculation of \( (\tilde{\bf B}^{\rm T} \tilde{\bf D}^{-1}){\bf A} \) can be done in \(O(m \langle k \rangle)\), and that of \( (\tilde{\bf B}^{\rm T} \tilde{\bf D}^{-1}{\bf A}) (\tilde{\bf D}^{-1} \tilde{\bf B}) \) in \(O(m \langle k \rangle^{2})\). Thus, the time complexity of constructing \(\tilde{\bf E}_{1}\) is \(O(m \langle k \rangle^{2}.\)

As for the removal of self-loops in Sect. 4.2, let \(\langle k_{M} \rangle\) be the average degree of a network with an adjacency matrix M \({\in \mathbb{R}_{+}^{\ell \times \ell}}\) (e.g., \(\langle k_{M} \rangle\) is \(O(\langle k\rangle)\) in \(\tilde{\bf E}\), and \(O(\langle k\rangle^{2})\) in \(\tilde{\bf E}_{1}\)). The calculation of M wo in Eq. (16) can be done in O(ℓ), and that of m wo in Eq. (17) in \(O(\ell \langle k_{M} \rangle)\). Since both D 1/2 M and diag(m wo )−1/2 are diagonal matrices, the calculation of D 1/2 M diag(m wo )−1/2 can be done in O(ℓ). The scaling of M wo by multiplying D 1/2 M diag(m wo )−1/2 from left and right needs to be conducted only for \(O(\ell \langle k_{M} \rangle)\) non-zero elements, and the addition as well. Thus, the time complexity of constructing N in Eq. (18) is \(O(\ell \langle k_{M} \rangle)\). By substituting m into ℓ, the time complexity of constructing F (and \(\tilde{\bf F}\)) is \(O(m \langle k \rangle)\), and that of F 1 (and \(\tilde{\bf F}_{1}\)) is \(O(m \langle k \rangle^{2}).\)

On the other hand, since it is necessary to store the adjacency matrices of line graphs in memory, the space complexity is \(O(m \langle k_{M} \rangle),\) where \(\langle k_{M} \rangle\) is the average degree in the constructed line graph. In our approach, allocation of adjacency matrices in memory can become a problem for large networks.

Appendix 5: Construction of synthetic networks

Let |C| be the number of communities, n c for the number of nodes in a community (a network has n c  × |C| nodes). Let w u stand for the link weight in the overall network, and r m  > 1 for the weight ratio of the links within communities.

A synthetic network was generated as follows:

Step 1:

The overall network with n c  × |C| nodes was created with the Barabási–Albert (BA) model. The constructed overall network was rather dense,Footnote 6 and all the link weights were set to small value w u .

Step 2:

A network of n c nodes was created for each community with the BA model. In this case, the constructed communities were rather sparse, Footnote 7 and all the link weights in the communities were set to w u  × r m .

Step 3:

The communities constructed at Step 2 were embedded into the diagonal blocks of the adjacency matrix of the overall network at Step 1. Note that there was no overlap between the embedded diagonal blocks (i.e., embedded communities).

Step 4:

For each node i (with degree k i ) in the overall network, another community was randomly selected for which the node did not belong to. Then, up to k i nodes were randomly selected in the selected community. Finally, the node i was connected to the selected nodes with link weight w u  × r m as in Step 2.

.

The overall dense network with relatively small weights is constructed at Step 1. Sparse communities with large weights at Step 2 are embedded into the overall network at Step 3. In addition, since each node is connected to other nodes in another community at Step 4, the constructed network has an overlapping community structure. In the experiments, the parameters were set as w u  = 1 and r m  = 100 so that nodes in each community were tightly connected with large weights.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Yoshida, T. Weighted line graphs for overlapping community discovery. Soc. Netw. Anal. Min. 3, 1001–1013 (2013). https://doi.org/10.1007/s13278-013-0104-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13278-013-0104-1

Keywords