Mining Community Structure with Node Embeddings

Vu, Thuy; Parker, D. Stott

doi:10.1007/978-3-319-51367-6_6

Thuy Vu¹⁶ &
D. Stott Parker¹⁶

Part of the book series: Lecture Notes in Social Networks ((LNSN))

994 Accesses
1 Citations

Abstract

We develop node embeddings, a distributed representation of nodes, for large-scale social network applications. We compute embeddings for nodes based on their attributes and links. We show that node embeddings can effectively reflect community structure in networks and thus, be useful for a wide range of community related applications. We consider node embeddings in two different community related mining tasks.First, we propose a generic integration of node embeddings for network processing in community detection algorithms. Our strategy aims to re-adjust input networks by adding and trimming links, using embedding-based node distances. We empirically show that the strategy can remove up to 32.16% links from the DBLP (computer science literature) citation network, yet improve performance for different algorithms by different evaluation metrics for community detections.Second, we show that these embeddings can support many community-based mining tasks in social networks—including analyses of community homogeneity, distance, and detection of community connectors (inter-community outliers, actors who connect communities)—thanks to the convenient yet efficient computation provided by node embeddings for structural comparisons. Our experimental results include many interesting insights about DBLP. For example, prior to 2013 the best way for research in Natural Language & Speech to gain “best-paper” recognition was to emphasize aspects related to Machine Learning & Pattern Recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bengio S, Heigold G. Word embeddings for speech recognition. In: Proceedings of the 15th conference of the international speech communication association, Interspeech; 2014.
Google Scholar
Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of community hierarchies in large networks. CoRR. 2008;abs/0803.0476.
Google Scholar
Chen W, Zhang M, Zhang Y. Distributed feature representations for dependency parsing. IEEE Trans Audio Speech Lang Process. 2015;23(3):451–60.
Article Google Scholar
Clauset A, Shalizi CR, Newman MEJ. Power-law distributions in empirical data. SIAM Rev. 2009;51(4):661–703.
Article MathSciNet MATH Google Scholar
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. Natural language processing (almost) from scratch. J Mach Learn Res. 2011;12:2493–2537.
MATH Google Scholar
Feingold E, Good P. Encode pilot project; 2003. http://www.genome.gov/26525202.
Google Scholar
Fortunato S, Lancichinetti A. Community detection algorithms: a comparative analysis: invited presentation, extended abstract. In: Proceedings of the fourth international ICST conference on performance evaluation methodologies and tools, VALUETOOLS ’09. Brussels: ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering); 2009. p. 27:1–2.
Google Scholar
Gehrke P. The ethics and politics of speech: communication and rhetoric in the twentieth century. Carbondale: Southern Illinois University Press; 2009.
Google Scholar
Goyal A, Bonchi F, Lakshmanan LVS. Approximation analysis of influence spread in social networks. CoRR. 2010;abs/1008.2005.
Google Scholar
Han J. Data mining: concepts and techniques. San Francisco, CA: Morgan Kaufmann; 2005.
Google Scholar
Hannun AY, Case C, Casper J, Catanzaro BC, Diamos G, Elsen E, Prenger R, Satheesh S, Sengupta S, Coates A, Ng AY. Deep speech: scaling up end-to-end speech recognition. CoRR. 2014;abs/1412.5567.
Google Scholar
Hinton GE, Osindero S, Teh YW. A fast learning algorithm for deep belief nets. Neural Comput. 2006;18(7):1527–54
Article MathSciNet MATH Google Scholar
Levy O, Goldberg Y. Dependency-based word embeddings. Baltimore, MD: Association for Computational Linguistics; 2014.
Book Google Scholar
Levy O, Goldberg Y. Neural word embedding as implicit matrix factorization. In: Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger K, editors. Advances in neural information processing systems, vol. 27. Red Hook, NY: Curran Associates; 2014, p. 2177–85.
Google Scholar
Li L, Su H, Lim Y, Li F. Object bank: an object-level image representation for high-level visual recognition. Int J Comput Vis 2014;107(1):20–39.
Article Google Scholar
Mikolov T, Chen K, Corrado G. Dean J. Efficient estimation of word representations in vector space. CoRR. 2013;abs/1301.3781.
Google Scholar
Newman MEJ. Analysis of weighted networks. Phys. Rev. E 2004;70:056131
Article Google Scholar
Newman MEJ. Fast algorithm for detecting community structure in networks. Phys. Rev. E 2004;69:066133.
Article Google Scholar
Newman MEJ. Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 2006;74:036104
Article MathSciNet Google Scholar
Newman MEJ, Girvan M. Finding and evaluating community structure in networks. Phys. Rev. E 2004;69:026113.
Article Google Scholar
Orman GK, Labatut V, Cherifi H. On accuracy of community structure discovery algorithms. CoRR. 2011;abs/1112.4134.
Google Scholar
Pons P, Latapy M. Computing communities in large networks using random walks (long version). In: Computer and Information Sciences-ISCIS; 2005. p. 284–93. ArXiv:arXiv:physics/0512106v1.
Google Scholar
Riondato M, Kornaropoulos EM. Fast approximation of betweenness centrality through sampling. In: WSDM ’14; 2014.
Google Scholar
Rosvall M, Bergstrom CT. Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci USA. 2008;2007:1118.
Article Google Scholar
Ruan, Y., Fuhry, D., Parthasarathy, S.: Efficient community detection in large networks using content and links. CoRR. 2012;abs/1212.0146.
Google Scholar
Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng AY, Potts C. Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing. Stroudsburg, PA: Association for Computational Linguistics; 2013. p. 1631–42.
Google Scholar
Sun Y, Han J. Mining heterogeneous information networks: principles and methodologies. San Rafael, CA: Morgan & Claypool; 2012.
Google Scholar
Sun Y, Yu Y, Han J. Ranking-based clustering of heterogeneous information networks with star network schema. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’09; 2009. p. 797–806.
Google Scholar
Tian Y, Hankins RA, Patel JM. Efficient aggregation for graph summarization. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, SIGMOD ’08; 2008. p. 567–80.
Google Scholar
Vu T, Parker DS. Node embeddings in social network analysis. In: Proceedings of the 2015 IEEE/ACM international conference on advances in social networks analysis and mining 2015, ASONAM ’15; 2015. p. 326–9.
Google Scholar
Yang T, Jin R, Chi Y, Zhu S. Combining link and content for community detection: a discriminative approach. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’09; 2009. p. 927–36.
Google Scholar
Zhou Y, Cheng H, Yu JX. Graph clustering based on structural/attribute similarities. Proc VLDB Endow. 2009;2(1):718–729.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, 90095, USA
Thuy Vu & D. Stott Parker

Authors

Thuy Vu
View author publications
You can also search for this author in PubMed Google Scholar
D. Stott Parker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thuy Vu .

Editor information

Editors and Affiliations

Department of Computer Engineering, Firat University, Elazig, Turkey
Mehmet Kaya
Ministry of Interior, Ankara, Turkey
Özcan Erdoǧan
Department of Computer Science, University of Calgary, Calgary, AB, Canada
Jon Rokne

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Vu, T., Parker, D.S. (2017). Mining Community Structure with Node Embeddings. In: Kaya, M., Erdoǧan, Ö., Rokne, J. (eds) From Social Data Mining and Analysis to Prediction and Community Detection. Lecture Notes in Social Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-51367-6_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-51367-6_6
Published: 22 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-51366-9
Online ISBN: 978-3-319-51367-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics