Abstract
We develop node embeddings, a distributed representation of nodes, for large-scale social network applications. We compute embeddings for nodes based on their attributes and links. We show that node embeddings can effectively reflect community structure in networks and thus, be useful for a wide range of community related applications. We consider node embeddings in two different community related mining tasks.First, we propose a generic integration of node embeddings for network processing in community detection algorithms. Our strategy aims to re-adjust input networks by adding and trimming links, using embedding-based node distances. We empirically show that the strategy can remove up to 32.16% links from the DBLP (computer science literature) citation network, yet improve performance for different algorithms by different evaluation metrics for community detections.Second, we show that these embeddings can support many community-based mining tasks in social networks—including analyses of community homogeneity, distance, and detection of community connectors (inter-community outliers, actors who connect communities)—thanks to the convenient yet efficient computation provided by node embeddings for structural comparisons. Our experimental results include many interesting insights about DBLP. For example, prior to 2013 the best way for research in Natural Language & Speech to gain “best-paper” recognition was to emphasize aspects related to Machine Learning & Pattern Recognition.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bengio S, Heigold G. Word embeddings for speech recognition. In: Proceedings of the 15th conference of the international speech communication association, Interspeech; 2014.
Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of community hierarchies in large networks. CoRR. 2008;abs/0803.0476.
Chen W, Zhang M, Zhang Y. Distributed feature representations for dependency parsing. IEEE Trans Audio Speech Lang Process. 2015;23(3):451–60.
Clauset A, Shalizi CR, Newman MEJ. Power-law distributions in empirical data. SIAM Rev. 2009;51(4):661–703.
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. Natural language processing (almost) from scratch. J Mach Learn Res. 2011;12:2493–2537.
Feingold E, Good P. Encode pilot project; 2003. http://www.genome.gov/26525202.
Fortunato S, Lancichinetti A. Community detection algorithms: a comparative analysis: invited presentation, extended abstract. In: Proceedings of the fourth international ICST conference on performance evaluation methodologies and tools, VALUETOOLS ’09. Brussels: ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering); 2009. p. 27:1–2.
Gehrke P. The ethics and politics of speech: communication and rhetoric in the twentieth century. Carbondale: Southern Illinois University Press; 2009.
Goyal A, Bonchi F, Lakshmanan LVS. Approximation analysis of influence spread in social networks. CoRR. 2010;abs/1008.2005.
Han J. Data mining: concepts and techniques. San Francisco, CA: Morgan Kaufmann; 2005.
Hannun AY, Case C, Casper J, Catanzaro BC, Diamos G, Elsen E, Prenger R, Satheesh S, Sengupta S, Coates A, Ng AY. Deep speech: scaling up end-to-end speech recognition. CoRR. 2014;abs/1412.5567.
Hinton GE, Osindero S, Teh YW. A fast learning algorithm for deep belief nets. Neural Comput. 2006;18(7):1527–54
Levy O, Goldberg Y. Dependency-based word embeddings. Baltimore, MD: Association for Computational Linguistics; 2014.
Levy O, Goldberg Y. Neural word embedding as implicit matrix factorization. In: Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger K, editors. Advances in neural information processing systems, vol. 27. Red Hook, NY: Curran Associates; 2014, p. 2177–85.
Li L, Su H, Lim Y, Li F. Object bank: an object-level image representation for high-level visual recognition. Int J Comput Vis 2014;107(1):20–39.
Mikolov T, Chen K, Corrado G. Dean J. Efficient estimation of word representations in vector space. CoRR. 2013;abs/1301.3781.
Newman MEJ. Analysis of weighted networks. Phys. Rev. E 2004;70:056131
Newman MEJ. Fast algorithm for detecting community structure in networks. Phys. Rev. E 2004;69:066133.
Newman MEJ. Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 2006;74:036104
Newman MEJ, Girvan M. Finding and evaluating community structure in networks. Phys. Rev. E 2004;69:026113.
Orman GK, Labatut V, Cherifi H. On accuracy of community structure discovery algorithms. CoRR. 2011;abs/1112.4134.
Pons P, Latapy M. Computing communities in large networks using random walks (long version). In: Computer and Information Sciences-ISCIS; 2005. p. 284–93. ArXiv:arXiv:physics/0512106v1.
Riondato M, Kornaropoulos EM. Fast approximation of betweenness centrality through sampling. In: WSDM ’14; 2014.
Rosvall M, Bergstrom CT. Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci USA. 2008;2007:1118.
Ruan, Y., Fuhry, D., Parthasarathy, S.: Efficient community detection in large networks using content and links. CoRR. 2012;abs/1212.0146.
Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng AY, Potts C. Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing. Stroudsburg, PA: Association for Computational Linguistics; 2013. p. 1631–42.
Sun Y, Han J. Mining heterogeneous information networks: principles and methodologies. San Rafael, CA: Morgan & Claypool; 2012.
Sun Y, Yu Y, Han J. Ranking-based clustering of heterogeneous information networks with star network schema. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’09; 2009. p. 797–806.
Tian Y, Hankins RA, Patel JM. Efficient aggregation for graph summarization. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, SIGMOD ’08; 2008. p. 567–80.
Vu T, Parker DS. Node embeddings in social network analysis. In: Proceedings of the 2015 IEEE/ACM international conference on advances in social networks analysis and mining 2015, ASONAM ’15; 2015. p. 326–9.
Yang T, Jin R, Chi Y, Zhu S. Combining link and content for community detection: a discriminative approach. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’09; 2009. p. 927–36.
Zhou Y, Cheng H, Yu JX. Graph clustering based on structural/attribute similarities. Proc VLDB Endow. 2009;2(1):718–729.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Vu, T., Parker, D.S. (2017). Mining Community Structure with Node Embeddings. In: Kaya, M., Erdoǧan, Ö., Rokne, J. (eds) From Social Data Mining and Analysis to Prediction and Community Detection. Lecture Notes in Social Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-51367-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-51367-6_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-51366-9
Online ISBN: 978-3-319-51367-6
eBook Packages: Computer ScienceComputer Science (R0)