Abstract
While graph embedding aims at learning low-dimensional representations of nodes encompassing the graph topology, word embedding focus on learning word vectors that encode semantic properties of the vocabulary. The first finds applications on tasks such as link prediction and node classification while the latter is systematically considered in natural language processing. Most of the time, graph and word embeddings are considered on their own as distinct tasks. However, word co-occurrence matrices, widely used to extract word embeddings, can be seen as graphs. Furthermore, most network embedding techniques rely either on a word embedding methodology (Word2vec) or on matrix factorization, also widely used for word embedding. These methods are usually computationally expensive, parameter dependant and the dimensions of the embedding space are not interpretable. To circumvent these issues, we introduce the Lower Dimension Bipartite Graphs Framework (LDBGF) which takes advantage of the fact that all graphs can be described as bipartite graphs, even in the case of textual data. This underlying bipartite structure may be explicit, like in coauthor networks. However, with LDBGF, we focus on uncovering latent bipartite structures, lying for instance in social or word co-occurrence networks, and especially such structures providing conciser and interpretable representations of the graph at hand. We further propose SINr, an efficient implementation of the LDBGF approach that extracts Sparse Interpretable Node Representations using community structure to approximate the underlying bipartite structure. In the case of graph embedding, our near-linear time method is the fastest of our benchmark, parameter-free and provides state-of-the-art results on the classical link prediction task. We also show that low-dimensional vectors can be derived from SINr using singular value decomposition. In the case of word embedding, our approach proves to be very efficient considering the classical similarity evaluation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
With two Intel Xeon CPU E5-2660 2.20 GHz: 16 cores, 96Go Ram.
References
Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: NIPS, pp. 585–591 (2002)
Bhowmick, A.K., Meneni, K., Danisch, M., Guillaume, J., Mitra, B.: Louvainne: hierarchical louvain method for high quality and scalable network embedding. In: WSDM, pp. 43–51 (2020)
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech.: Theory Exp. 2008, P10008 (2008)
Brandes, U., et al.: Maximizing modularity is hard. arXiv preprint physics/0608255 (2006)
Brochier, R., Guille, A., Velcin, J.: Global vectors for node representations. In: WWW, pp. 2587–2593 (2019)
Cao, S., Lu, W., Xu, Q.: GraRep: learning graph representations with global structural information. In: CIKM, pp. 891–900 (2015)
Dugué, N., Lamirel, J.-C., Perez, A.: Bringing a feature selection metric from machine learning to complex networks. In: Aiello, L.M., Cherifi, C., Cherifi, H., Lambiotte, R., Lió, P., Rocha, L.M. (eds.) Complex Networks 2018. SCI, vol. 813, pp. 107–118. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05414-4_9
Guillaume, J.L., Latapy, M.: Bipartite graphs as models of complex networks. Physica A 371(2), 795–813 (2006)
Kunegis, J.: The Koblenz network collection. In: WWW, pp. 1343–1350 (2013)
Lastra-Díz, J.J., et al.: Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity. Data Brief 26, 104432 (2019)
Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph evolution: densification and shrinking diameters. ACM Trans. Knowl. Discov. Data 1(1), 2-es (2007)
Levy, O., Goldberg, Y., Dagan, I.: Improving distributional similarity with lessons learned from word embeddings. ACL 3, 211–225 (2015)
Lu, Q., Getoor, L.: Link-based classification. In: ICML, pp. 496–503 (2003)
Martínez, V., Berzal, F., Talavera, J.C.C.: A survey of link prediction in complex networks. ACM Comput. Surv. 49(4), 69:1–69:33 (2017)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)
Monson, S.D., Pullman, N.J., Rees, R.: A survey of clique and biclique coverings and factorizations of (0, 1)-matrices. Bull. Inst. Combin. Appl. 14, 17–86 (1995)
Ou, M., Cui, P., Pei, J., Zhang, Z., Zhu, W.: Asymmetric transitivity preserving graph embedding. In: SIGKDD, pp. 1105–1114 (2016)
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)
Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: SIGKDD, pp. 701–710 (2014)
Perozzi, B., Kulkarni, V., Chen, H., Skiena, S.: Don’t walk, skip! online learning of multi-scale network embeddings. In: ASONAM, pp. 258–265 (2017)
Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76(3), 036106 (2007)
Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: LREC, pp. 45–50 (2010)
Rozemberczki, B., Kiss, O., Sarkar, R.: An API oriented open-source python framework for unsupervised learning on graphs (2020)
Traag, V.A.: Faster unfolding of communities: speeding up the Louvain algorithm. Phys. Rev. E 92(3), 032801 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Prouteau, T. et al. (2021). SINr: Fast Computing of Sparse Interpretable Node Representations is not a Sin!. In: Abreu, P.H., Rodrigues, P.P., Fernández, A., Gama, J. (eds) Advances in Intelligent Data Analysis XIX. IDA 2021. Lecture Notes in Computer Science(), vol 12695. Springer, Cham. https://doi.org/10.1007/978-3-030-74251-5_26
Download citation
DOI: https://doi.org/10.1007/978-3-030-74251-5_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-74250-8
Online ISBN: 978-3-030-74251-5
eBook Packages: Computer ScienceComputer Science (R0)