SINr: Fast Computing of Sparse Interpretable Node Representations is not a Sin!

Prouteau, Thibault; Connes, Victor; Dugué, Nicolas; Perez, Anthony; Lamirel, Jean-Charles; Camelin, Nathalie; Meignier, Sylvain

doi:10.1007/978-3-030-74251-5_26

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12695))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

901 Accesses
2 Citations
3 Altmetric

Abstract

While graph embedding aims at learning low-dimensional representations of nodes encompassing the graph topology, word embedding focus on learning word vectors that encode semantic properties of the vocabulary. The first finds applications on tasks such as link prediction and node classification while the latter is systematically considered in natural language processing. Most of the time, graph and word embeddings are considered on their own as distinct tasks. However, word co-occurrence matrices, widely used to extract word embeddings, can be seen as graphs. Furthermore, most network embedding techniques rely either on a word embedding methodology (Word2vec) or on matrix factorization, also widely used for word embedding. These methods are usually computationally expensive, parameter dependant and the dimensions of the embedding space are not interpretable. To circumvent these issues, we introduce the Lower Dimension Bipartite Graphs Framework (LDBGF) which takes advantage of the fact that all graphs can be described as bipartite graphs, even in the case of textual data. This underlying bipartite structure may be explicit, like in coauthor networks. However, with LDBGF, we focus on uncovering latent bipartite structures, lying for instance in social or word co-occurrence networks, and especially such structures providing conciser and interpretable representations of the graph at hand. We further propose SINr, an efficient implementation of the LDBGF approach that extracts Sparse Interpretable Node Representations using community structure to approximate the underlying bipartite structure. In the case of graph embedding, our near-linear time method is the fastest of our benchmark, parameter-free and provides state-of-the-art results on the classical link prediction task. We also show that low-dimensional vectors can be derived from SINr using singular value decomposition. In the case of word embedding, our approach proves to be very efficient considering the classical similarity evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.github.com/anthonimes/SINr.
2.
http://mattmahoney.net/dc/textdata.html and http://www.anc.org/data/oanc/.
3.
https://git-lium.univ-lemans.fr/vpelloin/svd2vec.
4.
With two Intel Xeon CPU E5-2660 2.20 GHz: 16 cores, 96Go Ram.

References

Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: NIPS, pp. 585–591 (2002)
Google Scholar
Bhowmick, A.K., Meneni, K., Danisch, M., Guillaume, J., Mitra, B.: Louvainne: hierarchical louvain method for high quality and scalable network embedding. In: WSDM, pp. 43–51 (2020)
Google Scholar
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech.: Theory Exp. 2008, P10008 (2008)
Article Google Scholar
Brandes, U., et al.: Maximizing modularity is hard. arXiv preprint physics/0608255 (2006)
Google Scholar
Brochier, R., Guille, A., Velcin, J.: Global vectors for node representations. In: WWW, pp. 2587–2593 (2019)
Google Scholar
Cao, S., Lu, W., Xu, Q.: GraRep: learning graph representations with global structural information. In: CIKM, pp. 891–900 (2015)
Google Scholar
Dugué, N., Lamirel, J.-C., Perez, A.: Bringing a feature selection metric from machine learning to complex networks. In: Aiello, L.M., Cherifi, C., Cherifi, H., Lambiotte, R., Lió, P., Rocha, L.M. (eds.) Complex Networks 2018. SCI, vol. 813, pp. 107–118. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05414-4_9
Chapter Google Scholar
Guillaume, J.L., Latapy, M.: Bipartite graphs as models of complex networks. Physica A 371(2), 795–813 (2006)
Article Google Scholar
Kunegis, J.: The Koblenz network collection. In: WWW, pp. 1343–1350 (2013)
Google Scholar
Lastra-Díz, J.J., et al.: Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity. Data Brief 26, 104432 (2019)
Article Google Scholar
Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph evolution: densification and shrinking diameters. ACM Trans. Knowl. Discov. Data 1(1), 2-es (2007)
Article Google Scholar
Levy, O., Goldberg, Y., Dagan, I.: Improving distributional similarity with lessons learned from word embeddings. ACL 3, 211–225 (2015)
Google Scholar
Lu, Q., Getoor, L.: Link-based classification. In: ICML, pp. 496–503 (2003)
Google Scholar
Martínez, V., Berzal, F., Talavera, J.C.C.: A survey of link prediction in complex networks. ACM Comput. Surv. 49(4), 69:1–69:33 (2017)
Article Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)
Google Scholar
Monson, S.D., Pullman, N.J., Rees, R.: A survey of clique and biclique coverings and factorizations of (0, 1)-matrices. Bull. Inst. Combin. Appl. 14, 17–86 (1995)
MathSciNet MATH Google Scholar
Ou, M., Cui, P., Pei, J., Zhang, Z., Zhu, W.: Asymmetric transitivity preserving graph embedding. In: SIGKDD, pp. 1105–1114 (2016)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)
Google Scholar
Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: SIGKDD, pp. 701–710 (2014)
Google Scholar
Perozzi, B., Kulkarni, V., Chen, H., Skiena, S.: Don’t walk, skip! online learning of multi-scale network embeddings. In: ASONAM, pp. 258–265 (2017)
Google Scholar
Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76(3), 036106 (2007)
Article Google Scholar
Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: LREC, pp. 45–50 (2010)
Google Scholar
Rozemberczki, B., Kiss, O., Sarkar, R.: An API oriented open-source python framework for unsupervised learning on graphs (2020)
Google Scholar
Traag, V.A.: Faster unfolding of communities: speeding up the Louvain algorithm. Phys. Rev. E 92(3), 032801 (2015)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire d’Informatique de l’Université du Mans, LIUM, EA 4023, Le Mans Université, Le Mans, France
Thibault Prouteau, Nicolas Dugué, Nathalie Camelin & Sylvain Meignier
Laboratoire des Sciences du Numérique de Nantes, Université de Nantes, Nantes, France
Victor Connes
INSA Centre Val de Loire, LIFO EA 4022, Univ. Orléans, 45067, Orléans, France
Anthony Perez
LORIA, Equipe Synalp, Université de Strasbourg, Strasbourg, France
Jean-Charles Lamirel

Authors

Thibault Prouteau
View author publications
You can also search for this author in PubMed Google Scholar
Victor Connes
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Dugué
View author publications
You can also search for this author in PubMed Google Scholar
Anthony Perez
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Charles Lamirel
View author publications
You can also search for this author in PubMed Google Scholar
Nathalie Camelin
View author publications
You can also search for this author in PubMed Google Scholar
Sylvain Meignier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thibault Prouteau .

Editor information

Editors and Affiliations

University of Coimbra, Coimbra, Portugal
Pedro Henriques Abreu
University of Porto, Porto, Portugal
Pedro Pereira Rodrigues
University of Granada, Granada, Spain
Alberto Fernández
University of Porto, Porto, Portugal
João Gama

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Prouteau, T. et al. (2021). SINr: Fast Computing of Sparse Interpretable Node Representations is not a Sin!. In: Abreu, P.H., Rodrigues, P.P., Fernández, A., Gama, J. (eds) Advances in Intelligent Data Analysis XIX. IDA 2021. Lecture Notes in Computer Science(), vol 12695. Springer, Cham. https://doi.org/10.1007/978-3-030-74251-5_26

Download citation

DOI: https://doi.org/10.1007/978-3-030-74251-5_26
Published: 13 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-74250-8
Online ISBN: 978-3-030-74251-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics