Clique-TF-IDF: A New Partitioning Framework Based on Dense Substructures

D’Elia, Marco; Finocchi, Irene; Patrignani, Maurizio

doi:10.1007/978-3-031-47546-7_27

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14318))

Included in the following conference series:

International Conference of the Italian Association for Artificial Intelligence

528 Accesses
1 Citations

Abstract

Natural Language Processing (NLP) techniques are powerful tools for analyzing, understanding, and processing human language with a wide range of applications. In this paper we exploit NLP techniques, combined with Machine Learning clustering algorithms, to find good solutions to a traditional combinatorial problem, namely, the computation of a partition with high modularity of a graph. We introduce a novel framework, dubbed Clique-TF-IDF, for computing a graph partition. Such a framework leverages dense subgraphs of the input graph, modeled as maximal cliques, and characterizes each node in terms of the cliques it belongs to, similarly to a term-document matrix. Our experimental results show that the quality of the partitions produced by algorithm Clique-TF-IDF is comparable with that of the most effective algorithms in the literature. While our focus is on maximal cliques and partitioning algorithms, we believe that this strategy can be generalized to devise AI solutions for a variety of intractable combinatorial problems where some substructures can be efficiently enumerated and exploited.

This research was supported in part by MUR PRIN Projects no. 2022TS4Y3N (EXPAND) and no. 2022ME9Z78 (NextGRAAL).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Mach. Learn. 56(1–3), 89–113 (2004). https://doi.org/10.1023/B:MACH.0000033116.57574.95
Article MathSciNet MATH Google Scholar
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008(10), P10008 (2008)
Google Scholar
Brandes, U., et al.: On finding graph clusterings with maximum modularity. In: Brandstädt, A., Kratsch, D., Müller, H. (eds.) WG 2007. LNCS, vol. 4769, pp. 121–132. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74839-7_12
Chapter Google Scholar
Bron, C., Kerbosch, J.: Finding all cliques of an undirected graph (algorithm 457). Commun. ACM 16(9), 575–576 (1973)
Article MATH Google Scholar
Cazals, F., Karande, C.: A note on the problem of reporting maximal cliques. Theor. Comput. Sci. 407(1–3), 564–568 (2008)
Article MathSciNet MATH Google Scholar
Chen, Y., et al.: SP-GNN: learning structure and position information from graphs. Neural Netw. 161, 505–514 (2023). https://doi.org/10.1016/j.neunet.2023.01.051
Article Google Scholar
Cheng, J., Ke, Y., Fu, A.W.C., Yu, J.X., Zhu, L.: Finding maximal cliques in massive networks. ACM Trans. Database Syst. 36(4), 21 (2011)
Article Google Scholar
Cheng, J., Zhu, L., Ke, Y., Chu, S.: Fast algorithms for maximal clique enumeration with limited memory. In: KDD, pp. 1240–1248 (2012)
Google Scholar
Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70, 066111 (2004)
Article Google Scholar
Combe, D., Largeron, C., Géry, M., Egyed-Zsigmond, E.: I-Louvain: an attributed graph clustering method. In: Fromont, E., De Bie, T., van Leeuwen, M. (eds.) IDA 2015. LNCS, vol. 9385, pp. 181–192. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24465-5_16
Chapter Google Scholar
Conte, A., De Virgilio, R., Maccioni, A., Patrignani, M., Torlone, R.: Finding all maximal cliques in very large social networks. In: EDBT 2016, pp. 173–184. OpenProceedings.org, Konstanz, Germany (2016)
Google Scholar
Conte, A., Firmani, D., Patrignani, M., Torlone, R.: Shared-nothing distributed enumeration of 2-plexes. In: CIKM 2019, pp. 2469–2472. ACM, New York (2019)
Google Scholar
Coppa, E., Finocchi, I., Garcia, R.L.: Counting cliques in parallel without a cluster: engineering a fork/join algorithm for shared-memory platforms. Inf. Sci. 496, 553–571 (2019)
Article Google Scholar
Cordasco, G., Gargano, L.: Community detection via semi-synchronous label propagation algorithms. In: 2010 IEEE International Workshop on: Business Applications of Social Network Analysis (BASNA), pp. 1–8 (2010)
Google Scholar
Cui, H., Lu, Z., Li, P., Yang, C.: On positional and structural node features for graph neural networks on non-attributed graphs. In: Hasan, M.A., Xiong, L. (eds.) ACM CIKM 2022, pp. 3898–3902. ACM (2022)
Google Scholar
Devvrit, F., Sinha, A., Dhillon, I.S., Jain, P.: S3GC: scalable self-supervised graph clustering. In: NeurIPS (2022)
Google Scholar
Eppstein, D., Strash, D.: Listing all maximal cliques in large sparse real-world graphs. In: SEA, pp. 364–375 (2011)
Google Scholar
Finocchi, I., Finocchi, M., Fusco, E.G.: Clique counting in mapreduce: algorithms and experiments. ACM J. Exp. Algorithmics 20, 1.7:1–1.7:20 (2015)
Google Scholar
Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. 99(12), 7821–7826 (2002)
Google Scholar
Grover, A., Leskovec, J.: Node2vec: scalable feature learning for networks. In: Krishnapuram, B., Shah, M., Smola, A.J., Aggarwal, C.C., Shen, D., Rastogi, R. (eds.) ACM SIGKDD 2016, pp. 855–864. ACM (2016). https://doi.org/10.1145/2939672.2939754
Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R.E., Thatcher, J.W., Bohlinger, J.D. (eds.) Complexity of Computer Computations. The IBM Research Symposia Series, pp. 85–103. Springer, Boston, MA (1972). https://doi.org/10.1007/978-1-4684-2001-2_9
Koch, I.: Enumerating all connected maximal common subgraphs in two graphs. Theor. Comput. Sci. 250(1–2), 1–30 (2001)
Article MathSciNet MATH Google Scholar
Lattanzi, S., Moseley, B., Vassilvitskii, S., Wang, Y., Zhou, R.: Robust online correlation clustering. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) NeurIPS 2021, pp. 4688–4698 (2021)
Google Scholar
Leskovec, J., Krevl, A.: SNAP datasets: stanford large network dataset collection, June 2014. https://snap.stanford.edu/data
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge, UK (2008). https://nlp.stanford.edu/IR-book/information-retrieval-book.html
Pattillo, J., Youssef, N., Butenko, S.: Clique relaxation models in social network analysis. In: Thai, M., Pardalos, P. (eds.) Handbook of Optimization in Complex Networks. SOIA, vol. 58, pp. 143–162. Springer, New York, NY (2012). https://doi.org/10.1007/978-1-4614-0857-4_5
Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: Macskassy, S.A., Perlich, C., Leskovec, J., Wang, W., Ghani, R. (eds.) The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, New York, NY, USA - 24–27 August 2014, pp. 701–710. ACM (2014). https://doi.org/10.1145/2623330.2623732
Pizzuti, C.: GA-Net: a genetic algorithm for community detection in social networks. In: Rudolph, G., Jansen, T., Beume, N., Lucas, S., Poloni, C. (eds.) PPSN 2008. LNCS, vol. 5199, pp. 1081–1090. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87700-4_107
Chapter Google Scholar
Pons, P., Latapy, M.: Computing communities in large networks using random walks (long version) (2005)
Google Scholar
Prat-Pérez, A., Dominguez-Sal, D., Larriba-Pey, J.L.: High quality, scalable and parallel community detection for large real graphs. In: Proceedings of the WWW 2014, pp. 225–236. Association for Computing Machinery, New York, NY, USA (2014)
Google Scholar
Reichardt, J., Bornholdt, S.: Statistical mechanics of community detection. Phys.l Rev. E 74(1) (2006). https://doi.org/10.1103/2Fphysreve.74.016110
Ribeiro, L.F.R., Saverese, P.H.P., Figueiredo, D.R.: Struc2vec: learning node representations from structural identity. In: ACM SIGKDD 2017, pp. 385–394. ACM (2017). https://doi.org/10.1145/3097983.3098061
Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Sci. 105(4), 1118–1123 (2008)
Google Scholar
Saha, B., Subramanian, S.: Correlation clustering with same-cluster queries bounded by optimal cost. In: Bender, M.A., Svensson, O., Herman, G. (eds.) ESA 2019. LIPIcs, vol. 144, pp. 81:1–81:17. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019). https://doi.org/10.4230/LIPIcs.ESA.2019.81
Srinivasan, B., Ribeiro, B.: On the equivalence between positional node embeddings and structural graph representations. In: ICLR 2020. OpenReview.net (2020)
Google Scholar
Tan, P.N., Steinbach, M., Karpatne, A., Kumar, V.: Introduction to Data Mining, 2nd ed. Pearson, London (2018)
Google Scholar
Tomita, E., Tanaka, A., Takahashi, H.: The worst-case time complexity for generating all maximal cliques. In: Chwa, K.-Y., Munro, J.I.J. (eds.) COCOON 2004. LNCS, vol. 3106, pp. 161–170. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-27798-9_19
Chapter Google Scholar
Traag, V.A., Waltman, L., van Eck, N.J.: From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9, 1–12 (2019). https://www.nature.com/articles/s41598-019-41695-z#citeas
Wang, J., Cheng, J.: Truss decomposition in massive networks. Proc. VLDB Endow. 5(9), 812–823 (2012)
Article Google Scholar
Xu, Y., Cheng, J., Fu, A.W.C., Bu, Y.: Distributed maximal clique computation. In: International Congress on Big Data, pp. 160–167. IEEE (2014)
Google Scholar
Zhu, J., Lu, X., Heimann, M., Koutra, D.: Node proximity is all you need: unified structural and positional node and graph embedding. In: Demeniconi, C., Davidson, I. (eds.) SIAM International Conference on Data Mining, SDM 2021, pp. 163–171. SIAM (2021). https://doi.org/10.1137/1.9781611976700.19

Download references

Author information

Authors and Affiliations

Roma Tre University, Via della Vasca Navale 79, Rome, Italy
Marco D’Elia & Maurizio Patrignani
Luiss Guido Carli, Viale Romania 32, Rome, Italy
Irene Finocchi

Authors

Marco D’Elia
View author publications
You can also search for this author in PubMed Google Scholar
Irene Finocchi
View author publications
You can also search for this author in PubMed Google Scholar
Maurizio Patrignani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marco D’Elia .

Editor information

Editors and Affiliations

University of Rome Tor Vergata, Rome, Italy
Roberto Basili
Sapienza University of Rome, Rome, Italy
Domenico Lembo
Roma Tre University, Rome, Italy
Carla Limongelli
National Research Council, Rome, Italy
Andrea Orlandini

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

D’Elia, M., Finocchi, I., Patrignani, M. (2023). Clique-TF-IDF: A New Partitioning Framework Based on Dense Substructures. In: Basili, R., Lembo, D., Limongelli, C., Orlandini, A. (eds) AIxIA 2023 – Advances in Artificial Intelligence. AIxIA 2023. Lecture Notes in Computer Science(), vol 14318. Springer, Cham. https://doi.org/10.1007/978-3-031-47546-7_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-47546-7_27
Published: 02 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47545-0
Online ISBN: 978-3-031-47546-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Clique-TF-IDF: A New Partitioning Framework Based on Dense Substructures