Abstract
Natural Language Processing (NLP) techniques are powerful tools for analyzing, understanding, and processing human language with a wide range of applications. In this paper we exploit NLP techniques, combined with Machine Learning clustering algorithms, to find good solutions to a traditional combinatorial problem, namely, the computation of a partition with high modularity of a graph. We introduce a novel framework, dubbed Clique-TF-IDF, for computing a graph partition. Such a framework leverages dense subgraphs of the input graph, modeled as maximal cliques, and characterizes each node in terms of the cliques it belongs to, similarly to a term-document matrix. Our experimental results show that the quality of the partitions produced by algorithm Clique-TF-IDF is comparable with that of the most effective algorithms in the literature. While our focus is on maximal cliques and partitioning algorithms, we believe that this strategy can be generalized to devise AI solutions for a variety of intractable combinatorial problems where some substructures can be efficiently enumerated and exploited.
This research was supported in part by MUR PRIN Projects no. 2022TS4Y3N (EXPAND) and no. 2022ME9Z78 (NextGRAAL).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Mach. Learn. 56(1–3), 89–113 (2004). https://doi.org/10.1023/B:MACH.0000033116.57574.95
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008(10), P10008 (2008)
Brandes, U., et al.: On finding graph clusterings with maximum modularity. In: Brandstädt, A., Kratsch, D., Müller, H. (eds.) WG 2007. LNCS, vol. 4769, pp. 121–132. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74839-7_12
Bron, C., Kerbosch, J.: Finding all cliques of an undirected graph (algorithm 457). Commun. ACM 16(9), 575–576 (1973)
Cazals, F., Karande, C.: A note on the problem of reporting maximal cliques. Theor. Comput. Sci. 407(1–3), 564–568 (2008)
Chen, Y., et al.: SP-GNN: learning structure and position information from graphs. Neural Netw. 161, 505–514 (2023). https://doi.org/10.1016/j.neunet.2023.01.051
Cheng, J., Ke, Y., Fu, A.W.C., Yu, J.X., Zhu, L.: Finding maximal cliques in massive networks. ACM Trans. Database Syst. 36(4), 21 (2011)
Cheng, J., Zhu, L., Ke, Y., Chu, S.: Fast algorithms for maximal clique enumeration with limited memory. In: KDD, pp. 1240–1248 (2012)
Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70, 066111 (2004)
Combe, D., Largeron, C., Géry, M., Egyed-Zsigmond, E.: I-Louvain: an attributed graph clustering method. In: Fromont, E., De Bie, T., van Leeuwen, M. (eds.) IDA 2015. LNCS, vol. 9385, pp. 181–192. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24465-5_16
Conte, A., De Virgilio, R., Maccioni, A., Patrignani, M., Torlone, R.: Finding all maximal cliques in very large social networks. In: EDBT 2016, pp. 173–184. OpenProceedings.org, Konstanz, Germany (2016)
Conte, A., Firmani, D., Patrignani, M., Torlone, R.: Shared-nothing distributed enumeration of 2-plexes. In: CIKM 2019, pp. 2469–2472. ACM, New York (2019)
Coppa, E., Finocchi, I., Garcia, R.L.: Counting cliques in parallel without a cluster: engineering a fork/join algorithm for shared-memory platforms. Inf. Sci. 496, 553–571 (2019)
Cordasco, G., Gargano, L.: Community detection via semi-synchronous label propagation algorithms. In: 2010 IEEE International Workshop on: Business Applications of Social Network Analysis (BASNA), pp. 1–8 (2010)
Cui, H., Lu, Z., Li, P., Yang, C.: On positional and structural node features for graph neural networks on non-attributed graphs. In: Hasan, M.A., Xiong, L. (eds.) ACM CIKM 2022, pp. 3898–3902. ACM (2022)
Devvrit, F., Sinha, A., Dhillon, I.S., Jain, P.: S3GC: scalable self-supervised graph clustering. In: NeurIPS (2022)
Eppstein, D., Strash, D.: Listing all maximal cliques in large sparse real-world graphs. In: SEA, pp. 364–375 (2011)
Finocchi, I., Finocchi, M., Fusco, E.G.: Clique counting in mapreduce: algorithms and experiments. ACM J. Exp. Algorithmics 20, 1.7:1–1.7:20 (2015)
Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. 99(12), 7821–7826 (2002)
Grover, A., Leskovec, J.: Node2vec: scalable feature learning for networks. In: Krishnapuram, B., Shah, M., Smola, A.J., Aggarwal, C.C., Shen, D., Rastogi, R. (eds.) ACM SIGKDD 2016, pp. 855–864. ACM (2016). https://doi.org/10.1145/2939672.2939754
Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R.E., Thatcher, J.W., Bohlinger, J.D. (eds.) Complexity of Computer Computations. The IBM Research Symposia Series, pp. 85–103. Springer, Boston, MA (1972). https://doi.org/10.1007/978-1-4684-2001-2_9
Koch, I.: Enumerating all connected maximal common subgraphs in two graphs. Theor. Comput. Sci. 250(1–2), 1–30 (2001)
Lattanzi, S., Moseley, B., Vassilvitskii, S., Wang, Y., Zhou, R.: Robust online correlation clustering. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) NeurIPS 2021, pp. 4688–4698 (2021)
Leskovec, J., Krevl, A.: SNAP datasets: stanford large network dataset collection, June 2014. https://snap.stanford.edu/data
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge, UK (2008). https://nlp.stanford.edu/IR-book/information-retrieval-book.html
Pattillo, J., Youssef, N., Butenko, S.: Clique relaxation models in social network analysis. In: Thai, M., Pardalos, P. (eds.) Handbook of Optimization in Complex Networks. SOIA, vol. 58, pp. 143–162. Springer, New York, NY (2012). https://doi.org/10.1007/978-1-4614-0857-4_5
Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: Macskassy, S.A., Perlich, C., Leskovec, J., Wang, W., Ghani, R. (eds.) The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, New York, NY, USA - 24–27 August 2014, pp. 701–710. ACM (2014). https://doi.org/10.1145/2623330.2623732
Pizzuti, C.: GA-Net: a genetic algorithm for community detection in social networks. In: Rudolph, G., Jansen, T., Beume, N., Lucas, S., Poloni, C. (eds.) PPSN 2008. LNCS, vol. 5199, pp. 1081–1090. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87700-4_107
Pons, P., Latapy, M.: Computing communities in large networks using random walks (long version) (2005)
Prat-Pérez, A., Dominguez-Sal, D., Larriba-Pey, J.L.: High quality, scalable and parallel community detection for large real graphs. In: Proceedings of the WWW 2014, pp. 225–236. Association for Computing Machinery, New York, NY, USA (2014)
Reichardt, J., Bornholdt, S.: Statistical mechanics of community detection. Phys.l Rev. E 74(1) (2006). https://doi.org/10.1103/2Fphysreve.74.016110
Ribeiro, L.F.R., Saverese, P.H.P., Figueiredo, D.R.: Struc2vec: learning node representations from structural identity. In: ACM SIGKDD 2017, pp. 385–394. ACM (2017). https://doi.org/10.1145/3097983.3098061
Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Sci. 105(4), 1118–1123 (2008)
Saha, B., Subramanian, S.: Correlation clustering with same-cluster queries bounded by optimal cost. In: Bender, M.A., Svensson, O., Herman, G. (eds.) ESA 2019. LIPIcs, vol. 144, pp. 81:1–81:17. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019). https://doi.org/10.4230/LIPIcs.ESA.2019.81
Srinivasan, B., Ribeiro, B.: On the equivalence between positional node embeddings and structural graph representations. In: ICLR 2020. OpenReview.net (2020)
Tan, P.N., Steinbach, M., Karpatne, A., Kumar, V.: Introduction to Data Mining, 2nd ed. Pearson, London (2018)
Tomita, E., Tanaka, A., Takahashi, H.: The worst-case time complexity for generating all maximal cliques. In: Chwa, K.-Y., Munro, J.I.J. (eds.) COCOON 2004. LNCS, vol. 3106, pp. 161–170. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-27798-9_19
Traag, V.A., Waltman, L., van Eck, N.J.: From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9, 1–12 (2019). https://www.nature.com/articles/s41598-019-41695-z#citeas
Wang, J., Cheng, J.: Truss decomposition in massive networks. Proc. VLDB Endow. 5(9), 812–823 (2012)
Xu, Y., Cheng, J., Fu, A.W.C., Bu, Y.: Distributed maximal clique computation. In: International Congress on Big Data, pp. 160–167. IEEE (2014)
Zhu, J., Lu, X., Heimann, M., Koutra, D.: Node proximity is all you need: unified structural and positional node and graph embedding. In: Demeniconi, C., Davidson, I. (eds.) SIAM International Conference on Data Mining, SDM 2021, pp. 163–171. SIAM (2021). https://doi.org/10.1137/1.9781611976700.19
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
D’Elia, M., Finocchi, I., Patrignani, M. (2023). Clique-TF-IDF: A New Partitioning Framework Based on Dense Substructures. In: Basili, R., Lembo, D., Limongelli, C., Orlandini, A. (eds) AIxIA 2023 – Advances in Artificial Intelligence. AIxIA 2023. Lecture Notes in Computer Science(), vol 14318. Springer, Cham. https://doi.org/10.1007/978-3-031-47546-7_27
Download citation
DOI: https://doi.org/10.1007/978-3-031-47546-7_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47545-0
Online ISBN: 978-3-031-47546-7
eBook Packages: Computer ScienceComputer Science (R0)