Abstract
We study point-to-point distance estimation in hypergraphs, where the query is parameterized by a positive integer s, which defines the required level of overlap for two hyperedges to be considered adjacent. To answer s-distance queries, we first explore an oracle based on the line graph of the given hypergraph and discuss its limitations: The line graph is typically orders of magnitude larger than the original hypergraph. We then introduce HypED, a landmark-based oracle with a predefined size, built directly on the hypergraph, thus avoiding the materialization of the line graph. Our framework allows to approximately answer vertex-to-vertex, vertex-to-hyperedge, and hyperedge-to-hyperedge s-distance queries for any value of s. A key observation at the basis of our framework is that as s increases, the hypergraph becomes more fragmented. We show how this can be exploited to improve the placement of landmarks, by identifying the s-connected components of the hypergraph. For this latter task, we devise an efficient algorithm based on the union-find technique and a dynamic inverted index. We experimentally evaluate HypED on several real-world hypergraphs and prove its versatility in answering s-distance queries for different values of s. Our framework allows answering such queries in fractions of a millisecond while allowing fine-grained control of the trade-off between index size and approximation error at creation time. Finally, we prove the usefulness of the s-distance oracle in two applications, namely hypergraph-based recommendation and the approximation of the s-closeness centrality of vertices and hyperedges in the context of protein-protein interactions.
Similar content being viewed by others
Notes
The s-distance is often defined as the length of the shortest s-path. We subtract 1 to align with graph theory conventions where the distance between edges is the number of vertices in a shortest path between them, making adjacent edges connected by a path of length 2 but at distance 1.
An alternative definition of vertex-to-vertex s-distance that is a metric could be considered. Let the dual hypergraph be the one obtained by swapping the roles of vertices and hyperedges: Hyperedges become vertices, and each vertex in the original hypergraph becomes a hyperedge that connects all the vertices in the dual that correspond to the hyperedges of the original hypergraph by which it was contained. We can compute the hyperedge-to-hyperedge s-distance in this dual hypergraph and obtain a metric vertex-to-vertex s-distance in the original hypergraph.
However, note that the vertex-to-vertex s-distance as in Definition 3 and the hyperedge-to-hyperedge s-distance in the dual hypergraph yield distinct results. For instance, s-connected vertices in the hypergraph may be at infinite distance in the dual. In fact, an s-path in the hypergraph is a sequence of hyperedges such that consecutive hyperedges share at least s common vertices, whereas an s-path in the dual is a sequence of vertices such that consecutive vertices belong to at least s common hyperedges.
Both definitions are valid and could be adopted depending on the applications at hand. Our framework can also handle this alternative definition of vertex-to-vertex s-distance, by simply applying it to the dual hypergraph. Of course, this leads to a separate oracle that could not be used to answer hyperedge-to-hyperedge s-distance queries in the original hypergraph. Therefore, henceforth we focus on the vertex-to-vertex s-distance as per Definition 3. This allows to answer all three types of queries with a single oracle.
In contrast to hyperedges, for \(s > 1\), a vertex v may belong to different s-connected components, as it may be in hyperedges that overlap only in v.
References
Liu, Q., Huang, Y., Metaxas, D.N.: Hypergraph with sampling for image retrieval. Pattern Recogn. 44(10), 2255 (2011)
Akiba, T., Iwata, Y., Yoshida, Y.: Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In: SIGMOD, p. 349 (2013)
Aksoy, S.G., Joslyn, C., Marrero, C.O., Praggastis, B., Purvine, E.: Hypernetwork science via high-order hypergraph walks. EPJ Data Sci. 9(1), 16 (2020)
Ausiello, G., Laura, L.: Directed hypergraphs: introduction and fundamental algorithms–a survey. Theor. Comput. Sci. 658, 293 (2017)
Baswana, S., Goyal, V., Sen, S.: All-pairs nearly 2-approximate shortest paths in o (n2polylogn) time. Theor. Comput. Sci. 410(1), 84 (2009)
Benson, A.R., Abebe, R., Schaub, M.T., Jadbabaie, A., Kleinberg, J.: Simplicial closure and higher-order link prediction. PNAS 115(48), E11221 (2018)
Berge, C.: Hypergraphs: Combinatorics of Finite Sets, vol. 45. Elsevier (1984)
Betzler, N., Fellows, M.R., Guo, J., Niedermeier, R., Rosamond, F.A.: Fixed-parameter algorithms for kemeny scores. In: AAIM, p. 60 (2008)
Billings, J.C.W., Hu, M., Lerda, G., Medvedev, A.N., Mottes, F., Onicas, A., Santoro, A., Petri, G.: Simplex2vec embeddings for community detection in simplicial complexes. arXiv preprint arXiv:1906.09068 (2019)
Brancotte, B., Yang, B., Blin, G., Cohen-Boulakia, S., Denise, A., Hamel, S.: Rank aggregation with ties: experiments and analysis. PVLDB 8(11), 1202 (2015)
Bretto, A., Cherifi, H., Aboutajdine, D.: Hypergraph imaging: an overview. Pattern Recogn. 35(3), 651 (2002)
Bu, J., Tan, S., Chen, C., Wang, C., Wu, H., Zhang, L., He, X.: Music recommendation by unified hypergraph: Combining social media information and music content. In: MM, p. 391 (2010)
Chlamtáč, E., Dinitz, M., Konrad, C., Kortsarz, G., Rabanca, G.: The densest \(k\)-subhypergraph problem. SIAM J. Discrete Math. 32(2), 1458 (2018)
Cohen-Boulakia, S., Denise, A., Hamel, S.: Using medians to generate consensus rankings for biological data. In: SSDBM, p. 73 (2011)
Cooley, O., Kang, M., Koch, C.: Evolution of high-order connected components in random hypergraphs. Electron. Not. Discrete Math. 49, 569 (2015)
Cooper, C., Lee, S.H., Radzik, T., Siantos, Y.: Random walks in recommender systems: exact computation and simulations. In: WWW, p. 811 (2014)
De Figueiredo, L.F., Schuster, S., Kaleta, C., Fell, D.A.: Can sugars be produced from fatty acids? a test case for pathway analysis tools. Bioinformatics 24(22), 2615 (2008)
Draves, R., Padhye, J., Zill, B.: Routing in multi-radio, multi-hop wireless mesh networks. In: MobiCom, p. 114 (2004)
Farhan, M., Wang, Q., Lin, Y., Mckay, B.: A highly scalable labelling approach for exact distance queries in complex networks. EDBT (2019)
Fatemi, B., Taslakian, P., Vazquez, D., Poole, D.: Knowledge hypergraphs: Prediction beyond binary relations. arXiv preprint arXiv:1906.00137 (2019)
Feng, S., Heath, E., Jefferson, B., Joslyn, C., Kvinge, H., Mitchell, H.D., Praggastis, B., Eisfeld, A.J., Sims, A.C., Thackray, L.B., et al.: Hypergraph models of biological networks to identify genes critical to pathogenic viral response. BMC Bioinform. 22(1), 1 (2021)
Feng, Y., You, H., Zhang, Z., Ji, R., Gao, Y.: Hypergraph neural networks. In: AAAI, p. 3558 (2019)
Franzese, N., Groce, A., Murali, T., Ritz, A.: Hypergraph-based connectivity measures for signaling pathway topologies. PLoS Comput. Biol. 15(10), e1007,384 (2019)
Gallo, G., Longo, G., Pallottino, S., Nguyen, S.: Directed hypergraphs and applications. Discrete Appl. Math. 42(2–3), 177 (1993)
Gao, J., Zhao, Q., Ren, W., Swami, A., Ramanathan, R., Bar-Noy, A.: Dynamic shortest path algorithms for hypergraphs. Trans. Netw. 23(6), 1805 (2014)
Goldberg, A.V.: Point-to-point shortest path algorithms with preprocessing. In: SOFSEM, p. 88 (2007)
Goldberg, A.V., Harrelson, C.: Computing the shortest path: A search meets graph theory. In: SODA, vol. 5, p. 156. Citeseer (2005)
Goldberg, A.V., Kaplan, H., Werneck, R.F.: Reach for a*: Efficient point-to-point shortest path algorithms. In: ALENEX, p. 129. SIAM (2006)
Goldman, R., Shivakumar, N., Venkatasubramanian, S., Garcia-Molina, H.: Proximity search in databases. VLDB 98, p. 26 (1998)
Gori, M., Pucci, A.: Research paper recommender systems: A random-walk based approach. In: 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI’06), p. 778 (2006)
Gori, M., Pucci, A., Roma, V., Siena, I.: Itemrank: A random-walk based scoring algorithm for recommender engines. In: IJCAI, vol. 7, p. 2766 (2007)
Gubichev, A., Bedathur, S., Seufert, S., Weikum, G.: Fast and accurate estimation of shortest paths in large graphs. In: CIKM, p. 499 (2010)
Huang, J., Zhang, R., Yu, J.X.: Scalable hypergraph learning and processing. In: ICDM, p. 775 (2015)
Hwang, H., Lee, S., Shin, K.: Hyfer: A framework for making hypergraph learning easy, scalable and benchmarkable. In: GLB (2021)
Italiano, G.F., Nanni, U.: Online maintenance of minimal directed hypergraphs (1989)
Jeong, H., Mason, S.P., Barabási, A.L., Oltvai, Z.N.: Lethality and centrality in protein networks. Nature 411(6833), 41 (2001)
Ji, S., Feng, Y., Ji, R., Zhao, X., Tang, W., Gao, Y.: Dual channel hypergraph collaborative filtering. In: KDD, p. 2020 (2020)
Jiang, J., Wei, Y., Feng, Y., Cao, J., Gao, Y.: Dynamic hypergraph neural networks. In: IJCAI, p. 2635 (2019)
Jin, R., Peng, Z., Wu, W., Dragan, F., Agrawal, G., Ren, B.: Parallelizing pruned landmark labeling: dealing with dependencies in graph algorithms. In: ICS, p. 1 (2020)
Joslyn, C.A., Aksoy, S.G., Callahan, T.J., Hunter, L.E., Jefferson, B., Praggastis, B., Purvine, E., Tripodi, I.J.: Hypernetwork science: from multidimensional networks to computational topology. In: CCS, p. 377 (2020)
Joy, M.P., Brock, A., Ingber, D.E., Huang, S.: High-betweenness proteins in the yeast protein interaction network. J. Biomed. Biotechnol. 2005(2), 96 (2005)
Karypis, G., Aggarwal, R., Kumar, V., Shekhar, S.: Multilevel hypergraph partitioning: applications in vlsi domain. VLSI 7(1), 69 (1999)
Kemeny, J.G.: Mathematics without numbers. Daedalus 88(4), 577 (1959)
Kirkland, S.: Two-mode networks exhibiting data loss. J. Comp. Netw. 6(2), 297 (2018)
Klamt, S., Haus, U.U., Theis, F.: Hypergraphs and cellular networks. PLoS Comput. Biol. 5(5), e1000,385 (2009)
Kleinberg, J.M.: Navigation in a small world. Nature 406(6798), 845 (2000)
Kotlyar, M., Fortney, K., Jurisica, I.: Network-based characterization of drug-regulated genes, drug targets, and toxicity. Methods 57(4), 499 (2012)
Krieger, S., Kececioglu, J.: Fast approximate shortest hyperpaths for inferring pathways in cell signaling hypergraphs. In: WABI. Schloss Dagstuhl-Leibniz-Zentrum für Informatik (2021)
Kumar, R., Vassilvitskii, S.: Generalized distances between rankings. In: WWW, p. 571(2010)
Li, D., Xu, Z., Li, S., Sun, X.: Link prediction in social networks based on hypergraph. In: WWW, p. 41 (2013)
Li, J., He, J., Zhu, Y.: E-tail product return prediction via hypergraph-based local graph cut. In: KDD, p. 519 (2018)
Li, W., Qiao, M., Qin, L., Zhang, Y., Chang, L., Lin, X.: Scaling up distance labeling on graphs with core-periphery properties. In: SIGMOD, p. 1367 (2020)
Liu, X.T., Firoz, J., Aksoy, S., Amburg, I., Lumsdaine, A., Joslyn, C., Gebremedhin, A.H., Praggastis, B.: High-order line graphs of non-uniform hypergraphs: Algorithms, applications, and experimental analysis. arXiv preprint arXiv:2201.11326 (2022)
Liu, X.T., Firoz, J., Lumsdaine, A., Joslyn, C., Aksoy, S., Praggastis, B., Gebremedhin, A.: Parallel algorithms and heuristics for efficient computation of high-order line graphs of hypergraphs. arXiv preprint arXiv:2010.11448 (2020)
Lu, L., Peng, X.: High-ordered random walks and generalized laplacians on hypergraphs. In: International Workshop on Algorithms and Models for the Web-Graph, p. 14 (2011)
Luo, Q., Yu, D., Cai, Z., Lin, X., Wang, G., Cheng, X.: Toward maintenance of hypercores in large-scale dynamic hypergraphs. In: VLDBJ, p. 1 (2022)
Manne, F., Patwary, M., Ali, M.: A scalable parallel union-find algorithm for distributed memory computers. In: PPAM, p. 186 (2009)
Nielsen, L.R., Andersen, K.A., Pretolani, D.: Finding the k shortest hyperpaths. Comput. Oper. Res. 32(6), 1477 (2005)
Potamias, M., Bonchi, F., Castillo, C., Gionis, A.: Fast shortest path distance estimation in large networks. In: CIKM, p. 867 (2009)
Preti, G., De Francisci Morales, G., Bonchi, F.: Strud: Truss decomposition of simplicial complexes. In: The Web Conference, p. 3408 (2021)
Qi, Z., Xiao, Y., Shao, B., Wang, H.: Toward a distance oracle for billion-node graphs. PVLDB 7(1), 61 (2013)
Rahman, S.A., Advani, P., Schunk, R., Schrader, R., Schomburg, D.: Metabolic pathway analysis web service (pathway hunter tool at cubic). Bioinformatics 21(7), 1189 (2005)
Ritz, A., Avent, B., Murali, T.: Pathway analysis with signaling hypergraphs. TCBB 14(5), 1042 (2015)
Ritz, A., Tegge, A.N., Kim, H., Poirel, C.L., Murali, T.: Signaling hypergraphs. Trends Biotechnol. 32(7), 356 (2014)
Schölkopf, B., Platt, J., Hofmann, T.: Learning with hypergraphs: Clustering, classification, and embedding. In: NIPS, p. 1601 (2007)
Shun, J.: Practical parallel hypergraph algorithms. In: SIGPLAN, p. 232 (2020)
Sommer, C.: Shortest-path queries in static networks. CSU 46(4), 1 (2014)
Soofi, A., Taghizadeh, M., Tabatabaei, S.M., Tavirani, M.R., Shakib, H., Namaki, S., Alighiarloo, N.S.: Centrality analysis of protein-protein interaction networks and molecular docking prioritize potential drug-targets in type 1 diabetes. IJPR 19(4), 121 (2020)
Sun, B., Chan, T.H.H., Sozio, M.: Fully dynamic approximate k-core decomposition in hypergraphs. TKDD 14(4) (2020)
Tan, H.K., Ngo, C.W., Wu, X.: Modeling video hyperlinks with hypergraph for web video reranking. In: MM, p. 659 (2008)
Tan, S., Guan, Z., Cai, D., Qin, X., Bu, J., Chen, C.: Mapping users across networks by manifold alignment on hypergraph. AAAI 28(1) (2014)
Tarjan, R.E., Van Leeuwen, J.: Worst-case analysis of set union algorithms. JACM 31(2), 245 (1984)
Thorup, M., Zwick, U.: Approximate distance oracles. JACM 52(1), 1 (2005)
Tofallis, C.: A better measure of relative prediction accuracy for model selection and model estimation. J. Oper. Res. Soc. 66(8), 1352 (2015)
Tretyakov, K., Armas-Cervantes, A., García-Bañuelos, L., Vilo, J., Dumas, M.: Fast fully dynamic landmark-based estimation of shortest path distances in very large graphs. In: CIKM, p. 1785 (2011)
Viacava Follis, A.: Centrality of drug targets in protein networks. BMC Bioinform. 22(1), 1 (2021)
Vieira, M.V., Fonseca, B.M., Damazio, R., Golgher, P.B., Reis, D.d.C., Ribeiro-Neto, B.: Efficient search ranking in social networks. In: CIKM, p. 563 (2007)
Xu, Q., Zhang, X., Zhao, J., Wang, X., Wolf, T.: Fast shortest-path queries on large-scale graphs. In: ICNP, p. 1 (2016)
Yang, D., Qu, B., Yang, J., Cudre-Mauroux, P.: Revisiting user mobility and social relationships in lbsns: A hypergraph embedding approach. In: WWW, p. 2147 (2019)
Zhang, M., Cui, Z., Jiang, S., Chen, Y.: Beyond link prediction: Predicting hyperlinks in adjacency space. AAAI 32(1) (2018)
Zheng, X., Luo, Y., Sun, L., Ding, X., Zhang, J.: A novel social network hybrid recommender system based on hypergraph topologic structure. WWW, p. 985 (2018)
Zheng, X., Luo, Y., Sun, L., Ding, X., Zhang, J.: A novel social network hybrid recommender system based on hypergraph topologic structure. WWW, p. 985 (2018)
Zhu, Y., Guan, Z., Tan, S., Liu, H., Cai, D., He, X.: Heterogeneous hypergraph embedding for document recommendation. Neurocomputing 216, 150 (2016)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Preti, G., De Francisci Morales, G. & Bonchi, F. Hyper-distance oracles in hypergraphs. The VLDB Journal (2024). https://doi.org/10.1007/s00778-024-00851-2
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00778-024-00851-2