Abstract
Graph query, pattern mining and knowledge discovery become challenging on large-scale heterogeneous information networks (HINs). State-of-the-art techniques involving path propagation mainly focus on the inference of node labels, and neighborhood structures. However, entity links in the real world also contain rich hierarchical inheritance relations. For example, the vulnerability of a product version is likely to be inherited from its older versions. Taking advantage of the hierarchical inheritances can potentially improve the quality of query results. Motivated by this, we explore hierarchical inheritance relations between entities and formulate the problem of graph query on HINs with hierarchical inheritance relations. We propose a graph query search algorithm by decomposing the original query graph into multiple star queries and applying a star query algorithm to each star query. Candidates from each star query result are then constructed for the final top-k query answer to the original query. To efficiently obtain the graph query result from a large-scale HIN, we design a bound-based pruning technique by using the uniform cost search to prune the search spaces. We implement our algorithm in Spark GraphX to test the effectiveness and efficiency on synthetic and real-world datasets. Compared with two state-of-the-art graph query algorithms, our algorithm can effectively obtain more accurate results and competitive performance.
Similar content being viewed by others
References
Davis, D., Lichtenwalter, R., Chawla, N.V.: Multi-relational link prediction in heterogeneous information networks. In: 2011 International Conference on Advances in Social Networks Analysis and Mining, pp. 281–288. IEEE (2011)
Gupta, M., Gao, J., Yan, X., Cam, H., Han, J.: On detecting association-based clique outliers in heterogeneous information networks. In: 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013), pp. 108–115. IEEE (2013)
Shi, C., Li, Y., Zhang, J., Sun, Y., Philip, S.Y.: A survey of heterogeneous information network analysis. IEEE Trans. Knowl. Data Eng. 29(1), 17–37 (2017)
Sun, Y., Han, J.: Mining heterogeneous information networks: principles and methodologies. Synth. Lect. Data Mining Knowl. Discov. 3(2), 1–159 (2012)
Fujiwara, Y., Nakatsuji, M., Onizuka, M., Kitsuregawa, M.: Fast and exact top-k search for random walk with restart. Proc. VLDB Endowment 5(5), 442–453 (2012)
Fujiwara, Y., Nakatsuji, M., Shiokawa, H., Mishima, T., Onizuka, M.: Fast and exact top-k algorithm for pagerank. In: Twenty-Seventh AAAI Conference on Artificial Intelligence (2013)
Khan, A., Wu, Y., Aggarwal, C.C., Yan, X.: Nema: Fast graph search with label similarity. In: Proceedings of the VLDB Endowment, vol. 6, pp. 181–192. VLDB Endowment (2013)
Jin, J., Khemmarat, S., Gao, L., Luo, J.: Querying web-scale information networks through bounding matching scores. In: Proceedings of the 24th International Conference on World Wide Web, pp. 527–537. International World Wide Web Conferences Steering Committee (2015)
Jin, J., Luo, J., Khemmarat, S., Dong, F., Gao, L.: Gstar: an efficient framework for answering top-k star queries on billion-node knowledge graphs. World Wide Web 22(4), 1611–1638 (2019)
Yang, S., Wu, Y., Sun, H., Yan, X.: Schemaless and structureless graph querying. Proc. VLDB Endowment 7(7), 565–576 (2014)
Hajebi, K., Abbasi-Yadkori, Y., Shahbazi, H., Zhang, H.: Fast approximate nearest-neighbor search with k-nearest neighbor graph. In: Twenty-Second International Joint Conference on Artificial Intelligence (2011)
Long, D.P., Garigliano, R.: Inheritance in hierarchical relational structures. In: Proceedings of the 12th conference on Computational linguistics-Volume 1, pp. 384–386. Association for Computational Linguistics (1988)
Clauset, A., Moore, C., Newman, M.E.: Hierarchical structure and the prediction of missing links in networks. arXiv preprint arXiv:0811.0484 (2008)
Jiang, J.Y., Cheng, P.J., Lin, C.Y.: Entity-driven type hierarchy construction for freebase. In: Proceedings of the 24th International Conference on World Wide Web, pp. 47–48. ACM (2015)
Lee, J., Han, W.S., Kasperovics, R., Lee, J.H.: An in-depth comparison of subgraph isomorphism algorithms in graph databases. In: Proceedings of the VLDB Endowment, vol. 6, pp. 133–144. VLDB Endowment (2012)
Yang, S., Han, F., Wu, Y., Yan, X.: Fast top-k search in knowledge graphs. In: Data Engineering (ICDE), 2016 IEEE 32nd International Conference on, pp. 990–1001. IEEE (2016)
Ilyas, I.F., Aref, W.G., Elmagarmid, A.K.: Supporting top-k join queries in relational databases. VLDB J. Int. J. Very Large Data Bases 13(3), 207–221 (2004)
Khemmarat, S., Gao, L.: Fast top-k path-based relevance query on massive graphs. IEEE Tran. Knowl. Data Eng. 28(5), 1189–1202 (2016)
Clausen, J.: Branch and bound algorithms-principles and examples, pp. 1–30. Department of Computer Science, University of Copenhagen pp (1999)
Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: Graphx: A resilient distributed graph system on spark. In: First International Workshop on Graph Data Management Experiences and Systems, p. 2. ACM (2013)
Bisong, E.: An overview of google cloud platform services. Building Machine Learning and Deep Learning Models on Google Cloud Platform pp. 7–10 (2019)
Ley, M.: Dblp computer science bibliography. http://dblp. uni-trier. de/ (2008)
Kargar, M., Golab, L., Szlichta, J.: egraphsearch: Effective keyword search in graphs. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 2461–2464 (2016)
Han, S., Zou, L., Yu, J.X., Zhao, D.: Keyword search on rdf graphs-a query graph assembly approach. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 227–236 (2017)
Roy, S.B., Eliassi-Rad, T., Papadimitriou, S.: Fast best-effort search on graphs with multiple attributes. IEEE Trans. Knowl. Data Eng. 27(3), 755–768 (2015)
Fu, C., Xiang, C., Wang, C., Cai, D.: Fast approximate nearest neighbor search with the navigating spreading-out graph. Proc. VLDB Endowment 12(5), 461–474 (2019)
Khan, A., Li, N., Yan, X., Guan, Z., Chakraborty, S., Tao, S.: Neighborhood based fast graph search in large networks. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pp. 901–912. ACM (2011)
Mongiovi, M., Di Natale, R., Giugno, R., Pulvirenti, A., Ferro, A., Sharan, R.: Sigma: a set-cover-based inexact graph matching algorithm. J. Bioinformatics Comput. Biol. 8(02), 199–218 (2010)
Zheng, W., Zou, L., Feng, Y., Chen, L., Zhao, D.: Efficient simrank-based similarity join over large graphs. Proc. VLDB Endowment 6(7), 493–504 (2013)
Dave, V.S., Al Hasan, M.: Topcom: Index for shortest distance query in directed graph. Database and expert systems applications., pp. 471–480. Springer, Berlin (2015)
Ren, X., Wang, J.: Exploiting vertex relationships in speeding up subgraph isomorphism over large graphs. Proc. VLDB Endowment 8(5), 617–628 (2015)
Wu, Y., Jin, R., Zhang, X.: Efficient and exact local search for random walk based top-k proximity query in large graphs. IEEE Trans. Knowl. data Eng. 28(5), 1160–1174 (2016)
Zhao, P., Han, J.: On graph query optimization in large networks. Proc. VLDB Endowment 3(1–2), 340–351 (2010)
Ullmann, J.R.: An algorithm for subgraph isomorphism. J. ACM (JACM) 23(1), 31–42 (1976)
Giugno, R., Shasha, D.: Graphgrep: A fast and universal method for querying graphs. In: 2002 International Conference on Pattern Recognition, vol. 2, pp. 112–115. IEEE (2002)
Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: A (sub) graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26(10), 1367–1372 (2004)
Yan, X., Han, J.: gspan: Graph-based substructure pattern mining. In: 2002 IEEE International Conference on Data Mining, 2002. Proceedings., pp. 721–724. IEEE (2002)
Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: Proceedings of the 2004 ACM SIGMOD international conference on Management of data, pp. 335–346 (2004)
Yan, X., Yu, P.S., Han, J.: Substructure similarity search in graph databases. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp. 766–777 (2005)
Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: An improved algorithm for matching large graphs. In: 3rd IAPR-TC15 workshop on graph-based representations in pattern recognition, pp. 149–159. Citeseer (2001)
Shang, H., Zhang, Y., Lin, X., Yu, J.X.: Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. Proc. VLDB Endowment 1(1), 364–375 (2008)
Bunke, H., Allermann, G.: Inexact graph matching for structural pattern recognition. Pattern Recognit. Lett. 1(4), 245–253 (1983)
Cesar, R.M., Jr., Bengoetxea, E., Bloch, I., Larrañaga, P.: Inexact graph matching for model-based recognition: evaluation and comparison of optimization algorithms. Pattern Recognit. 38(11), 2099–2113 (2005)
Su, Y., Yang, S., Sun, H., Srivatsa, M., Kase, S., Vanni, M., Yan, X.: Exploiting relevance feedback in knowledge graph search. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144. ACM (2015)
Du, B., Zhang, S., Cao, N., Tong, H.: First: Fast interactive attributed subgraph matching. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1447–1456. ACM (2017)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)
Fan, W., Wang, X., Wu, Y.: Diversified top-k graph pattern matching. Proc. VLDB Endowment 6(13), 1510–1521 (2013)
Zou, L., Chen, L., Lu, Y.: Top-k subgraph matching query in a large graph. In: Proceedings of the ACM first Ph. D. workshop in CIKM, pp. 139–146 (2007)
Wei, Z., He, X., Xiao, X., Wang, S., Shang, S., Wen, J.R.: Topppr: top-k personalized pagerank queries with precision guarantees on large graphs. In: Proceedings of the 2018 International Conference on Management of Data, pp. 441–456 (2018)
Semertzidis, K., Pitoura, E.: Top-\(k\) durable graph pattern queries on temporal graphs. IEEE Trans. Knowl. Data Eng. 31(1), 181–194 (2018)
Zhu, Y., Qin, L., Yu, J.X., Cheng, H.: Answering top-k graph similarity queries in graph databases. IEEE Trans. Knowl. Data Eng. 32(8), 1459–1474 (2019)
Cheng, J., Zeng, X., Yu, J.X.: Top-k graph pattern matching over large graphs. In: Data Engineering (ICDE), 2013 IEEE 29th International Conference on, pp. 1033–1044. IEEE (2013)
Gupta, M., Gao, J., Yan, X., Cam, H., Han, J.: Top-k interesting subgraph discovery in information networks. In: Data Engineering (ICDE), 2014 IEEE 30th International Conference on, pp. 820–831. IEEE (2014)
Acknowledgements
This work was supported in part by National Science Foundation Grants CNS-1815412 and CNS-1908536.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wu, F., Gao, L. Scalable top-k query on information networks with hierarchical inheritance relations. Distrib Parallel Databases 42, 1–30 (2024). https://doi.org/10.1007/s10619-023-07432-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-023-07432-2