Skip to main content
Log in

Scalable top-k query on information networks with hierarchical inheritance relations

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Graph query, pattern mining and knowledge discovery become challenging on large-scale heterogeneous information networks (HINs). State-of-the-art techniques involving path propagation mainly focus on the inference of node labels, and neighborhood structures. However, entity links in the real world also contain rich hierarchical inheritance relations. For example, the vulnerability of a product version is likely to be inherited from its older versions. Taking advantage of the hierarchical inheritances can potentially improve the quality of query results. Motivated by this, we explore hierarchical inheritance relations between entities and formulate the problem of graph query on HINs with hierarchical inheritance relations. We propose a graph query search algorithm by decomposing the original query graph into multiple star queries and applying a star query algorithm to each star query. Candidates from each star query result are then constructed for the final top-k query answer to the original query. To efficiently obtain the graph query result from a large-scale HIN, we design a bound-based pruning technique by using the uniform cost search to prune the search spaces. We implement our algorithm in Spark GraphX to test the effectiveness and efficiency on synthetic and real-world datasets. Compared with two state-of-the-art graph query algorithms, our algorithm can effectively obtain more accurate results and competitive performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Davis, D., Lichtenwalter, R., Chawla, N.V.: Multi-relational link prediction in heterogeneous information networks. In: 2011 International Conference on Advances in Social Networks Analysis and Mining, pp. 281–288. IEEE (2011)

  2. Gupta, M., Gao, J., Yan, X., Cam, H., Han, J.: On detecting association-based clique outliers in heterogeneous information networks. In: 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013), pp. 108–115. IEEE (2013)

  3. Shi, C., Li, Y., Zhang, J., Sun, Y., Philip, S.Y.: A survey of heterogeneous information network analysis. IEEE Trans. Knowl. Data Eng. 29(1), 17–37 (2017)

    Article  Google Scholar 

  4. Sun, Y., Han, J.: Mining heterogeneous information networks: principles and methodologies. Synth. Lect. Data Mining Knowl. Discov. 3(2), 1–159 (2012)

    Article  MathSciNet  Google Scholar 

  5. Fujiwara, Y., Nakatsuji, M., Onizuka, M., Kitsuregawa, M.: Fast and exact top-k search for random walk with restart. Proc. VLDB Endowment 5(5), 442–453 (2012)

    Article  Google Scholar 

  6. Fujiwara, Y., Nakatsuji, M., Shiokawa, H., Mishima, T., Onizuka, M.: Fast and exact top-k algorithm for pagerank. In: Twenty-Seventh AAAI Conference on Artificial Intelligence (2013)

  7. Khan, A., Wu, Y., Aggarwal, C.C., Yan, X.: Nema: Fast graph search with label similarity. In: Proceedings of the VLDB Endowment, vol. 6, pp. 181–192. VLDB Endowment (2013)

  8. Jin, J., Khemmarat, S., Gao, L., Luo, J.: Querying web-scale information networks through bounding matching scores. In: Proceedings of the 24th International Conference on World Wide Web, pp. 527–537. International World Wide Web Conferences Steering Committee (2015)

  9. Jin, J., Luo, J., Khemmarat, S., Dong, F., Gao, L.: Gstar: an efficient framework for answering top-k star queries on billion-node knowledge graphs. World Wide Web 22(4), 1611–1638 (2019)

    Article  Google Scholar 

  10. Yang, S., Wu, Y., Sun, H., Yan, X.: Schemaless and structureless graph querying. Proc. VLDB Endowment 7(7), 565–576 (2014)

    Article  Google Scholar 

  11. Hajebi, K., Abbasi-Yadkori, Y., Shahbazi, H., Zhang, H.: Fast approximate nearest-neighbor search with k-nearest neighbor graph. In: Twenty-Second International Joint Conference on Artificial Intelligence (2011)

  12. Long, D.P., Garigliano, R.: Inheritance in hierarchical relational structures. In: Proceedings of the 12th conference on Computational linguistics-Volume 1, pp. 384–386. Association for Computational Linguistics (1988)

  13. Clauset, A., Moore, C., Newman, M.E.: Hierarchical structure and the prediction of missing links in networks. arXiv preprint arXiv:0811.0484 (2008)

  14. Jiang, J.Y., Cheng, P.J., Lin, C.Y.: Entity-driven type hierarchy construction for freebase. In: Proceedings of the 24th International Conference on World Wide Web, pp. 47–48. ACM (2015)

  15. Lee, J., Han, W.S., Kasperovics, R., Lee, J.H.: An in-depth comparison of subgraph isomorphism algorithms in graph databases. In: Proceedings of the VLDB Endowment, vol. 6, pp. 133–144. VLDB Endowment (2012)

  16. Yang, S., Han, F., Wu, Y., Yan, X.: Fast top-k search in knowledge graphs. In: Data Engineering (ICDE), 2016 IEEE 32nd International Conference on, pp. 990–1001. IEEE (2016)

  17. Ilyas, I.F., Aref, W.G., Elmagarmid, A.K.: Supporting top-k join queries in relational databases. VLDB J. Int. J. Very Large Data Bases 13(3), 207–221 (2004)

    Article  Google Scholar 

  18. Khemmarat, S., Gao, L.: Fast top-k path-based relevance query on massive graphs. IEEE Tran. Knowl. Data Eng. 28(5), 1189–1202 (2016)

    Article  Google Scholar 

  19. Clausen, J.: Branch and bound algorithms-principles and examples, pp. 1–30. Department of Computer Science, University of Copenhagen pp (1999)

  20. Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: Graphx: A resilient distributed graph system on spark. In: First International Workshop on Graph Data Management Experiences and Systems, p. 2. ACM (2013)

  21. Bisong, E.: An overview of google cloud platform services. Building Machine Learning and Deep Learning Models on Google Cloud Platform pp. 7–10 (2019)

  22. Ley, M.: Dblp computer science bibliography. http://dblp. uni-trier. de/ (2008)

  23. Kargar, M., Golab, L., Szlichta, J.: egraphsearch: Effective keyword search in graphs. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 2461–2464 (2016)

  24. Han, S., Zou, L., Yu, J.X., Zhao, D.: Keyword search on rdf graphs-a query graph assembly approach. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 227–236 (2017)

  25. Roy, S.B., Eliassi-Rad, T., Papadimitriou, S.: Fast best-effort search on graphs with multiple attributes. IEEE Trans. Knowl. Data Eng. 27(3), 755–768 (2015)

    Article  Google Scholar 

  26. Fu, C., Xiang, C., Wang, C., Cai, D.: Fast approximate nearest neighbor search with the navigating spreading-out graph. Proc. VLDB Endowment 12(5), 461–474 (2019)

    Article  Google Scholar 

  27. Khan, A., Li, N., Yan, X., Guan, Z., Chakraborty, S., Tao, S.: Neighborhood based fast graph search in large networks. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pp. 901–912. ACM (2011)

  28. Mongiovi, M., Di Natale, R., Giugno, R., Pulvirenti, A., Ferro, A., Sharan, R.: Sigma: a set-cover-based inexact graph matching algorithm. J. Bioinformatics Comput. Biol. 8(02), 199–218 (2010)

    Article  Google Scholar 

  29. Zheng, W., Zou, L., Feng, Y., Chen, L., Zhao, D.: Efficient simrank-based similarity join over large graphs. Proc. VLDB Endowment 6(7), 493–504 (2013)

    Article  Google Scholar 

  30. Dave, V.S., Al Hasan, M.: Topcom: Index for shortest distance query in directed graph. Database and expert systems applications., pp. 471–480. Springer, Berlin (2015)

    Google Scholar 

  31. Ren, X., Wang, J.: Exploiting vertex relationships in speeding up subgraph isomorphism over large graphs. Proc. VLDB Endowment 8(5), 617–628 (2015)

    Article  Google Scholar 

  32. Wu, Y., Jin, R., Zhang, X.: Efficient and exact local search for random walk based top-k proximity query in large graphs. IEEE Trans. Knowl. data Eng. 28(5), 1160–1174 (2016)

    Article  Google Scholar 

  33. Zhao, P., Han, J.: On graph query optimization in large networks. Proc. VLDB Endowment 3(1–2), 340–351 (2010)

    Article  Google Scholar 

  34. Ullmann, J.R.: An algorithm for subgraph isomorphism. J. ACM (JACM) 23(1), 31–42 (1976)

    Article  MathSciNet  Google Scholar 

  35. Giugno, R., Shasha, D.: Graphgrep: A fast and universal method for querying graphs. In: 2002 International Conference on Pattern Recognition, vol. 2, pp. 112–115. IEEE (2002)

  36. Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: A (sub) graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26(10), 1367–1372 (2004)

    Article  Google Scholar 

  37. Yan, X., Han, J.: gspan: Graph-based substructure pattern mining. In: 2002 IEEE International Conference on Data Mining, 2002. Proceedings., pp. 721–724. IEEE (2002)

  38. Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: Proceedings of the 2004 ACM SIGMOD international conference on Management of data, pp. 335–346 (2004)

  39. Yan, X., Yu, P.S., Han, J.: Substructure similarity search in graph databases. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp. 766–777 (2005)

  40. Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: An improved algorithm for matching large graphs. In: 3rd IAPR-TC15 workshop on graph-based representations in pattern recognition, pp. 149–159. Citeseer (2001)

  41. Shang, H., Zhang, Y., Lin, X., Yu, J.X.: Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. Proc. VLDB Endowment 1(1), 364–375 (2008)

    Article  Google Scholar 

  42. Bunke, H., Allermann, G.: Inexact graph matching for structural pattern recognition. Pattern Recognit. Lett. 1(4), 245–253 (1983)

    Article  Google Scholar 

  43. Cesar, R.M., Jr., Bengoetxea, E., Bloch, I., Larrañaga, P.: Inexact graph matching for model-based recognition: evaluation and comparison of optimization algorithms. Pattern Recognit. 38(11), 2099–2113 (2005)

    Article  Google Scholar 

  44. Su, Y., Yang, S., Sun, H., Srivatsa, M., Kase, S., Vanni, M., Yan, X.: Exploiting relevance feedback in knowledge graph search. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144. ACM (2015)

  45. Du, B., Zhang, S., Cao, N., Tong, H.: First: Fast interactive attributed subgraph matching. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1447–1456. ACM (2017)

  46. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)

    Article  MathSciNet  Google Scholar 

  47. Fan, W., Wang, X., Wu, Y.: Diversified top-k graph pattern matching. Proc. VLDB Endowment 6(13), 1510–1521 (2013)

    Article  Google Scholar 

  48. Zou, L., Chen, L., Lu, Y.: Top-k subgraph matching query in a large graph. In: Proceedings of the ACM first Ph. D. workshop in CIKM, pp. 139–146 (2007)

  49. Wei, Z., He, X., Xiao, X., Wang, S., Shang, S., Wen, J.R.: Topppr: top-k personalized pagerank queries with precision guarantees on large graphs. In: Proceedings of the 2018 International Conference on Management of Data, pp. 441–456 (2018)

  50. Semertzidis, K., Pitoura, E.: Top-\(k\) durable graph pattern queries on temporal graphs. IEEE Trans. Knowl. Data Eng. 31(1), 181–194 (2018)

    Article  Google Scholar 

  51. Zhu, Y., Qin, L., Yu, J.X., Cheng, H.: Answering top-k graph similarity queries in graph databases. IEEE Trans. Knowl. Data Eng. 32(8), 1459–1474 (2019)

    Article  Google Scholar 

  52. Cheng, J., Zeng, X., Yu, J.X.: Top-k graph pattern matching over large graphs. In: Data Engineering (ICDE), 2013 IEEE 29th International Conference on, pp. 1033–1044. IEEE (2013)

  53. Gupta, M., Gao, J., Yan, X., Cam, H., Han, J.: Top-k interesting subgraph discovery in information networks. In: Data Engineering (ICDE), 2014 IEEE 30th International Conference on, pp. 820–831. IEEE (2014)

Download references

Acknowledgements

This work was supported in part by National Science Foundation Grants CNS-1815412 and CNS-1908536.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fubao Wu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, F., Gao, L. Scalable top-k query on information networks with hierarchical inheritance relations. Distrib Parallel Databases 42, 1–30 (2024). https://doi.org/10.1007/s10619-023-07432-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-023-07432-2

Keywords

Navigation