Advertisement

Approximate Iceberg Cube on Heterogeneous Dimensions

  • Dan YinEmail author
  • Hong Gao
  • Zhaonian Zou
  • Jianzhong Li
  • Zhipeng Cai
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9643)

Abstract

Heterogeneous information networks contain heterogeneous types of nodes and edges, e.g., social networks and knowledge graphs. A meta-path is a path connecting nodes through a sequence of heterogeneous edges, representing different kinds of semantic relations among nodes. Meta-paths are good mechanisms to improve the quality of graph analysis on heterogeneous information networks. This paper presents an iceberg cube framework for heterogeneous information networks based on meta-paths. To the best of our knowledge, there is no such proposal in the past. (1) We use meta-paths to measure the similarities of nodes, and prove the problem is NP-hard. (2) An optimal solution is proposed for the strict case. We develop the variant of slice tree to aggregate networks hierarchically. (3) To improve the scalability, a general approximate algorithm is provided for fast aggregation, where random walk on meta-paths is employed to measure the similarities. (4) Two pruning strategies are designed for reducing search space when the aggregate function is monotonic. (5) Experiments on both real-world and synthetic networks demonstrate the effectiveness and efficiency of the algorithms.

Keywords

Random Walk Approximate Algorithm Pruning Strategy Aggregate Function Synthetic Network 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgement

This work is supported by National Grand Fundamental Research 973 Program of China under grant 2012CB316200, National Natural Science Foundation of China under Grant 61190115, 61173023 and 61532015.

References

  1. 1.
    Agarwal, S., Agrawal, R., Deshpande, P.: On the computation of multidimensional aggregates. In: Proceedings of Very Large Database Conference, pp. 506–521. ACM (1996)Google Scholar
  2. 2.
    Beyer, K.S., Ramakrishnan, R.: Bottom-up computation of sparse and iceberg cubes. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 359–370. ACM (1999)Google Scholar
  3. 3.
    Chen, C., Yan, X., Zhu, F., Han, J., Yu, P.S.: Graph OLAP: towards online analytical processing on graphs. In: Proceedings of International Conference on Data Mining, pp. 103–112. IEEE (2008)Google Scholar
  4. 4.
    Gray, J., Bosworth, A., Reichart, D.: Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. In: Proceeding of IEEE International Conference on Data Engineering. IEEE (1996)Google Scholar
  5. 5.
    Harinarayan, V., Rajaraman, A., Ullman, J.D.: Implementing data cube efficiently. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 205–216. ACM (1996)Google Scholar
  6. 6.
    Ji, M., Han, J., Danilevsky, M.: Ranking-based classification of heterogeneous information networks. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1298–1306. ACM (2011)Google Scholar
  7. 7.
    Li, N., Guan, Z., Ren, L., Wu, J., Han, J., Yan, X.: gIceberg: Towards iceberg analysis in large graphs. In: Proceedings of IEEE International Conference on Data Engineering, pp. 1021–1032. IEEE (2013)Google Scholar
  8. 8.
    Shen, W., Han, J., Wang, J.: A probabilistic model for linking named entities in web text with heterogeneous information networks. In: Proceedings of ACM SIGMOD International Conference on Management of Data. ACM (2014)Google Scholar
  9. 9.
    Shi, C., Kong, X., Yu, P.S.: Relevance search in heterogeneous networks. In: Proceedings of International Conference on Extending Database Technology, pp. 180–191. ACM (2012)Google Scholar
  10. 10.
    Silva, A., Bogdanov, P., Singh, A.K.: Hierarchical in-network attribute compression via importance sampling. In: International Conference on Data Engineering, pp. 951–962. IEEE (2014)Google Scholar
  11. 11.
    Sun, Y., Aggarwal, C.C., Han, J.: Relation strength-aware clustering of heterogeneous information networks with incomplete attributes. Proc. Very Large DataBases Endowment 5(5), 394–405 (2012)Google Scholar
  12. 12.
    Sun, Y., Barber, R., Gupta, M.: Co-author relationship prediction in heterogeneous bibliographic networks. In: International Conference on Advances in Social Networks Analysis and Mining, pp. 121–128. IEEE (2011)Google Scholar
  13. 13.
    Sun, Y., Han, J., Yan, X., Yu, P.S., Wu, T.: PathSim: meta path-based top-k similarity search in heterogeneous information networks. Proc. Very Large Databases Endowment 4(11), 992–1003 (2011). ACMGoogle Scholar
  14. 14.
    Sun, Y., Norick, B., Han, J., Yan, X., Yu, P.S., Yu, X.: Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1348–1356. ACM (2012)Google Scholar
  15. 15.
    Tian, Y., Hankins, R.A., Patel, J.M.: Efficient aggregation for graph summarization. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 567–580. ACM (2008)Google Scholar
  16. 16.
    Wang, Z., Fan, Q., Wang, H., Tan, K.-L., Agrawal, D., Abbadi, A.E., Pagrol: parallel graph olap over large-scale attributed graphs. In: Proceeding of IEEE International Conference on Data Engineering, pp. 496–507. IEEE (2014)Google Scholar
  17. 17.
    Yang, Z., Ling, L., David, B.: Integrating vertex-centric clustering with edge-centric clustering for meta path graph analysis. In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 1563–1572. ACM (2015)Google Scholar
  18. 18.
    Zhang, N., Tian, Y., Patel, J.M.: Discovery-driven graph summarization. In: Proceeding of IEEE International Conference on Data Engineering, pp. 880–891. IEEE, Piscataway (2010)Google Scholar
  19. 19.
    Zhao, P., Li, X., Xin, D., Han, J.: Graph cube: on warehousing and olap multidimensional networks. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 853–864. ACM (2011)Google Scholar
  20. 20.
    Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. Proc. Very Large Database Endowment 2(1), 718–729 (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Dan Yin
    • 1
    Email author
  • Hong Gao
    • 1
  • Zhaonian Zou
    • 1
  • Jianzhong Li
    • 1
  • Zhipeng Cai
    • 2
  1. 1.Harbin Institute of TechnologyHarbinChina
  2. 2.Georgia State UniversityAtlantaUSA

Personalised recommendations