Minimized-cost cube query on heterogeneous information networks

Yin, Dan; Gao, Hong; Zou, Zhaonian; Li, Jianzhong

doi:10.1007/s10878-015-9967-6

Minimized-cost cube query on heterogeneous information networks

Published: 22 October 2015

Volume 33, pages 339–364, (2017)
Cite this article

Journal of Combinatorial Optimization Aims and scope Submit manuscript

Dan Yin¹,
Hong Gao¹,
Zhaonian Zou¹ &
…
Jianzhong Li¹

315 Accesses
1 Citation
Explore all metrics

Abstract

Data cube is the foundation of on-line analytical processing (OLAP), which can provide users with data views from different perspectives and granularities. Heterogeneous information networks consist of multiple types of nodes and edges which represent different semantic relations. With the rapid development of social networks and knowledge graphs, heterogeneous information networks have become increasingly popular. In heterogeneous information networks, cube is the set of aggregate graphs and cube query is required for supporting OLAP. The existing research mainly studies aggregate graph query on homogeneous networks, but only considers the attributes of nodes. To overcome these challenges, this paper investigates cube query problem on heterogeneous information networks. (1) A novel cube model for heterogeneous information networks is proposed, which captures both the attribute and structure semantics. (2) Because the total number of aggregate graphs is huge, computing and storing them cost plenty of time and storage. The problem of partial cube materialization on heterogeneous information networks is investigated. Given a fixed size of memory space, select a subset of aggregate graphs in cube, to minimize the computing cost of the whole cube. This optimization problem is proved to be NP-complete and there is no \(n^{1-{\epsilon }}\) approximation algorithm unless P \(=\) NP. (3) A greedy algorithm is proposed for partial cube materialization based on two interesting dependencies between aggregate graphs, attribute dependence and path dependence. (4) Experiments on real world data sets show the cube definition is meaningful, and the partial cube materialization algorithm is efficient.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Iceberg Cube Query on Heterogeneous Information Networks

Approximate Iceberg Cube on Heterogeneous Dimensions

Distributed graph cube generation using Spark framework

Article 10 January 2019

Notes

References

Chen C, Yan X, Zhu F, Han J, Yu PS (2008) Graph olap: towards online analytical processing on graphs. In: IEEE Proceedings of international conference on data mining, NJ, pp 103–112
Chen C, Yan X, Zhu F, Han J, Yu PS (2009) Graph olap: a multi-dimensional framework for graph data analysis. Knowl Inf Syst 21(1):41–63
Article Google Scholar
Haas PJ, Naughton JF, Seshadri S (1995) Sampling-based estimation of the number of distinct values of an attribute. In: Proceedings of International Conference on Very Large Data Bases, ACM, New York, pp 311–322
Harinarayan V, Rajaraman A, Ullman JD (1996) Implementing data cube efficiently. In: Proceedings of ACM SIGMOD international conference on Management of data, ACM, New York, pp 205–216
Ji M, Han J, Danilevsky M (2011) Ranking-based classification of heterogeneous information networks. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, New York, pp 1298–1306
Jiawei H, Micheline K (2005) Data mining: concepts and techniques. Morgan Kaufmann, California
MATH Google Scholar
Karloff H, Mihail M (1999) On the complexity of the view-selection problem. In: Proceedings of ACM PODS international conference, ACM, New York, pp 167–173
Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: Proceedings of ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, New York, pp 631–636
Li N, Guan Z, Ren L, Wu J, Han J, Yan X (2013) Giceberg: towards iceberg analysis in large graphs. In: IEEE Proceedings of the 2013 IEEE international conference on data engineering, pp 1021–1032
Shen W, Han J, Wang J (2014) A probabilistic model for linking named entities in web text wit heterogeneous information networks. In: Proceedings of ACM SIGMOD international conference on management of data, ACM, New York, pp 1199–1210
Sun Y, Han J, Zhao P, Yin Z, Cheng H, Wu T (2009) Rankclus: integrating clustering with ranking for heterogeneous information network analysis. In: Proceedings of the international conference on extending database technology: advances in database technology, ACM, New York, pp 565–576
Sun Y, Barber R, Gupta M (2011a) Co-author relationship prediction in heterogeneous bibliographic networks. In: IEEE International Conference on advances in social networks analysis and mining, NJ, pp 121–128
Sun Y, Han J, Yan X, Yu PS, Wu T (2011b) Pathsim: meta path-based top-k similarity search in heterogeneous information networks. Proc VLDB Endow 4(11):992–1003
Google Scholar
Sun Y, Norick B, Han J, Yan X, Yu PS, Yu X (2012) Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. In: Proceedings of the ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, New York, pp 1348–1356
Tian Y, Hankins RA, Patel JM (2008) Efficient aggregation for graph summarization. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, ACM, New York, pp 567–580
Wang Z, Fan Q, Wang H, Tan KL, Agrawal D, Abbadi AE (2014) Pagrol: parallel graph olap over large-scale attributed graphs. In: Proceeding of IEEE International Conference on Data Engineering, pp 496–507
Yin D, Gao H (2014) Iceberg cube query on heterogeneous information networks. In: Wireless algorithms, systems, and applications, Springer: Berlin, pp 740–749
Zhang N, Tian Y, Pate (2010) Discovery-driven graph summarization. In: Proceeding of IEEE international conference on data engineering, NJ, pp 880–891
Zhao P, Li X, Xin D, Han J (2011) Graph cube: on warehousing and olap multidimensional networks. In: Proceedings of the ACM SIGMOD international conference on management of data, ACM, New York, pp 853–864

Download references

Acknowledgments

This work is supported by the National Grand Fundamental Research 973 Program of China under Grant 2012CB316200, the National Natural Science Foundation of China under Grant 61190115, 61173023 and 61532015.

Author information

Authors and Affiliations

Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
Dan Yin, Hong Gao, Zhaonian Zou & Jianzhong Li

Authors

Dan Yin
View author publications
You can also search for this author in PubMed Google Scholar
Hong Gao
View author publications
You can also search for this author in PubMed Google Scholar
Zhaonian Zou
View author publications
You can also search for this author in PubMed Google Scholar
Jianzhong Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dan Yin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yin, D., Gao, H., Zou, Z. et al. Minimized-cost cube query on heterogeneous information networks. J Comb Optim 33, 339–364 (2017). https://doi.org/10.1007/s10878-015-9967-6

Download citation

Published: 22 October 2015
Issue Date: January 2017
DOI: https://doi.org/10.1007/s10878-015-9967-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Minimized-cost cube query on heterogeneous information networks

Abstract

Access this article

Similar content being viewed by others

Iceberg Cube Query on Heterogeneous Information Networks

Approximate Iceberg Cube on Heterogeneous Dimensions

Distributed graph cube generation using Spark framework

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Minimized-cost cube query on heterogeneous information networks

Abstract

Access this article

Similar content being viewed by others

Iceberg Cube Query on Heterogeneous Information Networks

Approximate Iceberg Cube on Heterogeneous Dimensions

Distributed graph cube generation using Spark framework

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation