Abstract
In this chapter, we introduce the homogeneous graph extraction task, which extracts homogeneous graphs from the heterogeneous graphs. In an extracted homogeneous graph, the relation is defined by a line pattern on the heterogeneous graph and the new attribute values of the relation are calculated by user-defined aggregate functions. When facing large-scale heterogeneous graphs, the key challenges of the extraction problem are how to efficiently enumerate paths matched by the line pattern and aggregate values for each pair of vertices from the matched paths. To address the above two challenges, we propose a parallel graph extraction framework. The framework compiles the line pattern into a path concatenation plan, which is selected by a cost model. To guarantee the performance of computing aggregate functions, we first classify the aggregate functions into distributive aggregation, algebraic aggregation, and holistic aggregation; then we speed up the distributive and algebraic aggregations by computing partial aggregate values during the path enumeration. The experimental results demonstrate the effectiveness of the proposed graph extraction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
#W[1]-complete problem does not have fixed-parameter tractable solutions.
- 2.
- 3.
- 4.
- 5.
- 6.
References
Deng Cai, Zheng Shao, Xiaofei He, Xifeng Yan, and Jiawei Han. Community mining from multi-relational networks. In PKDD, pages 445–452, 2005.
Chen Chen, X. Yan, Feida Zhu, Jiawei Han, and P.S. Yu. Graph OLAP: Towards online analytical processing on graphs. In ICDM, pages 103–112, 2008.
Jörg Flum and Martin Grohe. The parameterized complexity of counting problems. SIAM J. Comput., 33(4):892–922, April 2004.
Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. PowerGraph: Distributed graph-parallel computation on natural graphs. In OSDI, pages 17–30, 2012.
Jim Gray, Adam Bosworth, Andrew Layman, and Hamid Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-total. In ICDE, pages 152–159, 1996.
Xiangnan Kong, Philip S. Yu, Ying Ding, and David J. Wild. Meta path-based collective classification in heterogeneous information networks. In CIKM, pages 1567–1571, 2012.
Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: A system for large-scale graph processing. In SIGMOD, pages 135–146, 2010.
Arnab Nandi, Cong Yu, Philip Bohannon, and Raghu Ramakrishnan. Distributed cube materialization on holistic measures. In ICDE, pages 183–194, 2011.
Mark Newman. Networks: An Introduction. Oxford University Press, Inc., New York, NY, USA, 2010.
Maurizio Nolé and Carlo Sartiani. Processing regular path queries on Giraph. In EDBT/ICDT, pages 37–40, 2014.
Makoto Onizuka, Toshimasa Fujimori, and Hiroaki Shiokawa. Graph partitioning for distributed graph processing. Data Science and Engineering, 2(1):94–105, 2017.
T. Pitoura and P. Triantafillou. Self-join size estimation in large-scale distributed data systems. In ICDE, pages 764–773, 2008.
Marko A. Rodriguez and Joshua Shinavier. Exposing multi-relational networks to single-relational network analysis algorithms. Journal of Informetrics, 4(1):29–41, 2010.
Yingxia Shao, Lei Chen, and Bin Cui. Efficient cohesive subgraphs detection in parallel. In Proc. of ACM SIGMOD Conference, pages 613–624, 2014.
Yingxia Shao, Bin Cui, Lei Chen, Mingming Liu, and Xing Xie. An efficient similarity search framework for SimRank over large dynamic graphs. Proc. VLDB Endow., 8(8):838–849, April 2015.
Yingxia Shao, Bin Cui, Lei Chen, Lin Ma, Junjie Yao, and Ning Xu. Parallel subgraph listing in a large-scale graph. In Proc. of ACM SIGMOD Conference, pages 625–636, 2014.
Yingxia Shao, Bin Cui, and Lin Ma. Page: A partition aware engine for parallel graph computation. TKDE, 27(2):518–530, Feb 2015.
Yizhou Sun and Jiawei Han. Mining Heterogeneous Information Networks: Principles and Methodologies. Morgan & Claypool Publishers, 2012.
Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, and Tianyi Wu. PathSim: Meta path-based top-k similarity search in heterogeneous information networks. In VLDB, pages 992–1003, 2011.
Zhengkui Wang, Qi Fan, Huiju Wang, Kian-Lee Tan, D. Agrawal, and A. El Abbadi. Pagrol: Parallel graph OLAP over large-scale attributed graphs. In ICDE, pages 496–507, 2014.
Zhipeng Zhang, Yingxia Shao, Bin Cui, and Ce Zhang. An experimental evaluation of SimRank-based similarity search algorithms. Proc. VLDB Endow., 10(5):601–612, January 2017.
Peixiang Zhao, Xiaolei Li, Dong Xin, and Jiawei Han. Graph cube: On warehousing and OLAP multidimensional networks. In SIGMOD, pages 853–864, 2011.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Shao, Y., Cui, B., Chen, L. (2020). Efficient Parallel Graph Extraction. In: Large-scale Graph Analysis: System, Algorithm and Optimization. Big Data Management. Springer, Singapore. https://doi.org/10.1007/978-981-15-3928-2_5
Download citation
DOI: https://doi.org/10.1007/978-981-15-3928-2_5
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-3927-5
Online ISBN: 978-981-15-3928-2
eBook Packages: Computer ScienceComputer Science (R0)