Skip to main content

Efficient Parallel Graph Extraction

  • Chapter
  • First Online:
Large-scale Graph Analysis: System, Algorithm and Optimization

Part of the book series: Big Data Management ((BIGDM))

  • 715 Accesses

Abstract

In this chapter, we introduce the homogeneous graph extraction task, which extracts homogeneous graphs from the heterogeneous graphs. In an extracted homogeneous graph, the relation is defined by a line pattern on the heterogeneous graph and the new attribute values of the relation are calculated by user-defined aggregate functions. When facing large-scale heterogeneous graphs, the key challenges of the extraction problem are how to efficiently enumerate paths matched by the line pattern and aggregate values for each pair of vertices from the matched paths. To address the above two challenges, we propose a parallel graph extraction framework. The framework compiles the line pattern into a path concatenation plan, which is selected by a cost model. To guarantee the performance of computing aggregate functions, we first classify the aggregate functions into distributive aggregation, algebraic aggregation, and holistic aggregation; then we speed up the distributive and algebraic aggregations by computing partial aggregate values during the path enumeration. The experimental results demonstrate the effectiveness of the proposed graph extraction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    #W[1]-complete problem does not have fixed-parameter tractable solutions.

  2. 2.

    http://giraph.apache.org/.

  3. 3.

    http://dblp.uni-trier.de/xml/.

  4. 4.

    http://www.nber.org/patents/.

  5. 5.

    http://neo4j.com/.

  6. 6.

    http://docs.scipy.org.

References

  1. Deng Cai, Zheng Shao, Xiaofei He, Xifeng Yan, and Jiawei Han. Community mining from multi-relational networks. In PKDD, pages 445–452, 2005.

    Google Scholar 

  2. Chen Chen, X. Yan, Feida Zhu, Jiawei Han, and P.S. Yu. Graph OLAP: Towards online analytical processing on graphs. In ICDM, pages 103–112, 2008.

    Google Scholar 

  3. Jörg Flum and Martin Grohe. The parameterized complexity of counting problems. SIAM J. Comput., 33(4):892–922, April 2004.

    Article  MathSciNet  Google Scholar 

  4. Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. PowerGraph: Distributed graph-parallel computation on natural graphs. In OSDI, pages 17–30, 2012.

    Google Scholar 

  5. Jim Gray, Adam Bosworth, Andrew Layman, and Hamid Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-total. In ICDE, pages 152–159, 1996.

    Google Scholar 

  6. Xiangnan Kong, Philip S. Yu, Ying Ding, and David J. Wild. Meta path-based collective classification in heterogeneous information networks. In CIKM, pages 1567–1571, 2012.

    Google Scholar 

  7. Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: A system for large-scale graph processing. In SIGMOD, pages 135–146, 2010.

    Google Scholar 

  8. Arnab Nandi, Cong Yu, Philip Bohannon, and Raghu Ramakrishnan. Distributed cube materialization on holistic measures. In ICDE, pages 183–194, 2011.

    Google Scholar 

  9. Mark Newman. Networks: An Introduction. Oxford University Press, Inc., New York, NY, USA, 2010.

    Book  Google Scholar 

  10. Maurizio Nolé and Carlo Sartiani. Processing regular path queries on Giraph. In EDBT/ICDT, pages 37–40, 2014.

    Google Scholar 

  11. Makoto Onizuka, Toshimasa Fujimori, and Hiroaki Shiokawa. Graph partitioning for distributed graph processing. Data Science and Engineering, 2(1):94–105, 2017.

    Article  Google Scholar 

  12. T. Pitoura and P. Triantafillou. Self-join size estimation in large-scale distributed data systems. In ICDE, pages 764–773, 2008.

    Google Scholar 

  13. Marko A. Rodriguez and Joshua Shinavier. Exposing multi-relational networks to single-relational network analysis algorithms. Journal of Informetrics, 4(1):29–41, 2010.

    Article  Google Scholar 

  14. Yingxia Shao, Lei Chen, and Bin Cui. Efficient cohesive subgraphs detection in parallel. In Proc. of ACM SIGMOD Conference, pages 613–624, 2014.

    Google Scholar 

  15. Yingxia Shao, Bin Cui, Lei Chen, Mingming Liu, and Xing Xie. An efficient similarity search framework for SimRank over large dynamic graphs. Proc. VLDB Endow., 8(8):838–849, April 2015.

    Article  Google Scholar 

  16. Yingxia Shao, Bin Cui, Lei Chen, Lin Ma, Junjie Yao, and Ning Xu. Parallel subgraph listing in a large-scale graph. In Proc. of ACM SIGMOD Conference, pages 625–636, 2014.

    Google Scholar 

  17. Yingxia Shao, Bin Cui, and Lin Ma. Page: A partition aware engine for parallel graph computation. TKDE, 27(2):518–530, Feb 2015.

    Google Scholar 

  18. Yizhou Sun and Jiawei Han. Mining Heterogeneous Information Networks: Principles and Methodologies. Morgan & Claypool Publishers, 2012.

    Google Scholar 

  19. Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, and Tianyi Wu. PathSim: Meta path-based top-k similarity search in heterogeneous information networks. In VLDB, pages 992–1003, 2011.

    Google Scholar 

  20. Zhengkui Wang, Qi Fan, Huiju Wang, Kian-Lee Tan, D. Agrawal, and A. El Abbadi. Pagrol: Parallel graph OLAP over large-scale attributed graphs. In ICDE, pages 496–507, 2014.

    Google Scholar 

  21. Zhipeng Zhang, Yingxia Shao, Bin Cui, and Ce Zhang. An experimental evaluation of SimRank-based similarity search algorithms. Proc. VLDB Endow., 10(5):601–612, January 2017.

    Article  Google Scholar 

  22. Peixiang Zhao, Xiaolei Li, Dong Xin, and Jiawei Han. Graph cube: On warehousing and OLAP multidimensional networks. In SIGMOD, pages 853–864, 2011.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Shao, Y., Cui, B., Chen, L. (2020). Efficient Parallel Graph Extraction. In: Large-scale Graph Analysis: System, Algorithm and Optimization. Big Data Management. Springer, Singapore. https://doi.org/10.1007/978-981-15-3928-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-3928-2_5

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-3927-5

  • Online ISBN: 978-981-15-3928-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics