Skip to main content
Log in

Big graph search: challenges and techniques

  • Review Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

On one hand, compared with traditional relational and XML models, graphs have more expressive power and are widely used today. On the other hand, various applications of social computing trigger the pressing need of a new search paradigm. In this article, we argue that big graph search is the one filling this gap. We first introduce the application of graph search in various scenarios. We then formalize the graph search problem, and give an analysis of graph search from an evolutionary point of view, followed by the evidences from both the industry and academia. After that, we analyze the difficulties and challenges of big graph search. Finally, we present three classes of techniques towards big graph search: query techniques, data techniques and distributed computing techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Cukier K. Data, data everywhere: a special report on managing information. Economist Newspaper, 2010

    Google Scholar 

  2. Ma S, Li J, Liu X, Huai J. Graph search: a new searching approach to the social computing era. Communications of CCF, 2012, 8(11): 26–31

    Google Scholar 

  3. Ma S, Cao Y, Wo T, Huai J. Social networks and graph matching. Communications of CCF, 2012, 8(4): 20–24

    Google Scholar 

  4. Ma S, Li J, Liu X, Huai J. Graph search in the big data era. Information and Communications Technologies, 2013, 6: 44–51

    Google Scholar 

  5. Tian Y, Patel J M. Tale: A tool for approximate large graph matching. In: Proceedings of IEEE the 24th International Conference on Data Engineering. 2008, 963–972

    Google Scholar 

  6. Fan W, Li J, Ma S, Tang N, Wu Y, Wu Y. Graph pattern matching: from intractable to polynomial time. Proceedings of the VLDB Endowment, 2010, 3(1): 264–275

    Article  Google Scholar 

  7. Barcelo P, Hurtado C A, Libkin L, Wood P T. Expressive languages for path queries over graph-structured data. In: Proceedings of the 29th ACM Symposium on Principles of Database Systems. 2010, 3–14

    Google Scholar 

  8. Feng K, Cong G, Bhowmick S S, Ma S. In search of influential event organizers in online social networks. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 2014, 63–74

    Google Scholar 

  9. Maserrat H, Pei J. Neighbor query friendly compression of social networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010, 533–542

    Chapter  Google Scholar 

  10. Schenker A, Last M, Bunke H, Kandel A. Classification of web documents using graph matching. International Journal of Pattern Recognition and Artificial Intelligence, 2004, 18(3): 475–496

    Article  MATH  Google Scholar 

  11. Fan W, Li J, Ma S, Wang H, Wu Y. Graph homomorphism revisited for graph matching. Proceedings of the VLDB Endowment, 2010, 3(1): 1161–1172

    Article  Google Scholar 

  12. Terveen L G, McDonald D W. Social matching: a framework and research agenda. ACM Transactions on Computer-Human Interaction, 2005, 12(3): 401–434

    Article  Google Scholar 

  13. Ma S, Cao Y, Fan W, Huai J, Wo T. Capturing topology in graph pattern matching. Proceedings of the VLDB Endowment, 2011, 5(4): 310–321

    Article  MATH  Google Scholar 

  14. Ma S, Cao Y, Fan W, Huai J, Wo T. Strong simulation: capturing topology in graph pattern matching. ACM Transactions on Database Systems, 2014, 39(1)

    Google Scholar 

  15. Eckerson W. Data quality and the bottom line: achieving business success through a commitment to high quality data. TDWI Report. 2002

    Google Scholar 

  16. Otto B, Weber K. From health checks to the seven sisters: the data quality journey at bt. Report: BT TR-BE HSG/CC CDQ/8. 2009

    Google Scholar 

  17. Fan W, Li J, Ma S, Tang N, Yu W. Interaction between record matching and data repairing. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. 2011, 469–480

    Chapter  Google Scholar 

  18. Ullmann J R. An algorithm for subgraph isomorphism. Journal of the ACM, 1976, 23(1): 31–42

    Article  MathSciNet  Google Scholar 

  19. Liu C, Chen C, Han J, Yu P S. Gplag: detection of software plagiarism by program dependence graph analysis. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006, 872–881

    Chapter  Google Scholar 

  20. Ferrante J, Ottenstein K J, Warren J D. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems, 1987, 9(3): 319–349

    Article  MATH  Google Scholar 

  21. Rice M N, Tsotras V J. Graph indexing of road networks for shortest path queries with label restrictions. Proceedings of the VLDB Endowment, 2010, 4(2): 69–80

    Article  Google Scholar 

  22. Cormen T H, Leiserson C E, Rivest R L, Stein C. Introduction to Algorithms. Cambridge: The MIT Press, 2001

    MATH  Google Scholar 

  23. Chen Z, Shen H T, Zhou X, Yu J X. Monitoring path nearest neighbor in road networks. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. 2009, 591–602

    Google Scholar 

  24. Chowdhury N M M K, Rahman M R, Boutaba R. Virtual network embedding with coordinated node and link mapping. In: Proceedings of IEEE 28th Conference on Computer Communications. 2009, 783–791

    Google Scholar 

  25. Conte D, Foggia P, Sansone C, Vento M. Thirty years of graph matching in pattern recognition. International Journal of Pattern Recognition and Artificial, 2004, 18(3): 265–298

    Article  Google Scholar 

  26. Karypis G, Aggarwal R, Kumar V, Shekhar S. Multilevel hypergraph partitioning: applications in vlsi domain. IEEE Transactions on Very Large Scale Integration Systems, 1999, 7(1): 69–79

    Article  Google Scholar 

  27. Fan W, Li J, Ma S, Tang N, Wu Y. Adding regular expressions to graph reachability and pattern queries. In: Proceedings of IEEE the 27th Conference on Data Engineering. 2011, 39–50

    Google Scholar 

  28. Hansen P B, ed. Classic Operating Systems. New York: Springer, 2001

    MATH  Google Scholar 

  29. Ramakrishnan R, Gehrke J. Database Management Systems. New York: McGraw-Hill Higher Education, 2000

    MATH  Google Scholar 

  30. Abiteboul S, Hull R, Vianu V. Foundations of Databases. Addison-Wesley, 1995

    Google Scholar 

  31. Sakr S, Pardede E, eds. Graph Data Management: Techniques and Applications. IGI Global, 2011

  32. Malewicz G, Austern M H, Bik A J C, Dehnert J C, Horn I, Leiser N, Czajkowski G. Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 135–146

    Chapter  Google Scholar 

  33. Yang S, Wu Y, Sun H, Yan X. Schemaless and structureless graph querying. Proceedings of the VLDB Endowment, 2014, 7(7): 565–576

    Article  Google Scholar 

  34. Beitzel S M, Jensen E C, Frieder O, Lewis D D, Chowdhury A, Kolcz A. Improving automatic query classification via semi-supervised learning. In: Proceedings of the 5th IEEE International Conference on Data Mining. 2005, 42–49

    Google Scholar 

  35. Shen D, Sun J T, Yang Q, Chen Z. Building bridges for web query classification. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2006, 131–138

    Google Scholar 

  36. Xing Q, Liu Y, Nie J Y, Zhang M, Ma S, Zhang K. Incorporating user preferences into click models. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management. 2013, 1301–1310

    Google Scholar 

  37. Hu B, Zhang Y, Chen W, Wang G, Yang Q. Characterizing search intent diversity into click models. In: Proceedings of the 20th International Conference on World Wide Web. 2011, 17–26

    Google Scholar 

  38. Maria G, Symeon P, Athena V. Massive graph management for the Web and Web 2.0. New Directions in Web Data Management 1. Springer, 2011, 19–58

    Google Scholar 

  39. Newman M, Barabási A L, Watts D J. The Structure and Dynamics of Networks. Princeton: Princeton University Press, 2006

    MATH  Google Scholar 

  40. Rahm E, Do H H. Data cleaning: problems and current approaches. IEEE Data Engineering Bulletin, 2000, 23(4): 3–13

    Google Scholar 

  41. Fan W, Li J, Ma S, Tang N, Yu W. Towards certain fixes with editing rules and master data. The International Journal on Very Large Data Bases, 2012, 21(2): 213–238

    Article  Google Scholar 

  42. Henzinger M R, Henzinger T A, Kopke P W. Computing simulations on finite and infinite graphs. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science. 1995, 453–462

    Google Scholar 

  43. Ramalingam G, Reps T W. A categorized bibliography on incremental computation. In: Proceedings of the 20th Symposium on Principles of Programming Languages. 1993, 502–510

    Google Scholar 

  44. Ramalingam G, Reps T W. On the computational complexity of dynamic graph problems. Theoretical Computer Science, 1996, 158(1): 233–277

    Article  MathSciNet  MATH  Google Scholar 

  45. Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th USENIX Conference on Operating System Design and Implementation. 2004, 137–149

    Google Scholar 

  46. Peng D, Dabek F. Large-scale incremental processing using distributed transactions and notifications. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation. 2010, 1–15

    Google Scholar 

  47. Papadimitriou C H. Computational Complexity. Addison-Wesley, 1994

    Google Scholar 

  48. Yu W, Aggarwal C C, Ma S, Wang H. On anomalous hotspot discovery in graph streams. In: Proceedings of the 13th IEEE International Conference on Data Mining. 2013, 1271–1276

    Google Scholar 

  49. Aggarwal C C, Wang H. Managing and Mining Graph Data. New York: Springer, 2010

    Book  MATH  Google Scholar 

  50. Jordan M I. Divide-and-conquer and statistical inference for big data. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012, 4–4

    Chapter  Google Scholar 

  51. Kleiner A, Talwalkar A, Sarkar P, Jordan M I. The big data bootstrap. In: Proceedings of the 29th International Conference on Machine Learning. 2012, 1759–1766

    Google Scholar 

  52. Kernighan B W, Lin S. An efficient heuristic procedure for partitioning graphs. Bell System Technical Journal, 1970, 49(2): 291–307

    Article  MATH  Google Scholar 

  53. Karypis G, Kumar V. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 1998, 20(1): 359–392

    Article  MathSciNet  MATH  Google Scholar 

  54. Yang S, Yan X, Zong B, Khan A. Towards effective partition management for large graphs. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. 2012, 517–528

    Chapter  Google Scholar 

  55. Salomon D. Data compression: The Complete Reference. 4th ed. New York: Springer, 2007

    MATH  Google Scholar 

  56. Buehrer G, Chellapilla K. A scalable pattern mining approach to Web graph compression with communities. In: Proceedings of the 2008 International Conference on Web Search and Data Mining. 2008, 95–106

    Google Scholar 

  57. Adler M, Mitzenmacher M. Towards compressing Web graphs. In: Proceedings of Data Compression Conference. 2001, 203–212

    Google Scholar 

  58. Boldi P, Vigna S. The Web Graph framework I: compression techniques. In: Proceedings of the 13th International Conference on World Wide Web. 2004, 595–602

    Google Scholar 

  59. Feder T, Motwani R. Clique partitions, graph compression and speeding-up algorithms. Journal of Computer and System Sciences, 1995, 51(2): 261–272

    Article  MathSciNet  MATH  Google Scholar 

  60. Karande C, Chellapilla K, Andersen R. Speeding up algorithms on compressed Web graphs. In: Proceedings of the 2009 International Conference on Web Search and Data Mining. 2009, 272–281

    Google Scholar 

  61. Fan W, Li J, Wang X, Wu Y. Query preserving graph compression. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. 2012, 157–168

    Chapter  Google Scholar 

  62. Baeza-Yates R A, Ribeiro-Neto B A. Modern Information Retrieval: the concepts and technology behind search. 2nd ed. Harlow: Pearson Education Ltd., 2011

    Google Scholar 

  63. Klein K, Kriege N, Mutzel P. CT-Index: Fingerprint-based graph indexing combining cycles and trees. In: Proceedings of IEEE the 27th International Conference on Data Engineering. 2011, 1115–1126

    Google Scholar 

  64. Lynch N A. Distributed Algorithms. San Francisco: Morgan Kaufmann, 1996

    MATH  Google Scholar 

  65. Peleg D. Distributed Computing: A Locality-Sensitive Approach. SIAM, 2000

    Google Scholar 

  66. Ma S, Cao Y, Huai J, Wo T. Distributed graph pattern matching. In: Proceedings of the 21st International Conference on World Wide Web. 2012, 949–958

    Google Scholar 

  67. Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauly M, Franklin M J, Shenker S, Stoica I. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. 2012, 15–28

    Google Scholar 

  68. Gao J, Zhou J, Zhou C, Yu J X. Glog: A high level graph analysis system using mapreduce. In: Proceedings of IEEE the 30th International Conference on Data Engineering. 2014, 544–555

    Google Scholar 

  69. Qin L, Yu J X, Chang L, Cheng H, Zhang C, Lin X. Scalable big graph processing in mapreduce. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 2014, 827–838

    Google Scholar 

  70. Xin R S, Gonzalez J E, Franklin M J, Stoica I. Graphx: a resilient distributed graph system on spark. In: Proceeding of the 1st International Workshop on Graph Data Management Experiences and Systems. 2013

    Google Scholar 

  71. Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein J M. Distributed graphlab: a framework for machine learning in the cloud. Proceedings of the VLDB Endowment, 2012, 5(8): 716–727

    Article  Google Scholar 

  72. Gonzalez J E, Low Y, Gu H, Bickson D, Guestrin C. Powergraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation. 2012, 17–30

    Google Scholar 

  73. Fan W, Huai J. Querying big data: bridging theory and practice. Journal of Computer Science and Technology, 2014, 29(5): 849–869

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chunming Hu.

Additional information

Shuai Ma is a professor in the School of Computer Science and Engineering, Beihang University, China. He obtained his two PhDs from University of Edinburgh, UK in 2010, and from Peking University, China in 2004. He was a postdoctoral research fellow in the database group, University of Edinburgh, and a summer intern at Bell labs, Murray Hill, USA in the summer of 2008. His research interests include database theory and systems, social data analysis, and data intensive computing. He is a recipient of the best paper award for VLDB 2010, the Visiting Young Faculty Program of MRSA in 2012, and the best challenge paper award for WISE 2013.

Jia Li is a PhD student in the School of Computer Science and Engineering, Beihang University, China. She obtained her Bachelor degree in computer science from Beihang University in 2012. Her research interests include databases, in particular, social data analysis.

Chunming Hu is an associate professor at the School of Computer Science and Engineering, Beihang University, China. He received his PhD degree from Beihang University in 2006. His current research interests include distributed systems, system virtualization, large scale data management and processing systems.

Xuelian Lin is currently a lecturer in the School of Computer Science and Engineering, Beihang University, China. He received his PhD degree from Beihang University in 2013. His current research interests include middleware and data process systems.

Jinpeng Huai is a professor in the School of Computer Science and Engineering at Beihang University, China. He received his PhD in computer science from Beihang University, in 1993. He is an academician of Chinese Academy of Sciences and the vice honorary chairman of China Computer Federation (CCF). His research interests include big data computing, distributed system, virtual computing, service-oriented computing, trustworthiness and security.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, S., Li, J., Hu, C. et al. Big graph search: challenges and techniques. Front. Comput. Sci. 10, 387–398 (2016). https://doi.org/10.1007/s11704-015-4515-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-015-4515-1

Keywords

Navigation