Skip to main content
Log in

SPORE: shortest path overlapped regions and confined traversals towards graph clustering

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

An abundance of structural information has resulted in non-trivial graph traversals. Shortcut construction is among the utilized techniques implemented for efficient shortest path (SP) traversals on graphs. However, shortcut construction, being a computationally intensive task, required to be exclusive and offline, often produces unnecessary auxiliary data, i.e., shortcuts. Medium to large-scale graphs can take minutes to hours of computation time depending upon the utilization of computational resources and complexity of shortcut construction algorithms. In addition, the branching factor during SP expansions greatly increases due to excessive shortcuts. These factors make repeated SP queries unsuitable for graph mining tasks. This paper presents Shortest Path Overlapped Region (SPORE), a performance-based initiative that improves the shortcut construction performance by exploiting SP overlapped regions. Path overlapping has been overlooked by shortcut construction systems. SPORE takes advantage of this opportunity and provides a solution by constructing auxiliary shortcuts incrementally, using SP trees during traversals, instead of an exclusive step. SPORE is exposed to a graph clustering task, which requires extensive graph traversals to group similar vertices together, for realistic implications. We further suggest an optimization strategy to accelerate the performance of the clustering process using confined subgraph traversals. A performance evaluation of SPORE on real and synthetic graphs reveals an execution time gain of up to 40 %, having an order of magnitude fewer shortcuts over the SegTable approach. Leveraging the SPORE with multiple SP computations consistently reduces the latency of the entire clustering process. Furthermore, the confined subgraph traversal scheme improves the performance by an order of magnitude on undirected graphs, which is twice that of directed graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25

Similar content being viewed by others

Notes

  1. Social Network Dataset from Stanford Collection: http://snap.stanford.edu/data/

  2. Social Network Dataset from Stanford Collection: http://snap.stanford.edu/data/ http://snap.stanford.edu/data/

  3. Scientific Collaboration Network: http://toreopsahl.com/datasets/newman2001/

  4. Santo Fortunatos Graph Generator: http://santo.fortunato.googlepages.com/inthepress2/ http://santo.fortunato.googlepages.com/inthepress2/

References

  1. Abraham I, Delling D, Fiat A, Goldberg AV, Werneck RF (2013) Highway dimension and provably efficient shortest path algorithms. Tech. rep., Microsoft Research, USA. doi:MSR-TR-2013-91

    Google Scholar 

  2. Abraham I, Delling D, Goldberg AV, Werneck RFF (2012) Hierarchical hub labelings for shortest paths. In: Epstein L, Ferragina P (eds) ESA, Lecture Notes in Computer Science, vol 7501. Springer, pp 24–35. http://dblp.uni-trier.de/db/conf/esa/esa2012.html#AbrahamDGW12

  3. Aggarwal C C, Bhuiyan M A, Hasan M A (2014) Frequent pattern mining algorithms: A survey. In: Aggarwal C C, Han J (eds) Frequent Pattern Mining. Springer International Publishing, pp 19–64. doi:10.1007/978-3-319-07821-2_2

  4. Akiba T, Iwata Y, Yoshida Y (2013) Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD ’13. ACM, New York, pp 349–360. doi:10.1145/2463676.2465315

  5. Bast H, Funke S, Sanders P, Schultes D (2007) Fast routing in road networks with transit nodes. Science 316(5824):566. doi:10.1126/science.1137521. http://www.mpi-inf.mpg.de/funke/Papers/SCIENCE07/SCIENCE07.pdf

    Article  MATH  MathSciNet  Google Scholar 

  6. Bollobas B (1998) Modern Graph Theory. Springer. http://www.worldcat.org/isbn/0387984887

  7. Bradley P, Fayyad U, Reina C (1998) Scaling clustering algorithms to large databases. In: Proceedings of the 4th Conference on Knowledge Discovery in Databases, KDD’98. AAAI Press, pp 9–15

  8. Chen HH, Giles CL (2013) Ascos: an asymmetric network structure context similarity measure. In: Rokne J G, Faloutsos C (eds) ASONAM. ACM, pp 442–449

  9. Cheng H, Zhou Y, Yu JX (2011) Clustering large attributed graphs: A balance between structural and attribute similarities. ACM Trans Knowl Discov Data 5(2):12:1–12:33. doi:10.1145/1921632.1921638

    Article  Google Scholar 

  10. Cohen E, Delling D, Fuchs F, Goldberg AV, Goldszmidt M, Werneck RF (2013) Scalable similarity estimation in social networks: Closeness, node labels, and random edge lengths. In: Proceedings of the First ACM Conference on Online Social Networks, COSN ’13. ACM, New York, pp 131–142. doi:10.1145/2512938.2512944

  11. Cohen S, Kimelfeld B, Koutrika G (2012) A survey on proximity measures for social networks. In: Ceri S, Brambilla M (eds) Search Computing, Lecture Notes in Computer Science, vol 7538. Springer, Berlin Heidelberg, pp 191–206. doi:10.1007/978-3-642-34213-4_13

    Google Scholar 

  12. Cormen TH, Stein C, Rivest RL, Leiserson CE (2001) Introduction to Algorithms, 2nd edn. McGraw-Hill Higher Education

  13. Delling D, Goldberg AV, Pajor T, Werneck RF (2014) Robust exact distance queries on massive networks. Tech. rep., Microsoft Research, USA. doi:MSR-TR-2014-12

    Google Scholar 

  14. Delling D, Sanders P, Schultes D, Wagner D (2009) Engineering route planning algorithms. In: Lerner J, Wagner D, Zweig K A (eds) Algorithmics of Large and Complex Networks. Springer, Berlin, Heidelberg, pp 117–139. doi:10.1007/978-3-642-02094-0_7

    Chapter  Google Scholar 

  15. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc Ser B 39(1):1–38

    MATH  MathSciNet  Google Scholar 

  16. Dijkstra EW (1959) A note on two problems in connexion with graphs. Numer Math 1(1):269–271

    Article  MATH  MathSciNet  Google Scholar 

  17. van Dongen S (2000) Graph clustering by flow simulation. Ph.D. thesis. University of Utrecht, Utrecht

  18. Farnstrom F, Lewis J, Elkan C (2000) Scalability for clustering algorithms revisited. SIGKDD Explor Newsl 2(1):51–57. doi:10.1145/360402.360419

    Article  Google Scholar 

  19. Gao J, Jin R, Zhou J, Yu JX, Jiang X, Wang T (2011) Relational approach for shortest path discovery over large graphs. Proc VLDB Endow 5(4):358–369. doi:10.14778/2095686.2095694

    Article  Google Scholar 

  20. Geisberger R, Sanders P, Schultes D, Vetter C (2012) Exact routing in large road networks using contraction hierarchies. Transp Sci 46(3):388–404. doi:10.1287/trsc.1110.0401

    Article  Google Scholar 

  21. Goldberg AV, Kaplan H, Werneck RF (2009) Reach for a*: Efficient point-to-point shortest path algorithms. In: The Shortest Path Problem: Ninth DIMACS Implementation Challenge. American Mathematical Society, USA, pp 93–139

  22. Hamerly G (2010) Making k-means even faster. In: Proceedings of the 2010 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, Philadelphia, pp 130–140. doi:10.1137/1.9781611972801.12

  23. Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87. doi:10.1023/B:DAMI.0000005258.31418.83

    Article  MathSciNet  Google Scholar 

  24. Jeh G, Widom J (2002) Simrank: A measure of structural-context similarity. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’02. ACM, New York, pp 538–543. doi:10.1145/775047.775126

  25. Katz L (1953) A new status index derived from sociometric analysis. Psychometrika 18(1):39–43. doi:10.1007/BF02289026

    Article  MATH  Google Scholar 

  26. Kriegel HP, Kroger P, Zimek A (2009) Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans Knowl Discov Data 3(1):1:1–1:58. doi:10.1145/1497577.1497578

    Article  Google Scholar 

  27. Lada A, Eytan A (2005) How to search a social network. Soc Networks 27(3):187–203. doi:10.1016/j.socnet.2005.01.007

    Article  Google Scholar 

  28. Leskovec J, Chakrabarti D, Kleinberg J, Faloutsos C (2005) Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication. In: Proceedings of the 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD’05. Springer, Berlin, Heidelberg, pp 133–145. doi:10.1007/11564126_17

  29. Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58(7):1019–1031. doi:10.1002/asi.20591

    Article  Google Scholar 

  30. Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58(7):1019–1031. doi:10.1002/asi.v58:7

    Article  Google Scholar 

  31. Nawaz W, Lee YK, Lee S (2012) Collaborative similarity measure for intra graph clustering. In: DASFAA Workshops, pp 204– 215

  32. Newman MEJ (2001) The structure of scientific collaboration networks. Proc Natl Acad Sci 98(2):404–409. doi:10.1073/pnas.98.2.404 10.1073/pnas.98.2.404. http://www.pnas.org/content/98/2/404.abstract

    Article  MATH  MathSciNet  Google Scholar 

  33. Ordonez C (2006) Integrating k-means clustering with a relational dbms using sql. IEEE Trans Knowl Data Eng 18(2):188–201. doi:10.1109/TKDE.2006.31

    Article  MathSciNet  Google Scholar 

  34. Perozzi B, McCubbin C, Beecher S, Halbert J (2013) Scalable graph clustering with pregel. In: Ghoshal G., Poncela-Casasnovas J., Tolksdorf R. (eds) Complex Networks IV, Studies in Computational Intelligence, vol 476. Springer, Berlin Heidelberg, pp 133–144. doi:10.1007/978-3-642-36844-8_13

    Google Scholar 

  35. Pradhan A, Mahinthakumar G (2013) Finding all-pairs shortest path for a large-scale transportation network using parallel floyd-warshall and parallel dijkstra algorithms. J Comput Civ Eng 27(3):263–273. doi:10.1061/(ASCE)CP.1943-5487.0000220

    Article  Google Scholar 

  36. Sanders P, Schultes D (2012) Engineering highway hierarchies. J Exp Algorithmics 17:1.6:1.1–1.6:1.40. doi:10.1145/2133803.2330080

    Article  MathSciNet  Google Scholar 

  37. Satuluri VM (2012) Scalable clustering of modern networks. Ph.D. thesis. The Ohio State University

  38. Schaeffer SE (2007) Survey: Graph clustering. Comput Sci Rev 1(1):27–64. doi:10.1016/j.cosrev.2007.05.001

    Article  MATH  MathSciNet  Google Scholar 

  39. Sommer C (2012) Shortest-path queries in static networks. Submitted to ACM Computing Surveys

  40. Theodoridis S, Koutroumbas K (2006) Pattern Recognition, 3rd edn. Academic Press, Inc., Orlando

    MATH  Google Scholar 

  41. Tong H, Faloutsos C, Pan JY (2006) Fast random walk with restart and its applications. In: Proceedings of the Sixth International Conference on Data Mining, ICDM ’06. IEEE Computer Society, Washington, pp 613–622. doi:10.1109/ICDM.2006.70

  42. Xu R, Wunsch DI (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678. doi:10.1109/TNN.2005.845141

    Article  Google Scholar 

  43. Zhu AD, Ma H, Xiao X, Luo S, Tang Y, Zhou S (2013) Shortest path and distance queries on road networks: Towards bridging theory and practice. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD ’13. ACM, New York, pp 857–868. doi:10.1145/2463676.2465277

  44. Zhu AD, Xiao X, Wang S, Lin W (2013) Efficient single-source shortest path and distance queries on large graphs. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’13. ACM, New York, pp 998–1006. doi:10.1145/2487575.2487665

Download references

Acknowledgments

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No.2012R1A2A2A01047478). We thank our colleagues Mr. Mohammad Aazam, Mr. Bilal Amin, and anonymous reviewers who provided comments that greatly improved the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Young-Koo Lee.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nawaz, W., Khan, KU. & Lee, YK. SPORE: shortest path overlapped regions and confined traversals towards graph clustering. Appl Intell 43, 208–232 (2015). https://doi.org/10.1007/s10489-014-0637-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-014-0637-7

Keywords

Navigation