SPORE: shortest path overlapped regions and confined traversals towards graph clustering

Nawaz, Waqas; Khan, Kifayat-Ullah; Lee, Young-Koo

doi:10.1007/s10489-014-0637-7

SPORE: shortest path overlapped regions and confined traversals towards graph clustering

Published: 30 January 2015

Volume 43, pages 208–232, (2015)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Waqas Nawaz¹,
Kifayat-Ullah Khan¹ &
Young-Koo Lee¹

321 Accesses
4 Citations
Explore all metrics

Abstract

An abundance of structural information has resulted in non-trivial graph traversals. Shortcut construction is among the utilized techniques implemented for efficient shortest path (SP) traversals on graphs. However, shortcut construction, being a computationally intensive task, required to be exclusive and offline, often produces unnecessary auxiliary data, i.e., shortcuts. Medium to large-scale graphs can take minutes to hours of computation time depending upon the utilization of computational resources and complexity of shortcut construction algorithms. In addition, the branching factor during SP expansions greatly increases due to excessive shortcuts. These factors make repeated SP queries unsuitable for graph mining tasks. This paper presents Shortest Path Overlapped Region (SPORE), a performance-based initiative that improves the shortcut construction performance by exploiting SP overlapped regions. Path overlapping has been overlooked by shortcut construction systems. SPORE takes advantage of this opportunity and provides a solution by constructing auxiliary shortcuts incrementally, using SP trees during traversals, instead of an exclusive step. SPORE is exposed to a graph clustering task, which requires extensive graph traversals to group similar vertices together, for realistic implications. We further suggest an optimization strategy to accelerate the performance of the clustering process using confined subgraph traversals. A performance evaluation of SPORE on real and synthetic graphs reveals an execution time gain of up to 40 %, having an order of magnitude fewer shortcuts over the SegTable approach. Leveraging the SPORE with multiple SP computations consistently reduces the latency of the entire clustering process. Furthermore, the confined subgraph traversal scheme improves the performance by an order of magnitude on undirected graphs, which is twice that of directed graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

EM-FGS: Graph sparsification via faster semi-metric edges pruning

Article 03 May 2019

Efficient structural graph clustering: an index-based approach

Article 08 May 2019

Efficient and scalable labeled subgraph matching using SGMatch

Article 05 July 2016

Notes

Social Network Dataset from Stanford Collection: http://snap.stanford.edu/data/
Social Network Dataset from Stanford Collection: http://snap.stanford.edu/data/ http://snap.stanford.edu/data/
Scientific Collaboration Network: http://toreopsahl.com/datasets/newman2001/
Santo Fortunatos Graph Generator: http://santo.fortunato.googlepages.com/inthepress2/ http://santo.fortunato.googlepages.com/inthepress2/

References

Abraham I, Delling D, Fiat A, Goldberg AV, Werneck RF (2013) Highway dimension and provably efficient shortest path algorithms. Tech. rep., Microsoft Research, USA. doi:MSR-TR-2013-91
Google Scholar
Abraham I, Delling D, Goldberg AV, Werneck RFF (2012) Hierarchical hub labelings for shortest paths. In: Epstein L, Ferragina P (eds) ESA, Lecture Notes in Computer Science, vol 7501. Springer, pp 24–35. http://dblp.uni-trier.de/db/conf/esa/esa2012.html#AbrahamDGW12
Aggarwal C C, Bhuiyan M A, Hasan M A (2014) Frequent pattern mining algorithms: A survey. In: Aggarwal C C, Han J (eds) Frequent Pattern Mining. Springer International Publishing, pp 19–64. doi:10.1007/978-3-319-07821-2_2
Akiba T, Iwata Y, Yoshida Y (2013) Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD ’13. ACM, New York, pp 349–360. doi:10.1145/2463676.2465315
Bast H, Funke S, Sanders P, Schultes D (2007) Fast routing in road networks with transit nodes. Science 316(5824):566. doi:10.1126/science.1137521. http://www.mpi-inf.mpg.de/funke/Papers/SCIENCE07/SCIENCE07.pdf
Article MATH MathSciNet Google Scholar
Bollobas B (1998) Modern Graph Theory. Springer. http://www.worldcat.org/isbn/0387984887
Bradley P, Fayyad U, Reina C (1998) Scaling clustering algorithms to large databases. In: Proceedings of the 4th Conference on Knowledge Discovery in Databases, KDD’98. AAAI Press, pp 9–15
Chen HH, Giles CL (2013) Ascos: an asymmetric network structure context similarity measure. In: Rokne J G, Faloutsos C (eds) ASONAM. ACM, pp 442–449
Cheng H, Zhou Y, Yu JX (2011) Clustering large attributed graphs: A balance between structural and attribute similarities. ACM Trans Knowl Discov Data 5(2):12:1–12:33. doi:10.1145/1921632.1921638
Article Google Scholar
Cohen E, Delling D, Fuchs F, Goldberg AV, Goldszmidt M, Werneck RF (2013) Scalable similarity estimation in social networks: Closeness, node labels, and random edge lengths. In: Proceedings of the First ACM Conference on Online Social Networks, COSN ’13. ACM, New York, pp 131–142. doi:10.1145/2512938.2512944
Cohen S, Kimelfeld B, Koutrika G (2012) A survey on proximity measures for social networks. In: Ceri S, Brambilla M (eds) Search Computing, Lecture Notes in Computer Science, vol 7538. Springer, Berlin Heidelberg, pp 191–206. doi:10.1007/978-3-642-34213-4_13
Google Scholar
Cormen TH, Stein C, Rivest RL, Leiserson CE (2001) Introduction to Algorithms, 2nd edn. McGraw-Hill Higher Education
Delling D, Goldberg AV, Pajor T, Werneck RF (2014) Robust exact distance queries on massive networks. Tech. rep., Microsoft Research, USA. doi:MSR-TR-2014-12
Google Scholar
Delling D, Sanders P, Schultes D, Wagner D (2009) Engineering route planning algorithms. In: Lerner J, Wagner D, Zweig K A (eds) Algorithmics of Large and Complex Networks. Springer, Berlin, Heidelberg, pp 117–139. doi:10.1007/978-3-642-02094-0_7
Chapter Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc Ser B 39(1):1–38
MATH MathSciNet Google Scholar
Dijkstra EW (1959) A note on two problems in connexion with graphs. Numer Math 1(1):269–271
Article MATH MathSciNet Google Scholar
van Dongen S (2000) Graph clustering by flow simulation. Ph.D. thesis. University of Utrecht, Utrecht
Farnstrom F, Lewis J, Elkan C (2000) Scalability for clustering algorithms revisited. SIGKDD Explor Newsl 2(1):51–57. doi:10.1145/360402.360419
Article Google Scholar
Gao J, Jin R, Zhou J, Yu JX, Jiang X, Wang T (2011) Relational approach for shortest path discovery over large graphs. Proc VLDB Endow 5(4):358–369. doi:10.14778/2095686.2095694
Article Google Scholar
Geisberger R, Sanders P, Schultes D, Vetter C (2012) Exact routing in large road networks using contraction hierarchies. Transp Sci 46(3):388–404. doi:10.1287/trsc.1110.0401
Article Google Scholar
Goldberg AV, Kaplan H, Werneck RF (2009) Reach for a*: Efficient point-to-point shortest path algorithms. In: The Shortest Path Problem: Ninth DIMACS Implementation Challenge. American Mathematical Society, USA, pp 93–139
Hamerly G (2010) Making k-means even faster. In: Proceedings of the 2010 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, Philadelphia, pp 130–140. doi:10.1137/1.9781611972801.12
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87. doi:10.1023/B:DAMI.0000005258.31418.83
Article MathSciNet Google Scholar
Jeh G, Widom J (2002) Simrank: A measure of structural-context similarity. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’02. ACM, New York, pp 538–543. doi:10.1145/775047.775126
Katz L (1953) A new status index derived from sociometric analysis. Psychometrika 18(1):39–43. doi:10.1007/BF02289026
Article MATH Google Scholar
Kriegel HP, Kroger P, Zimek A (2009) Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans Knowl Discov Data 3(1):1:1–1:58. doi:10.1145/1497577.1497578
Article Google Scholar
Lada A, Eytan A (2005) How to search a social network. Soc Networks 27(3):187–203. doi:10.1016/j.socnet.2005.01.007
Article Google Scholar
Leskovec J, Chakrabarti D, Kleinberg J, Faloutsos C (2005) Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication. In: Proceedings of the 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD’05. Springer, Berlin, Heidelberg, pp 133–145. doi:10.1007/11564126_17
Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58(7):1019–1031. doi:10.1002/asi.20591
Article Google Scholar
Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58(7):1019–1031. doi:10.1002/asi.v58:7
Article Google Scholar
Nawaz W, Lee YK, Lee S (2012) Collaborative similarity measure for intra graph clustering. In: DASFAA Workshops, pp 204– 215
Newman MEJ (2001) The structure of scientific collaboration networks. Proc Natl Acad Sci 98(2):404–409. doi:10.1073/pnas.98.2.404 10.1073/pnas.98.2.404. http://www.pnas.org/content/98/2/404.abstract
Article MATH MathSciNet Google Scholar
Ordonez C (2006) Integrating k-means clustering with a relational dbms using sql. IEEE Trans Knowl Data Eng 18(2):188–201. doi:10.1109/TKDE.2006.31
Article MathSciNet Google Scholar
Perozzi B, McCubbin C, Beecher S, Halbert J (2013) Scalable graph clustering with pregel. In: Ghoshal G., Poncela-Casasnovas J., Tolksdorf R. (eds) Complex Networks IV, Studies in Computational Intelligence, vol 476. Springer, Berlin Heidelberg, pp 133–144. doi:10.1007/978-3-642-36844-8_13
Google Scholar
Pradhan A, Mahinthakumar G (2013) Finding all-pairs shortest path for a large-scale transportation network using parallel floyd-warshall and parallel dijkstra algorithms. J Comput Civ Eng 27(3):263–273. doi:10.1061/(ASCE)CP.1943-5487.0000220
Article Google Scholar
Sanders P, Schultes D (2012) Engineering highway hierarchies. J Exp Algorithmics 17:1.6:1.1–1.6:1.40. doi:10.1145/2133803.2330080
Article MathSciNet Google Scholar
Satuluri VM (2012) Scalable clustering of modern networks. Ph.D. thesis. The Ohio State University
Schaeffer SE (2007) Survey: Graph clustering. Comput Sci Rev 1(1):27–64. doi:10.1016/j.cosrev.2007.05.001
Article MATH MathSciNet Google Scholar
Sommer C (2012) Shortest-path queries in static networks. Submitted to ACM Computing Surveys
Theodoridis S, Koutroumbas K (2006) Pattern Recognition, 3rd edn. Academic Press, Inc., Orlando
MATH Google Scholar
Tong H, Faloutsos C, Pan JY (2006) Fast random walk with restart and its applications. In: Proceedings of the Sixth International Conference on Data Mining, ICDM ’06. IEEE Computer Society, Washington, pp 613–622. doi:10.1109/ICDM.2006.70
Xu R, Wunsch DI (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678. doi:10.1109/TNN.2005.845141
Article Google Scholar
Zhu AD, Ma H, Xiao X, Luo S, Tang Y, Zhou S (2013) Shortest path and distance queries on road networks: Towards bridging theory and practice. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD ’13. ACM, New York, pp 857–868. doi:10.1145/2463676.2465277
Zhu AD, Xiao X, Wang S, Lin W (2013) Efficient single-source shortest path and distance queries on large graphs. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’13. ACM, New York, pp 998–1006. doi:10.1145/2487575.2487665

Download references

Acknowledgments

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No.2012R1A2A2A01047478). We thank our colleagues Mr. Mohammad Aazam, Mr. Bilal Amin, and anonymous reviewers who provided comments that greatly improved the manuscript.

Author information

Authors and Affiliations

Department of Computer Engineering, Kyung Hee University, Seocheon-dong, Giheung-gu, Yongin-si, Gyeonggi-do, 446-701, Republic of Korea
Waqas Nawaz, Kifayat-Ullah Khan & Young-Koo Lee

Authors

Waqas Nawaz
View author publications
You can also search for this author in PubMed Google Scholar
Kifayat-Ullah Khan
View author publications
You can also search for this author in PubMed Google Scholar
Young-Koo Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Young-Koo Lee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nawaz, W., Khan, KU. & Lee, YK. SPORE: shortest path overlapped regions and confined traversals towards graph clustering. Appl Intell 43, 208–232 (2015). https://doi.org/10.1007/s10489-014-0637-7

Download citation

Published: 30 January 2015
Issue Date: July 2015
DOI: https://doi.org/10.1007/s10489-014-0637-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SPORE: shortest path overlapped regions and confined traversals towards graph clustering

Abstract

Access this article

Similar content being viewed by others

EM-FGS: Graph sparsification via faster semi-metric edges pruning

Efficient structural graph clustering: an index-based approach

Efficient and scalable labeled subgraph matching using SGMatch

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SPORE: shortest path overlapped regions and confined traversals towards graph clustering

Abstract

Access this article

Similar content being viewed by others

EM-FGS: Graph sparsification via faster semi-metric edges pruning

Efficient structural graph clustering: an index-based approach

Efficient and scalable labeled subgraph matching using SGMatch

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation