Abstract
An abundance of structural information has resulted in non-trivial graph traversals. Shortcut construction is among the utilized techniques implemented for efficient shortest path (SP) traversals on graphs. However, shortcut construction, being a computationally intensive task, required to be exclusive and offline, often produces unnecessary auxiliary data, i.e., shortcuts. Medium to large-scale graphs can take minutes to hours of computation time depending upon the utilization of computational resources and complexity of shortcut construction algorithms. In addition, the branching factor during SP expansions greatly increases due to excessive shortcuts. These factors make repeated SP queries unsuitable for graph mining tasks. This paper presents Shortest Path Overlapped Region (SPORE), a performance-based initiative that improves the shortcut construction performance by exploiting SP overlapped regions. Path overlapping has been overlooked by shortcut construction systems. SPORE takes advantage of this opportunity and provides a solution by constructing auxiliary shortcuts incrementally, using SP trees during traversals, instead of an exclusive step. SPORE is exposed to a graph clustering task, which requires extensive graph traversals to group similar vertices together, for realistic implications. We further suggest an optimization strategy to accelerate the performance of the clustering process using confined subgraph traversals. A performance evaluation of SPORE on real and synthetic graphs reveals an execution time gain of up to 40 %, having an order of magnitude fewer shortcuts over the SegTable approach. Leveraging the SPORE with multiple SP computations consistently reduces the latency of the entire clustering process. Furthermore, the confined subgraph traversal scheme improves the performance by an order of magnitude on undirected graphs, which is twice that of directed graphs.
Similar content being viewed by others
Notes
Social Network Dataset from Stanford Collection: http://snap.stanford.edu/data/
Social Network Dataset from Stanford Collection: http://snap.stanford.edu/data/ http://snap.stanford.edu/data/
Scientific Collaboration Network: http://toreopsahl.com/datasets/newman2001/
Santo Fortunatos Graph Generator: http://santo.fortunato.googlepages.com/inthepress2/ http://santo.fortunato.googlepages.com/inthepress2/
References
Abraham I, Delling D, Fiat A, Goldberg AV, Werneck RF (2013) Highway dimension and provably efficient shortest path algorithms. Tech. rep., Microsoft Research, USA. doi:MSR-TR-2013-91
Abraham I, Delling D, Goldberg AV, Werneck RFF (2012) Hierarchical hub labelings for shortest paths. In: Epstein L, Ferragina P (eds) ESA, Lecture Notes in Computer Science, vol 7501. Springer, pp 24–35. http://dblp.uni-trier.de/db/conf/esa/esa2012.html#AbrahamDGW12
Aggarwal C C, Bhuiyan M A, Hasan M A (2014) Frequent pattern mining algorithms: A survey. In: Aggarwal C C, Han J (eds) Frequent Pattern Mining. Springer International Publishing, pp 19–64. doi:10.1007/978-3-319-07821-2_2
Akiba T, Iwata Y, Yoshida Y (2013) Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD ’13. ACM, New York, pp 349–360. doi:10.1145/2463676.2465315
Bast H, Funke S, Sanders P, Schultes D (2007) Fast routing in road networks with transit nodes. Science 316(5824):566. doi:10.1126/science.1137521. http://www.mpi-inf.mpg.de/funke/Papers/SCIENCE07/SCIENCE07.pdf
Bollobas B (1998) Modern Graph Theory. Springer. http://www.worldcat.org/isbn/0387984887
Bradley P, Fayyad U, Reina C (1998) Scaling clustering algorithms to large databases. In: Proceedings of the 4th Conference on Knowledge Discovery in Databases, KDD’98. AAAI Press, pp 9–15
Chen HH, Giles CL (2013) Ascos: an asymmetric network structure context similarity measure. In: Rokne J G, Faloutsos C (eds) ASONAM. ACM, pp 442–449
Cheng H, Zhou Y, Yu JX (2011) Clustering large attributed graphs: A balance between structural and attribute similarities. ACM Trans Knowl Discov Data 5(2):12:1–12:33. doi:10.1145/1921632.1921638
Cohen E, Delling D, Fuchs F, Goldberg AV, Goldszmidt M, Werneck RF (2013) Scalable similarity estimation in social networks: Closeness, node labels, and random edge lengths. In: Proceedings of the First ACM Conference on Online Social Networks, COSN ’13. ACM, New York, pp 131–142. doi:10.1145/2512938.2512944
Cohen S, Kimelfeld B, Koutrika G (2012) A survey on proximity measures for social networks. In: Ceri S, Brambilla M (eds) Search Computing, Lecture Notes in Computer Science, vol 7538. Springer, Berlin Heidelberg, pp 191–206. doi:10.1007/978-3-642-34213-4_13
Cormen TH, Stein C, Rivest RL, Leiserson CE (2001) Introduction to Algorithms, 2nd edn. McGraw-Hill Higher Education
Delling D, Goldberg AV, Pajor T, Werneck RF (2014) Robust exact distance queries on massive networks. Tech. rep., Microsoft Research, USA. doi:MSR-TR-2014-12
Delling D, Sanders P, Schultes D, Wagner D (2009) Engineering route planning algorithms. In: Lerner J, Wagner D, Zweig K A (eds) Algorithmics of Large and Complex Networks. Springer, Berlin, Heidelberg, pp 117–139. doi:10.1007/978-3-642-02094-0_7
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc Ser B 39(1):1–38
Dijkstra EW (1959) A note on two problems in connexion with graphs. Numer Math 1(1):269–271
van Dongen S (2000) Graph clustering by flow simulation. Ph.D. thesis. University of Utrecht, Utrecht
Farnstrom F, Lewis J, Elkan C (2000) Scalability for clustering algorithms revisited. SIGKDD Explor Newsl 2(1):51–57. doi:10.1145/360402.360419
Gao J, Jin R, Zhou J, Yu JX, Jiang X, Wang T (2011) Relational approach for shortest path discovery over large graphs. Proc VLDB Endow 5(4):358–369. doi:10.14778/2095686.2095694
Geisberger R, Sanders P, Schultes D, Vetter C (2012) Exact routing in large road networks using contraction hierarchies. Transp Sci 46(3):388–404. doi:10.1287/trsc.1110.0401
Goldberg AV, Kaplan H, Werneck RF (2009) Reach for a*: Efficient point-to-point shortest path algorithms. In: The Shortest Path Problem: Ninth DIMACS Implementation Challenge. American Mathematical Society, USA, pp 93–139
Hamerly G (2010) Making k-means even faster. In: Proceedings of the 2010 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, Philadelphia, pp 130–140. doi:10.1137/1.9781611972801.12
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87. doi:10.1023/B:DAMI.0000005258.31418.83
Jeh G, Widom J (2002) Simrank: A measure of structural-context similarity. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’02. ACM, New York, pp 538–543. doi:10.1145/775047.775126
Katz L (1953) A new status index derived from sociometric analysis. Psychometrika 18(1):39–43. doi:10.1007/BF02289026
Kriegel HP, Kroger P, Zimek A (2009) Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans Knowl Discov Data 3(1):1:1–1:58. doi:10.1145/1497577.1497578
Lada A, Eytan A (2005) How to search a social network. Soc Networks 27(3):187–203. doi:10.1016/j.socnet.2005.01.007
Leskovec J, Chakrabarti D, Kleinberg J, Faloutsos C (2005) Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication. In: Proceedings of the 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD’05. Springer, Berlin, Heidelberg, pp 133–145. doi:10.1007/11564126_17
Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58(7):1019–1031. doi:10.1002/asi.20591
Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58(7):1019–1031. doi:10.1002/asi.v58:7
Nawaz W, Lee YK, Lee S (2012) Collaborative similarity measure for intra graph clustering. In: DASFAA Workshops, pp 204– 215
Newman MEJ (2001) The structure of scientific collaboration networks. Proc Natl Acad Sci 98(2):404–409. doi:10.1073/pnas.98.2.404 10.1073/pnas.98.2.404. http://www.pnas.org/content/98/2/404.abstract
Ordonez C (2006) Integrating k-means clustering with a relational dbms using sql. IEEE Trans Knowl Data Eng 18(2):188–201. doi:10.1109/TKDE.2006.31
Perozzi B, McCubbin C, Beecher S, Halbert J (2013) Scalable graph clustering with pregel. In: Ghoshal G., Poncela-Casasnovas J., Tolksdorf R. (eds) Complex Networks IV, Studies in Computational Intelligence, vol 476. Springer, Berlin Heidelberg, pp 133–144. doi:10.1007/978-3-642-36844-8_13
Pradhan A, Mahinthakumar G (2013) Finding all-pairs shortest path for a large-scale transportation network using parallel floyd-warshall and parallel dijkstra algorithms. J Comput Civ Eng 27(3):263–273. doi:10.1061/(ASCE)CP.1943-5487.0000220
Sanders P, Schultes D (2012) Engineering highway hierarchies. J Exp Algorithmics 17:1.6:1.1–1.6:1.40. doi:10.1145/2133803.2330080
Satuluri VM (2012) Scalable clustering of modern networks. Ph.D. thesis. The Ohio State University
Schaeffer SE (2007) Survey: Graph clustering. Comput Sci Rev 1(1):27–64. doi:10.1016/j.cosrev.2007.05.001
Sommer C (2012) Shortest-path queries in static networks. Submitted to ACM Computing Surveys
Theodoridis S, Koutroumbas K (2006) Pattern Recognition, 3rd edn. Academic Press, Inc., Orlando
Tong H, Faloutsos C, Pan JY (2006) Fast random walk with restart and its applications. In: Proceedings of the Sixth International Conference on Data Mining, ICDM ’06. IEEE Computer Society, Washington, pp 613–622. doi:10.1109/ICDM.2006.70
Xu R, Wunsch DI (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678. doi:10.1109/TNN.2005.845141
Zhu AD, Ma H, Xiao X, Luo S, Tang Y, Zhou S (2013) Shortest path and distance queries on road networks: Towards bridging theory and practice. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD ’13. ACM, New York, pp 857–868. doi:10.1145/2463676.2465277
Zhu AD, Xiao X, Wang S, Lin W (2013) Efficient single-source shortest path and distance queries on large graphs. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’13. ACM, New York, pp 998–1006. doi:10.1145/2487575.2487665
Acknowledgments
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No.2012R1A2A2A01047478). We thank our colleagues Mr. Mohammad Aazam, Mr. Bilal Amin, and anonymous reviewers who provided comments that greatly improved the manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Nawaz, W., Khan, KU. & Lee, YK. SPORE: shortest path overlapped regions and confined traversals towards graph clustering. Appl Intell 43, 208–232 (2015). https://doi.org/10.1007/s10489-014-0637-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-014-0637-7