Hub Labels on the database for large-scale graphs with the COLD framework

  • Alexandros Efentakis
  • Christodoulos Efstathiades
  • Dieter Pfoser
Article

Abstract

Shortest-path computation on graphs is one of the most well-studied problems in algorithmic theory. An aspect that has only recently attracted attention is the use of databases in combination with graph algorithms, so-called distance oracles, to compute shortest-path queries on large graphs. To this purpose, we propose a novel, efficient, pure-SQL framework for answering exact distance queries on large-scale graphs, implemented entirely on an open-source database engine. Our COLD framework (COmpressed Labels on the Database) can answer multiple distance queries (vertex-to-vertex, one-to-many, k-Nearest Neighbors, Reverse k-Nearest Neighbors, Reverse k-Farthest Neighbors and Top-k Range) not handled by previous methods, rendering it a complete database solution for a variety of practical large-scale graph applications. Our experimentation shows that COLD outperforms existing approaches (including popular graph databases) in terms of query time and efficiency, while requiring significantly less storage space than these methods.

Keywords

Shortest-paths Large-scale graphs kNN K-nearest neighbor Reverse k-nearest neighbor Reverse k-farthest neighbor Top-k range One-to-many Hub labels Query processing Databases 

References

  1. 1.
    Abraham I, Delling D, Fiat A, Goldberg AV, Werneck RF (2012) Hldb: Location-based services in databases. In: Proceedings of the 20th International Conference on Advances in Geographic Information Systems, pp 339–348Google Scholar
  2. 2.
    Abraham I, Delling D, Goldberg AV, Werneck RF (2011) A hub-based labeling algorithm for shortest paths in road networks. In: Proc. 10th International Symposium on Experimental Algorithms (SEA), pp 230–241Google Scholar
  3. 3.
    Abraham I, Delling D, Goldberg AV, Werneck RF (2012) Hierarchical hub labelings for shortest paths. In: Proc. 20th Annual European Symposium on Algorithms (ESA), pp 24–35Google Scholar
  4. 4.
    Afshani P, Brodal GS, Zeh N (2011) Ordered and unordered top-k range reporting in large data sets. In: Proc. Twenty-second Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp 390–400Google Scholar
  5. 5.
    Akiba T, Iwata Y, Kawarabayashi K, Kawata Y (2014) Fast shortest-path distance queries on road networks by pruned highway labeling. In: Proc. 16th Workshop on Algorithm Engineering and Experiments (ALENEX), pp 147–154Google Scholar
  6. 6.
    Akiba T, Iwata Y, Yoshida Y (2013) Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In: Proc. ACM SIGMOD International Conference on Management of Data, pp 349–360Google Scholar
  7. 7.
    Akiba T, Iwata Y, Yoshida Y (2015) Pruned landmark labeling. https://github.com/iwiwi/pruned-landmark-labeling
  8. 8.
    Albert R, Jeong H, Barabási A-L (1999) The diameter of the world wide web. CoRR. arXiv:cond-mat/9907038
  9. 9.
    Bader DA, Meyerhenke H, Sanders P, Wagner D (eds) (2013) Proceedings of the 10th DIMACS Implementation Challenge Workshop Graph Partitioning and Graph ClusteringGoogle Scholar
  10. 10.
    Bast H, Delling D, Goldberg AV, Muller-Hannemann M, Pajor T, Sanders P, Wagner D, Werneck RF (2015) Route planning in transportation networks. CoRR. arXiv:abs/1504.05140
  11. 11.
    Borutta F, Nascimento MA, Niedermayer J, Kröger P (2014) Monochromatic rknn queries in time-dependent road networks. In: Proc. Third ACM SIGSPATIAL International Workshop on Mobile Geographic Information Systems, pp 26–33Google Scholar
  12. 12.
    Cheema MA, Shen Z, Lin X, Zhang W (2014) A unified framework for efficiently processing ranking related queries. In: Proc. 17th International Conference on Extending Database Technology (EDBT), pp 427–438Google Scholar
  13. 13.
    Cho E, Myers SA, Leskovec J (2011) Friendship and mobility: user movement in location-based social networks. In: Proc. of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1082–1090Google Scholar
  14. 14.
    Cohen E, Halperin E, Kaplan H, Zwick U (2002) Reachability and distance queries via 2-hop labels. In: Proc. 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp 937–946Google Scholar
  15. 15.
    Delling D, Dibbelt J, Pajor T, Werneck R (2015) Public transit labeling. In: Proc. 14th International Symposium on Experimental Algorithms(SEA), pp 273–285Google Scholar
  16. 16.
    Delling D, Goldberg AV, Pajor T, Werneck RF (2011) Customizable route planning. In: Proc. 10th International Conference on Experimental Algorithms (SEA), pp 376–387Google Scholar
  17. 17.
    Delling D, Goldberg AV, Pajor T, Werneck RF (2014) Robust distance queries on massive networks. In: Proc. 22th Annual European Symposium on Algorithms (ESA), pp 321–333Google Scholar
  18. 18.
    Delling D, Goldberg AV, Werneck R (2011) Faster batched shortest paths in road networks. In: Proc. 11th Workshop on Algorithmic Approaches for Transportation Modeling, Optimization, and Systems (ATMOS)Google Scholar
  19. 19.
    Delling D, Goldberg AV, Werneck RF (2013) Hub label compression. In: Proc. 12th International Symposium on Experimental Algorithms (SEA), pp 18–29Google Scholar
  20. 20.
    Delling D, Werneck R (2015) Customizable point-of-interest queries in road networks. IEEE Trans Knowl Data Eng 27(3):686–698CrossRefGoogle Scholar
  21. 21.
    Delling D, Werneck RFF (2012) Better bounds for graph bisection. In: Proc. 20th Annual European Symposium on Algorithms (ESA), pp 407–418Google Scholar
  22. 22.
    Efentakis A (2016) Scalable public transportation queries on the database. In: Proc. 19th International Conference on Extending Database Technology (EDBT), pp 527–538Google Scholar
  23. 23.
    Efentakis A, Efstathiades C, Pfoser D (2015) COLD. revisiting hub labels on the database for large-scale graphs. In: Proc. 14th International Symposium on Advances in Spatial and Temporal Databases (SSTD), pp 22–39Google Scholar
  24. 24.
    Efentakis A, Pfoser D (2013) Optimizing landmark-based routing and preprocessing. In: Proc. 6th ACM SIGSPATIAL International Workshop on Computational Transportation Science (CTS)Google Scholar
  25. 25.
    Efentakis A, Pfoser D (2014) GRASP. extending graph separators for the single-source shortest-path problem. In: Proc. 22th Annual European Symposium on Algorithms (ESA), pp 358–370Google Scholar
  26. 26.
    Efentakis A, Pfoser D (2016) Rehub: Extending hub labels for reverse k-nearest neighbor queries on large-scale networks. J. Exp. Algorithmics 21:1.13:1–1.13:35Google Scholar
  27. 27.
    Efentakis A, Pfoser D, Vassiliou Y (2015) Salt.aunifiedframeworkforallshortest-path query variants on road networks. In: Proc. 14th International Symposium on Experimental Algorithms (SEA)), pp 298–311Google Scholar
  28. 28.
    Gavoille C, Peleg D, Pérennes S, Raz R (2001) Distance labeling in graphs. In: Proc. Twelfth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), SODA ’01, pp 210–219Google Scholar
  29. 29.
    Gavoille C, Peleg D, Pérennes S, Raz R (2004) Distance labeling in graphs. J. Algorithms 53(1):85–112CrossRefGoogle Scholar
  30. 30.
    Geisberger R, Sanders P, Schultes D (2008) Better approximation of betweenness centrality. In: Proc. 10th Workshop on Algorithm Engineering and Experiments (ALENEX), pp 90–100Google Scholar
  31. 31.
    Geisberger R, Sanders P, Schultes D, Delling D (2008) Contraction hierarchies: Faster and simpler hierarchical routing in road networks. In: Proc. 7th International Workshop on Experimental Algorithms (WEA), pp 319–333Google Scholar
  32. 32.
    Hung H-P, Chuang K-T, Chen M-S (2007) Efficient process of top-k range-sum queries over multiple streams with minimized global error, pp 1404–1419Google Scholar
  33. 33.
    Jiang M, Fu AW, Wong RC, Xu Y (2014) Hop doubling label indexing for point-to-point distance querying on scale-free networks. PVLDB 7(12):1203–1214Google Scholar
  34. 34.
    Kumar Y, Janardan R, Gupta P (2008) Efficient algorithms for reverse proximity query problems. In: Proc. 16th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp 39:1–39:10Google Scholar
  35. 35.
    Leskovec J, Krevl A (2014) SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data
  36. 36.
    Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2009) Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Math 6(1):29–123CrossRefGoogle Scholar
  37. 37.
    Liao B, LHU, Yiu ML, Gong Z (2015) Beyond millisecond latency knn search on commodity machine. IEEE Trans Knowl Data Eng 27(10):2618–2631CrossRefGoogle Scholar
  38. 38.
    Liu J, Chen H, Furuse K, Kitagawa H (2010) An efficient algorithm for reverse furthest neighbors query with metric index. In: Proc. 21st International Conference on Database and Expert Systems Applications (DEXA): Part II, pp 437–451Google Scholar
  39. 39.
    Luo Z, Ling TW, Ang C-H, Lee SY, Cui B (2001) Range top/bottom k queries in olap sparse data cubes. In: Proc. 12th International Conference on Database and Expert Systems Applications (DEXA), pp 678–687Google Scholar
  40. 40.
    McAuley JJ, Leskovec J (2012) Learning to discover social circles in ego networks. In: Proc. 26th Annual Conference on Neural Information Processing Systems, pp 548–556Google Scholar
  41. 41.
    PostgreSQL (2016) The world’s most advanced open source database. http://www.postgresql.org/
  42. 42.
    Safar M, Ibrahimi D, Taniar D (2009) Voronoi-based reverse nearest neighbor query processing on spatial networks. Multimedia Systems 15(5):295–308CrossRefGoogle Scholar
  43. 43.
    Sankaranarayanan J, Samet H (2010) Query processing using distance oracles for spatial networks. IEEE Trans Knowl Data Eng 22(8):1158–1175CrossRefGoogle Scholar
  44. 44.
    Sheng C, Tao Y (2012) Dynamic top-k range reporting in external memory. In: Proc. 31st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), pp 121–130Google Scholar
  45. 45.
    Tao Y (2014) A dynamic i/o-efficient structure for one-dimensional top-k range reporting. In: Proc. 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), pp 256–265Google Scholar
  46. 46.
    Tran QT, Taniar D, Safar M (2009) Transactions on large-scale data- and knowledge-centered systems i. chapter Reverse K Nearest Neighbor and Reverse Farthest Neighbor Search on Spatial Networks, pp 353–372. Springer-VerlagGoogle Scholar
  47. 47.
    Wang S, Cheema MA, Lin X, Zhang Y, Liu D (2016) Efficiently computing reverse k furthest neighbors. In: Proc. 32nd IEEE International Conference on Data Engineering (ICDE), pp 1110–1121Google Scholar
  48. 48.
    Wang S, Lin W, Yang Y, Xiao X, Zhou S (2015) Efficient route planning on public transportation networks: A labelling approach. In: Proc. 2015 ACM SIGMOD International Conference on Management of Data, pp 967–982Google Scholar
  49. 49.
    Yang J, Leskovec J (2012) Defining and evaluating network communities based on ground-truth. In: Proc. 12th IEEE International Conference on Data Mining (ICDM), pp 745–754Google Scholar
  50. 50.
    Yiu ML, Papadias D, Mamoulis N, Tao Y (2006) Reverse nearest neighbors in large graphs. IEEE Trans Knowl Data Eng 18(4):540–553CrossRefGoogle Scholar
  51. 51.
    Zhong R, Li G, Tan K-L, Zhou L (2013) G-tree: An efficient index for knn search on road networks. In: Proc. 22nd ACM International Conference on Conference on Information Knowledge Management (CIKM), pp 39–48. ACMGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.Institute for the Management of Information Systems (IMIS), Research Center “Athena”MarousiGreece
  2. 2.Department of Computer Science and EngineeringEuropean University CyprusEngomiCyprus
  3. 3.Department of Geography and GeoInformation ScienceGeorge Mason UniversityFairfaxUSA

Personalised recommendations