Advertisement

Indexing the Pickup and Drop-Off Locations of NYC Taxi Trips in PostgreSQL – Lessons from the Road

  • Jia Yu
  • Mohamed Sarwat
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10411)

Abstract

In this paper, we present our experience in indexing the drop-off and pick-up locations of taxi trips in New York City. The paper presents a comprehensive experimental analysis of classic and state-of-the-art spatial database indexing schemes. The paper evaluates a popular spatial tree indexing scheme (i.e., GIST-Spatial), a Block Range Index (BRIN-Spatial) provided by PostgreSQL as well as a new indexing scheme, namely Hippo-Spatial. In the experiments, the paper considers five evaluation metrics to compare and contrast the performance of the three indexing schemes: storage overhead, index initialization time, query response time, maintenance overhead, and throughput. Furthermore, the benchmark takes into account parameters that affect the index performance, which include but is not limited to: data size, spatial query selectivity, and spatial area density, The paper finally analyzes the experimental evaluation results and highlights the key insights and lessons learned. The results emphasize the fact that there is no one size that fits all when it comes to indexing massive-scale spatial data. The results also prove that modern database systems can maintain a lightweight index (in terms of storage and maintenance overhead) that is also fast enough for spatial data analytics applications. The source code for the experiments presented in the paper is available here: https://github.com/DataSystemsLab/hippo-postgresql.

References

  1. 1.
    New york city taxi and limousine commission. http://www.nyc.gov/html/tlc/html/about/trip_record_data.html
  2. 2.
    Aoki, P.M.: Generalizing “search” in generalized search trees. In: Proceedings of the 14th International Conference on Data Engineering, pp. 380–389. IEEE (1998)Google Scholar
  3. 3.
  4. 4.
    Postgis - spatial and geographic objects for postgresql. http://postgis.net
  5. 5.
    Postgresql: a powerful, open source object-relational database system. https://www.postgresql.org/
  6. 6.
    Bontempo, C., Zagelow, G.: The IBM data warehouse architecture. Commun. ACM 41(9), 38–48 (1998)CrossRefGoogle Scholar
  7. 7.
    Earth science data. https://earthdata.nasa.gov
  8. 8.
    Comer, D.: Ubiquitous b-tree. ACM Comput. Surv. CSUR 11(2), 121–137 (1979)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Corral, A., Vassilakopoulos, M., Manolopoulos, Y.: Algorithms for joining R-trees and linear region quadtrees. In: Güting, R.H., Papadias, D., Lochovsky, F. (eds.) SSD 1999. LNCS, vol. 1651, pp. 251–269. Springer, Heidelberg (1999). doi: 10.1007/3-540-48482-5_16 CrossRefGoogle Scholar
  10. 10.
    Finkel, R.A., Bentley, J.L.: Quad trees: a data structure for retrieval of composite keys. Acta Inf. 4(1), 1–9 (1974)CrossRefzbMATHGoogle Scholar
  11. 11.
    Fusco, F., Stoecklin, M.P., Vlachos, M.: Net-fli: on-the-fly compression, archiving and indexing of streaming network traffic. VLDB J. 3(1–2), 1382–1393 (2010)Google Scholar
  12. 12.
    Goldstein, J., Ramakrishnan, R., Shaft, U.: Compressing relations and indexes. In: Proceedings of the International Conference on Data Engineering, ICDE, pp. 370–379. IEEE (1998)Google Scholar
  13. 13.
    Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of the ACM International Conference on Management of Data, SIGMOD, pp. 47–57. ACM (1984)Google Scholar
  14. 14.
    Hellerstein, J.M.: Generalized search tree. In: Liu, L., Tamer Özsu, M. (eds.) Encyclopedia of Database Systems, pp. 1222–1224. Springer, US (2009)Google Scholar
  15. 15.
    Hellerstein, J.M., Naughton, J.F., Pfeffer, A.: Generalized search trees for database systems, September 1995Google Scholar
  16. 16.
    Kamel, I., Faloutsos, C.: Hilbert R-tree: an improved R-tree using fractals. In: Proceedings of the International Conference on Very Large Data Bases, VLDB, September 1994Google Scholar
  17. 17.
    Kamel, I., Khalil, M., Kouramajian, V.: Bulk insertion in dynamic R-trees. In: Proceedings of the International Symposium on Spatial Data Handling, SDH, pp. 31–42 (1996)Google Scholar
  18. 18.
    Kornacker, M., Mohan, C., Hellerstein, J.M.: Concurrency and recovery in generalized search trees. ACM SIGMOD Rec. 26, 62–72 (1997). ACMCrossRefGoogle Scholar
  19. 19.
    Lee, M.-L., Hsu, W., Jensen, C.S., Cui, B., Teo, K.L.: Supporting frequent updates in R-trees: a bottom-up approach. In: Proceedings of the International Conference on Very Large Data Bases, VLDB, pp. 608–619, September 2003Google Scholar
  20. 20.
    Samet, H., Webber, R.E.: Storing a collection of polygons using quadtrees. ACM Trans. Graph. TOG 4(3), 182–222 (1985)CrossRefGoogle Scholar
  21. 21.
    Sidirourgos, L., Kersten, M.L.: Column imprints: a secondary index structure. In: Proceedings of the ACM International Conference on Management of Data, SIGMOD, pp. 893–904. ACM (2013)Google Scholar
  22. 22.
    Ślezak, D., Eastwood, V.: Data warehouse technology by infobright. In: Proceedings of the ACM International Conference on Management of Data, SIGMOD, pp. 841–846. ACM (2009)Google Scholar
  23. 23.
    Stonebraker, M., Rowe, L.A.: The design of postgres. In: Proceedings of the ACM International Conference on Management of Data, SIGMOD, pp. 340–355. ACM (1986)Google Scholar
  24. 24.
    Tayeb, J., Ulusoy, Ö., Wolfson, O.: A quadtree-based dynamic attribute indexing method. Comput. J. 41(3), 185–200 (1998)CrossRefzbMATHGoogle Scholar
  25. 25.
    Weiss, R.: A technical overview of the oracle exadata database machine and exadata storage server. Oracle White Paper. Oracle Corporation, Redwood Shores (2012)Google Scholar
  26. 26.
    Xu, X., Han, J., Lu, W.: RT-tree: an improved R-tree indexing structure for temporal spatial databases. In: Proceeding of the International Symposium on Spatial Data Handling, SSDH, pp. 1040–1049, July 1990Google Scholar
  27. 27.
    Yu, J., Moraffah, R., Sarwat, M.: Hippo in action: scalable indexing of a billion New York city taxi trips and beyond. In: Proceedings of the International Conference on Data Engineering, ICDE. IEEE (2017)Google Scholar
  28. 28.
    Jia, Y., Sarwat, M.: Two birds, one stone: a fast, yet lightweight, indexing scheme for modern database systems. Proc. VLDB Endowment 10(4), 385–396 (2016)CrossRefGoogle Scholar
  29. 29.
    Zukowski, M., Heman, S., Nes, N., Boncz, P.: Super-scalar RAM-CPU cache compression. In: Proceedings of the International Conference on Data Engineering, ICDE, pp. 59–59. IEEE (2006)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.School of Computing, Informatics, and Decision Systems EngineeringArizona State UniversityTempeUSA

Personalised recommendations