Skip to main content
Log in

SQUID: subtrajectory query in trillion-scale GPS database

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Subtrajectory query has been a fundamental operator in mobility data management and useful in the applications of trajectory clustering, co-movement pattern mining and contact tracing in epidemiology. In this paper, we make the first attempt to study subtrajectory query in trillion-scale GPS databases, so as to support applications with urban-scale moving users and weeks-long historical data. We develop SQUID as a distributed subtrajectory query processing engine on Spark, with threefold technical contributions. First, we propose compact index and storage layers to handle massive trajectory datasets with trillion-scale GPS points. Second, we leverage hybrid partitioning, together with local indexes that are disk I/O friendly, to facilitate pruning. Third, we devise a novel filter-and-refine query processing framework to effectively reduce the number of trajectories for verification. Our experiments are conducted on huge trajectory datasets with up to 520 billion GPS points. The results validate the compactness of the storage mechanism and the scalability of the distributed query processing framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. https://github.com/Proudc/squid.

  2. https://www.kaggle.com/c/pkdd-15-predict-taxi-service-trajectory-i/data.

  3. https://github.com/davidmoten/rtree

    https://github.com/DanySK/java-quadtree.

References

  1. Agarwal, P. K., Fox, K., Munagala, K., Nath, A., Pan, J., Taylor, E.: Subtrajectory clustering: models and algorithms. In Van den Bussche, J. and Arenas, M. (eds) PODS, 75–87 ACM, (2018)

  2. Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.H.: Hadoop-gis: a high performance spatial data warehousing system over mapreduce. Proc. VLDB Endow. 6(11), 1009–1020 (2013)

    Article  Google Scholar 

  3. Armbrust, M., Xin, R. S., Lian, C., Huai, Y., Liu, D., Bradley, J. K., Meng, X., Kaftan, T., Franklin, M. J., Ghodsi, A., Zaharia, M.: Spark SQL: relational data processing in spark. In SIGMOD Conference, 1383–1394 ACM, (2015)

  4. Bakalov, P., Hadjieleftheriou, M., Keogh, E. J, Tsotras, V. J.: Efficient trajectory joins using symbolic representations

  5. Bakalov, P., Hadjieleftheriou, M., Tsotras, V. J.: Time relaxed spatiotemporal trajectory joins. In GIS 182–191 (2005)

  6. Bakalov, P., Tsotras, V. J.: Continuous spatiotemporal trajectory joins. In GSN, 109–128 (2006)

  7. Chen, L., Özsu, M. T., Oria, V.: Robust and fast similarity search for moving object trajectories. In SIGMOD Conference, pp. 491–502. ACM, (2005)

  8. Chen, L., Gao, Y., Fang, Z., Miao, X., Jensen, C.S., Guo, C.: Real-time distributed co-movement pattern detection on streaming trajectories. Proc. VLDB Endow. 12(10), 1208–1220 (2019)

    Article  Google Scholar 

  9. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In OSDI, 137–150, USENIX Association, (2004)

  10. Eldawy, A., Mokbel, M. F.: Spatialhadoop: a mapreduce framework for spatial data. In ICDE, 1352–1363 (2015)

  11. Ester, M., Kriegel, H.-P., Sander, J., Xiaowei, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, 226–231 (1996)

  12. Fan, Q., Zhang, D., Huayu, W., Tan, K.-L.: A general and parallel platform for mining co-movement patterns over large-scale trajectories. Proc. VLDB Endow. 10(4), 313–324 (2016)

    Article  Google Scholar 

  13. Fang, Z., Yunjun G. L., Chen, P. L., Miao, X., Jensen, C. S.: Coming: a real-time co-movement mining system for streaming trajectories. In SIGMOD, 2777–2780 (2020)

  14. Gudmundsson, J., van Kreveld, M. J., Computing longest duration flocks in trajectory data. In GIS, pages 35–42, (2006)

  15. Hadoop. In Encyclopedia of GIS, page 837 (2017)

  16. Hu, G., Shao, J., Liu, F., Wang, Y., Shen, H.T.: If-matching: towards accurate map-matching with information fusion. IEEE Trans. Knowl. Data Eng. 29(1), 114–127 (2017)

    Article  Google Scholar 

  17. Jeung, H., Yiu, M.L., Zhou, X., Jensen, C.S., Shen, H.T.: Discovery of convoys in trajectory databases. Proc. VLDB Endow. 1(1), 1068–1080 (2008)

    Article  Google Scholar 

  18. Keogh, E. J., Pazzani, M. J.: A simple dimensionality reduction technique for fast similarity search in large time series databases. In PADKK, pp. 122–133, (2000)

  19. Li, Y., Li, Y., Gunopulos, D., Guibas, L. J.: Knowledge-based trajectory completion from sparse GPS samples. In GIS, 33:1–33:10 (2016)

  20. Li, Y., Bailey, J., Kulik, L.: Efficient mining of platoon patterns in trajectory databases. Data Knowl. Eng. 100, 167–187 (2015)

    Article  Google Scholar 

  21. Li, Z., Ding, B., Han, J., Kays, R.: Swarm: mining relaxed temporal moving object clusters. Proc. VLDB Endow. 3(1), 723–734 (2010)

    Article  Google Scholar 

  22. Liu, S., Liu, C., Luo, Q., Ni, L. M., Krishnan, R.: Calibrating large scale vehicle trajectory data. In MDM, pp. 222–231. IEEE Computer Society, (2012)

  23. Lou, Y., Zhang, C., Zheng, Y., Xie, X., Wang, W., Huang, Y.: Map-matching for low-sampling-rate GPS trajectories. In GIS, pp. 352–361. ACM, (2009)

  24. Nibali, A., He, Z.: Trajic: an effective compression system for trajectory data. IEEE Trans. Knowl. Data Eng. 27(11), 3138–3151 (2015)

    Article  Google Scholar 

  25. Nutanong, S., Jacox, E.H., Samet, H.: An incremental hausdorff distance calculation algorithm. PVLDB 4(8), 506–517 (2011)

    Google Scholar 

  26. Qi, J., Tao, Y., Chang, Y., Zhang, R.: Packing r-trees with space-filling curves: theoretical optimality, empirical efficiency, and bulk-loading parallelizability. ACM Trans. Database Syst. 45(3), 14:1-14:47 (2020)

    Article  MathSciNet  Google Scholar 

  27. Raskar, R., Schunemann, I., Barbar, R., Vilcans, K., Gray, J., Vepakomma, P., Kapa, S., Nuzzo, A., Gupta, R., Berke, A., Greenwood, D., Keegan, C., Kanaparti, S., Beaudry, R., Stansbury, D., Beatriz B. A., Rishank K., Vitor P., Benedetti, F. M., Alina C., Riddhiman D., Kaushal J., Khahlil L., Greg N., Vitor P., Steve P., Yasaman R., Abhishek S., Greg S., and John W.: Maintaining personal privacy in an epidemic, Apps gone rogue (2020)

  28. Sacharidis, D., Skoutas, D., Skoumas, G.: Continuous monitoring of nearest trajectories. In GIS, 361–370 (2014)

  29. Shang, Zeyuan, Li, Guoliang, Bao, Zhifeng: DITA: distributed in-memory trajectory analytics. In SIGMOD Conference, 725–740 ACM,(2018)

  30. Stonebraker, M., Abadi, D. J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E. J., O’Neil, P. E., Rasin, A., Tran, N., Zdonik, S. B.: C-store: a column-oriented DBMS. In VLDB 553–564 ACM, (2005)

  31. Su, H., Zheng, K., Wang, H., Huang, J., Zhou, X.: Calibrating trajectory data for similarity-based analysis. In SIGMOD Conference 833–844 ACM, (2013)

  32. Tampakis, P., Doulkeridis, C., Pelekis, N., Theodoridis, Y.: Distributed subtrajectory join on massive datasets. ACM Trans. Spatial Algorithms Syst. 6(2), 8:1-8:29 (2020)

    Article  Google Scholar 

  33. Tampakis, P., Pelekis, N., Doulkeridis, C., Theodoridis, Y.: Scalable distributed subtrajectory clustering. In IEEE International Conference on Big Data, 950–959 IEEE, (2019)

  34. Tang, B., Yiu, M. L., Mouratidis, K., Wang, K.: Efficient motif discovery in spatial trajectories using discrete fréchet distance. In EDBT, pp. 378–389. OpenProceedings.org, (2017)

  35. Wang, S., Ferhatosmanoglu, H.: Ppq-trajectory: spatio-temporal quantization for querying in large trajectory repositories. Proc. VLDB Endow. 14(2), 215–227 (2020)

    Article  Google Scholar 

  36. Wang, Y., Lim, E.-P., Hwang, S.-Y.: Efficient mining of group patterns from user movement data. Data Knowl. Eng. 57(3), 240–282 (2006)

    Article  Google Scholar 

  37. Xie, D., Li, F., Phillips, J.M.: Distributed trajectory similarity search. Proc. VLDB Endow. 10(11), 1478–1489 (2017)

    Article  Google Scholar 

  38. Xie, D., Li, F., Yao, B., Li, G., Zhou, L., Guo, M.: Simba: efficient in-memory spatial analytics. In SIGMOD Conference, 1071–1085 ACM, (2016)

  39. Yi, B.-K., Jagadish, H. V., Faloutsos, C.: Efficient retrieval of similar time sequences under time warping. In ICDE, pp. 201–208. IEEE Computer Society, (1998)

  40. You, S., Zhang, J., Gruenwald, L.: Large-scale spatial join query processing in cloud. In ICDE Workshops, 34–41 IEEE Computer Society, (2015)

  41. Yu, J., Zhang, Z., Sarwat, M.: Spatial data management in apache spark: the geospark perspective and beyond. GeoInformatica 23(1), 37–78 (2019)

    Article  Google Scholar 

  42. Yuan, H., Li, G.: Distributed in-memory trajectory similarity search and join on road network. In ICDE, 1262–1273 (2019)

  43. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M. J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In NSDI, 15–28 USENIX Association,(2012)

  44. Zhang, D., Chang, Z., Wu, S., Yuan, Y., Tan, K.L., Chen, G.: Continuous trajectory similarity search for online outlier detection. IEEE Transactions on Knowledge and Data Engineering, 1 (2020)

  45. Zhang, D., Chan, C.-Y., Tan, K.-L.: Processing spatial keyword query as a top-k aggregation query. In SIGIR, 355–364 ACM, (2014)

  46. Zhang, D., Ding, M., Yang, D., Liu, Y., Fan, J.: Trajectory simplification: an experimental study and quality analysis. Proc. VLDB Endow. 11(9), 934–946 (2018)

    Article  Google Scholar 

  47. Zhou, Z., Dou, W., Jia, G., Chunhua, H., Xiaolong, X., Xiaotong, W., Pan, J.: A method for real-time trajectory monitoring to improve taxi service using GPS big data. Inf. Manag. 53(8), 964–977 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dingyu Yang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, D., Chang, Z., Yang, D. et al. SQUID: subtrajectory query in trillion-scale GPS database. The VLDB Journal 32, 887–904 (2023). https://doi.org/10.1007/s00778-022-00777-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-022-00777-7

Keywords

Navigation