SQUID: subtrajectory query in trillion-scale GPS database

Zhang, Dongxiang; Chang, Zhihao; Yang, Dingyu; Li, Dongsheng; Tan, Kian-Lee; Chen, Ke; Chen, Gang

doi:10.1007/s00778-022-00777-7

SQUID: subtrajectory query in trillion-scale GPS database

Regular Paper
Published: 19 January 2023

Volume 32, pages 887–904, (2023)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Dongxiang Zhang¹,
Zhihao Chang¹,
Dingyu Yang²,
Dongsheng Li³,
Kian-Lee Tan⁴,
Ke Chen¹ &
…
Gang Chen¹

540 Accesses
1 Citation
Explore all metrics

Abstract

Subtrajectory query has been a fundamental operator in mobility data management and useful in the applications of trajectory clustering, co-movement pattern mining and contact tracing in epidemiology. In this paper, we make the first attempt to study subtrajectory query in trillion-scale GPS databases, so as to support applications with urban-scale moving users and weeks-long historical data. We develop SQUID as a distributed subtrajectory query processing engine on Spark, with threefold technical contributions. First, we propose compact index and storage layers to handle massive trajectory datasets with trillion-scale GPS points. Second, we leverage hybrid partitioning, together with local indexes that are disk I/O friendly, to facilitate pruning. Third, we devise a novel filter-and-refine query processing framework to effectively reduce the number of trajectories for verification. Our experiments are conducted on huge trajectory datasets with up to 520 billion GPS points. The results validate the compactness of the storage mechanism and the scalability of the distributed query processing framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dragoon: a hybrid and efficient big trajectory management system for offline and online analytics

Article 03 February 2021

Garden: a real-time processing framework for continuous top-k trajectory similarity search

Article 04 May 2023

TrajSpark: A Scalable and Efficient In-Memory Management System for Big Trajectory Data

Notes

References

Agarwal, P. K., Fox, K., Munagala, K., Nath, A., Pan, J., Taylor, E.: Subtrajectory clustering: models and algorithms. In Van den Bussche, J. and Arenas, M. (eds) PODS, 75–87 ACM, (2018)
Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.H.: Hadoop-gis: a high performance spatial data warehousing system over mapreduce. Proc. VLDB Endow. 6(11), 1009–1020 (2013)
Article Google Scholar
Armbrust, M., Xin, R. S., Lian, C., Huai, Y., Liu, D., Bradley, J. K., Meng, X., Kaftan, T., Franklin, M. J., Ghodsi, A., Zaharia, M.: Spark SQL: relational data processing in spark. In SIGMOD Conference, 1383–1394 ACM, (2015)
Bakalov, P., Hadjieleftheriou, M., Keogh, E. J, Tsotras, V. J.: Efficient trajectory joins using symbolic representations
Bakalov, P., Hadjieleftheriou, M., Tsotras, V. J.: Time relaxed spatiotemporal trajectory joins. In GIS 182–191 (2005)
Bakalov, P., Tsotras, V. J.: Continuous spatiotemporal trajectory joins. In GSN, 109–128 (2006)
Chen, L., Özsu, M. T., Oria, V.: Robust and fast similarity search for moving object trajectories. In SIGMOD Conference, pp. 491–502. ACM, (2005)
Chen, L., Gao, Y., Fang, Z., Miao, X., Jensen, C.S., Guo, C.: Real-time distributed co-movement pattern detection on streaming trajectories. Proc. VLDB Endow. 12(10), 1208–1220 (2019)
Article Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In OSDI, 137–150, USENIX Association, (2004)
Eldawy, A., Mokbel, M. F.: Spatialhadoop: a mapreduce framework for spatial data. In ICDE, 1352–1363 (2015)
Ester, M., Kriegel, H.-P., Sander, J., Xiaowei, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, 226–231 (1996)
Fan, Q., Zhang, D., Huayu, W., Tan, K.-L.: A general and parallel platform for mining co-movement patterns over large-scale trajectories. Proc. VLDB Endow. 10(4), 313–324 (2016)
Article Google Scholar
Fang, Z., Yunjun G. L., Chen, P. L., Miao, X., Jensen, C. S.: Coming: a real-time co-movement mining system for streaming trajectories. In SIGMOD, 2777–2780 (2020)
Gudmundsson, J., van Kreveld, M. J., Computing longest duration flocks in trajectory data. In GIS, pages 35–42, (2006)
Hadoop. In Encyclopedia of GIS, page 837 (2017)
Hu, G., Shao, J., Liu, F., Wang, Y., Shen, H.T.: If-matching: towards accurate map-matching with information fusion. IEEE Trans. Knowl. Data Eng. 29(1), 114–127 (2017)
Article Google Scholar
Jeung, H., Yiu, M.L., Zhou, X., Jensen, C.S., Shen, H.T.: Discovery of convoys in trajectory databases. Proc. VLDB Endow. 1(1), 1068–1080 (2008)
Article Google Scholar
Keogh, E. J., Pazzani, M. J.: A simple dimensionality reduction technique for fast similarity search in large time series databases. In PADKK, pp. 122–133, (2000)
Li, Y., Li, Y., Gunopulos, D., Guibas, L. J.: Knowledge-based trajectory completion from sparse GPS samples. In GIS, 33:1–33:10 (2016)
Li, Y., Bailey, J., Kulik, L.: Efficient mining of platoon patterns in trajectory databases. Data Knowl. Eng. 100, 167–187 (2015)
Article Google Scholar
Li, Z., Ding, B., Han, J., Kays, R.: Swarm: mining relaxed temporal moving object clusters. Proc. VLDB Endow. 3(1), 723–734 (2010)
Article Google Scholar
Liu, S., Liu, C., Luo, Q., Ni, L. M., Krishnan, R.: Calibrating large scale vehicle trajectory data. In MDM, pp. 222–231. IEEE Computer Society, (2012)
Lou, Y., Zhang, C., Zheng, Y., Xie, X., Wang, W., Huang, Y.: Map-matching for low-sampling-rate GPS trajectories. In GIS, pp. 352–361. ACM, (2009)
Nibali, A., He, Z.: Trajic: an effective compression system for trajectory data. IEEE Trans. Knowl. Data Eng. 27(11), 3138–3151 (2015)
Article Google Scholar
Nutanong, S., Jacox, E.H., Samet, H.: An incremental hausdorff distance calculation algorithm. PVLDB 4(8), 506–517 (2011)
Google Scholar
Qi, J., Tao, Y., Chang, Y., Zhang, R.: Packing r-trees with space-filling curves: theoretical optimality, empirical efficiency, and bulk-loading parallelizability. ACM Trans. Database Syst. 45(3), 14:1-14:47 (2020)
Article MathSciNet Google Scholar
Raskar, R., Schunemann, I., Barbar, R., Vilcans, K., Gray, J., Vepakomma, P., Kapa, S., Nuzzo, A., Gupta, R., Berke, A., Greenwood, D., Keegan, C., Kanaparti, S., Beaudry, R., Stansbury, D., Beatriz B. A., Rishank K., Vitor P., Benedetti, F. M., Alina C., Riddhiman D., Kaushal J., Khahlil L., Greg N., Vitor P., Steve P., Yasaman R., Abhishek S., Greg S., and John W.: Maintaining personal privacy in an epidemic, Apps gone rogue (2020)
Sacharidis, D., Skoutas, D., Skoumas, G.: Continuous monitoring of nearest trajectories. In GIS, 361–370 (2014)
Shang, Zeyuan, Li, Guoliang, Bao, Zhifeng: DITA: distributed in-memory trajectory analytics. In SIGMOD Conference, 725–740 ACM,(2018)
Stonebraker, M., Abadi, D. J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E. J., O’Neil, P. E., Rasin, A., Tran, N., Zdonik, S. B.: C-store: a column-oriented DBMS. In VLDB 553–564 ACM, (2005)
Su, H., Zheng, K., Wang, H., Huang, J., Zhou, X.: Calibrating trajectory data for similarity-based analysis. In SIGMOD Conference 833–844 ACM, (2013)
Tampakis, P., Doulkeridis, C., Pelekis, N., Theodoridis, Y.: Distributed subtrajectory join on massive datasets. ACM Trans. Spatial Algorithms Syst. 6(2), 8:1-8:29 (2020)
Article Google Scholar
Tampakis, P., Pelekis, N., Doulkeridis, C., Theodoridis, Y.: Scalable distributed subtrajectory clustering. In IEEE International Conference on Big Data, 950–959 IEEE, (2019)
Tang, B., Yiu, M. L., Mouratidis, K., Wang, K.: Efficient motif discovery in spatial trajectories using discrete fréchet distance. In EDBT, pp. 378–389. OpenProceedings.org, (2017)
Wang, S., Ferhatosmanoglu, H.: Ppq-trajectory: spatio-temporal quantization for querying in large trajectory repositories. Proc. VLDB Endow. 14(2), 215–227 (2020)
Article Google Scholar
Wang, Y., Lim, E.-P., Hwang, S.-Y.: Efficient mining of group patterns from user movement data. Data Knowl. Eng. 57(3), 240–282 (2006)
Article Google Scholar
Xie, D., Li, F., Phillips, J.M.: Distributed trajectory similarity search. Proc. VLDB Endow. 10(11), 1478–1489 (2017)
Article Google Scholar
Xie, D., Li, F., Yao, B., Li, G., Zhou, L., Guo, M.: Simba: efficient in-memory spatial analytics. In SIGMOD Conference, 1071–1085 ACM, (2016)
Yi, B.-K., Jagadish, H. V., Faloutsos, C.: Efficient retrieval of similar time sequences under time warping. In ICDE, pp. 201–208. IEEE Computer Society, (1998)
You, S., Zhang, J., Gruenwald, L.: Large-scale spatial join query processing in cloud. In ICDE Workshops, 34–41 IEEE Computer Society, (2015)
Yu, J., Zhang, Z., Sarwat, M.: Spatial data management in apache spark: the geospark perspective and beyond. GeoInformatica 23(1), 37–78 (2019)
Article Google Scholar
Yuan, H., Li, G.: Distributed in-memory trajectory similarity search and join on road network. In ICDE, 1262–1273 (2019)
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M. J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In NSDI, 15–28 USENIX Association,(2012)
Zhang, D., Chang, Z., Wu, S., Yuan, Y., Tan, K.L., Chen, G.: Continuous trajectory similarity search for online outlier detection. IEEE Transactions on Knowledge and Data Engineering, 1 (2020)
Zhang, D., Chan, C.-Y., Tan, K.-L.: Processing spatial keyword query as a top-k aggregation query. In SIGIR, 355–364 ACM, (2014)
Zhang, D., Ding, M., Yang, D., Liu, Y., Fan, J.: Trajectory simplification: an experimental study and quality analysis. Proc. VLDB Endow. 11(9), 934–946 (2018)
Article Google Scholar
Zhou, Z., Dou, W., Jia, G., Chunhua, H., Xiaolong, X., Xiaotong, W., Pan, J.: A method for real-time trajectory monitoring to improve taxi service using GPS big data. Inf. Manag. 53(8), 964–977 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Zhejiang University, Hangzhou, China
Dongxiang Zhang, Zhihao Chang, Ke Chen & Gang Chen
Alibaba Group, Hangzhou, China
Dingyu Yang
National University of Defense Technology, Changsha, China
Dongsheng Li
National University of Singapore, Singapore, Singapore
Kian-Lee Tan

Authors

Dongxiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhihao Chang
View author publications
You can also search for this author in PubMed Google Scholar
Dingyu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Dongsheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Kian-Lee Tan
View author publications
You can also search for this author in PubMed Google Scholar
Ke Chen
View author publications
You can also search for this author in PubMed Google Scholar
Gang Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dingyu Yang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, D., Chang, Z., Yang, D. et al. SQUID: subtrajectory query in trillion-scale GPS database. The VLDB Journal 32, 887–904 (2023). https://doi.org/10.1007/s00778-022-00777-7

Download citation

Received: 16 December 2021
Revised: 15 September 2022
Accepted: 29 December 2022
Published: 19 January 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s00778-022-00777-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SQUID: subtrajectory query in trillion-scale GPS database

Abstract

Access this article

Similar content being viewed by others

Dragoon: a hybrid and efficient big trajectory management system for offline and online analytics

Garden: a real-time processing framework for continuous top-k trajectory similarity search

TrajSpark: A Scalable and Efficient In-Memory Management System for Big Trajectory Data

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SQUID: subtrajectory query in trillion-scale GPS database

Abstract

Access this article

Similar content being viewed by others

Dragoon: a hybrid and efficient big trajectory management system for offline and online analytics

Garden: a real-time processing framework for continuous top-k trajectory similarity search

TrajSpark: A Scalable and Efficient In-Memory Management System for Big Trajectory Data

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation