Abstract
Subtrajectory query has been a fundamental operator in mobility data management and useful in the applications of trajectory clustering, co-movement pattern mining and contact tracing in epidemiology. In this paper, we make the first attempt to study subtrajectory query in trillion-scale GPS databases, so as to support applications with urban-scale moving users and weeks-long historical data. We develop SQUID as a distributed subtrajectory query processing engine on Spark, with threefold technical contributions. First, we propose compact index and storage layers to handle massive trajectory datasets with trillion-scale GPS points. Second, we leverage hybrid partitioning, together with local indexes that are disk I/O friendly, to facilitate pruning. Third, we devise a novel filter-and-refine query processing framework to effectively reduce the number of trajectories for verification. Our experiments are conducted on huge trajectory datasets with up to 520 billion GPS points. The results validate the compactness of the storage mechanism and the scalability of the distributed query processing framework.
Similar content being viewed by others
References
Agarwal, P. K., Fox, K., Munagala, K., Nath, A., Pan, J., Taylor, E.: Subtrajectory clustering: models and algorithms. In Van den Bussche, J. and Arenas, M. (eds) PODS, 75–87 ACM, (2018)
Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.H.: Hadoop-gis: a high performance spatial data warehousing system over mapreduce. Proc. VLDB Endow. 6(11), 1009–1020 (2013)
Armbrust, M., Xin, R. S., Lian, C., Huai, Y., Liu, D., Bradley, J. K., Meng, X., Kaftan, T., Franklin, M. J., Ghodsi, A., Zaharia, M.: Spark SQL: relational data processing in spark. In SIGMOD Conference, 1383–1394 ACM, (2015)
Bakalov, P., Hadjieleftheriou, M., Keogh, E. J, Tsotras, V. J.: Efficient trajectory joins using symbolic representations
Bakalov, P., Hadjieleftheriou, M., Tsotras, V. J.: Time relaxed spatiotemporal trajectory joins. In GIS 182–191 (2005)
Bakalov, P., Tsotras, V. J.: Continuous spatiotemporal trajectory joins. In GSN, 109–128 (2006)
Chen, L., Özsu, M. T., Oria, V.: Robust and fast similarity search for moving object trajectories. In SIGMOD Conference, pp. 491–502. ACM, (2005)
Chen, L., Gao, Y., Fang, Z., Miao, X., Jensen, C.S., Guo, C.: Real-time distributed co-movement pattern detection on streaming trajectories. Proc. VLDB Endow. 12(10), 1208–1220 (2019)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In OSDI, 137–150, USENIX Association, (2004)
Eldawy, A., Mokbel, M. F.: Spatialhadoop: a mapreduce framework for spatial data. In ICDE, 1352–1363 (2015)
Ester, M., Kriegel, H.-P., Sander, J., Xiaowei, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, 226–231 (1996)
Fan, Q., Zhang, D., Huayu, W., Tan, K.-L.: A general and parallel platform for mining co-movement patterns over large-scale trajectories. Proc. VLDB Endow. 10(4), 313–324 (2016)
Fang, Z., Yunjun G. L., Chen, P. L., Miao, X., Jensen, C. S.: Coming: a real-time co-movement mining system for streaming trajectories. In SIGMOD, 2777–2780 (2020)
Gudmundsson, J., van Kreveld, M. J., Computing longest duration flocks in trajectory data. In GIS, pages 35–42, (2006)
Hadoop. In Encyclopedia of GIS, page 837 (2017)
Hu, G., Shao, J., Liu, F., Wang, Y., Shen, H.T.: If-matching: towards accurate map-matching with information fusion. IEEE Trans. Knowl. Data Eng. 29(1), 114–127 (2017)
Jeung, H., Yiu, M.L., Zhou, X., Jensen, C.S., Shen, H.T.: Discovery of convoys in trajectory databases. Proc. VLDB Endow. 1(1), 1068–1080 (2008)
Keogh, E. J., Pazzani, M. J.: A simple dimensionality reduction technique for fast similarity search in large time series databases. In PADKK, pp. 122–133, (2000)
Li, Y., Li, Y., Gunopulos, D., Guibas, L. J.: Knowledge-based trajectory completion from sparse GPS samples. In GIS, 33:1–33:10 (2016)
Li, Y., Bailey, J., Kulik, L.: Efficient mining of platoon patterns in trajectory databases. Data Knowl. Eng. 100, 167–187 (2015)
Li, Z., Ding, B., Han, J., Kays, R.: Swarm: mining relaxed temporal moving object clusters. Proc. VLDB Endow. 3(1), 723–734 (2010)
Liu, S., Liu, C., Luo, Q., Ni, L. M., Krishnan, R.: Calibrating large scale vehicle trajectory data. In MDM, pp. 222–231. IEEE Computer Society, (2012)
Lou, Y., Zhang, C., Zheng, Y., Xie, X., Wang, W., Huang, Y.: Map-matching for low-sampling-rate GPS trajectories. In GIS, pp. 352–361. ACM, (2009)
Nibali, A., He, Z.: Trajic: an effective compression system for trajectory data. IEEE Trans. Knowl. Data Eng. 27(11), 3138–3151 (2015)
Nutanong, S., Jacox, E.H., Samet, H.: An incremental hausdorff distance calculation algorithm. PVLDB 4(8), 506–517 (2011)
Qi, J., Tao, Y., Chang, Y., Zhang, R.: Packing r-trees with space-filling curves: theoretical optimality, empirical efficiency, and bulk-loading parallelizability. ACM Trans. Database Syst. 45(3), 14:1-14:47 (2020)
Raskar, R., Schunemann, I., Barbar, R., Vilcans, K., Gray, J., Vepakomma, P., Kapa, S., Nuzzo, A., Gupta, R., Berke, A., Greenwood, D., Keegan, C., Kanaparti, S., Beaudry, R., Stansbury, D., Beatriz B. A., Rishank K., Vitor P., Benedetti, F. M., Alina C., Riddhiman D., Kaushal J., Khahlil L., Greg N., Vitor P., Steve P., Yasaman R., Abhishek S., Greg S., and John W.: Maintaining personal privacy in an epidemic, Apps gone rogue (2020)
Sacharidis, D., Skoutas, D., Skoumas, G.: Continuous monitoring of nearest trajectories. In GIS, 361–370 (2014)
Shang, Zeyuan, Li, Guoliang, Bao, Zhifeng: DITA: distributed in-memory trajectory analytics. In SIGMOD Conference, 725–740 ACM,(2018)
Stonebraker, M., Abadi, D. J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E. J., O’Neil, P. E., Rasin, A., Tran, N., Zdonik, S. B.: C-store: a column-oriented DBMS. In VLDB 553–564 ACM, (2005)
Su, H., Zheng, K., Wang, H., Huang, J., Zhou, X.: Calibrating trajectory data for similarity-based analysis. In SIGMOD Conference 833–844 ACM, (2013)
Tampakis, P., Doulkeridis, C., Pelekis, N., Theodoridis, Y.: Distributed subtrajectory join on massive datasets. ACM Trans. Spatial Algorithms Syst. 6(2), 8:1-8:29 (2020)
Tampakis, P., Pelekis, N., Doulkeridis, C., Theodoridis, Y.: Scalable distributed subtrajectory clustering. In IEEE International Conference on Big Data, 950–959 IEEE, (2019)
Tang, B., Yiu, M. L., Mouratidis, K., Wang, K.: Efficient motif discovery in spatial trajectories using discrete fréchet distance. In EDBT, pp. 378–389. OpenProceedings.org, (2017)
Wang, S., Ferhatosmanoglu, H.: Ppq-trajectory: spatio-temporal quantization for querying in large trajectory repositories. Proc. VLDB Endow. 14(2), 215–227 (2020)
Wang, Y., Lim, E.-P., Hwang, S.-Y.: Efficient mining of group patterns from user movement data. Data Knowl. Eng. 57(3), 240–282 (2006)
Xie, D., Li, F., Phillips, J.M.: Distributed trajectory similarity search. Proc. VLDB Endow. 10(11), 1478–1489 (2017)
Xie, D., Li, F., Yao, B., Li, G., Zhou, L., Guo, M.: Simba: efficient in-memory spatial analytics. In SIGMOD Conference, 1071–1085 ACM, (2016)
Yi, B.-K., Jagadish, H. V., Faloutsos, C.: Efficient retrieval of similar time sequences under time warping. In ICDE, pp. 201–208. IEEE Computer Society, (1998)
You, S., Zhang, J., Gruenwald, L.: Large-scale spatial join query processing in cloud. In ICDE Workshops, 34–41 IEEE Computer Society, (2015)
Yu, J., Zhang, Z., Sarwat, M.: Spatial data management in apache spark: the geospark perspective and beyond. GeoInformatica 23(1), 37–78 (2019)
Yuan, H., Li, G.: Distributed in-memory trajectory similarity search and join on road network. In ICDE, 1262–1273 (2019)
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M. J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In NSDI, 15–28 USENIX Association,(2012)
Zhang, D., Chang, Z., Wu, S., Yuan, Y., Tan, K.L., Chen, G.: Continuous trajectory similarity search for online outlier detection. IEEE Transactions on Knowledge and Data Engineering, 1 (2020)
Zhang, D., Chan, C.-Y., Tan, K.-L.: Processing spatial keyword query as a top-k aggregation query. In SIGIR, 355–364 ACM, (2014)
Zhang, D., Ding, M., Yang, D., Liu, Y., Fan, J.: Trajectory simplification: an experimental study and quality analysis. Proc. VLDB Endow. 11(9), 934–946 (2018)
Zhou, Z., Dou, W., Jia, G., Chunhua, H., Xiaolong, X., Xiaotong, W., Pan, J.: A method for real-time trajectory monitoring to improve taxi service using GPS big data. Inf. Manag. 53(8), 964–977 (2016)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, D., Chang, Z., Yang, D. et al. SQUID: subtrajectory query in trillion-scale GPS database. The VLDB Journal 32, 887–904 (2023). https://doi.org/10.1007/s00778-022-00777-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-022-00777-7