A distributed framework for large-scale semantic trajectory similarity join

Tian, Ruijie; Li, Jiajun; Zhang, Weishi; Wang, Fei

doi:10.1007/s11042-023-15236-w

A distributed framework for large-scale semantic trajectory similarity join

Published: 13 July 2023

Volume 83, pages 16205–16229, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Ruijie Tian ORCID: orcid.org/0000-0001-8913-9057¹,
Jiajun Li¹,
Weishi Zhang^1,2 &
…
Fei Wang^1,2

132 Accesses
1 Altmetric
Explore all metrics

Abstract

The similarity join is a common yet expensive operator for large-scale semantic trajectories analytics. In this paper, we propose DFST, an efficient framework for semantic trajectory similarity join in distributed systems. We devise ITS index and summary index, which consider textual, temporal, and spatial domains, and theoretically demonstrate that they can effectively prune pairs of dissimilar trajectories. Moreover, DFST can support most existing similarity functions to quantify the spatial similarity between semantic trajectories. We have conducted extensive experiments on real world datasets, and experimental results show that DFST achieves a 13.6% improvement of join performance compared to existing semantic trajectory similarity join methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Trajectory Similarity Join for Spatial Temporal Database

Spatial Temporal Trajectory Similarity Join

SST: Synchronized Spatial-Temporal Trajectory Similarity Search

Article 28 April 2020

Data Availability

The data that support the findings of this study are available from the corresponding author, [Ruijie Tian], upon reasonable request.

Notes

References

Alarabi L (2017) St-hadoop: a mapreduce framework for big spatio-temporal data. In: Proceedings of the 2017 ACM International conference on management of data. SIGMOD ’17, pp 40–42. Association for computing machinery. https://doi.org/10.1145/3055167.3055181
Alarabi L (2021) Summit: a scalable system for massive trajectory data management 10(3), 2–3. https://doi.org/10.1145/3307599.3307601. Accessed 22 Nov 2021
Belesiotis A, Skoutas D, Efstathiades C, Kaffes V, Pfoser D (2018) Spatio-textual user matching and clustering based on set similarity joins. VLDB J 27(3):297–320. https://doi.org/10.1007/s00778-018-0498-5
Article Google Scholar
Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining. AAAIWS’94, pp 359–370. AAAI Press
Bhatti UA, Huang M, Wu D, Zhang Y, Mehmood A, Han H (2019) Recommendation system using feature extraction and pattern recognition in clinical care systems. Enterp Inf Syst 13(3):329–351. https://doi.org/10.1080/17517575.2018.1557256
Article Google Scholar
Bhatti UA, Yu Z, Chanussot J, Zeeshan Z, Yuan L, Luo W, Nawaz SA, Bhatti MA, Ain QU, Mehmood A (2022) Local similarity-based spatial–spectral fusion hyperspectral image classification with deep cnn and gabor filtering. IEEE Trans Geosci Remote Sens 60:1–15. https://doi.org/10.1109/TGRS.2021.3090410
Article Google Scholar
Bouros P, Ge S, Mamoulis N (2012) Spatio-textual similarity joins. In: Proceedings of the VLDB Endowment, vol 6, pp 1–12. https://doi.org/10.14778/2428536.2428537
Chen L, Özsu MT, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data. SIGMOD ’05, pp 491–502. Association for Computing Machinery, New York. https://doi.org/10.1145/1066157.1066213
Chen L, Shang S, Jensen CS, Yao B, Kalnis P (2020) Parallel semantic trajectory similarity join. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp 997–1008. https://doi.org/10.1109/ICDE48307.2020.00091
Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: Experimental comparison of representations and distance measures. Proc. VLDB Endow. 1(2), 1542–1552. https://doi.org/10.14778/1454159.1454226
Ferrante M, Bongiorno C, Shoval N (2019) Similarity of GPS trajectories using dynamic time warping: an application to cruise tourism. In: Crocetta C (ed) Theoretical and applied statistics. Springer proceedings in mathematics & statistics. Springer, Cham, pp 91–101. https://doi.org/10.1007/978-3-030-05420-5_10
Hu H, Li G, Bao Z, Feng J, Wu Y, Gong Z, Xu Y (2016) Top-k spatio-textual similarity join. IEEE Trans Knowl Data Eng 28(2):551–565. https://doi.org/10.1109/TKDE.2015.2485213
Article Google Scholar
Li R, He H, Wang R, Ruan S, He T, Bao J, Zhang J, Hong L, Zheng Y (2021) Trajmesa: a distributed nosql-based trajectory data management system. IEEE Transactions on Knowledge and Data Engineering, 1–1. https://doi.org/10.1109/TKDE.2021.3079880
Liu S, Li G, Feng J (2012) Star-join: spatio-textual similarity join. In: Proceedings of the 21st ACM International conference on information and knowledge management. CIKM ’12, pp 2194–2198. Association for computing machinery. https://doi.org/10.1145/2396761.2398600
Liu S, Li G, Feng J (2014) A prefix-filter based method for spatio-textual similarity join. IEEE Trans Knowl Data Eng 26(10):2354–2367. https://doi.org/10.1109/TKDE.2013.83
Article Google Scholar
Mark DB, Otfried C, Marc VK, Mark O (2008) Computational geometry: algorithms and applications springer
Parent C, Spaccapietra S, Renso C, Andrienko G, Andrienko N, Bogorny V, Damiani ML, Gkoulalas-Divanis A, Macedo J, Pelekis N, Theodoridis Y, Yan Z (2021) Semantic trajectories modeling and analysis 45(4), 42–14232. https://doi.org/10.1145/2501654.2501656. Accessed 13 Dec 2021
Rao J, Lin J, Samet H (2014) Partitioning strategies for spatio-textual similarity join. In: Proceedings of the 3rd ACM SIGSPATIAL International workshop on analytics for big geospatial Data. BigSpatial ’14, pp 40–49. Association for computing machinery. https://doi.org/10.1145/2676536.2676542
Shang S, Chen L, Wei Z, Jensen CS, Zheng K, Kalnis P (2018) Parallel trajectory similarity joins in spatial networks. VLDB J 27(3):395–420. https://doi.org/10.1007/s00778-018-0502-0
Article Google Scholar
Shang Z, Li G, Bao Z (2018) Dita: distributed in-memory trajectory analytics. In: Proceedings of the 2018 International conference on management of data. SIGMOD ’18, pp 725–740. Association for computing machinery, New York, NY, USA. https://doi.org/10.1145/3183713.3183743
Ta N, Li G, Xie Y, Li C, Hao S, Feng J (2017) Signature-based trajectory similarity join. IEEE Trans Knowl Data Eng 29(4):870–883. https://doi.org/10.1109/TKDE.2017.2651821
Article Google Scholar
Tampakis P, Doulkeridis C, Pelekis N, Theodoridis Y (2020) Distributed subtrajectory join on massive datasets. ACM Trans Spatial Algo Syst 6 (2):1–29. https://doi.org/10.1145/3373642
Article Google Scholar
Toohey K, Duckham M (2015) Trajectory similarity measures. SIGSPATIAL Special 7(1):43–50. https://doi.org/10.1145/2782759.2782767
Article Google Scholar
Vu T, Eldawy A (2018) R-grove: growing a family of r-trees in the big-data forest. In: Proceedings of the 26th ACM SIGSPATIAL International conference on advances in geographic information systems. SIGSPATIAL ’18, pp 532–535. Association for computing machinery. https://doi.org/10.1145/3274895.3274984
Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P, Keogh E (2013) Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Discov 26(2):275–309. https://doi.org/10.1007/s10618-012-0250-5
Article MathSciNet Google Scholar
Wang N, Zeng J, Chen M, Zhu S (2020) An efficient algorithm for spatio-textual location matching. Distrib Parallel Databases 38(3):649–666. https://doi.org/10.1007/s10619-020-07289-9
Article Google Scholar
Wang X, Zhang W, Zhang Y, Lin X, Huang Z (2017) Top-k spatial-keyword publish/subscribe over sliding window. VLDB J 26 (3):301–326. https://doi.org/10.1007/s00778-016-0453-2
Article Google Scholar
Yuan J, Zheng Y, Xie X, Sun G (2013) T-drive: enhancing driving directions with taxi drivers’ intelligence. IEEE Trans Knowl Data Eng 25(1):220–232. https://doi.org/10.1109/TKDE.2011.200
Article Google Scholar
Zhang Y, Ma Y, Meng X (2014) Efficient spatio-textual similarity join using mapreduce. In: 2014 IEEE/WIC/ACM International joint conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), vol 1, pp 52–59. https://doi.org/10.1109/WI-IAT.2014.16
Zhang D, Tan K-L, Tung AKH (2013) Scalable top-k spatial keyword search . EDBT ’13, pp 359–370. Association for computing machinery. https://doi.org/10.1145/2452376.2452419
Zheng K, Shang S, Yuan NJ, Yang Y (2013) Towards efficient search for activity trajectories. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp 230–241. https://doi.org/10.1109/ICDE.2013.6544828
Zheng B, Yuan NJ, Zheng K, Xie X, Sadiq S, Zhou X (2015) Approximate keyword search in semantic trajectory database. In: 2015 IEEE 31st International conference on data engineering, pp 975–986. https://doi.org/10.1109/ICDE.2015.7113349
Zheng B, Zheng K, Sharaf MA, Zhou X, Sadiq S (2014) Efficient retrieval of top-k most similar users from travel smart card data. In: 2014 IEEE 15th International conference on mobile data management, vol 1, pp 259–268. https://doi.org/10.1109/MDM.2014.38
Zheng K, Zheng B, Xu J, Liu G, Liu A, Li Z (2017) Popularity-aware spatial keyword search on activity trajectories. World Wide Web 20(4):749–773. https://doi.org/10.1007/s11280-016-0414-0
Article Google Scholar
Yang S, Cheema MA, Lin X, Zhang Y, Zhang W (2017) Reverse k nearest neighbors queries and spatial reverse top-k queries. The VLDB Journal 26:151–176. https://doi.org/10.1007/s00778-016-0445-2
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China (2020YFF0410947) and the National Natural Science Foundation of China (62103072). Additional funding was provided by the China Postdoctoral Science Foundation (2021M690502) and Fundamental Research Funds for the Central Universities (3132022647).

Author information

Authors and Affiliations

Information Science and Technology College, Dalian Maritime University, Dalian, 116026, Liaoning, China
Ruijie Tian, Jiajun Li, Weishi Zhang & Fei Wang
Key Laboratory of Intelligent Software, Dalian, 116026, Liaoning, China
Weishi Zhang & Fei Wang

Authors

Ruijie Tian
View author publications
You can also search for this author in PubMed Google Scholar
Jiajun Li
View author publications
You can also search for this author in PubMed Google Scholar
Weishi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Fei Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Weishi Zhang or Fei Wang.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Other distance measures

A.1 Dynamic Time Warping (DTW)

It computes the minimum cumulative distance when two trajectories match [4].

Definition 5

(DTW) Given two trajectories $\mathcal {T}=\left \lbrace o^{\mathcal {T}}_1,\ldots ,o^{\mathcal {T}}_m\right \rbrace $ and $tr=\left \lbrace o^{tr}_1,\ldots ,o^{tr}_n\right \rbrace $, DTW [4] is computed as below.

$$ DTW(\mathcal{T},tr)=\begin{cases} {\sum}_{i=1}^{m}dist(o^{\mathcal{T}}_{i}.p,o^{tr}_{1}.p)&if\ n=1\\ {\sum}_{j=1}^{n}dist(o^{\mathcal{T}}_{1}.p,o^{tr}_{j}.p)&if\ m=1\\ dist(o^{\mathcal{T}}_{m}.p,o^{tr}_{n}.p)+\min\Big(DTW(\mathcal{T}^{m-1},tr^{n-1}),\\ DTW(\mathcal{T}^{m-1},tr),DTW(\mathcal{T},tr^{n-1})\Big)& otherwise \end{cases} $$

(16)

where $\mathcal {T}^{m-1}$ is the prefix trajectory of $\mathcal {T}$ by removing the last point.

According to (6) and Definition 5, we can conclude that $DTW(\mathcal {T},tr)$ is not less than Fréchet$(\mathcal {T},tr)$ constant. Given two trajectories $\mathcal {T}_1$ and $\mathcal {T}_3$ in Fig. 17, $DTW(\mathcal {T}_1,\mathcal {T}_3)=6.41 >$ Fréchet$(\mathcal {T}_1,\mathcal {T}_3)=1.41$. To support DTW, DFST doesn’t need to update ε by accumulating distances from it when querying the partitions. Similarly, we can still utilize partition/node distance lower bound pruning and summary pruning.

A.2 Edit Distance on Real Sequences(EDR)

Definition 6 (EDR)

Given two trajectories $\mathcal {T}$ and $\mathcal {Q}$, and a matching threshold δ ≥ 0, EDR_δ [23] is:

$$ EDR_{\delta}(\mathcal{T},\mathcal{Q})= \begin{cases} n&if\ m=0\\ m&if\ n=0\\ \min\Big(EDR_{\delta}(\mathcal{T}^{2,m},q^{2,n})+subcost(t_{1},q_{1}),\\ \text{EDR}_{\delta}(\mathcal{T}^{2,m},q)+1,EDR_{\delta}(\mathcal{T},q^{2,n})+1\Big)&otherwise \end{cases} $$

(17)

where $\mathcal {T}^{2,m}$ stands for trajectory $ \mathcal {T} $ with its first point removed, and subcost(t,q) = 0 if dist(t,q) ≤ δ; 1 otherwise.

Given two trajectories $\mathcal {T}_1$ and $\mathcal {T}_3$ in Fig. 17, let δ = 1, we have $EDR_{\delta }(\mathcal {T}_1,\mathcal {T}_3)=2$. To support EDR, for the MBR of each partition, we compute the distance. If it exceeds δ, subcost(t,q) is always equal to 1 and $EDR_{\delta }(\mathcal {T},tr)=\max \limits (m,n)$, we safely prune this partition.

A.3 Longest Common Subsequence Distance (LCSS)

Definition 7 (LCSS)

Given two trajectories $\mathcal {T}$ and $\mathcal {Q}$ with lengths m and n, and a matching threshold δ, LCSS_δ is defined as below [23]:

$$ LCSS_{\delta}(\mathcal{T},\mathcal{Q})=\begin{cases} n&if\ m=0\\ m&if\ n=0\\ 1+LCSS_{\delta}(\mathcal{T}^{m-1},\mathcal{Q}^{n-1})&if\ dist(t_{m},q_{n})\leq \delta\\ \max\Big(LCSS_{\delta}(\mathcal{T}^{m-1},\mathcal{Q}),\\ LCSS_{\delta}(\mathcal{T},\mathcal{Q}^{n-1})\Big) &otherwise \end{cases} $$

(18)

where $\mathcal {T}^{m-1}$ is the prefix trajectory of $ \mathcal {T} $ with the last point removed.

Given two trajectories $\mathcal {T}_1$ and $\mathcal {T}_3$ in Fig. 17, let δ = 1, we have $LCSS_{\delta }(\mathcal {T}_1,\mathcal {T}_3)=5$. Similar to EDR, for each partition’s MBR, we compute the distance to the query trajectory tr. According Definition 7, if it is beyond δ, $LCSS_{\delta }(\mathcal {T},tr)$ is always equal to 0, we also safely prune this partition.

Appendix B: Comparison with other distance measures

We evaluated DFST’s performance with different distance measures, including Fréchet, DTW, EDR and LCSS (ε = 0.0001), in Fig. 18. We could observe that: (1) Fréchet was slower than DTW with the same threshold, because DTW sums the values up through the whole path from (1, 1) to (m,n) while Fréchet chooses the maximum value; (2) LCSS is as fast as EDR because the time complexity of both LCSS and EDR is O(mn).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tian, R., Li, J., Zhang, W. et al. A distributed framework for large-scale semantic trajectory similarity join. Multimed Tools Appl 83, 16205–16229 (2024). https://doi.org/10.1007/s11042-023-15236-w

Download citation

Received: 09 August 2022
Revised: 29 September 2022
Accepted: 30 March 2023
Published: 13 July 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s11042-023-15236-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A distributed framework for large-scale semantic trajectory similarity join

Abstract

Access this article

Similar content being viewed by others

Trajectory Similarity Join for Spatial Temporal Database

Spatial Temporal Trajectory Similarity Join

SST: Synchronized Spatial-Temporal Trajectory Similarity Search

Data Availability

Notes

References

Acknowledgements