Skip to main content
Log in

A distributed framework for large-scale semantic trajectory similarity join

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The similarity join is a common yet expensive operator for large-scale semantic trajectories analytics. In this paper, we propose DFST, an efficient framework for semantic trajectory similarity join in distributed systems. We devise ITS index and summary index, which consider textual, temporal, and spatial domains, and theoretically demonstrate that they can effectively prune pairs of dissimilar trajectories. Moreover, DFST can support most existing similarity functions to quantify the spatial similarity between semantic trajectories. We have conducted extensive experiments on real world datasets, and experimental results show that DFST achieves a 13.6% improvement of join performance compared to existing semantic trajectory similarity join methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Data Availability

The data that support the findings of this study are available from the corresponding author, [Ruijie Tian], upon reasonable request.

Notes

  1. https://foursquare.com

  2. https://www.instagram.com

  3. https://twitter.com

  4. https://databank.illinois.edu/datasets/IDB-9610843

References

  1. Alarabi L (2017) St-hadoop: a mapreduce framework for big spatio-temporal data. In: Proceedings of the 2017 ACM International conference on management of data. SIGMOD ’17, pp 40–42. Association for computing machinery. https://doi.org/10.1145/3055167.3055181

  2. Alarabi L (2021) Summit: a scalable system for massive trajectory data management 10(3), 2–3. https://doi.org/10.1145/3307599.3307601. Accessed 22 Nov 2021

  3. Belesiotis A, Skoutas D, Efstathiades C, Kaffes V, Pfoser D (2018) Spatio-textual user matching and clustering based on set similarity joins. VLDB J 27(3):297–320. https://doi.org/10.1007/s00778-018-0498-5

    Article  Google Scholar 

  4. Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining. AAAIWS’94, pp 359–370. AAAI Press

  5. Bhatti UA, Huang M, Wu D, Zhang Y, Mehmood A, Han H (2019) Recommendation system using feature extraction and pattern recognition in clinical care systems. Enterp Inf Syst 13(3):329–351. https://doi.org/10.1080/17517575.2018.1557256

    Article  Google Scholar 

  6. Bhatti UA, Yu Z, Chanussot J, Zeeshan Z, Yuan L, Luo W, Nawaz SA, Bhatti MA, Ain QU, Mehmood A (2022) Local similarity-based spatial–spectral fusion hyperspectral image classification with deep cnn and gabor filtering. IEEE Trans Geosci Remote Sens 60:1–15. https://doi.org/10.1109/TGRS.2021.3090410

    Article  Google Scholar 

  7. Bouros P, Ge S, Mamoulis N (2012) Spatio-textual similarity joins. In: Proceedings of the VLDB Endowment, vol 6, pp 1–12. https://doi.org/10.14778/2428536.2428537

  8. Chen L, Özsu MT, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data. SIGMOD ’05, pp 491–502. Association for Computing Machinery, New York. https://doi.org/10.1145/1066157.1066213

  9. Chen L, Shang S, Jensen CS, Yao B, Kalnis P (2020) Parallel semantic trajectory similarity join. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp 997–1008. https://doi.org/10.1109/ICDE48307.2020.00091

  10. Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: Experimental comparison of representations and distance measures. Proc. VLDB Endow. 1(2), 1542–1552. https://doi.org/10.14778/1454159.1454226

  11. Ferrante M, Bongiorno C, Shoval N (2019) Similarity of GPS trajectories using dynamic time warping: an application to cruise tourism. In: Crocetta C (ed) Theoretical and applied statistics. Springer proceedings in mathematics & statistics. Springer, Cham, pp 91–101. https://doi.org/10.1007/978-3-030-05420-5_10

  12. Hu H, Li G, Bao Z, Feng J, Wu Y, Gong Z, Xu Y (2016) Top-k spatio-textual similarity join. IEEE Trans Knowl Data Eng 28(2):551–565. https://doi.org/10.1109/TKDE.2015.2485213

    Article  Google Scholar 

  13. Li R, He H, Wang R, Ruan S, He T, Bao J, Zhang J, Hong L, Zheng Y (2021) Trajmesa: a distributed nosql-based trajectory data management system. IEEE Transactions on Knowledge and Data Engineering, 1–1. https://doi.org/10.1109/TKDE.2021.3079880

  14. Liu S, Li G, Feng J (2012) Star-join: spatio-textual similarity join. In: Proceedings of the 21st ACM International conference on information and knowledge management. CIKM ’12, pp 2194–2198. Association for computing machinery. https://doi.org/10.1145/2396761.2398600

  15. Liu S, Li G, Feng J (2014) A prefix-filter based method for spatio-textual similarity join. IEEE Trans Knowl Data Eng 26(10):2354–2367. https://doi.org/10.1109/TKDE.2013.83

    Article  Google Scholar 

  16. Mark DB, Otfried C, Marc VK, Mark O (2008) Computational geometry: algorithms and applications springer

  17. Parent C, Spaccapietra S, Renso C, Andrienko G, Andrienko N, Bogorny V, Damiani ML, Gkoulalas-Divanis A, Macedo J, Pelekis N, Theodoridis Y, Yan Z (2021) Semantic trajectories modeling and analysis 45(4), 42–14232. https://doi.org/10.1145/2501654.2501656. Accessed 13 Dec 2021

  18. Rao J, Lin J, Samet H (2014) Partitioning strategies for spatio-textual similarity join. In: Proceedings of the 3rd ACM SIGSPATIAL International workshop on analytics for big geospatial Data. BigSpatial ’14, pp 40–49. Association for computing machinery. https://doi.org/10.1145/2676536.2676542

  19. Shang S, Chen L, Wei Z, Jensen CS, Zheng K, Kalnis P (2018) Parallel trajectory similarity joins in spatial networks. VLDB J 27(3):395–420. https://doi.org/10.1007/s00778-018-0502-0

    Article  Google Scholar 

  20. Shang Z, Li G, Bao Z (2018) Dita: distributed in-memory trajectory analytics. In: Proceedings of the 2018 International conference on management of data. SIGMOD ’18, pp 725–740. Association for computing machinery, New York, NY, USA. https://doi.org/10.1145/3183713.3183743

  21. Ta N, Li G, Xie Y, Li C, Hao S, Feng J (2017) Signature-based trajectory similarity join. IEEE Trans Knowl Data Eng 29(4):870–883. https://doi.org/10.1109/TKDE.2017.2651821

    Article  Google Scholar 

  22. Tampakis P, Doulkeridis C, Pelekis N, Theodoridis Y (2020) Distributed subtrajectory join on massive datasets. ACM Trans Spatial Algo Syst 6 (2):1–29. https://doi.org/10.1145/3373642

    Article  Google Scholar 

  23. Toohey K, Duckham M (2015) Trajectory similarity measures. SIGSPATIAL Special 7(1):43–50. https://doi.org/10.1145/2782759.2782767

    Article  Google Scholar 

  24. Vu T, Eldawy A (2018) R-grove: growing a family of r-trees in the big-data forest. In: Proceedings of the 26th ACM SIGSPATIAL International conference on advances in geographic information systems. SIGSPATIAL ’18, pp 532–535. Association for computing machinery. https://doi.org/10.1145/3274895.3274984

  25. Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P, Keogh E (2013) Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Discov 26(2):275–309. https://doi.org/10.1007/s10618-012-0250-5

    Article  MathSciNet  Google Scholar 

  26. Wang N, Zeng J, Chen M, Zhu S (2020) An efficient algorithm for spatio-textual location matching. Distrib Parallel Databases 38(3):649–666. https://doi.org/10.1007/s10619-020-07289-9

    Article  Google Scholar 

  27. Wang X, Zhang W, Zhang Y, Lin X, Huang Z (2017) Top-k spatial-keyword publish/subscribe over sliding window. VLDB J 26 (3):301–326. https://doi.org/10.1007/s00778-016-0453-2

    Article  Google Scholar 

  28. Yuan J, Zheng Y, Xie X, Sun G (2013) T-drive: enhancing driving directions with taxi drivers’ intelligence. IEEE Trans Knowl Data Eng 25(1):220–232. https://doi.org/10.1109/TKDE.2011.200

    Article  Google Scholar 

  29. Zhang Y, Ma Y, Meng X (2014) Efficient spatio-textual similarity join using mapreduce. In: 2014 IEEE/WIC/ACM International joint conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), vol 1, pp 52–59. https://doi.org/10.1109/WI-IAT.2014.16

  30. Zhang D, Tan K-L, Tung AKH (2013) Scalable top-k spatial keyword search . EDBT ’13, pp 359–370. Association for computing machinery. https://doi.org/10.1145/2452376.2452419

  31. Zheng K, Shang S, Yuan NJ, Yang Y (2013) Towards efficient search for activity trajectories. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp 230–241. https://doi.org/10.1109/ICDE.2013.6544828

  32. Zheng B, Yuan NJ, Zheng K, Xie X, Sadiq S, Zhou X (2015) Approximate keyword search in semantic trajectory database. In: 2015 IEEE 31st International conference on data engineering, pp 975–986. https://doi.org/10.1109/ICDE.2015.7113349

  33. Zheng B, Zheng K, Sharaf MA, Zhou X, Sadiq S (2014) Efficient retrieval of top-k most similar users from travel smart card data. In: 2014 IEEE 15th International conference on mobile data management, vol 1, pp 259–268. https://doi.org/10.1109/MDM.2014.38

  34. Zheng K, Zheng B, Xu J, Liu G, Liu A, Li Z (2017) Popularity-aware spatial keyword search on activity trajectories. World Wide Web 20(4):749–773. https://doi.org/10.1007/s11280-016-0414-0

    Article  Google Scholar 

  35. Yang S, Cheema MA, Lin X, Zhang Y, Zhang W (2017) Reverse k nearest neighbors queries and spatial reverse top-k queries. The VLDB Journal 26:151–176. https://doi.org/10.1007/s00778-016-0445-2

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China (2020YFF0410947) and the National Natural Science Foundation of China (62103072). Additional funding was provided by the China Postdoctoral Science Foundation (2021M690502) and Fundamental Research Funds for the Central Universities (3132022647).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Weishi Zhang or Fei Wang.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Other distance measures

A.1 Dynamic Time Warping (DTW)

It computes the minimum cumulative distance when two trajectories match [4].

Definition 5

(DTW) Given two trajectories \(\mathcal {T}=\left \lbrace o^{\mathcal {T}}_1,\ldots ,o^{\mathcal {T}}_m\right \rbrace \) and \(tr=\left \lbrace o^{tr}_1,\ldots ,o^{tr}_n\right \rbrace \), DTW [4] is computed as below.

$$ DTW(\mathcal{T},tr)=\begin{cases} {\sum}_{i=1}^{m}dist(o^{\mathcal{T}}_{i}.p,o^{tr}_{1}.p)&if\ n=1\\ {\sum}_{j=1}^{n}dist(o^{\mathcal{T}}_{1}.p,o^{tr}_{j}.p)&if\ m=1\\ dist(o^{\mathcal{T}}_{m}.p,o^{tr}_{n}.p)+\min\Big(DTW(\mathcal{T}^{m-1},tr^{n-1}),\\ DTW(\mathcal{T}^{m-1},tr),DTW(\mathcal{T},tr^{n-1})\Big)& otherwise \end{cases} $$
(16)

where \(\mathcal {T}^{m-1}\) is the prefix trajectory of \(\mathcal {T}\) by removing the last point.

According to (6) and Definition 5, we can conclude that \(DTW(\mathcal {T},tr)\) is not less than Fréchet\((\mathcal {T},tr)\) constant. Given two trajectories \(\mathcal {T}_1\) and \(\mathcal {T}_3\) in Fig. 17, \(DTW(\mathcal {T}_1,\mathcal {T}_3)=6.41 >\) Fréchet\((\mathcal {T}_1,\mathcal {T}_3)=1.41\). To support DTW, DFST doesn’t need to update ε by accumulating distances from it when querying the partitions. Similarly, we can still utilize partition/node distance lower bound pruning and summary pruning.

Fig. 17
figure 17

Example Trajectories

A.2 Edit Distance on Real Sequences(EDR)

Definition 6 (EDR)

Given two trajectories \(\mathcal {T}\) and \(\mathcal {Q}\), and a matching threshold δ ≥ 0, EDRδ [23] is:

$$ EDR_{\delta}(\mathcal{T},\mathcal{Q})= \begin{cases} n&if\ m=0\\ m&if\ n=0\\ \min\Big(EDR_{\delta}(\mathcal{T}^{2,m},q^{2,n})+subcost(t_{1},q_{1}),\\ \text{EDR}_{\delta}(\mathcal{T}^{2,m},q)+1,EDR_{\delta}(\mathcal{T},q^{2,n})+1\Big)&otherwise \end{cases} $$
(17)

where \(\mathcal {T}^{2,m}\) stands for trajectory \( \mathcal {T} \) with its first point removed, and subcost(t,q) = 0 if dist(t,q) ≤ δ; 1 otherwise.

Given two trajectories \(\mathcal {T}_1\) and \(\mathcal {T}_3\) in Fig. 17, let δ = 1, we have \(EDR_{\delta }(\mathcal {T}_1,\mathcal {T}_3)=2\). To support EDR, for the MBR of each partition, we compute the distance. If it exceeds δ, subcost(t,q) is always equal to 1 and \(EDR_{\delta }(\mathcal {T},tr)=\max \limits (m,n)\), we safely prune this partition.

A.3 Longest Common Subsequence Distance (LCSS)

Definition 7 (LCSS)

Given two trajectories \(\mathcal {T}\) and \(\mathcal {Q}\) with lengths m and n, and a matching threshold δ, LCSSδ is defined as below [23]:

$$ LCSS_{\delta}(\mathcal{T},\mathcal{Q})=\begin{cases} n&if\ m=0\\ m&if\ n=0\\ 1+LCSS_{\delta}(\mathcal{T}^{m-1},\mathcal{Q}^{n-1})&if\ dist(t_{m},q_{n})\leq \delta\\ \max\Big(LCSS_{\delta}(\mathcal{T}^{m-1},\mathcal{Q}),\\ LCSS_{\delta}(\mathcal{T},\mathcal{Q}^{n-1})\Big) &otherwise \end{cases} $$
(18)

where \(\mathcal {T}^{m-1}\) is the prefix trajectory of \( \mathcal {T} \) with the last point removed.

Given two trajectories \(\mathcal {T}_1\) and \(\mathcal {T}_3\) in Fig. 17, let δ = 1, we have \(LCSS_{\delta }(\mathcal {T}_1,\mathcal {T}_3)=5\). Similar to EDR, for each partition’s MBR, we compute the distance to the query trajectory tr. According Definition 7, if it is beyond δ, \(LCSS_{\delta }(\mathcal {T},tr)\) is always equal to 0, we also safely prune this partition.

Appendix B: Comparison with other distance measures

We evaluated DFST’s performance with different distance measures, including Fréchet, DTW, EDR and LCSS (ε = 0.0001), in Fig. 18. We could observe that: (1) Fréchet was slower than DTW with the same threshold, because DTW sums the values up through the whole path from (1, 1) to (m,n) while Fréchet chooses the maximum value; (2) LCSS is as fast as EDR because the time complexity of both LCSS and EDR is O(mn).

Fig. 18
figure 18

Other distance measures

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tian, R., Li, J., Zhang, W. et al. A distributed framework for large-scale semantic trajectory similarity join. Multimed Tools Appl 83, 16205–16229 (2024). https://doi.org/10.1007/s11042-023-15236-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15236-w

Keywords

Navigation