Parallel trajectory similarity joins in spatial networks

Abstract

The matching of similar pairs of objects, called similarity join, is fundamental functionality in data management. We consider two cases of trajectory similarity joins (TS-Joins), including a threshold-based join (Tb-TS-Join) and a top-k TS-Join (k-TS-Join), where the objects are trajectories of vehicles moving in road networks. Given two sets of trajectories and a threshold \(\theta \), the Tb-TS-Join returns all pairs of trajectories from the two sets with similarity above \(\theta \). In contrast, the k-TS-Join does not take a threshold as a parameter, and it returns the top-k most similar trajectory pairs from the two sets. The TS-Joins target diverse applications such as trajectory near-duplicate detection, data cleaning, ridesharing recommendation, and traffic congestion prediction. With these applications in mind, we provide purposeful definitions of similarity. To enable efficient processing of the TS-Joins on large sets of trajectories, we develop search space pruning techniques and enable use of the parallel processing capabilities of modern processors. Specifically, we present a two-phase divide-and-conquer search framework that lays the foundation for the algorithms for the Tb-TS-Join and the k-TS-Join that rely on different pruning techniques to achieve efficiency. For each trajectory, the algorithms first find similar trajectories. Then they merge the results to obtain the final result. The algorithms for the two joins exploit different upper and lower bounds on the spatiotemporal trajectory similarity and different heuristic scheduling strategies for search space pruning. Their per-trajectory searches are independent of each other and can be performed in parallel, and the mergings have constant cost. An empirical study with real data offers insight in the performance of the algorithms and demonstrates that they are capable of outperforming well-designed baseline algorithms by an order of magnitude.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Notes

  1. 1.

    https://www.bing.com/maps/.

  2. 2.

    https://maps.google.com/.

  3. 3.

    https://www.mapquest.com.

  4. 4.

    https://www.bikely.com/.

  5. 5.

    https://www.gps-waypoints.net.

  6. 6.

    https://www.sharemyroutes.com/.

  7. 7.

    https://research.microsoft.com/en-us/projects/geolife/.

  8. 8.

    https://www.twitter.com/.

  9. 9.

    https://www.Facebook.com/.

  10. 10.

    https://www.Foursquare.com/.

  11. 11.

    https://publish.illinois.edu/dbwork/open-data/.

  12. 12.

    https://www.microsoft.com/en-us/research/publication/t-drive-trajectory-data-sample/.

References

  1. 1.

    Agrawal, R., Lin, K., Sawhney, H.S., Shim, K.: Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In: VLDB, pp. 490–501 (1995)

  2. 2.

    Bakalov, P., Hadjieleftheriou, M., Keogh, E.J., Tsotras, V.J.: Efficient trajectory joins using symbolic representations. In: MDM, pp. 86–93 (2005)

  3. 3.

    Bakalov, P., Tsotras, V.J.: Continuous spatiotemporal trajectory joins. In: GSN, pp. 109–128 (2006)

  4. 4.

    Brakatsoulas, S., Pfoser, D., Salas, R., Wenk, C.: On map-matching vehicle tracking data. In: VLDB, pp. 853–864 (2005)

  5. 5.

    Chen, L., Özsu, M.T., Oria, V.: Robust and fast similarity search for moving object trajectories. In: SIGMOD, pp. 491–502 (2005)

  6. 6.

    Chen, Y., Patel, J.M.: Design and evaluation of trajectory join algorithms. In: ACM-GIS, pp. 266–275 (2009)

  7. 7.

    Chen, Z., Shen, H.T., Zhou, X., Zheng, Y., Xie, X.: Searching trajectories by locations: an efficiency study. In: SIGMOD, pp. 255–266 (2010)

  8. 8.

    de Berg, M., Cheong, O., van Kreveld, M., Overmars, M.: Computational Geometry: Algorithms and Applications. Springer, Berlin (2008)

    Google Scholar 

  9. 9.

    Dijkstra, E.W.: A note on two problems in connection with graphs. Numer. Math. 1, 269–271 (1959)

    MathSciNet  Article  MATH  Google Scholar 

  10. 10.

    Ding, H., Trajcevski, G., Scheuermann, P.: Efficient similarity join of large sets of moving object trajectories. In: TIME, pp. 79–87 (2008)

  11. 11.

    Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: SIGMOD, pp. 47–57 (1984)

  12. 12.

    Jiang, Y., Li, G., Feng, J., Li, W.: String similarity joins: an experimental evaluation. PVLDB 7(8), 625–636 (2014)

    Google Scholar 

  13. 13.

    Luo, W., Tan, H., Chen, L., Ni, L.M.: Finding time period-based most frequent path in big trajectory data. In: SIGMOD, pp. 713–724 (2013)

  14. 14.

    Shang, S., Chen, L., Jensen, C.S., Wen, J., Kalnis, P.: Searching trajectories by regions of interest. IEEE Trans. Knowl. Data Eng. 29(7), 1549–1562 (2017)

    Article  Google Scholar 

  15. 15.

    Shang, S., Chen, L., Wei, Z., Jensen, C.S., Wen, J., Kalnis, P.: Collective travel planning in spatial networks. IEEE Trans. Knowl. Data Eng. 28(5), 1132–1146 (2016)

    Article  Google Scholar 

  16. 16.

    Shang, S., Chen, L., Wei, Z., Jensen, C.S., Zheng, K., Kalnis, P.: Trajectory similarity join in spatial networks. PVLDB 10(11), 1178–1189 (2017)

    Google Scholar 

  17. 17.

    Shang, S., Ding, R., Yuan, B., Xie, K., Zheng, K., Kalnis, P.: User oriented trajectory search for trip recommendation. In: EDBT, pp. 156–167 (2012)

  18. 18.

    Shang, S., Ding, R., Zheng, K., Jensen, C.S., Kalnis, P., Zhou, X.: Personalized trajectory matching in spatial networks. VLDB J. 23(3), 449–468 (2014)

    Article  Google Scholar 

  19. 19.

    Shang, S., Zheng, K., Jensen, C.S., Yang, B., Kalnis, P., Li, G., Wen, J.: Discovery of path nearby clusters in spatial networks. IEEE Trans. Knowl. Data Eng. 27(6), 1505–1518 (2015)

    Article  Google Scholar 

  20. 20.

    Ta, N., Li, G., Xie, Y., Li, C., Hao, S., Feng, J.: Signature-based trajectory similarity join. IEEE Trans. Knowl. Data Eng. 29(4), 870–883 (2017)

    Article  Google Scholar 

  21. 21.

    Tang, L.A., Zheng, Y., Xie, X., Yuan, J., Yu, X., Han, J.: Retrieving k-nearest neighboring trajectories by a set of point locations. In: SSTD, pp. 223–241 (2011)

  22. 22.

    Wenk, C., Salas, R., Pfoser, D.: Addressing the need for map-matching speed: localizing global curve-matching algorithms. In: SSDBM, pp. 379–388 (2006)

  23. 23.

    Yi, B., Jagadish, H.V., Faloutsos, C.: Efficient retrieval of similar time sequences under time warping. In: ICDE, pp. 201–208 (1998)

  24. 24.

    Yuan, J., Zheng, Y., Xie, X., Sun, G.: Driving with knowledge from the physical world. In: SIGKDD, pp. 316–324 (2011)

  25. 25.

    Yuan, J., Zheng, Y., Zhang, C., Xie, W., Xie, X., Sun, G., Huang, Y.: T-drive: driving directions based on taxi trajectories. In: ACM SIGSPATIAL, pp. 99–108 (2010)

  26. 26.

    Zheng, K., Shang, S., Yuan, N.J., Yang, Y.: Towards efficient search for activity trajectories. In: ICDE, pp. 230–241 (2013)

  27. 27.

    Zheng, K., Zheng, Y., Yuan, N.J., Shang, S., Zhou, X.: Online discovery of gathering patterns over trajectories. IEEE Trans. Knowl. Data Eng. 26(8), 1974–1988 (2014)

    Article  Google Scholar 

  28. 28.

    Zhou, J., Tung, A.K.H.,Wu, W., Ng, W.S.: A “semi-lazy” approach to probabilistic path prediction. In: SIGKDD, pp. 748–756 (2013)

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Lisi Chen.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Shang, S., Chen, L., Wei, Z. et al. Parallel trajectory similarity joins in spatial networks. The VLDB Journal 27, 395–420 (2018). https://doi.org/10.1007/s00778-018-0502-0

Download citation

Keywords

  • Trajectory similarity join
  • Parallel processing
  • Spatial networks
  • Spatiotemporal databases