Skip to main content
Log in

Osprey: a heterogeneous search framework for spatial-temporal similarity

  • Regular Paper
  • Published:
Computing Aims and scope Submit manuscript

Abstract

In this paper, a heterogeneous spatial-temporal similarity search framework is proposed, in which the datasets come from multiple different asynchronous data sources. Due to measuring error, data loss, and other factors, the similarity search based on single points along a trajectory usually cannot fulfill the accuracy requirements in our heterogeneous case. To address this issue, we introduce a concept of the spatial-temporal cluster of points, instead of single points, which can be identified for each target query. By following this concept, we further design a spectral clustering algorithm to construct the clusters in the pre-processing phase effectively. And the query processing is improved for the accuracy of the search by unifying multiple search metrics. To validate our idea, we also prototype a clustered online spatial-temporal similarity search system, "Osprey", to calculate in parallel the similarity of spatial-temporal sequences in the heterogeneous search on a distributed database. Our empirical study is conducted based on an open dataset, called "T-Drive", and a billion-scale dataset consisting of WiFi positioning records gathered from the urban metro system in Shenzhen, China. The experimental results show that the latency of our proposed system is less than 4s in most cases, and the accuracy is more than 70% when the similarity exceeds 0.5.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. The paths up and down a viaduct are actually different paths although they have small Euclidean distances.

References

  1. Chen R, Jankovic F, Marinsek N, Foschini L, Kourtis L, Signorini A, Pugh M, Shen J, Yaari R, Maljkovic V et al. (2019) Developing measures of cognitive impairment in the real world from consumer-grade multimodal sensor streams. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 2145–2155

  2. Shang S, Chen L, Jensen CS, Wen J-R, Kalnis P (2017) Searching trajectories by regions of interest. IEEE Trans Knowl Data Eng 29(7):1549–1562. https://doi.org/10.1109/TKDE.2017.2685504

    Article  Google Scholar 

  3. Chen L, Özsu MT, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data. SIGMOD ’05, pp. 491–502. Association for computing machinery, New York, NY, USA. https://doi.org/10.1145/1066157.1066213

  4. Ta N, Li G, Xie Y, Li C, Hao S, Feng J (2017) Signature-based trajectory similarity join. IEEE Trans Knowl Data Eng 29(4):870–883. https://doi.org/10.1109/TKDE.2017.2651821

    Article  Google Scholar 

  5. Xie D, Li F, Phillips JM (2017) Distributed trajectory similarity search. In: VLDB 10:1478–1489

    Google Scholar 

  6. Ying R, Pan J, Fox K, Agarwal PK (2016) A simple efficient approximation algorithm for dynamic time warping. In: Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. SIGSPACIAL ’16. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2996913.2996954

  7. Ranu SPD, Telang AD, Deshpande P, Raghavan S (2015) Indexing and matching trajectories under inconsistent sampling rates. In: 2015 IEEE 31st International conference on data engineering, pp. 999–1010. https://doi.org/10.1109/ICDE.2015.7113351

  8. Mao Y, Zhong H, Xiao X, Li X (2017) A segment-based trajectory similarity measure in the urban transportation systems. Sensors 17(3):524

    Article  Google Scholar 

  9. Li X, Zhao K, Cong G, Jensen CS, Wei W (2018) Deep representation learning for trajectory similarity computation. In: 2018 IEEE 34th International conference on data engineering (ICDE), pp. 617–628. IEEE

  10. Shang S, Chen L, Jensen CS, Wen J-R, Kalnis P (2017) Searching trajectories by regions of interest. IEEE Trans Knowl Data Eng 29(7):1549–1562

    Article  Google Scholar 

  11. Zhang L, Zhao L, Wang Z, Liu J (2017) Wifi networks in metropolises: from access point and user perspectives. IEEE Communicat Magaz 55(5):42–48

    Article  Google Scholar 

  12. Shang S, Chen L, Wei Z, Jensen CS, Zheng K, Kalnis P (2017) Trajectory similarity join in spatial networks. Proc. VLDB Endow. 10(11), 1178–1189. https://doi.org/10.14778/3137628.3137630

  13. Zheng Y, Zhang L, Ma Z, Xie X, Ma WY (2011) Recommending friends and locations based on individual location history. ACM Trans Web. https://doi.org/10.1145/1921591.1921596

    Article  Google Scholar 

  14. Shang S, Ding R, Zheng K, Jensen CS, Kalnis P, Zhou X (2014) Personalized trajectory matching in spatial networks. VLDB J 23(3):449–468. https://doi.org/10.1007/s00778-013-0331-0

    Article  Google Scholar 

  15. Zheng, K., Yang, Y., Shang, S., Yuan, N.J.: Towards efficient search for activity trajectories. In: Proceedings of the 2013 IEEE international conference on data engineering (ICDE 2013). ICDE ’13, pp. 230–241. IEEE Computer Society, USA (2013)

  16. Keogh E, Ratanamahatana CA (2005) Exact indexing of dynamic time warping. Knowl Info Sys 7(3):358–386

    Article  Google Scholar 

  17. Vlachos M, Kollios G, Gunopulos D (2002) Discovering similar multidimensional trajectories. In: Proceedings 18th International conference on data engineering, pp. 673–684. https://doi.org/10.1109/ICDE.2002.994784

  18. Willkomm J, Bettinger J, Schäler MBöhm K (2019) Efficient interval-focused similarity search under dynamic time warping. ACM International conference proceeding series, 130–139

  19. Lucas B, Shifaz A, Pelletier C, O’Neill L, Zaidi N, Goethals B, Petitjean F, Webb GI (2019) Proximity Forest: an effective and scalable distance-based classifier for time series. Data Min Knowl Discov 33(3):607–635

    Article  Google Scholar 

  20. Kondor D, Hashemian B, de Montjoye Y-A, Ratti C (2020) Towards matching user mobility traces in large-scale datasets. IEEE Trans Big Data 6(4):714–726

    Article  Google Scholar 

  21. Pelekis N, Kopanakis I, Marketos G, Ntoutsi I, Andrienko G, Theodoridis Y (2007) Similarity search in trajectory databases. In: 14th International symposium on temporal representation and reasoning (TIME’07), pp. 129–140. IEEE

  22. Patrou M, Alam MM, Memarzia P, Ray S, Bhavsar VC, Kent KB, Dueck GW (2018) DISTIL: A distributed in-memory data processing system for location-based services. GIS: Proceedings of the ACM international symposium on advances in geographic information systems, 496–499

  23. Memarzia P, Patrou M, Alam MM, Ray S, Bhavsar VC, Kent KB (2019) Toward efficient processing of spatio-temporal workloads in a distributed in-memory system, 118–127. IEEE

  24. Sun L, Zhou W (2017) A multi-source trajectory correlation algorithm based on spatial-temporal similarity. In: 2017 20th International conference on information fusion (Fusion), pp. 1–7. IEEE

  25. Hung C-C, Peng W-C, Lee W-C (2015) Clustering and aggregating clues of trajectories for mining trajectory patterns and routes. VLDB J 24(2):169–192

    Article  Google Scholar 

  26. Li R, Ruan S, Bao J, Li Y, Wu Y, Hong L, Zheng Y (2020) Efficient path query processing over massive trajectories on the cloud. IEEE Trans Big Data 6(1):66–79

    Article  Google Scholar 

  27. Apache: phoenix. [EB/OL]. https://phoenix.apache.org (2020)

  28. Gupta S, Kumar R, Lu K, Moseley B, Vassilvitskii S (2017) Local search methods for k-means with outliers. Proceed VLDB Endowm 10(7):757–768

    Article  Google Scholar 

  29. Schubert E, Sander J, Ester M, Kriegel HP, Xu X (2017) Dbscan revisited, revisited: why and how you should (still) use dbscan. ACM Trans Datab Sys (TODS) 42(3):1–21

    Article  MathSciNet  Google Scholar 

  30. Yang Y, Cai J, Yang H, Zhang J, Zhao X (2020) TAD: a trajectory clustering algorithm based on spatial-temporal density analysis. Expert Sys Appl. https://doi.org/10.1016/j.eswa.2019.112846

    Article  Google Scholar 

  31. Von Luxburg U (2007) A tutorial on spectral clustering. Statist Comput 17(4):395–416

    Article  MathSciNet  Google Scholar 

  32. Guo N, Xiong W, Wu Y, Chen L, Jing N (2019) A geographic meshing and coding method based on adaptive hilbert-geohash. IEEE Access 7:39815–39825

    Article  Google Scholar 

  33. Wang C, Huang Y, Shao M, Hu Q, Chen D (2019) Feature selection based on neighborhood self-information. IEEE Trans Cybern 50(9):4031–4042

    Article  Google Scholar 

  34. Bag S, Kumar SK, Tiwari MK (2019) An efficient recommendation generation using relevant jaccard similarity. Info Sci 483(1):53–64

    Article  Google Scholar 

  35. de Matthews AGG., Hensman J, Turner R, Ghahramani Z (2016) On sparse variational methods and the kullback-leibler divergence between stochastic processes. In: Artificial Intelligence and Statistics, pp. 231–239 PMLR

  36. Xu H, Zeng W, Zhang D, Zeng X (2019) Moea/hd: a multiobjective evolutionary algorithm based on hierarchical decomposition. IEEE Trans Cyber 49(2):517–526. https://doi.org/10.1109/TCYB.2017.2779450

    Article  Google Scholar 

  37. Apache: HBase. [EB/OL]. https://hbase.apache.org/ (2020)

  38. Arnold J, Glavic B, Raicu I (2019) A high-performance distributed relational database system for scalable OLAP processing. IPDPS, 738–748

  39. InfluxData: InfluxDB. https://www.influxdata.com/products/ (2020)

  40. Yuan J, Zheng Y, Zhang C, Xie W, Xie X, Sun G, Huang Y (2010) T-drive: driving directions based on taxi trajectories. In: Proceedings of the 18th SIGSPATIAL International conference on advances in geographic information systems, pp. 99–108

  41. Yuan J, Zheng Y, Xie X, Sun G (2011) Driving with knowledge from the physical world. In: Proceedings of the 17th ACM SIGKDD International conference on knowledge discovery and data mining, pp. 316–324

  42. Yue M, Kang C, Andris C, Qin K, Liu Y, Meng Q (2018) Understanding the interplay between bus, metro, and cab ridership dynamics in shenzhen, china. Trans GIS 22(3):855–871

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported in part by Key-Area Research and Development Program of Guangdong Province (No. 2020B010164002), and National Natural Science Foundation of China (No. 61672513).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dai, H., Wang, Y. & Xu, C. Osprey: a heterogeneous search framework for spatial-temporal similarity. Computing 104, 1949–1975 (2022). https://doi.org/10.1007/s00607-022-01075-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-022-01075-4

Keywords

Mathematics Subject Classification

Navigation