Skip to main content
Log in

Dragoon: a hybrid and efficient big trajectory management system for offline and online analytics

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

With the explosive use of GPS-enabled devices, increasingly massive volumes of trajectory data capturing the movements of people and vehicles are becoming available, which is useful in many application areas, such as transportation, traffic management, and location-based services. As a result, many trajectory data management and analytic systems have emerged that target either offline or online settings. However, some applications call for both offline and online analyses. For example, in traffic management scenarios, offline analyses of historical trajectory data can be used for traffic planning purposes, while online analyses of streaming trajectories can be adopted for congestion monitoring purposes. Existing trajectory-based systems tend to perform offline and online trajectory analysis separately, which is inefficient. In this paper, we propose a hybrid and efficient framework, called Dragoon, based on Spark, to support both offline and online big trajectory management and analytics. The framework features a mutable resilient distributed dataset model, including RDD Share, RDD Update, and RDD Mirror, which enables hybrid storage of historical and streaming trajectories. It also contains a real-time partitioner capable of efficiently distributing trajectory data and supporting both offline and online analyses. Therefore, Dragoon provides a hybrid analysis pipeline. Support for several typical trajectory queries and mining tasks demonstrates the flexibility of Dragoon. An extensive experimental study using both real and synthetic trajectory datasets shows that Dragoon (1) has similar offline trajectory query performance with the state-of-the-art system UlTraMan; (2) decreases up to doubled storage overhead compared with UlTraMan during trajectory editing; (3) achieves at least 40% improvement of scalability compared with popular streaming processing frameworks (i.e., Flink and Spark Streaming); and (4) offers an average doubled performance improvement for online trajectory data analytics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. https://github.com/OpenHFT/Chronicle-Map.

  2. https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/stream/state/.

  3. https://research.microsoft.com/en-us/projects.

  4. This is a proprietary dataset.

  5. https://iapg.jade-hs.de/personen/brinkhoff/generator/.

References

  1. Apache Hadoop. http://hadoop.apache.org/ (2008)

  2. Apache Samza. http://samza.apache.org/ (2013)

  3. Apache Flink. http://flink.apache.org/ (2014)

  4. Apache Spark. http://spark.apache.org/ (2014)

  5. Apache Storm. http://storm.apache.org/ (2014)

  6. DiDi Brain. https://www.didiglobal.com/science/brain (2018)

  7. Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D.J., Rasin, A., Silberschatz, A.: HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. PVLDB 2(1), 922–933 (2009)

    Google Scholar 

  8. Akidau, T., Bradshaw, R., Chambers, C., Chernyak, S., Fernández-Moctezuma, R., Lax, R., McVeety, S., Mills, D., Perry, F., Schmidt, E., Whittle, S.: The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. PVLDB 8(12), 1792–1803 (2015)

    Google Scholar 

  9. Ali, M., Chandramouli, B., Raman, B.S., Katibah, E.: Real-time spatio-temporal analytics using microsoft streaminsight. In: SIGSPATIAL, pp. 542–543 (2010)

  10. Bao, J., Li, R., Yi, X., Zheng, Y.: Managing massive trajectories on the cloud. In: SIGSPATIAL, pp. 41:1–41:10 (2016)

  11. Boykin, P.O., Ritchie, S., O’Connell, I., Lin, J.J.: Summingbird: a framework for integrating batch and online MapReduce computations. PVLDB 7(13), 1441–1451 (2014)

    Google Scholar 

  12. Brinkhoff, T.: A framework for generating network-based moving objects. GeoInformatica 6(2), 153–180 (2002)

    Article  Google Scholar 

  13. Brunsdon, C., Zheng, Y., Zhou, X.: Computing with spatial trajectories. IJGIS 27(1), 208–209 (2013)

    Google Scholar 

  14. Chen, L., Gao, Y., Fang, Z., Miao, X., Jensen, C.S., Guo, C.: Real-time distributed co-movement pattern detection on streaming trajectories. PVLDB 12(10), 1208–1220 (2019)

    Google Scholar 

  15. Cho, H., Shiokawa, H., Kitagawa, H.: JsFlow: integration of massive streams and batches via JSON-based dataflow algebra. In: NBIS, pp. 188–195 (2016)

  16. Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Elmeleegy, K., Sears, R.: MapReduce online. In: NSDI, pp. 313–328 (2010)

  17. Cudré-Mauroux, P., Wu, E., Madden, S.: TrajStore: an adaptive storage system for very large trajectory data sets. In: ICDE, pp. 109–120 (2010)

  18. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  19. DeWitt, D.J., Halverson, A., Nehme, R.V., Shankar, S., Aguilar-Saborit, J., Avanes, A., Flasza, M., Gramling, J.: Split query processing in polybase. In: SIGMOD, pp. 1255–1266 (2013)

  20. Ding, X., Chen, L., Gao, Y., Jensen, C.S., Bao, H.: UlTraMan: a unified platform for big trajectory data management and analytics. PVLDB 11(7), 787–799 (2018)

    Google Scholar 

  21. Düntgen, C., Behr, T., Güting, R.H.: BerlinMOD: a benchmark for moving object databases. VLDB J. 18(6), 1335–1368 (2009)

    Article  Google Scholar 

  22. Ge, Y., Xiong, H., Zhou, Z., Ozdemir, H.T., Yu, J., Lee, K.C.: Top-eye: top-\(k\) evolving trajectory outlier detection. In: CIKM, pp. 1733–1736 (2010)

  23. Gudmundsson, J., Laube, P., Wolle, T.: Computational Movement Analysis, pp. 423–438. Springer, Berlin (2012)

    Google Scholar 

  24. Hasani, Z., Kon-Popovska, M., Velinov, G.: Lambda architecture for real time big data analytic. In: ICT Innovations, pp. 133–143 (2014)

  25. Kulkarni, S., Bhagat, N., Fu, M., Kedigehalli, V., Kellogg, C., Mittal, S., Patel, J.M., Ramasamy, K., Taneja, S.: Twitter Heron: stream processing at scale. In: SIGMOD, pp. 239–250 (2015)

  26. Kumar, V., Andrade, H., Gedik, B., Wu, K.: DEDUCE: at the intersection of MapReduce and stream processing. In: EDBT, pp. 657–662 (2010)

  27. Leutenegger, S.T., Lopez, M.A., Edgington, J.: STR: a simple and efficient algorithm for R-tree packing. In: ICDE, pp. 497–506 (1997)

  28. Li, R., He, H., Wang, R., Huang, Y., Liu, J., Ruan, S., He, T., Bao, J., Zheng, Y.: Just: Jd urban spatio-temporal data engine. ICDE (2020)

  29. Li, R., He, H., Wang, R., Ruan, S., Sui, Y., Bao, J., Zheng, Y.: Trajmesa: a distributed nosql storage engine for big trajectory data. ICDE (2020)

  30. Li, R., Ruan, S., Bao, J., Li, Y., Wu, Y., Zheng, Y.: Querying massive trajectories by path on the cloud. In: SIGSPATIAL, pp. 77:1–77:4 (2017)

  31. Li, Z., Han, J., Ji, M., Tang, L., Yu, Y., Ding, B., Lee, J., Kays, R.: Movemine: mining moving object data for discovery of animal movement patterns. TIST 2(4), 37:1–37:32 (2011)

    Article  Google Scholar 

  32. Ma, S., Zheng, Y., Wolfson, O.: Real-time city-scale taxi ridesharing. TKDE 27(7), 1782–1795 (2015)

    Google Scholar 

  33. Mahmood, A.R., Punni, S., Aref, W.G.: Spatio-temporal access methods: a survey (2010–2017). GeoInformatica 23(1), 1–36 (2019)

    Article  Google Scholar 

  34. Patroumpas, K., Kefallinou, E., Sellis, T.: Monitoring continuous queries over streaming locations. In: SIGSPATIAL, pp. 41:1–41:10 (2008)

  35. Patroumpas, K., Pelekis, N., Theodoridis, Y.: On-the-fly mobility event detection over aircraft trajectories. In: SIGSPATIAL, pp. 259–268. ACM (2018)

  36. Ruan, S., Li, R., Bao, J., He, T., Zheng, Y.: Cloudtp: a cloud-based flexible trajectory preprocessing framework. In: ICDE, pp. 1601–1604 (2018)

  37. Salmon, L., Ray, C.: Design principles of a stream-based framework for mobility analysis. GeoInformatica 21(2), 237–261 (2017)

    Article  Google Scholar 

  38. Shang, Z., Li, G., Bao, Z.: DITA: distributed in-memory trajectory analytics. In: Das, G., Jermaine, C.M., Bernstein, P.A. (eds.) SIGMOD, pp. 725–740 (2018)

  39. Tan, H., Luo, W., Ni, L.M.: CloST: a hadoop-based storage system for big spatio-temporal data analytics. In: CIKM, pp. 2139–2143 (2012)

  40. Tang, M., Yu, Y., Malluhi, Q.M., Ouzzani, M., Aref, W.G.: Locationspark: a distributed in-memory data management system for big spatial data. PVLDB 9(13), 1565–1568 (2016)

    Google Scholar 

  41. Tao, Y., Papadias, D.: MV3R-tree: a spatio-temporal access method for timestamp and interval queries. In: VLDB, pp. 431–440 (2001)

  42. Wang, H., Zheng, K., Xu, J., Zheng, B., Zhou, X., Sadiq, S.W.: Sharkdb: an in-memory column-oriented trajectory storage. In: CIKM, pp. 1409–1418 (2014)

  43. Wang, L., Cai, R., Fu, T.Z., He, J., Lu, Z., Winslett, M., Zhang, Z.: Waterwheel: realtime indexing and temporal range query processing over massive data streams. In: ICDE, pp. 269–280 (2018)

  44. Wang, W., Yang, J., Muntz, R.R.: STING: a statistical information grid approach to spatial data mining. In: PVLDB, pp. 186–195 (1997)

  45. Wang, Y., Zheng, Y., Xue, Y.: Travel time estimation of a path using sparse trajectories. In: SIGKDD, pp. 25–34 (2014)

  46. Xie, D., Li, F., Phillips, J.M.: Distributed trajectory similarity search. VLDB 10(11), 1478–1489 (2017)

    Google Scholar 

  47. Xie, D., Li, F., Yao, B., Li, G., Zhou, L., Guo, M.: Simba: efficient in-memory spatial analytics. In: SIGMOD, pp. 1071–1085 (2016)

  48. Xie, X., Mei, B., Chen, J., Du, X., Jensen, C.S.: Elite: an elastic infrastructure for big spatiotemporal trajectories. VLDB J. 25(4), 473–493 (2016)

    Article  Google Scholar 

  49. Xu, W., Zhou, K., Yu, Y., Tan, Q., Peng, Q., Guo, B.: Gradient domain editing of deforming mesh sequences. ACM Trans. Graph. 26(3), 84 (2007)

    Article  Google Scholar 

  50. Yang, F., Merlino, G., Ray, N., Léauté, X., Gupta, H., Tschetter, E.: The RADStack: open source lambda architecture for interactive analytics. In: HICSS, pp. 1703–1712 (2017)

  51. Yu, L., Yu, J., Zhang, M., Zhang, X., Liu, Y., Zhang, H., Min, W.: Large scale traffic signal network optimization: a paradigm shift driven by big data. In: ICDE, pp. 1832–1840 (2019)

  52. Yuan, H., Li, G.: Distributed in-memory trajectory similarity search and join on road network. In: ICDE, pp. 1262–1273 (2019)

  53. Yuan, J., Zheng, Y., Xie, X.: Discovering regions of different functions in a city using human mobility and POIs. In: SIGKDD, pp. 186–194 (2012)

  54. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI, pp. 15–28 (2012)

  55. Zhan, X., Zheng, Y., Yi, X., Ukkusuri, S.V.: Citywide traffic volume estimation using trajectory data. TKDE 29(2), 272–285 (2017)

    Google Scholar 

  56. Zhang, M., Wo, T., Lin, X., Xie, T., Liu, Y.: Carstream: an industrial system of big data processing for internet-of-vehicles. PVLDB 10(12), 1766–1777 (2017)

    Google Scholar 

  57. Zheng, Y.: Trajectory data mining: an overview. TIST 6(3), 29:1–29:41 (2015)

    Article  Google Scholar 

  58. Zheng, Y., Capra, L., Wolfson, O., Yang, H.: Urban Computing: Concepts, Methodologies, and Applications. TIST 5(3), 38:1–38:55 (2014)

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the NSFC under Grant Nos. 62025206 and 61972338, the National Key R&D Program of China under Grant No. 2018YFB1004 003, and the NSFC-Zhejiang Joint Fund under Grant No. U1609217. Yunjun Gao is the corresponding author of the work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yunjun Gao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fang, Z., Chen, L., Gao, Y. et al. Dragoon: a hybrid and efficient big trajectory management system for offline and online analytics. The VLDB Journal 30, 287–310 (2021). https://doi.org/10.1007/s00778-021-00652-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-021-00652-x

Keywords

Navigation