Skip to main content

TrajSpark: A Scalable and Efficient In-Memory Management System for Big Trajectory Data

  • Conference paper
  • First Online:
Web and Big Data (APWeb-WAIM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10366))

Abstract

The widespread application of mobile positioning devices has generated big trajectory data. Existing disk-based trajectory management systems cannot provide scalable and low latency query services any more. In view of that, we present TrajSpark, a distributed in-memory system to consistently offer efficient management of trajectory data. TrajSpark introduces a new abstraction called IndexTRDD to manage trajectory segments, and exploits a global and local indexing mechanism to accelerate trajectory queries. Furthermore, to alleviate the essential partitioning overhead, it adopts the time-decay model to monitor the change of data distribution and updates the data-partition structure adaptively. This model avoids repartitioning existing data when new batch of data arrives. Extensive experiments of three types of trajectory queries on both real and synthetic dataset demonstrate that the performance of TrajSpark outperforms state-of-the-art systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.xiaojukeji.com/en/taxi.html.

  2. 2.

    http://spark.apache.org/.

  3. 3.

    http://hadoop.apache.org/.

  4. 4.

    http://www.openstack.org/.

References

  1. Aly, A.M., Mahmood, A.R., Hassan, M.S., Aref, W.G., Ouzzani, M., Elmeleegy, H., Qadah, T.: AQWA: adaptive query-workload-aware partitioning of big spatial data. PVLDB 8(13), 2062–2073 (2015)

    Google Scholar 

  2. Botea, V., Mallett, D., Nascimento, M.A., Sander, J.: PIST: an efficient and practical indexing technique for historical spatio-temporal point data. GeoInformatica 12(2), 143–168 (2008)

    Article  Google Scholar 

  3. Chakka, V.P., Everspaugh, A.C., Patel, J.M.: Indexing large trajectory data sets with seti, vol. 1001, p. 12. Citeseer (2003)

    Google Scholar 

  4. Cudré-Mauroux, P., Wu, E., Madden, S.: Trajstore: an adaptive storage system for very large trajectory data sets. In: ICDE, pp. 109–120 (2010)

    Google Scholar 

  5. Eldawy, A., Mokbel, M.F.: SpatialHadoop: a MapReduce framework for spatial data. In: ICDE, pp. 1352–1363 (2015)

    Google Scholar 

  6. Huang, S., Wang, B., Zhu, J., Wang, G., Yu, G.: R-hbase: a multi-dimensional indexing framework for cloud computing environment. In: ICDM, pp. 569–574 (2014)

    Google Scholar 

  7. Hughes, J.N., Annex, A., Eichelberger, C.N., Fox, A., Hulbert, A., Ronquest, M.: Geomesa: a distributed architecture for spatio-temporal fusion. In: SPIE Defense+ Security, p. 94730F (2015)

    Google Scholar 

  8. Lange, R., Dürr, F., Rothermel, K.: Scalable processing of trajectory-based queries in space-partitioned moving objects databases. In: SIGSPATIAL, p. 31 (2008)

    Google Scholar 

  9. Liu, H., Jin, C., Zhou, A.: Popular route planning with travel cost estimation. In: Navathe, S.B., Wu, W., Shekhar, S., Du, X., Wang, X.S., Xiong, H. (eds.) DASFAA 2016. LNCS, vol. 9643, pp. 403–418. Springer, Cham (2016). doi:10.1007/978-3-319-32049-6_25

    Chapter  Google Scholar 

  10. Ma, Q., Yang, B., Qian, W., Zhou, A.: Query processing of massive trajectory data based on mapreduce. In: CIKM, pp. 9–16 (2009)

    Google Scholar 

  11. Nishimura, S., Das, S., Agrawal, D., El Abbadi, A.: MD-hbase: design and implementation of an elastic data infrastructure for cloud-scale location services. DPD 31(2), 289–319 (2013)

    Google Scholar 

  12. Österreicher, F., Vajda, I.: A new class of metric divergences on probability spaces and its applicability in statistics. AISM 55(3), 639–653 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  13. Tan, H., Luo, W., Ni, L.M.: Clost: a hadoop-based storage system for big spatio-temporal data analytics. In: CIKM, pp. 2139–2143 (2012)

    Google Scholar 

  14. Tang, M., Yu, Y., Malluhi, Q.M., Ouzzani, M., Aref, W.G.: LocationSpark: a distributed in-memory data management system for big spatial data. PVLDB 9(13), 1565–1568 (2016)

    Google Scholar 

  15. Tzoumas, K., Yiu, M.L., Jensen, C.S.: OceanST: a distributed analytic system for large-scale spatiotemporal mobile broadband data. PVLDB 7, 1561–1564 (2014)

    Google Scholar 

  16. Wang, H., Zheng, K., Zhou, X., Sadiq, S.W.: SharkDB: an in-memory storage system for massive trajectory data. In: SIGMOD, pp. 1099–1104 (2015)

    Google Scholar 

  17. Xie, D., Li, F., Yao, B., Li, G., Zhou, L., Guo, M.: Simba: efficient in-memory spatial analytics. In: SIGMOD, pp. 1071–1085 (2016)

    Google Scholar 

  18. Xie, X., Mei, B., Chen, J., Du, X., Jensen, C.S.: Elite: an elastic infrastructure for big spatiotemporal trajectories. VLDB J. 25(4), 473–493 (2016)

    Article  Google Scholar 

  19. You, S., Zhang, J., Gruenwald, L.: Large-scale spatial join query processing in cloud. In: ICDE Workshops, pp. 34–41 (2015)

    Google Scholar 

  20. Yu, J., Wu, J., Sarwat, M.: Geospark: a cluster computing framework for processing large-scale spatial data. In: SIGSPATIAL, pp. 70:1–70:4 (2015)

    Google Scholar 

Download references

Acknowledgement

This paper is supported by the National Key Research and Development Program of China (2016YFB1000905), NSFC (61370101, 61532021, U1501252, U1401256 and 61402180), Shanghai Knowledge Service Platform Project (No. ZF1213).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cheqing Jin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Zhang, Z., Jin, C., Mao, J., Yang, X., Zhou, A. (2017). TrajSpark: A Scalable and Efficient In-Memory Management System for Big Trajectory Data. In: Chen, L., Jensen, C., Shahabi, C., Yang, X., Lian, X. (eds) Web and Big Data. APWeb-WAIM 2017. Lecture Notes in Computer Science(), vol 10366. Springer, Cham. https://doi.org/10.1007/978-3-319-63579-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63579-8_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63578-1

  • Online ISBN: 978-3-319-63579-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics