Skip to main content

Scalable and Fast Top-k Most Similar Trajectories Search Using MapReduce In-Memory

  • Conference paper
  • First Online:
Databases Theory and Applications (ADC 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9877))

Included in the following conference series:

Abstract

Top-k most similar trajectories search (k-NN) is frequently used as classification algorithm and recommendation systems in spatial-temporal trajectory databases. However, k-NN trajectories is a complex operation, and a multi-user application should be able to process multiple k-NN trajectories search concurrently in large-scale data in an efficient manner. The k-NN trajectories problem has received plenty of attention, however, state-of-the-art works neither consider in-memory parallel processing of k-NN trajectories nor concurrent queries in distributed environments, or consider parallelization of k-NN search for simpler spatial objects (i.e. 2D points) using MapReduce, but ignore the temporal dimension of spatial-temporal trajectories. In this work we propose a distributed parallel approach for k-NN trajectories search in a multi-user environment using MapReduce in-memory. We propose a space/time data partitioning based on Voronoi diagrams and time pages, named Voronoi Pages, in order to provide both spatial-temporal data organization and process decentralization. In addition, we propose a spatial-temporal index for our partitions to efficiently prune the search space, improve system throughput and scalability. We implemented our solution on top of Spark’s RDD data structure, which provides a thread-safe environment for concurrent MapReduce tasks in main-memory. We perform extensive experiments to demonstrate the performance and scalability of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aji, A., et al.: Hadoop-GIS: a high performance spatial data warehousing system over mapreduce. VLDB 6(11), 1009–1020 (2013)

    Google Scholar 

  2. Akdogan, A.: Voronoi-based geospatial query processing with mapreduce. In Cloud- Com, pp. 9–16. IEEE (2010)

    Google Scholar 

  3. Aly, A.M., et al.: AQWA: adaptive query workload aware partitioning of big spatial data. VLDB 8(13), 2062–2073 (2015)

    MathSciNet  Google Scholar 

  4. Bahmani, B., et al.: Scalable k-means++. VLDB 5(7), 622–633 (2012)

    Google Scholar 

  5. Chen, L., Ng, R.: On the marriage of lp-norms, edit distance. In: VLDB, pp. 792–803 (2004)

    Google Scholar 

  6. Chen, L., Özsu, M.T., Oria, V.: Robust, fast similarity search for moving object trajectories. In: SIGMOD, pp. 491–502 (2005)

    Google Scholar 

  7. Dai, J., et al.: Personalized route recommendation using big trajectory data. In: ICDE, pp. 543–554 (2015)

    Google Scholar 

  8. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  9. Eldawy, A., Alarabi, L., Mokbel, M.F.: Spatial partitioning techniques in Spatial- Hadoop. VLDB 8(12), 1602–1605 (2015)

    Google Scholar 

  10. Eldawy, A., Mokbel, M.F.: SpatialHadoop: a MapReduce framework for spatial data. In: ICDE, pp. 1352–1363 (2015)

    Google Scholar 

  11. Kolahdouzan, M., Shahabi, C.: Voronoi-based k nearest neighbor search for spatial network databases. In: VLDB, pp. 840–851 (2004)

    Google Scholar 

  12. Li, C., et al.: Processing moving k NN queries using influential neighbor sets. VLDB 8(2), 113–124 (2014)

    Google Scholar 

  13. Lu, P., et al.: ScalaGiST: scalable generalized search trees for mapreduce systems [innovative systems paper]. VLDB 7(14), 1797–1808 (2014)

    Google Scholar 

  14. Lu, W., et al.: Effcient processing of k nearest neighbor joins using mapreduce. VLDB 5(10), 1016–1027 (2012)

    Google Scholar 

  15. Luo, W., et al.: Finding time period-based most frequent path in big trajectory data. In: SIGMOD, pp. 713–724 (2013)

    Google Scholar 

  16. Ma, Q., et al.: Query processing of massive trajectory data based on mapreduce. In: International Workshop on Cloud Data Management, pp. 9–16. ACM (2009)

    Google Scholar 

  17. MLlib: http://spark.apache.org/docs/latest/mllib-guide.html

  18. Okabe, A., et al.: Spatial tessellations: concepts and applications of Voronoi diagrams, vol. 501. Wiley, New York (2009)

    Google Scholar 

  19. Pandis, I., et al.: Data-oriented transaction execution. Proc. VLDB Endowment 3(1–2), 928–939 (2010)

    Article  Google Scholar 

  20. Ranu, S., et al.: Indexing, matching trajectories under inconsistent sampling rates. In: ICDE, pp. 999–1010 (2015)

    Google Scholar 

  21. Scalable and fast top-k most similar trajectories search using MapReduce in-memory. Technical report (2016). https://www.researchgate.net/publication/303487238

  22. Spark-JobServer: https://github.com/spark-jobserver/spark-jobserver

  23. Vlachos, M., Gunopulos, D., Kollios, G.: Discovering similar multidimensional trajectories. In: Agrawal, R., Dittrich, K.R. (eds.) ICDE, pp. 673–684 (2002)

    Google Scholar 

  24. Wang, H., et al.: An effectiveness study on trajectory similarity measures. In: ADC, pp. 13–22 (2013)

    Google Scholar 

  25. Wang, H., et al.: SharkDB: an in-memory column-oriented trajectory storage. In: CIKM, pp. 1409–1418 (2014)

    Google Scholar 

  26. Wang, X., Zhou, X., Lu, S.: Spatiotemporal data modelling, management: a survey. In: TOOLS-Asia, pp. 202–211. IEEE (2000)

    Google Scholar 

  27. Yang, B., Ma, Q., Qian, W., Zhou, A.: TRUSTER: TRajectory data processing on ClUSTERs. In: Zhou, X., Yokota, H., Deng, K., Liu, Q. (eds.) DASFAA 2009. LNCS, vol. 5463, pp. 768–771. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  28. Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: USENIX Conference on Networked System Design and Implementation, p. 2 (2012)

    Google Scholar 

  29. Zaharia, M., et al.: Spark: cluster computing with working sets. In: USENIX Conference on Hot Topics in Cloud Computing, p. 10 (2010)

    Google Scholar 

  30. Zamanian, E., Binnig, C., Salama, A.: Locality-aware partitioning in parallel database systems. In: SIGMOD, pp. 17–30 (2015)

    Google Scholar 

  31. Zhang, C., Li, F., Jestes, J.: Effcient parallel kNN joins for large data in MapReduce. In: EDBT, pp. 38–49 (2012)

    Google Scholar 

  32. Zheng, Y., Zhou, X.: Computing with Spatial Trajectories. Springer, New York (2011)

    Book  Google Scholar 

  33. Zhong, Y., et al.: Towards parallel spatial query processing for big spatial data. In: IPDPSW, pp. 2085–2094. IEEE (2012)

    Google Scholar 

  34. Zhou, X., Abel, D.J., Truffet, D.: Data partitioning for parallel spatial join processing. In: Scholl, M., Voisard, A. (eds.) SSD 1997. LNCS, vol. 1262, pp. 178–196. Springer, Heidelberg (1997). doi:10.1007/3-540-63238-7_30

    Chapter  Google Scholar 

Download references

Acknowledgments

This research is partially supported by the Brazilian National Council for Scientific and Technological Development (CNPq).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Douglas Alves Peixoto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Peixoto, D.A., Hung, N.Q.V. (2016). Scalable and Fast Top-k Most Similar Trajectories Search Using MapReduce In-Memory. In: Cheema, M., Zhang, W., Chang, L. (eds) Databases Theory and Applications. ADC 2016. Lecture Notes in Computer Science(), vol 9877. Springer, Cham. https://doi.org/10.1007/978-3-319-46922-5_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46922-5_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46921-8

  • Online ISBN: 978-3-319-46922-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics