Abstract
Top-k most similar trajectories search (k-NN) is frequently used as classification algorithm and recommendation systems in spatial-temporal trajectory databases. However, k-NN trajectories is a complex operation, and a multi-user application should be able to process multiple k-NN trajectories search concurrently in large-scale data in an efficient manner. The k-NN trajectories problem has received plenty of attention, however, state-of-the-art works neither consider in-memory parallel processing of k-NN trajectories nor concurrent queries in distributed environments, or consider parallelization of k-NN search for simpler spatial objects (i.e. 2D points) using MapReduce, but ignore the temporal dimension of spatial-temporal trajectories. In this work we propose a distributed parallel approach for k-NN trajectories search in a multi-user environment using MapReduce in-memory. We propose a space/time data partitioning based on Voronoi diagrams and time pages, named Voronoi Pages, in order to provide both spatial-temporal data organization and process decentralization. In addition, we propose a spatial-temporal index for our partitions to efficiently prune the search space, improve system throughput and scalability. We implemented our solution on top of Spark’s RDD data structure, which provides a thread-safe environment for concurrent MapReduce tasks in main-memory. We perform extensive experiments to demonstrate the performance and scalability of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aji, A., et al.: Hadoop-GIS: a high performance spatial data warehousing system over mapreduce. VLDB 6(11), 1009–1020 (2013)
Akdogan, A.: Voronoi-based geospatial query processing with mapreduce. In Cloud- Com, pp. 9–16. IEEE (2010)
Aly, A.M., et al.: AQWA: adaptive query workload aware partitioning of big spatial data. VLDB 8(13), 2062–2073 (2015)
Bahmani, B., et al.: Scalable k-means++. VLDB 5(7), 622–633 (2012)
Chen, L., Ng, R.: On the marriage of lp-norms, edit distance. In: VLDB, pp. 792–803 (2004)
Chen, L., Özsu, M.T., Oria, V.: Robust, fast similarity search for moving object trajectories. In: SIGMOD, pp. 491–502 (2005)
Dai, J., et al.: Personalized route recommendation using big trajectory data. In: ICDE, pp. 543–554 (2015)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Eldawy, A., Alarabi, L., Mokbel, M.F.: Spatial partitioning techniques in Spatial- Hadoop. VLDB 8(12), 1602–1605 (2015)
Eldawy, A., Mokbel, M.F.: SpatialHadoop: a MapReduce framework for spatial data. In: ICDE, pp. 1352–1363 (2015)
Kolahdouzan, M., Shahabi, C.: Voronoi-based k nearest neighbor search for spatial network databases. In: VLDB, pp. 840–851 (2004)
Li, C., et al.: Processing moving k NN queries using influential neighbor sets. VLDB 8(2), 113–124 (2014)
Lu, P., et al.: ScalaGiST: scalable generalized search trees for mapreduce systems [innovative systems paper]. VLDB 7(14), 1797–1808 (2014)
Lu, W., et al.: Effcient processing of k nearest neighbor joins using mapreduce. VLDB 5(10), 1016–1027 (2012)
Luo, W., et al.: Finding time period-based most frequent path in big trajectory data. In: SIGMOD, pp. 713–724 (2013)
Ma, Q., et al.: Query processing of massive trajectory data based on mapreduce. In: International Workshop on Cloud Data Management, pp. 9–16. ACM (2009)
Okabe, A., et al.: Spatial tessellations: concepts and applications of Voronoi diagrams, vol. 501. Wiley, New York (2009)
Pandis, I., et al.: Data-oriented transaction execution. Proc. VLDB Endowment 3(1–2), 928–939 (2010)
Ranu, S., et al.: Indexing, matching trajectories under inconsistent sampling rates. In: ICDE, pp. 999–1010 (2015)
Scalable and fast top-k most similar trajectories search using MapReduce in-memory. Technical report (2016). https://www.researchgate.net/publication/303487238
Spark-JobServer: https://github.com/spark-jobserver/spark-jobserver
Vlachos, M., Gunopulos, D., Kollios, G.: Discovering similar multidimensional trajectories. In: Agrawal, R., Dittrich, K.R. (eds.) ICDE, pp. 673–684 (2002)
Wang, H., et al.: An effectiveness study on trajectory similarity measures. In: ADC, pp. 13–22 (2013)
Wang, H., et al.: SharkDB: an in-memory column-oriented trajectory storage. In: CIKM, pp. 1409–1418 (2014)
Wang, X., Zhou, X., Lu, S.: Spatiotemporal data modelling, management: a survey. In: TOOLS-Asia, pp. 202–211. IEEE (2000)
Yang, B., Ma, Q., Qian, W., Zhou, A.: TRUSTER: TRajectory data processing on ClUSTERs. In: Zhou, X., Yokota, H., Deng, K., Liu, Q. (eds.) DASFAA 2009. LNCS, vol. 5463, pp. 768–771. Springer, Heidelberg (2009)
Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: USENIX Conference on Networked System Design and Implementation, p. 2 (2012)
Zaharia, M., et al.: Spark: cluster computing with working sets. In: USENIX Conference on Hot Topics in Cloud Computing, p. 10 (2010)
Zamanian, E., Binnig, C., Salama, A.: Locality-aware partitioning in parallel database systems. In: SIGMOD, pp. 17–30 (2015)
Zhang, C., Li, F., Jestes, J.: Effcient parallel kNN joins for large data in MapReduce. In: EDBT, pp. 38–49 (2012)
Zheng, Y., Zhou, X.: Computing with Spatial Trajectories. Springer, New York (2011)
Zhong, Y., et al.: Towards parallel spatial query processing for big spatial data. In: IPDPSW, pp. 2085–2094. IEEE (2012)
Zhou, X., Abel, D.J., Truffet, D.: Data partitioning for parallel spatial join processing. In: Scholl, M., Voisard, A. (eds.) SSD 1997. LNCS, vol. 1262, pp. 178–196. Springer, Heidelberg (1997). doi:10.1007/3-540-63238-7_30
Acknowledgments
This research is partially supported by the Brazilian National Council for Scientific and Technological Development (CNPq).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Peixoto, D.A., Hung, N.Q.V. (2016). Scalable and Fast Top-k Most Similar Trajectories Search Using MapReduce In-Memory. In: Cheema, M., Zhang, W., Chang, L. (eds) Databases Theory and Applications. ADC 2016. Lecture Notes in Computer Science(), vol 9877. Springer, Cham. https://doi.org/10.1007/978-3-319-46922-5_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-46922-5_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46921-8
Online ISBN: 978-3-319-46922-5
eBook Packages: Computer ScienceComputer Science (R0)