A framework for parallel map-matching at scale using Spark

  • Douglas Alves PeixotoEmail author
  • Hung Quoc Viet Nguyen
  • Bolong Zheng
  • Xiaofang Zhou


Map-matching is a problem of matching recorded GPS trajectories to a digital representation of the road network. GPS data may be inaccurate and heterogeneous, due to limitations or error on electronic sensors, as well as law restrictions. How to accurately match trajectories to the road map is an important preprocessing step for many real-world applications, such as trajectory data mining, traffic analysis, and routes prediction. However, the high availability of GPS trajectories and map data challenges the scalability of current map-matching algorithms, which are limited for small datasets since they focus only on the accuracy of the matching rather than scalability. Therefore, we propose a distributed parallel framework for efficient and scalable offline map-matching on top of the Spark framework. Spark uses distributed in-memory data storage and the MapReduce paradigm to achieve horizontal scaling and fast computation of large datasets. Spark, however, is still limited for dynamic map-matching, and memory consumption in Spark can be an issue for very large datasets. We develop a framework to allow map-matching on top os Spark, while achieving horizontal scalability, memory-wise usage, and maintaining the accuracy of state-of-the-art matching algorithms by: (1) We combine a sampling-based Quadtree spatial partitioning construction and batch-based computation to achieve horizontal scalability of map-matching, as well as reduce cluster memory usage. (2) We employ a safe spatial-boundary approach to preserve matching accuracy of boundary objects. (3) In addition, a cost function for the distributed map-matching workload is provided in order to tune the framework parameters. Our extensive experiments demonstrate that our framework is efficient and scalable to process map-matching on large-scale data, while keeping matching accuracy and low memory usage.


Map-matching Spark Trajectory Efficiency Scalability 



This research is partially supported by the Brazilian National Council for Scientific and Technological Development (CNPq).


  1. 1.
    Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.: Hadoop-gis: a high performance spatial data warehousing system over mapreduce. VLDB 6, 1009–1020 (2013)Google Scholar
  2. 2.
    Alt, H., Efrat, A., Rote, G., Wenk, C.: Matching planar maps. In: ACM-SIAM Symposium on Discrete Algorithms, pp. 589–598. Society for Industrial and Applied Mathematics (2003)Google Scholar
  3. 3.
    Baig, F., Mehrotra, M., Vo, H., Wang, F., Saltz, J., Kurc, T.: Sparkgis: Efficient comparison and evaluation of algorithm results in tissue image analysis studies. In: VLDB Workshop on Big Graphs Online Querying, pp. 134–146. Springer, New York (2016)Google Scholar
  4. 4.
    Brakatsoulas, S., Pfoser, D., Salas, R., Wenk, C.: On map-matching vehicle tracking data. In: VLDB, pp. 853–864. VLDB Endowment (2005)Google Scholar
  5. 5.
    Chawathe, S.S.: Segment-based map matching. In: IEEE Intelligent Vehicles Symposium, pp. 1190–1197. IEEE (2007)Google Scholar
  6. 6.
    Cho, W., Choi, E.: A GPS trajectory map-matching mechanism with DTG big data on the hbase system. In: Proceedings of the 2015 International Conference on Big Data Applications and Services, pp. 22–29. ACM (2015)Google Scholar
  7. 7.
    Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  8. 8.
    Eldawy, A., Mokbel, M.F.: Spatialhadoop: a mapreduce framework for spatial data. In: ICDE, pp. 1352–1363 (2015)Google Scholar
  9. 9.
    Goh, C.Y., Dauwels, J., Mitrovic, N., Asif, M., Oran, A., Jaillet, P.: Online map-matching based on hidden markov model for real-time traffic sensing applications. In: International Conference on Intelligent Transportation Systems (ITSC), pp. 776–781. IEEE (2012)Google Scholar
  10. 10.
  11. 11.
    Hu, G., Shao, J., Liu, F., Wang, Y., Shen, H.T.: If-matching: towards accurate map-matching with information fusion. TKDE 29(1), 114–127 (2017)Google Scholar
  12. 12.
    Huang, J., Qiao, S., Yu, H., Qie, J., Liu, C.: Parallel map matching on massive vehicle GPS data using mapreduce. In: International Conference on Embedded and Ubiquitous Computing, & International Conference on High Performance Computing and Communications, pp. 1498–1503. IEEE (2013)Google Scholar
  13. 13.
    Javanmard, A., Haridasan, M., Zhang, L.: Multi-track map matching. In: SIGSPATIAL, pp. 394–397. ACM (2012)Google Scholar
  14. 14.
    Kim, S., Kim, J.H.: Adaptive fuzzy-network-based c-measure map-matching algorithm for car navigation system. IEEE Trans. Ind. Electron. 48(2), 432–441 (2001)CrossRefGoogle Scholar
  15. 15.
    Li, Y., Huang, Q., Kerber, M., Zhang, L., Guibas, L.: Large-scale joint map matching of GPS traces. In: SIGSPATIAL, pp. 214–223. ACM (2013)Google Scholar
  16. 16.
    Lou, Y., Zhang, C., Zheng, Y., Xie, X., Wang, W., Huang, Y.: Map-matching for low-sampling-rate GPS trajectories. In: SIGSPATIAL, pp. 352–361. ACM (2009)Google Scholar
  17. 17.
    Newson, P., Krumm, J.: Hidden markov map matching through noise and sparseness. In: SIGSPATIAL, pp. 336–343. ACM (2009)Google Scholar
  18. 18.
  19. 19.
    Pink, O., Hummel, B.: A statistical approach to map matching using road network geometry, topology and vehicular motion constraints. In: International Conference on Intelligent Transportation Systems (ITSC), pp. 862–867. IEEE (2008)Google Scholar
  20. 20.
    Shi, J., Qiu, Y., Minhas, U.F., Jiao, L., Wang, C., Reinwald, B., Özcan, F.: Clash of the titans: Mapreduce vs. spark for large scale data analytics. In: VLDB, pp. 2110–2121 (2015)CrossRefGoogle Scholar
  21. 21.
    Tang, Y., Zhu, A.D., Xiao, X.: An efficient algorithm for mapping vehicle trajectories onto road networks. In: SIGSPATIAL, pp. 601–604. ACM (2012)Google Scholar
  22. 22.
    Tiwari, V.S., Arya, A., Chaturvedi, S.: Framework for horizontal scaling of map matching: using map-reduce. In: International Conference on Information Technology, pp. 30–34. IEEE (2014)Google Scholar
  23. 23.
    Wang, H., Li, J., Hou, Z., Fang, R., Mei, W., Huang, J.: Research on parallelized real-time map matching algorithm for massive GPS data. Clust. Comput. 20(2), 1123–1134 (2017)CrossRefGoogle Scholar
  24. 24.
    Wei, H., Wang, Y., Forman, G., Zhu, Y., Guan, H.: Fast Viterbi map matching with tunable weight functions. In: SIGSPATIAL, pp. 613–616. ACM (2012)Google Scholar
  25. 25.
    Wenk, C., Salas, R., Pfoser, D.: Addressing the need for map-matching speed: Localizing global curve-matching algorithms. In: International Conference on Scientific and Statistical Database Management (SSDBM), pp. 379–388. IEEE (2006)Google Scholar
  26. 26.
    Xia, Y., Liu, Y., Ye, Z., Wu, W., Zhu, M.: Quadtree-based domain decomposition for parallel map-matching on gps data. In: International Conference on Intelligent Transportation Systems (ITSC), pp. 808–813. IEEE (2012)Google Scholar
  27. 27.
    Xie, D., Li, F., Yao, B., Li, G., Zhou, L., Guo, M.: Simba: Efficient in-memory spatial analytics. In: SIGMOD. ACM (2016)Google Scholar
  28. 28.
    You, S., Zhang, J., Gruenwald, L.: Large-scale spatial join query processing in cloud. In: ICDE Workshops, pp. 34–41. IEEE (2015)Google Scholar
  29. 29.
    Yu, J., Wu, J., Sarwat, M.: Geospark: A cluster computing framework for processing large-scale spatial data. In: SIGSPATIAL, p. 70. ACM (2015)Google Scholar
  30. 30.
    Yuan, M., Deng, K., Zeng, J., Li, Y., Ni, B., He, X., Wang, F., Dai, W., Yang, Q.: Oceanst: a distributed analytic system for large-scale spatiotemporal mobile broadband data. VLDB 7(13), 1561–1564 (2014)Google Scholar
  31. 31.
    Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: USENIX Conference on Networked Systems Design and Implementation, p. 2 (2012)Google Scholar
  32. 32.
    Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: USENIX Conference on Hot Topics in Cloud Computing, p. 10 (2010)Google Scholar
  33. 33.
    Zheng, K., Zheng, Y., Xie, X., Zhou, X.: Reducing uncertainty of low-sampling-rate trajectories. In: ICDE, pp. 1144–1155. IEEE (2012)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Douglas Alves Peixoto
    • 1
    Email author
  • Hung Quoc Viet Nguyen
    • 2
  • Bolong Zheng
    • 1
  • Xiaofang Zhou
    • 1
  1. 1.School of Information Technology and Electrical EngineeringThe University of QueenslandBrisbaneAustralia
  2. 2.School of Information and Communication TechnologyGriffith UniversityGold CoastAustralia

Personalised recommendations