A framework for parallel map-matching at scale using Spark


Map-matching is a problem of matching recorded GPS trajectories to a digital representation of the road network. GPS data may be inaccurate and heterogeneous, due to limitations or error on electronic sensors, as well as law restrictions. How to accurately match trajectories to the road map is an important preprocessing step for many real-world applications, such as trajectory data mining, traffic analysis, and routes prediction. However, the high availability of GPS trajectories and map data challenges the scalability of current map-matching algorithms, which are limited for small datasets since they focus only on the accuracy of the matching rather than scalability. Therefore, we propose a distributed parallel framework for efficient and scalable offline map-matching on top of the Spark framework. Spark uses distributed in-memory data storage and the MapReduce paradigm to achieve horizontal scaling and fast computation of large datasets. Spark, however, is still limited for dynamic map-matching, and memory consumption in Spark can be an issue for very large datasets. We develop a framework to allow map-matching on top os Spark, while achieving horizontal scalability, memory-wise usage, and maintaining the accuracy of state-of-the-art matching algorithms by: (1) We combine a sampling-based Quadtree spatial partitioning construction and batch-based computation to achieve horizontal scalability of map-matching, as well as reduce cluster memory usage. (2) We employ a safe spatial-boundary approach to preserve matching accuracy of boundary objects. (3) In addition, a cost function for the distributed map-matching workload is provided in order to tune the framework parameters. Our extensive experiments demonstrate that our framework is efficient and scalable to process map-matching on large-scale data, while keeping matching accuracy and low memory usage.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13


  1. 1.



  1. 1.

    Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.: Hadoop-gis: a high performance spatial data warehousing system over mapreduce. VLDB 6, 1009–1020 (2013)

    Google Scholar 

  2. 2.

    Alt, H., Efrat, A., Rote, G., Wenk, C.: Matching planar maps. In: ACM-SIAM Symposium on Discrete Algorithms, pp. 589–598. Society for Industrial and Applied Mathematics (2003)

  3. 3.

    Baig, F., Mehrotra, M., Vo, H., Wang, F., Saltz, J., Kurc, T.: Sparkgis: Efficient comparison and evaluation of algorithm results in tissue image analysis studies. In: VLDB Workshop on Big Graphs Online Querying, pp. 134–146. Springer, New York (2016)

    Google Scholar 

  4. 4.

    Brakatsoulas, S., Pfoser, D., Salas, R., Wenk, C.: On map-matching vehicle tracking data. In: VLDB, pp. 853–864. VLDB Endowment (2005)

  5. 5.

    Chawathe, S.S.: Segment-based map matching. In: IEEE Intelligent Vehicles Symposium, pp. 1190–1197. IEEE (2007)

  6. 6.

    Cho, W., Choi, E.: A GPS trajectory map-matching mechanism with DTG big data on the hbase system. In: Proceedings of the 2015 International Conference on Big Data Applications and Services, pp. 22–29. ACM (2015)

  7. 7.

    Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  8. 8.

    Eldawy, A., Mokbel, M.F.: Spatialhadoop: a mapreduce framework for spatial data. In: ICDE, pp. 1352–1363 (2015)

  9. 9.

    Goh, C.Y., Dauwels, J., Mitrovic, N., Asif, M., Oran, A., Jaillet, P.: Online map-matching based on hidden markov model for real-time traffic sensing applications. In: International Conference on Intelligent Transportation Systems (ITSC), pp. 776–781. IEEE (2012)

  10. 10.

    Hadoop: https://hadoop.apache.org/

  11. 11.

    Hu, G., Shao, J., Liu, F., Wang, Y., Shen, H.T.: If-matching: towards accurate map-matching with information fusion. TKDE 29(1), 114–127 (2017)

    Google Scholar 

  12. 12.

    Huang, J., Qiao, S., Yu, H., Qie, J., Liu, C.: Parallel map matching on massive vehicle GPS data using mapreduce. In: International Conference on Embedded and Ubiquitous Computing, & International Conference on High Performance Computing and Communications, pp. 1498–1503. IEEE (2013)

  13. 13.

    Javanmard, A., Haridasan, M., Zhang, L.: Multi-track map matching. In: SIGSPATIAL, pp. 394–397. ACM (2012)

  14. 14.

    Kim, S., Kim, J.H.: Adaptive fuzzy-network-based c-measure map-matching algorithm for car navigation system. IEEE Trans. Ind. Electron. 48(2), 432–441 (2001)

    Article  Google Scholar 

  15. 15.

    Li, Y., Huang, Q., Kerber, M., Zhang, L., Guibas, L.: Large-scale joint map matching of GPS traces. In: SIGSPATIAL, pp. 214–223. ACM (2013)

  16. 16.

    Lou, Y., Zhang, C., Zheng, Y., Xie, X., Wang, W., Huang, Y.: Map-matching for low-sampling-rate GPS trajectories. In: SIGSPATIAL, pp. 352–361. ACM (2009)

  17. 17.

    Newson, P., Krumm, J.: Hidden markov map matching through noise and sparseness. In: SIGSPATIAL, pp. 336–343. ACM (2009)

  18. 18.

    OpenStreetMap: https://www.openstreetmap.org/

  19. 19.

    Pink, O., Hummel, B.: A statistical approach to map matching using road network geometry, topology and vehicular motion constraints. In: International Conference on Intelligent Transportation Systems (ITSC), pp. 862–867. IEEE (2008)

  20. 20.

    Shi, J., Qiu, Y., Minhas, U.F., Jiao, L., Wang, C., Reinwald, B., Özcan, F.: Clash of the titans: Mapreduce vs. spark for large scale data analytics. In: VLDB, pp. 2110–2121 (2015)

    Article  Google Scholar 

  21. 21.

    Tang, Y., Zhu, A.D., Xiao, X.: An efficient algorithm for mapping vehicle trajectories onto road networks. In: SIGSPATIAL, pp. 601–604. ACM (2012)

  22. 22.

    Tiwari, V.S., Arya, A., Chaturvedi, S.: Framework for horizontal scaling of map matching: using map-reduce. In: International Conference on Information Technology, pp. 30–34. IEEE (2014)

  23. 23.

    Wang, H., Li, J., Hou, Z., Fang, R., Mei, W., Huang, J.: Research on parallelized real-time map matching algorithm for massive GPS data. Clust. Comput. 20(2), 1123–1134 (2017)

    Article  Google Scholar 

  24. 24.

    Wei, H., Wang, Y., Forman, G., Zhu, Y., Guan, H.: Fast Viterbi map matching with tunable weight functions. In: SIGSPATIAL, pp. 613–616. ACM (2012)

  25. 25.

    Wenk, C., Salas, R., Pfoser, D.: Addressing the need for map-matching speed: Localizing global curve-matching algorithms. In: International Conference on Scientific and Statistical Database Management (SSDBM), pp. 379–388. IEEE (2006)

  26. 26.

    Xia, Y., Liu, Y., Ye, Z., Wu, W., Zhu, M.: Quadtree-based domain decomposition for parallel map-matching on gps data. In: International Conference on Intelligent Transportation Systems (ITSC), pp. 808–813. IEEE (2012)

  27. 27.

    Xie, D., Li, F., Yao, B., Li, G., Zhou, L., Guo, M.: Simba: Efficient in-memory spatial analytics. In: SIGMOD. ACM (2016)

  28. 28.

    You, S., Zhang, J., Gruenwald, L.: Large-scale spatial join query processing in cloud. In: ICDE Workshops, pp. 34–41. IEEE (2015)

  29. 29.

    Yu, J., Wu, J., Sarwat, M.: Geospark: A cluster computing framework for processing large-scale spatial data. In: SIGSPATIAL, p. 70. ACM (2015)

  30. 30.

    Yuan, M., Deng, K., Zeng, J., Li, Y., Ni, B., He, X., Wang, F., Dai, W., Yang, Q.: Oceanst: a distributed analytic system for large-scale spatiotemporal mobile broadband data. VLDB 7(13), 1561–1564 (2014)

    Google Scholar 

  31. 31.

    Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: USENIX Conference on Networked Systems Design and Implementation, p. 2 (2012)

  32. 32.

    Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: USENIX Conference on Hot Topics in Cloud Computing, p. 10 (2010)

  33. 33.

    Zheng, K., Zheng, Y., Xie, X., Zhou, X.: Reducing uncertainty of low-sampling-rate trajectories. In: ICDE, pp. 1144–1155. IEEE (2012)

Download references


This research is partially supported by the Brazilian National Council for Scientific and Technological Development (CNPq).

Author information



Corresponding author

Correspondence to Douglas Alves Peixoto.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Alves Peixoto, D., Quoc Viet Nguyen, H., Zheng, B. et al. A framework for parallel map-matching at scale using Spark. Distrib Parallel Databases 37, 697–720 (2019). https://doi.org/10.1007/s10619-018-7254-0

Download citation


  • Map-matching
  • Spark
  • Trajectory
  • Efficiency
  • Scalability