Automatic generation of video navigation from Google Street View data with car detection and inpainting


In spite of the existence of numerous navigation tools/systems, Google Street View, offering only a single static image at a time, is still sometimes preferred for the provision of a realistic scene. However, for the sake of navigation, given the starting and ending locations, a navigation video consisting of images obtained from Google Street View service is desired. Several papers have tried to address this issue in some sense; however, there is still much room for further improvement. First, the generation of navigation video is not very smooth, i.e., the transition from one frame to another frame is not properly controlled, thus resulting a potential abrupt change from one scene toward another. Second, the generated video oftentimes contains many undesired vehicles and people, and the removal of these distracting objects would greatly enhance the quality of the navigational video. In this paper, we first make use of HOG and/or Haar features for detecting vehicles and people, and then we have also made some preliminary trials of using Faster R-CNN and Caffe to speed up detecting vehicles and people. Results are demonstrated to prove the effectiveness of our approaches and compared with similar approaches when applicable to show our improvement. In addition, a post-processing tool is also developed to interactively refine the results in case the automatic object detection is not perfect.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21


  1. 1.

    Aaron. CycleVR.

  2. 2.

    Barnes C, Shechtman E, Finkelstein A, Goldman DB (2009) Patchmatch: a randomized correspondence algorithm for structural image editing. In: ACM SIGGRAPH 2009 Papers, SIGGRAPH ’09. ACM, New York, pp 24:1–24:11

  3. 3.

    Bertalmio M, Sapiro G, Caselles V (2000) Image inpainting. In: Ballester C (ed) Proceedings of the 27th annual conference on computer graphics and interactive techniques, SIGGRAPH ’00. ACM Press/Addison-Wesley Publishing Co, New York, pp 417–424

  4. 4.

    Boykov Y, Veksler O, Zabih R (2001) Fast approximate energy minimization via graph cuts. IEEE Trans Pattern Anal Mach Intell 23(11):1222–1239

    Article  Google Scholar 

  5. 5.

    Chen Y-Y, Ning C, Zhou Y-Y, Wu K-H, Zhang W-W (2014) Pedestrian detection and tracking for counting applications in metro station. Discret Dyn Nat Soc, 2014

  6. 6.

    Chu W-T, Chao Y-C, Chang Y-S (2015) Street sweeper: detecting and removing cars in street view images. Multimed Tools Appl 74(23):10965–10988

    Article  Google Scholar 

  7. 7.

    Criminisi A, Perez P, Toyama K (2004) Region filling and object removal by exemplar-based image inpainting. IEEE Trans Image Process 13(9):1200–1212

    Article  Google Scholar 

  8. 8.

    Flynn J, Neulander I, Philbin J, Snavely N (2016) Deepstereo: learning to predict new views from the world’s imagery. volume 2016-January, pp 5515–5524. Las Vegas

  9. 9.

    Girshick R (2015) Fast r-cnn. In: Proceedings of the 2015 IEEE international conference on computer vision (ICCV), ICCV ’15, pp 1440–1448

  10. 10.

    Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the 2014 IEEE conference on computer vision and pattern recognition, CVPR ’14, pp 580–587

  11. 11.

    Guy R, Truong K (2012) Crossingguard: exploring information content in navigation aids for visually impaired pedestrians. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’12. ACM, New York, pp 405–414

  12. 12.

    Hao D, Feng X, Fan W, Chengxi Y (2015) A fast pedestrians counting method based on haar features and spatio-temporal correlation analysis. In: ACM International conference proceeding series, vol 2015-August, pp 300–303

  13. 13.

    He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37 (9):1904–1916

    Article  Google Scholar 

  14. 14.

    Huang J-B, Kang SB, Ahuja N, Kopf J (2014) Image completion using planar structure guidance. ACM Trans Graph 33(4):129:1–129:10

    Google Scholar 

  15. 15.

    Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22Nd ACM international conference on multimedia, MM ’14, pp 675–678

  16. 16.

    Kansal S, Jain P (2015) Automatic seed selection algorithm for image segmentation using region growing. Int J Adv Eng Technol 8(3):362–367

    Google Scholar 

  17. 17.

    Kim G, Cho J-S (2012) Vision-based vehicle detection and inter-vehicle distance estimation. In: International conference on control, automation and systems, pp 625–629

  18. 18.

    Kopf J, Chen B, Szeliski R, Cohen M (2010) Street slide: browsing street level imagery. In: ACM SIGGRAPH 2010 papers, SIGGRAPH ’10. ACM, New York, pp 96:1–96:8

  19. 19.

    Li Y, Sun J, Tang C-K (2004) Lazy snapping. In: Shum H-Y (ed) ACM SIGGRAPH 2004 papers, SIGGRAPH ’04. ACM, New York, pp 303–308

  20. 20.

    Liu G, Reda FA, Shih KJ, Wang T, Tao A, Catanzaro B (2018) Image inpainting for irregular holes using partial convolutions. arXiv:1804.07723

  21. 21.

    Malisiewicz T, Gupta A, Efros A (2011) Ensemble of exemplar-svms for object detection and beyond. In: 2011 International conference on computer vision, pp 89–96

  22. 22.

    Malisiewicz T, Shrivastava A, Gupta A, Efros A (2012) Exemplar-svms for visual object detection, label transfer and image retrieval, vol 1 pp lxix–lxx

  23. 23.

    Meur OL, Gautier J, Guillemot C (2011) Examplar-based inpainting based on local geometry. In: 2011 18th IEEE international conference on image processing, pp 3401–3404

  24. 24.

    Mortensen EN, Barrett WA (1995) Intelligent scissors for image composition. In: Proceedings of the 22nd annual conference on computer graphics and interactive techniques, SIGGRAPH ’95. ACM, New York, pp 191–198

  25. 25.

    Oliveira MR, Santos VMF (2008) Automatic detection of cars in real roads using haar-like features. In: Proceedings of the 8th Portuguese conference on automatic control (CONTROL2008), CONTROL2008, pp 1–6

  26. 26.

    Peng Y, Xu M, Jin J, Luo S, Zhao G (2011) Cascade-based license plate localization with line segment features and haar-like features. In: Proceedings - 6th international conference on image and graphics, ICIG 2011, pp 1023–1028

  27. 27.

    Prananta E, Pranowo, Budianto D (2016) Gpu cuda accelerated image inpainting using fourth order pde equation. Telkomnika (Telecommunication Computing Electronics and Control) 14(3):1009–1015

    Article  Google Scholar 

  28. 28.

    Rasmussen M boxcutter.

  29. 29.

    Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  30. 30.

    Rother C, Kolmogorov V, Blake A (2004) Grabcut-interactive foreground extraction using iterated graph cuts. In: ACM SIGGRAPH 2004 Papers, SIGGRAPH ’04, pp 309–314

  31. 31.

    Rybski PE, Huber D, Morris DD, Hoffman R (2010) Visual classification of coarse vehicle orientation using histogram of oriented gradients features. In: 2010 IEEE Intelligent vehicles symposium, pp 921–928

  32. 32.

    Shih F, Cheng S (2005) Automatic seeded region growing for color image segmentation. Image Vis Comput 23(10):877–886

    Article  Google Scholar 

  33. 33.

    Vincent L (2007) Taking online maps down to street level. Computer 40 (12):118–120

    Article  Google Scholar 

  34. 34.

    Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001, vol 1, p I–511–I–518

  35. 35.

    Wang J, Agrawala M, Cohen MF (2007) Soft scissors: an interactive tool for realtime high quality matting. In: ACM SIGGRAPH 2007 Papers, SIGGRAPH ’07. ACM, New York

  36. 36.

    Yang C, Lu X, Lin Z, Shechtman E, Wang O, Li H (2016) High-resolution image inpainting using multi-scale neural patch synthesis. arXiv:1611.09969

  37. 37.

    Yoshimoto Y, Dang TH, Kimura A, Shibata F, Tamura H (2011) Interaction design of 2d/3d map navigation on wall and tabletop displays. In: Proceedings of the ACM international conference on interactive tabletops and surfaces, ITS ’11. ACM, New York , pp 254–255

  38. 38.

    Yu J, Lin Z, Yang J, Shen X, Lu X, Huang TS (2018) Generative image inpainting with contextual attention. arXiv:1801.07892

Download references


We deeply appreciate the precious comments given by many anonymous reviewers. This work was supported in part by the Ministry of Science and Technology of Taiwan under the grants MOST 104-2221-E-011-083-MY2, MOST 105-2218-E-011-005, MOST 105-2218-E-001-001, MOST 106-3114-E-011-003, MOST 106-2221-E-011-148-MY3 and MOST 107-2218-E-011-012.

Author information



Corresponding author

Correspondence to Chuan-Kai Yang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Cheng, Y., Yang, C., Chang, G. et al. Automatic generation of video navigation from Google Street View data with car detection and inpainting. Multimed Tools Appl 78, 16129–16158 (2019).

Download citation


  • Google Earth
  • Google Street View
  • HOG and Exemplar-SVMs
  • HAAR and Adaboost
  • Region Growing
  • Image Inpainting
  • Caffe
  • Faster R-CNN