Multimedia Tools and Applications

, Volume 78, Issue 12, pp 16129–16158 | Cite as

Automatic generation of video navigation from Google Street View data with car detection and inpainting

  • Yuan-Bang Cheng
  • Chuan-Kai YangEmail author
  • Guan-Chung Chang
  • Teng-Wen Chang


In spite of the existence of numerous navigation tools/systems, Google Street View, offering only a single static image at a time, is still sometimes preferred for the provision of a realistic scene. However, for the sake of navigation, given the starting and ending locations, a navigation video consisting of images obtained from Google Street View service is desired. Several papers have tried to address this issue in some sense; however, there is still much room for further improvement. First, the generation of navigation video is not very smooth, i.e., the transition from one frame to another frame is not properly controlled, thus resulting a potential abrupt change from one scene toward another. Second, the generated video oftentimes contains many undesired vehicles and people, and the removal of these distracting objects would greatly enhance the quality of the navigational video. In this paper, we first make use of HOG and/or Haar features for detecting vehicles and people, and then we have also made some preliminary trials of using Faster R-CNN and Caffe to speed up detecting vehicles and people. Results are demonstrated to prove the effectiveness of our approaches and compared with similar approaches when applicable to show our improvement. In addition, a post-processing tool is also developed to interactively refine the results in case the automatic object detection is not perfect.


Google Earth Google Street View HOG and Exemplar-SVMs HAAR and Adaboost Region Growing Image Inpainting Caffe Faster R-CNN 



We deeply appreciate the precious comments given by many anonymous reviewers. This work was supported in part by the Ministry of Science and Technology of Taiwan under the grants MOST 104-2221-E-011-083-MY2, MOST 105-2218-E-011-005, MOST 105-2218-E-001-001, MOST 106-3114-E-011-003, MOST 106-2221-E-011-148-MY3 and MOST 107-2218-E-011-012.


  1. 1.
    Aaron. CycleVR.
  2. 2.
    Barnes C, Shechtman E, Finkelstein A, Goldman DB (2009) Patchmatch: a randomized correspondence algorithm for structural image editing. In: ACM SIGGRAPH 2009 Papers, SIGGRAPH ’09. ACM, New York, pp 24:1–24:11Google Scholar
  3. 3.
    Bertalmio M, Sapiro G, Caselles V (2000) Image inpainting. In: Ballester C (ed) Proceedings of the 27th annual conference on computer graphics and interactive techniques, SIGGRAPH ’00. ACM Press/Addison-Wesley Publishing Co, New York, pp 417–424Google Scholar
  4. 4.
    Boykov Y, Veksler O, Zabih R (2001) Fast approximate energy minimization via graph cuts. IEEE Trans Pattern Anal Mach Intell 23(11):1222–1239CrossRefGoogle Scholar
  5. 5.
    Chen Y-Y, Ning C, Zhou Y-Y, Wu K-H, Zhang W-W (2014) Pedestrian detection and tracking for counting applications in metro station. Discret Dyn Nat Soc, 2014Google Scholar
  6. 6.
    Chu W-T, Chao Y-C, Chang Y-S (2015) Street sweeper: detecting and removing cars in street view images. Multimed Tools Appl 74(23):10965–10988CrossRefGoogle Scholar
  7. 7.
    Criminisi A, Perez P, Toyama K (2004) Region filling and object removal by exemplar-based image inpainting. IEEE Trans Image Process 13(9):1200–1212CrossRefGoogle Scholar
  8. 8.
    Flynn J, Neulander I, Philbin J, Snavely N (2016) Deepstereo: learning to predict new views from the world’s imagery. volume 2016-January, pp 5515–5524. Las VegasGoogle Scholar
  9. 9.
    Girshick R (2015) Fast r-cnn. In: Proceedings of the 2015 IEEE international conference on computer vision (ICCV), ICCV ’15, pp 1440–1448Google Scholar
  10. 10.
    Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the 2014 IEEE conference on computer vision and pattern recognition, CVPR ’14, pp 580–587Google Scholar
  11. 11.
    Guy R, Truong K (2012) Crossingguard: exploring information content in navigation aids for visually impaired pedestrians. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’12. ACM, New York, pp 405–414Google Scholar
  12. 12.
    Hao D, Feng X, Fan W, Chengxi Y (2015) A fast pedestrians counting method based on haar features and spatio-temporal correlation analysis. In: ACM International conference proceeding series, vol 2015-August, pp 300–303Google Scholar
  13. 13.
    He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37 (9):1904–1916CrossRefGoogle Scholar
  14. 14.
    Huang J-B, Kang SB, Ahuja N, Kopf J (2014) Image completion using planar structure guidance. ACM Trans Graph 33(4):129:1–129:10Google Scholar
  15. 15.
    Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22Nd ACM international conference on multimedia, MM ’14, pp 675–678Google Scholar
  16. 16.
    Kansal S, Jain P (2015) Automatic seed selection algorithm for image segmentation using region growing. Int J Adv Eng Technol 8(3):362–367Google Scholar
  17. 17.
    Kim G, Cho J-S (2012) Vision-based vehicle detection and inter-vehicle distance estimation. In: International conference on control, automation and systems, pp 625–629Google Scholar
  18. 18.
    Kopf J, Chen B, Szeliski R, Cohen M (2010) Street slide: browsing street level imagery. In: ACM SIGGRAPH 2010 papers, SIGGRAPH ’10. ACM, New York, pp 96:1–96:8Google Scholar
  19. 19.
    Li Y, Sun J, Tang C-K (2004) Lazy snapping. In: Shum H-Y (ed) ACM SIGGRAPH 2004 papers, SIGGRAPH ’04. ACM, New York, pp 303–308Google Scholar
  20. 20.
    Liu G, Reda FA, Shih KJ, Wang T, Tao A, Catanzaro B (2018) Image inpainting for irregular holes using partial convolutions. arXiv:1804.07723
  21. 21.
    Malisiewicz T, Gupta A, Efros A (2011) Ensemble of exemplar-svms for object detection and beyond. In: 2011 International conference on computer vision, pp 89–96Google Scholar
  22. 22.
    Malisiewicz T, Shrivastava A, Gupta A, Efros A (2012) Exemplar-svms for visual object detection, label transfer and image retrieval, vol 1 pp lxix–lxxGoogle Scholar
  23. 23.
    Meur OL, Gautier J, Guillemot C (2011) Examplar-based inpainting based on local geometry. In: 2011 18th IEEE international conference on image processing, pp 3401–3404Google Scholar
  24. 24.
    Mortensen EN, Barrett WA (1995) Intelligent scissors for image composition. In: Proceedings of the 22nd annual conference on computer graphics and interactive techniques, SIGGRAPH ’95. ACM, New York, pp 191–198Google Scholar
  25. 25.
    Oliveira MR, Santos VMF (2008) Automatic detection of cars in real roads using haar-like features. In: Proceedings of the 8th Portuguese conference on automatic control (CONTROL2008), CONTROL2008, pp 1–6Google Scholar
  26. 26.
    Peng Y, Xu M, Jin J, Luo S, Zhao G (2011) Cascade-based license plate localization with line segment features and haar-like features. In: Proceedings - 6th international conference on image and graphics, ICIG 2011, pp 1023–1028Google Scholar
  27. 27.
    Prananta E, Pranowo, Budianto D (2016) Gpu cuda accelerated image inpainting using fourth order pde equation. Telkomnika (Telecommunication Computing Electronics and Control) 14(3):1009–1015CrossRefGoogle Scholar
  28. 28.
    Rasmussen M boxcutter.
  29. 29.
    Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149CrossRefGoogle Scholar
  30. 30.
    Rother C, Kolmogorov V, Blake A (2004) Grabcut-interactive foreground extraction using iterated graph cuts. In: ACM SIGGRAPH 2004 Papers, SIGGRAPH ’04, pp 309–314Google Scholar
  31. 31.
    Rybski PE, Huber D, Morris DD, Hoffman R (2010) Visual classification of coarse vehicle orientation using histogram of oriented gradients features. In: 2010 IEEE Intelligent vehicles symposium, pp 921–928Google Scholar
  32. 32.
    Shih F, Cheng S (2005) Automatic seeded region growing for color image segmentation. Image Vis Comput 23(10):877–886CrossRefGoogle Scholar
  33. 33.
    Vincent L (2007) Taking online maps down to street level. Computer 40 (12):118–120CrossRefGoogle Scholar
  34. 34.
    Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001, vol 1, p I–511–I–518Google Scholar
  35. 35.
    Wang J, Agrawala M, Cohen MF (2007) Soft scissors: an interactive tool for realtime high quality matting. In: ACM SIGGRAPH 2007 Papers, SIGGRAPH ’07. ACM, New YorkGoogle Scholar
  36. 36.
    Yang C, Lu X, Lin Z, Shechtman E, Wang O, Li H (2016) High-resolution image inpainting using multi-scale neural patch synthesis. arXiv:1611.09969
  37. 37.
    Yoshimoto Y, Dang TH, Kimura A, Shibata F, Tamura H (2011) Interaction design of 2d/3d map navigation on wall and tabletop displays. In: Proceedings of the ACM international conference on interactive tabletops and surfaces, ITS ’11. ACM, New York , pp 254–255Google Scholar
  38. 38.
    Yu J, Lin Z, Yang J, Shen X, Lu X, Huang TS (2018) Generative image inpainting with contextual attention. arXiv:1801.07892

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Yuan-Bang Cheng
    • 1
  • Chuan-Kai Yang
    • 1
    Email author
  • Guan-Chung Chang
    • 1
  • Teng-Wen Chang
    • 2
  1. 1.Department of Information ManagementNational Taiwan University of Science and TechnologyTaipeiTaiwan
  2. 2.Department of Digital Media DesignNational Yunlin University of Science and TechnologyDouliou CityTaiwan

Personalised recommendations