In spite of the existence of numerous navigation tools/systems, Google Street View, offering only a single static image at a time, is still sometimes preferred for the provision of a realistic scene. However, for the sake of navigation, given the starting and ending locations, a navigation video consisting of images obtained from Google Street View service is desired. Several papers have tried to address this issue in some sense; however, there is still much room for further improvement. First, the generation of navigation video is not very smooth, i.e., the transition from one frame to another frame is not properly controlled, thus resulting a potential abrupt change from one scene toward another. Second, the generated video oftentimes contains many undesired vehicles and people, and the removal of these distracting objects would greatly enhance the quality of the navigational video. In this paper, we first make use of HOG and/or Haar features for detecting vehicles and people, and then we have also made some preliminary trials of using Faster R-CNN and Caffe to speed up detecting vehicles and people. Results are demonstrated to prove the effectiveness of our approaches and compared with similar approaches when applicable to show our improvement. In addition, a post-processing tool is also developed to interactively refine the results in case the automatic object detection is not perfect.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Aaron. CycleVR. http://www.cyclevr.com
Barnes C, Shechtman E, Finkelstein A, Goldman DB (2009) Patchmatch: a randomized correspondence algorithm for structural image editing. In: ACM SIGGRAPH 2009 Papers, SIGGRAPH ’09. ACM, New York, pp 24:1–24:11
Bertalmio M, Sapiro G, Caselles V (2000) Image inpainting. In: Ballester C (ed) Proceedings of the 27th annual conference on computer graphics and interactive techniques, SIGGRAPH ’00. ACM Press/Addison-Wesley Publishing Co, New York, pp 417–424
Boykov Y, Veksler O, Zabih R (2001) Fast approximate energy minimization via graph cuts. IEEE Trans Pattern Anal Mach Intell 23(11):1222–1239
Chen Y-Y, Ning C, Zhou Y-Y, Wu K-H, Zhang W-W (2014) Pedestrian detection and tracking for counting applications in metro station. Discret Dyn Nat Soc, 2014
Chu W-T, Chao Y-C, Chang Y-S (2015) Street sweeper: detecting and removing cars in street view images. Multimed Tools Appl 74(23):10965–10988
Criminisi A, Perez P, Toyama K (2004) Region filling and object removal by exemplar-based image inpainting. IEEE Trans Image Process 13(9):1200–1212
Flynn J, Neulander I, Philbin J, Snavely N (2016) Deepstereo: learning to predict new views from the world’s imagery. volume 2016-January, pp 5515–5524. Las Vegas
Girshick R (2015) Fast r-cnn. In: Proceedings of the 2015 IEEE international conference on computer vision (ICCV), ICCV ’15, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the 2014 IEEE conference on computer vision and pattern recognition, CVPR ’14, pp 580–587
Guy R, Truong K (2012) Crossingguard: exploring information content in navigation aids for visually impaired pedestrians. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’12. ACM, New York, pp 405–414
Hao D, Feng X, Fan W, Chengxi Y (2015) A fast pedestrians counting method based on haar features and spatio-temporal correlation analysis. In: ACM International conference proceeding series, vol 2015-August, pp 300–303
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37 (9):1904–1916
Huang J-B, Kang SB, Ahuja N, Kopf J (2014) Image completion using planar structure guidance. ACM Trans Graph 33(4):129:1–129:10
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22Nd ACM international conference on multimedia, MM ’14, pp 675–678
Kansal S, Jain P (2015) Automatic seed selection algorithm for image segmentation using region growing. Int J Adv Eng Technol 8(3):362–367
Kim G, Cho J-S (2012) Vision-based vehicle detection and inter-vehicle distance estimation. In: International conference on control, automation and systems, pp 625–629
Kopf J, Chen B, Szeliski R, Cohen M (2010) Street slide: browsing street level imagery. In: ACM SIGGRAPH 2010 papers, SIGGRAPH ’10. ACM, New York, pp 96:1–96:8
Li Y, Sun J, Tang C-K (2004) Lazy snapping. In: Shum H-Y (ed) ACM SIGGRAPH 2004 papers, SIGGRAPH ’04. ACM, New York, pp 303–308
Liu G, Reda FA, Shih KJ, Wang T, Tao A, Catanzaro B (2018) Image inpainting for irregular holes using partial convolutions. arXiv:1804.07723
Malisiewicz T, Gupta A, Efros A (2011) Ensemble of exemplar-svms for object detection and beyond. In: 2011 International conference on computer vision, pp 89–96
Malisiewicz T, Shrivastava A, Gupta A, Efros A (2012) Exemplar-svms for visual object detection, label transfer and image retrieval, vol 1 pp lxix–lxx
Meur OL, Gautier J, Guillemot C (2011) Examplar-based inpainting based on local geometry. In: 2011 18th IEEE international conference on image processing, pp 3401–3404
Mortensen EN, Barrett WA (1995) Intelligent scissors for image composition. In: Proceedings of the 22nd annual conference on computer graphics and interactive techniques, SIGGRAPH ’95. ACM, New York, pp 191–198
Oliveira MR, Santos VMF (2008) Automatic detection of cars in real roads using haar-like features. In: Proceedings of the 8th Portuguese conference on automatic control (CONTROL2008), CONTROL2008, pp 1–6
Peng Y, Xu M, Jin J, Luo S, Zhao G (2011) Cascade-based license plate localization with line segment features and haar-like features. In: Proceedings - 6th international conference on image and graphics, ICIG 2011, pp 1023–1028
Prananta E, Pranowo, Budianto D (2016) Gpu cuda accelerated image inpainting using fourth order pde equation. Telkomnika (Telecommunication Computing Electronics and Control) 14(3):1009–1015
Rasmussen M boxcutter. http://keepnote.org/boxcutter/
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Rother C, Kolmogorov V, Blake A (2004) Grabcut-interactive foreground extraction using iterated graph cuts. In: ACM SIGGRAPH 2004 Papers, SIGGRAPH ’04, pp 309–314
Rybski PE, Huber D, Morris DD, Hoffman R (2010) Visual classification of coarse vehicle orientation using histogram of oriented gradients features. In: 2010 IEEE Intelligent vehicles symposium, pp 921–928
Shih F, Cheng S (2005) Automatic seeded region growing for color image segmentation. Image Vis Comput 23(10):877–886
Vincent L (2007) Taking online maps down to street level. Computer 40 (12):118–120
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001, vol 1, p I–511–I–518
Wang J, Agrawala M, Cohen MF (2007) Soft scissors: an interactive tool for realtime high quality matting. In: ACM SIGGRAPH 2007 Papers, SIGGRAPH ’07. ACM, New York
Yang C, Lu X, Lin Z, Shechtman E, Wang O, Li H (2016) High-resolution image inpainting using multi-scale neural patch synthesis. arXiv:1611.09969
Yoshimoto Y, Dang TH, Kimura A, Shibata F, Tamura H (2011) Interaction design of 2d/3d map navigation on wall and tabletop displays. In: Proceedings of the ACM international conference on interactive tabletops and surfaces, ITS ’11. ACM, New York , pp 254–255
Yu J, Lin Z, Yang J, Shen X, Lu X, Huang TS (2018) Generative image inpainting with contextual attention. arXiv:1801.07892
We deeply appreciate the precious comments given by many anonymous reviewers. This work was supported in part by the Ministry of Science and Technology of Taiwan under the grants MOST 104-2221-E-011-083-MY2, MOST 105-2218-E-011-005, MOST 105-2218-E-001-001, MOST 106-3114-E-011-003, MOST 106-2221-E-011-148-MY3 and MOST 107-2218-E-011-012.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Cheng, Y., Yang, C., Chang, G. et al. Automatic generation of video navigation from Google Street View data with car detection and inpainting. Multimed Tools Appl 78, 16129–16158 (2019). https://doi.org/10.1007/s11042-018-6880-x
- Google Earth
- Google Street View
- HOG and Exemplar-SVMs
- HAAR and Adaboost
- Region Growing
- Image Inpainting
- Faster R-CNN