Sim4CV: A Photo-Realistic Simulator for Computer Vision Applications

Müller, Matthias; Casser, Vincent; Lahoud, Jean; Smith, Neil; Ghanem, Bernard

doi:10.1007/s11263-018-1073-7

Sim4CV: A Photo-Realistic Simulator for Computer Vision Applications

Published: 24 March 2018

Volume 126, pages 902–919, (2018)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Matthias Müller ORCID: orcid.org/0000-0001-5249-8734¹,
Vincent Casser¹,
Jean Lahoud¹,
Neil Smith¹ &
…
Bernard Ghanem¹

3799 Accesses
93 Citations
15 Altmetric
Explore all metrics

Abstract

We present a photo-realistic training and evaluation simulator (Sim4CV) (http://www.sim4cv.org) with extensive applications across various fields of computer vision. Built on top of the Unreal Engine, the simulator integrates full featured physics based cars, unmanned aerial vehicles (UAVs), and animated human actors in diverse urban and suburban 3D environments. We demonstrate the versatility of the simulator with two case studies: autonomous UAV-based tracking of moving objects and autonomous driving using supervised learning. The simulator fully integrates both several state-of-the-art tracking algorithms with a benchmark evaluation tool and a deep neural network architecture for training vehicles to drive autonomously. It generates synthetic photo-realistic datasets with automatic ground truth annotations to easily extend existing real-world datasets and provides extensive synthetic data variety through its ability to reconfigure synthetic worlds on the fly using an automatic world generation tool.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recent Advances in Unmanned Aerial Vehicles: A Review

Article 25 April 2022

Faiyaz Ahmed, J. C. Mohanta, … Pankaj Singh Yadav

3D Object Detection for Autonomous Driving: A Comprehensive Survey

Article 27 April 2023

Jiageng Mao, Shaoshuai Shi, … Hongsheng Li

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

References

Andersson, O., Wzorek, M., & Doherty, P. (2017). Deep learning quadcopter control via risk-aware active learning. In Thirty-first AAAI conference on artificial intelligence (AAAI), San Francisco, February 4–9, Accepted.
Babenko, B., Yang, M. H., & Belongie, S. (2010). Visual tracking with online multiple instance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8), 1619–1632. https://doi.org/10.1109/TPAMI.2010.226.
Article Google Scholar
Battaglia, P. W., Hamrick, J. B., & Tenenbaum, J. B. (2013). Simulation as an engine of physical scene understanding. Proceedings of the National Academy of Sciences, 110(45), 18,327–18,332. https://doi.org/10.1073/pnas.1306572110, http://www.pnas.org/content/110/45/18327.abstract, http://www.pnas.org/content/110/45/18327.full.pdf.
Bojarski, M., Testa, D. D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., et al. (2016). End to end learning for self-driving cars. arXiv:1604.07316.
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., et al. (2016). Openai gym. arXiv:1606.01540.
Chen, C., Seff, A., Kornhauser, A., & Xiao, J. (2015). Deepdriving: Learning affordance for direct perception in autonomous driving. In Proceedings of the 2015 IEEE international conference on computer vision (ICCV), IEEE Computer Society, Washington, DC, USA, ICCV ’15 (pp. 2722–2730). https://doi.org/10.1109/ICCV.2015.312.
Collins, R., Zhou, X., & Teh, S. K. (2005). An open source tracking testbed and evaluation web site. In IEEE international workshop on performance evaluation of tracking and surveillance (PETS 2005), January 2005.
Danelljan, M., Hager, G., Shahbaz Khan, F., & Felsberg, M. (2015). Learning spatially regularized correlation filters for visual tracking. In The IEEE international conference on computer vision (ICCV)
Danelljan, M., Robinson, A., Shahbaz Khan, F., & Felsberg, M. (2016). Beyond correlation filters: Learning continuous convolution operators for visual tracking (pp. 472–488). Cham: Springer. https://doi.org/10.1007/978-3-319-46454-1_29.
Google Scholar
De Souza, C., Gaidon, A., Cabon, Y., & Lopez Pena, A.(2017). Procedural generation of videos to train deep action recognition networks. In IEEE conference on computer vision and pattern recognition (CVPR).
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., & Koltun, V. (2017). CARLA: An open urban driving simulator. In Proceedings of the 1st annual conference on robot learning (pp. 1–16).
Fu, C., Carrio, A., Olivares-Mendez, M., Suarez-Fernandez, R., & Campoy, P. (2014). Robust real-time vision-based aircraft tracking from unmanned aerial vehicles. In 2014 ieee international conference on robotics and automation (ICRA) (pp. 5441–5446). https://doi.org/10.1109/ICRA.2014.6907659.
Furrer, F., Burri, M., Achtelik, M., & Siegwart, R. (2016). RotorS—A modular gazebo MAV simulator framework (Vol. 625, pp. 595–625)., Studies in computational intelligence Cham: Springer.
Google Scholar
Gaidon, A., Wang, Q., Cabon, Y., & Vig, E. (2016). Virtual worlds as proxy for multi-object tracking analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4340–4349).
Gaszczak, A., Breckon, TP., & Han, J. (2011). Real-time people and vehicle detection from UAV imagery. In J. Röning, D. P. Casasent, & E. L. Hall (Eds.), IST/SPIE electronic imaging, international society for optics and photonics (Vol. 7878, pp. 78,780B-1-13). https://doi.org/10.1117/12.876663.
Ha, S., & Liu, C. K. (2014). Iterative training of dynamic skills inspired by human coaching techniques. ACM Transactions on Graphics, 34(1), 1:1–1:11. https://doi.org/10.1145/2682626.
Article Google Scholar
Hamalainen, P., Eriksson, S., Tanskanen, E., Kyrki, V., & Lehtinen, J. (2014). Online motion synthesis using sequential monte carlo. ACM Transactions on Graphics, 33(4), 51:1–52:12. https://doi.org/10.1145/2601097.2601218.
Article MATH Google Scholar
Hamalainen, P., Rajamaki, J., & Liu, C. K. (2015). Online control of simulated humanoids using particle belief propagation. ACM Transactions on Graphics, 34(4), 81:1–81:13. https://doi.org/10.1145/2767002.
Article MATH Google Scholar
Hejrati, M., & Ramanan, D. (2014). Analysis by synthesis: 3D object recognition by object reconstruction. In 2014 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2449–2456). https://doi.org/10.1109/CVPR.2014.314.
Ju, E., Won, J., Lee, J., Choi, B., Noh, J., & Choi, M. G. (2013). Data-driven control of flapping flight. ACM Transactions on Graphics, 32(5), 151:1–151:12. https://doi.org/10.1145/2516971.2516976.
Article Google Scholar
Kendall, A., Salvapantula, N., & Stol, K. (2014). On-board object tracking control of a quadcopter with monocular vision. In 2014 international conference on unmanned aircraft systems (ICUAS) (pp. 404–411). https://doi.org/10.1109/ICUAS.2014.6842280.
Kim, D. K., & Chen, T. (2015). Deep neural network for real-time autonomous indoor navigation. arXiv:1511.04668.
Koutník, J., Cuccu, G., Schmidhuber, J., & Gomez, F. (2013). Evolving large-scale neural networks for vision-based reinforcement learning. In Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation, ACM, New York, NY, USA, GECCO ’13 (pp. 1061–1068). https://doi.org/10.1145/2463372.2463509.
Koutník, J., Schmidhuber, J., & Gomez, F. (2014). Online evolution of deep convolutional network for vision-based reinforcement learning (pp. 260–269). Cham: Springer. https://doi.org/10.1007/978-3-319-08864-8_25.
Google Scholar
Kristan, M., Pflugfelder, R., Leonardis, A., Matas, J., Čehovin, L., Nebehay, G., et al. (2014). The visual object tracking vot2014 challenge results. In Computer Vision—ECCV 2014 Workshops (pp. 191–217). Springer.
Lerer, A., Gross, S., & Fergus, R. (2016). Learning physical intuition of block towers by example. ArXiv:1603.01312v1.
Li, A., Lin, M., Wu, Y., Yang, M. H., & Yan, S. (2016). NUS-PRO: A new visual tracking challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2), 335–349. https://doi.org/10.1109/TPAMI.2015.2417577.
Article Google Scholar
Liang, P., Blasch, E., & Ling, H. (2015). Encoding color information for visual tracking: Algorithms and benchmark. IEEE Transactions on Image Processing, 24(12), 5630–5644. https://doi.org/10.1109/TIP.2015.2482905.
Article MathSciNet Google Scholar
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., et al. (2016). Continuous control with deep reinforcement learning. arXiv:1509.02971.
Lim, H., & Sinha, S. N. (2015). Monocular localization of a moving person onboard a quadrotor mav. In 2015 IEEE international conference on robotics and automation (ICRA) (pp. 2182–2189). https://doi.org/10.1109/ICRA.2015.7139487.
Marín, J., Vázquez, D., Gerónimo, D., & López, A. M. (2010). Learning appearance in virtual scenarios for pedestrian detection. In 2010 IEEE computer society conference on computer vision and pattern recognition (pp. 137–144). https://doi.org/10.1109/CVPR.2010.5540218.
Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., et al. (2016). Asynchronous methods for deep reinforcement learning. In International conference on machine learning (pp. 1928–1937).
Movshovitz-Attias, Y., Sheikh, Y., Naresh Boddeti, V., & Wei, Z. (2014). 3D pose-by-detection of vehicles via discriminatively reduced ensembles of correlation filters. In Proceedings of the British machine vision conference. BMVA Press. https://doi.org/10.5244/C.28.53.
Mueller, M., Sharma, G., Smith, N., & Ghanem, B. (2016a). Persistent aerial tracking system for UAVs. In 2016 IEEE/RSJ international conference intelligent robots and systems (IROS).
Mueller, M., Smith, N., & Ghanem, B. (2016b). A Benchmark and simulator for UAV tracking (pp. 445–461). Cham: Springer. https://doi.org/10.1007/978-3-319-46448-0_27.
Google Scholar
Mueller, M., Smith, N., & Ghanem, B. (2017). Context-aware correlation filter tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).
Muller, U., Ben, J., Cosatto, E., Flepp, B., & Cun, Y. L. (2006). Off-road obstacle avoidance through end-to-end learning. In Y. Weiss, P. B. Schölkopf, & J. C. Platt (Eds.), Advances in neural information processing systems (Vol. 18, pp. 739–746). MIT Press. http://papers.nips.cc/paper/2847-off-road-obstacle-avoidance-through-end-to-end-learning.pdf. Accessed 1 June 2017.
Naseer, T., Sturm, J., & Cremers, D. (2013). Followme: Person following and gesture recognition with a quadrocopter. In 2013 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 624–630). https://doi.org/10.1109/IROS.2013.6696416.
Nussberger, A., Grabner, H., & Van Gool, L. (2014). Aerial object tracking from an airborne platform. In 2014 international conference on unmanned aircraft systems (ICUAS) (pp. 1284–1293). https://doi.org/10.1109/ICUAS.2014.6842386.
Papon, J., & Schoeler, M. (2015). Semantic pose using deep networks trained on synthetic RGB-D. arXiv:1508.00835.
Pepik, B., Stark, M., Gehler, P., & Schiele, B. (2012). Teaching 3D geometry to deformable part models. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3362–3369). https://doi.org/10.1109/CVPR.2012.6248075.
Pestana, J., Sanchez-Lopez, J., Campoy, P., & Saripalli, S. (2013). Vision based GPS-denied object tracking and following for unmanned aerial vehicles. In 2013 IEEE international symposium on safety, security, and rescue robotics (SSRR) (pp. 1–6). https://doi.org/10.1109/SSRR.2013.6719359.
Pollard, T., & Antone, M. (2012). Detecting and tracking all moving objects in wide-area aerial video. In 2012 IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW) (pp. 15–22). https://doi.org/10.1109/CVPRW.2012.6239201.
Pomerleau, D. A. (1989). ALVINN: An autonomous land vehicle in a neural network. In D. S. Touretzky (Ed.), Advances in neural information processing systems (Vol. 1, pp. 305–313). Morgan-Kaufmann. http://papers.nips.cc/paper/95-alvinn-an-autonomous-land-vehicle-in-a-neural-network.pdf. Accessed 1 June 2017.
Portmann, J., Lynen, S., Chli, M., & Siegwart, R. (2014). People detection and tracking from aerial thermal views. In 2014 IEEE international conference on robotics and automation (ICRA) (pp. 1794–1800). https://doi.org/10.1109/ICRA.2014.6907094.
Prabowo, Y. A., Trilaksono, B. R., & Triputra, F. R. (2015). Hardware in-the-loop simulation for visual servoing of fixed wing UAV. In 2015 international conference on electrical engineering and informatics (ICEEI) (pp. 247–252). https://doi.org/10.1109/ICEEI.2015.7352505.
Prokaj, J., & Medioni, G. (2014). Persistent tracking for wide area aerial surveillance. In 2014 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1186–1193). https://doi.org/10.1109/CVPR.2014.155.
Qadir, A., Neubert, J., Semke, W., & Schultz, R. (2011). On-board visual tracking with unmanned aircraft system (UAS), American Institute of Aeronautics and Astronautics, chap on-board visual tracking with unmanned aircraft system (UAS). Infotech@Aerospace Conferences. https://doi.org/10.2514/6.2011-1503.
Qiu, W., Zhong, F., Zhang, Y., Qiao, S., Xiao, Z., Kim, T. S., et al. (2017). Unrealcv: Virtual worlds for computer vision. In ACM multimedia open source software competition.
Richter, S. R., Vineet, V., Roth, S., & Koltun, V. (2016). Playing for data: Ground truth from computer games (pp. 102–118). Cham: Springer. https://doi.org/10.1007/978-3-319-46475_7.
Google Scholar
Ros, G., Sellart, L., Materzynska, J., Vazquez, D., & Lopez, A. (2016). The SYNTHIA Dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In CVPR.
Sadeghi, F., & Levine, S. (2016). CAD2RL: Real single-image flight without a single real image. arXiv:1611.04201.
Shah, S., Dey, D., Lovett, C., & Kapoor, A. (2017). Airsim: High-fidelity visual and physical simulation for autonomous vehicles. In Field and service robotics. arXiv:1705.05065.
Shah, U., Khawad, R., & Krishna, K. M. (2016). Deepfly: Towards complete autonomous navigation of MAVs with monocular camera. In Proceedings of the Tenth Indian conference on computer vision, graphics and image processing, ACM, New York, NY, USA, ICVGIP ’16 (pp. 59:1–59:8). https://doi.org/10.1145/3009977.3010047.
Smeulders, A. W. M., Chu, D. M., Cucchiara, R., Calderara, S., Dehghan, A., & Shah, M. (2014). Visual tracking: An experimental survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7), 1442–1468. https://doi.org/10.1109/TPAMI.2013.230.
Article Google Scholar
Smolyanskiy, N., Kamenev, A., Smith, J., & Birchfield, S. (2017). Toward low-flying autonomous MAV trail navigation using deep neural networks for environmental awareness. arXiv:1705.02550.
Tan, J., Gu, Y., Liu, C. K., & Turk, G. (2014). Learning bicycle stunts. ACM Transactions on Graphics, 33(4), 50:1–50:12. https://doi.org/10.1145/2601097.2601121.
Article MATH Google Scholar
Trilaksono, B. R., Triadhitama, R., Adiprawita, W., Wibowo, A., & Sreenatha, A. (2011). Hardware-in-the-loop simulation for visual target tracking of octorotor UAV. Aircraft Engineering and Aerospace Technology, 83(6), 407–419. https://doi.org/10.1108/00022661111173289.
Article Google Scholar
Wu, Y., Lim, J., & Yang, M. H. (2013). Online object tracking: A benchmark. In 2013 IEEE conference on computer vision and pattern recognition (pp. 2411–2418). IEEE. https://doi.org/10.1109/CVPR.2013.312.
Wymann, B., Dimitrakakis, C., Sumner, A., Espié, E., Guionneau, C., & Coulom, R. (2014). TORCS, the open racing car simulator. http://www.torcs.org. Accessed 1 June 2017.
Zhang, J., Ma, S., & Sclaroff, S. (2014). MEEM: Robust tracking via multiple experts using entropy minimization. In Proceedings of the European conference on computer vision (ECCV).

Download references

Acknowledgements

This work was supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research through the VCC funding.

Author information

Authors and Affiliations

Electrical Engineering, Visual Computing Center, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
Matthias Müller, Vincent Casser, Jean Lahoud, Neil Smith & Bernard Ghanem

Authors

Matthias Müller
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Casser
View author publications
You can also search for this author in PubMed Google Scholar
Jean Lahoud
View author publications
You can also search for this author in PubMed Google Scholar
Neil Smith
View author publications
You can also search for this author in PubMed Google Scholar
Bernard Ghanem
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthias Müller.

Additional information

Communicated by Adrien Gaidon, Florent Perronnin and Antonio Lopez.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 22407 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Müller, M., Casser, V., Lahoud, J. et al. Sim4CV: A Photo-Realistic Simulator for Computer Vision Applications. Int J Comput Vis 126, 902–919 (2018). https://doi.org/10.1007/s11263-018-1073-7

Download citation

Received: 18 July 2017
Accepted: 26 February 2018
Published: 24 March 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s11263-018-1073-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sim4CV: A Photo-Realistic Simulator for Computer Vision Applications

Abstract

Access this article

Similar content being viewed by others

Recent Advances in Unmanned Aerial Vehicles: A Review

3D Object Detection for Autonomous Driving: A Comprehensive Survey

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (mp4 22407 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sim4CV: A Photo-Realistic Simulator for Computer Vision Applications

Abstract

Access this article

Similar content being viewed by others

Recent Advances in Unmanned Aerial Vehicles: A Review

3D Object Detection for Autonomous Driving: A Comprehensive Survey

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (mp4 22407 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation