Skip to main content

Advertisement

Log in

UnrealROX: an extremely photorealistic virtual reality environment for robotics simulations and synthetic data generation

  • Original Article
  • Published:
Virtual Reality Aims and scope Submit manuscript

Abstract

Data-driven algorithms have surpassed traditional techniques in almost every aspect in robotic vision problems. Such algorithms need vast amounts of quality data to be able to work properly after their training process. Gathering and annotating that sheer amount of data in the real world is a time-consuming and error-prone task. These problems limit scale and quality. Synthetic data generation has become increasingly popular since it is faster to generate and automatic to annotate. However, most of the current datasets and environments lack realism, interactions, and details from the real world. UnrealROX is an environment built over Unreal Engine 4 which aims to reduce that reality gap by leveraging hyperrealistic indoor scenes that are explored by robot agents which also interact with objects in a visually realistic manner in that simulated world. Photorealistic scenes and robots are rendered by Unreal Engine into a virtual reality headset which captures gaze so that a human operator can move the robot and use controllers for the robotic hands; scene information is dumped on a per-frame basis so that it can be reproduced offline to generate raw data and ground truth annotations. This virtual reality environment enables robotic vision researchers to generate realistic and visually plausible data with full ground truth for a wide variety of problems such as class and instance semantic segmentation, object detection, depth estimation, visual grasping, and navigation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Notes

  1. http://gazebosim.org/.

  2. https://github.com/3dperceptionlab/unrealrox.

  3. https://docs.unrealengine.com/en-US/Programming/UnrealArchitecture/Reference/Interfaces.

  4. https://www.softbankrobotics.com/emea/en/robots/pepper.

  5. https://www.rethinkrobotics.com/baxter/.

  6. https://github.com/3dperceptionlab/unrealrox.

References

  • Bhoi A (2019) Monocular depth estimation: a survey. arXiv preprint arXiv:1901-09402

  • Bousmalis K, Irpan A, Wohlhart P, Bai Y, Kelcey M, Kalakrishnan M, Downs L, Ibarz J, Pastor P, Konolige K et al (2017) Using simulation and domain adaptation to improve efficiency of deep robotic grasping. arXiv preprint arXiv:1709.07857

  • Brodeur S, Perez E, Anand A, Golemo F, Celotti L, Strub F, Rouat J, Larochelle H, Courville A (2017) Home: a household multimodal environment. arXiv preprint arXiv:1711.11017

  • Butler DJ, Wulff J, Stanley GB, Black MJ (2012) A naturalistic open source movie for optical flow evaluation. In: Proceedings of the European conference on computer vision (ECCV), pp 611–625

  • Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2650–2658

  • Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. In: Advances in neural information processing systems (NIPS), pp 2366–2374

  • Gaidon A, Wang Q, Cabon Y, Vig E (2016) Virtual worlds as proxy for multi-object tracking analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4340–4349

  • He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision (CVPR), pp 2961–2969

  • Kolve E, Mottaghi R, Gordon D, Zhu Y, Gupta A, Farhadi A (2017) Ai2-thor: an interactive 3d environment for visual ai. arXiv preprint arXiv:1712.05474

  • Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) Deeper depth prediction with fully convolutional residual networks. In: Proceedings of the IEEE conference on 3D vision (3DV), pp 239–248

  • LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436

    Article  Google Scholar 

  • Lenz I, Lee H, Saxena A (2015) Deep learning for detecting robotic grasps. Int J Robot Res 34(4–5):705–724

    Article  Google Scholar 

  • Levine S, Pastor P, Krizhevsky A, Ibarz J, Quillen D (2018) Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. Int J Robot Res 37(4–5):421–436

    Article  Google Scholar 

  • Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3431–3440

  • Looman T (2017) Vr template. https://wiki.unrealengine.com/VR_Template. Accessed 1 Sept 2018

  • Mahler J, Liang J, Niyaz S, Laskey M, Doan R, Liu X, Ojea JA, Goldberg K (2017) Dex-net 2.0: deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. arXiv preprint arXiv:1703.09312

  • McCormac J, Handa A, Leutenegger S, Davison AJ (2016) Scenenet rgb-d: 5m photorealistic images of synthetic indoor trajectories with ground truth. arXiv preprint arXiv:1612.05079

  • Oculus (2017a) Distance grab sample now available in oculus unity sample framework. https://developer.oculus.com/blog/distance-grab-sample-now-available-in-oculus-unity-sample-framework/. Accessed 1 Sept 2018

  • Oculus (2017b) Oculus first contact. https://www.oculus.com/experiences/rift/1217155751659625/. Accessed 1 Sept 2018

  • Pashevich A, Strudel R, Kalevatykh I, Laptev I, Schmid C (2019) Learning to augment synthetic images for sim2real policy transfer. arXiv preprint arXiv:1903.07740

  • Qiu W, Yuille A (2016) Unrealcv: connecting computer vision to unreal engine. In: Proceedings of the European conference on computer vision (ECCV), pp 909–916

  • Qiu W, Zhong F, Zhang Y, Qiao S, Xiao Z, Kim TS, Wang Y (2017) Unrealcv: virtual worlds for computer vision. In: Proceedings of the 2017 ACM on multimedia conference (ACMMM), pp 1221–1224

  • Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings—30th IEEE conference on computer vision and pattern recognition, CVPR 2017 2017-Janua, pp 6517–6525. https://doi.org/10.1109/CVPR.2017.690

  • Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2016.91

  • Ros G, Sellart L, Materzynska J, Vazquez D, Lopez AM (2016) The synthia dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3234–3243

  • Savva M, Chang AX, Dosovitskiy A, Funkhouser T, Koltun V (2017) Minos: multimodal indoor simulator for navigation in complex environments. arXiv preprint arXiv:1712.03931

  • Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images. In: Proceedings of the European conference on computer vision (ECCV), pp 746–760

  • Tekin B, Sinha SN, Fua P (2018) Real-time seamless single shot 6d object pose prediction. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 292–301. https://doi.org/10.1109/CVPR.2018.00038

  • To T, Tremblay J, McKay D, Yamaguchi Y, Leung K, Balanon A, Cheng J, Birchfield S (2018) NDDS: NVIDIA deep learning dataset synthesizer. https://github.com/NVIDIA/Dataset_Synthesizer

  • Tobin J, Fong R, Ray A, Schneider J, Zaremba W, Abbeel P (2017a) Domain randomization for transferring deep neural networks from simulation to the real world. In: Proceedings of the IEEE international conference on intelligent robots and systems (IROS), pp 23–30

  • Tobin J, Zaremba W, Abbeel P (2017b) Domain randomization and generative models for robotic grasping. arXiv preprint arXiv:1710.06425

  • Tremblay J, Prakash A, Acuna D, Brophy M, Jampani V, Anil C, To T, Cameracci E, Boochoon S, Birchfield S (2018) Training deep networks with synthetic data: bridging the reality gap by domain randomization. arXiv preprint arXiv:1804.06516

  • Ummenhofer B, Zhou H, Uhrig J, Mayer N, Ilg E, Dosovitskiy A, Brox T (2017) Demon: depth and motion network for learning monocular stereo. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5038–5047

  • Xia F, Zamir RA, He ZY, Sax A, Malik J, Savarese S (2018) Gibson env: real-world perception for embodied agents. In: Proceedings of the IEEE computer vision and pattern recognition (CVPR)

  • Xu D, Wang W, Tang H, Liu H, Sebe N, Ricci E (2018) Structured attention guided convolutional neural fields for monocular depth estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3917–3925

  • Yan C, Misra D, Bennnett A, Walsman A, Bisk Y, Artzi Y (2018) Chalet: cornell house agent learning environment. arXiv preprint arXiv:1801.07357

Download references

Acknowledgements

This work has been funded by the Spanish Government TIN2016-76515-R Grant for the COMBAHO project, supported with Feder funds. This work has also been supported by three Spanish national grants for Ph.D. studies (FPU15/04516, FPU17/00166, and ACIF/2018/197), by the University of Alicante Project GRE16-19, and by the Valencian Government Project GV/2018/022. Experiments were made possible by a generous hardware donation from NVIDIA. We would also like to thank Zuria Bauer for her collaboration in the depth estimation experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alberto Garcia-Garcia.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Martinez-Gonzalez, P., Oprea, S., Garcia-Garcia, A. et al. UnrealROX: an extremely photorealistic virtual reality environment for robotics simulations and synthetic data generation. Virtual Reality 24, 271–288 (2020). https://doi.org/10.1007/s10055-019-00399-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10055-019-00399-5

Keywords

Navigation