Skip to main content

Review and analysis of synthetic dataset generation methods and techniques for application in computer vision


Synthetic datasets, for which we propose the term synthsets, are not a novelty but have become a necessity. Although they have been used in computer vision since 1989, helping to solve the problem of collecting a sufficient amount of annotated data for supervised machine learning, intensive development of methods and techniques for their generation belongs to the last decade. Nowadays, the question shifts from whether you should use synthetic datasets to how you should optimally create them. Motivated by the idea of discovering best practices for building synthetic datasets to represent dynamic environments (such as traffic, crowds, and sports), this study provides an overview of existing synthsets in the computer vision domain. We have analyzed the methods and techniques of synthetic datasets generation: from the first low-res generators to the latest generative adversarial training methods, and from the simple techniques for improving realism by adding global noise to those meant for solving domain and distribution gaps. The analysis extracts nine unique but potentially intertwined methods and reveals the synthsets generation diagram, consisting of 17 individual processes that synthset creators should follow and choose from, depending on the specific requirements of their task.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12


  • Abayomi-Alli OO, Damaševičius R, Wieczorek Michałand Woźniak M (2020) Data augmentation using principal component resampling for image recognition by deep learning. In: Rutkowski L, Scherer Rafałand KM, Pedrycz W et al (eds) Artificial intelligence and soft computing. Springer International Publishing, Cham, pp 39–48

    Chapter  Google Scholar 

  • Abu Alhaija H, Mustikovela SK, Mescheder L et al (2018) Augmented reality meets computer vision: efficient data generation for urban driving scenes. Int J Comput Vis 126:961–972.

    Article  Google Scholar 

  • Aranjuelo N, García S, Loyo E et al (2021) Key strategies for synthetic data generation for training intelligent systems based on people detection from omnidirectional cameras. Comput Electr Eng 92:107105.

    Article  Google Scholar 

  • Atapour-Abarghouei A, Breckon TP (2018) Real-time monocular depth estimation using synthetic data with domain adaptation via image style transfer. Proc IEEE Comput Soc Conf Comput vis Pattern Recognit.

    Article  Google Scholar 

  • Baker S, Scharstein D, Lewis JP et al (2011) A database and evaluation methodology for optical flow. Int J Comput Vis 92:1–31.

    Article  Google Scholar 

  • Bargoti S, Underwood J (2017) Deep fruit detection in orchards. In: 2017 IEEE International Conference on Robotics and Automation (ICRA). pp 3626–3633.

  • Barron JL, Fleet DJ, Beauchemin SS (1994) Systems and experiment performance of optical flow techniques. Int J Comput Vis 12:43–77.

    Article  Google Scholar 

  • Burić M, Ivašić-Kos M, Paulin G (2019) Object detection using synthesized data. In: ICT innovations 2019 web proceedings. pp 110–124

  • Butler DJ, Wulff J, Stanley GB, Black MJ (2012) A naturalistic open source movie for optical flow evaluation. In: Fitzgibbon A, Lazebnik S, Perona P et al (eds) Computer vision—ECCV 2012. Springer, Berlin, Heidelberg, pp 611–625

    Chapter  Google Scholar 

  • Cai W, Liu D, Ning X et al (2021) Voxel-based three-view hybrid parallel network for 3D object classification. Displays 69:102076.

    Article  Google Scholar 

  • Carlucci FM, Russo P, Caputo B (2017) A deep representation for depth images from synthetic data. Proc—IEEE Int Conf Robot Autom.

    Article  Google Scholar 

  • Cazzato D, Cimarelli C, Sanchez-Lopez JL et al (2020) A survey of computer vision methods for 2D object detection from unmanned aerial vehicles. J Imaging.

    Article  Google Scholar 

  • Chen W, Wang H, Li Y, et al (2016) Synthesizing training images for boosting human 3D pose estimation. Proc—2016 4th Int Conf 3D Vision, 3DV 2016 479–488.

  • Chen M, Feng A, McCullough K, et al (2020) Generating synthetic photogrammetric data for training deep learning based 3D point cloud segmentation models.

  • Chociej M, Welinder P, Weng L (2019) ORRB—OpenAI remote rendering backend.

  • Courty N, Allain P, Creusot C, Corpetti T (2014) Using the agoraset dataset: assessing for the quality of crowd video analysis methods. Pattern Recognit Lett 44:161–170.

    Article  Google Scholar 

  • Deschaintre V, Aittala M, Durand F et al (2018) Single-image SVBRDF capture with a rendering-aware deep network. ACM Trans Graph.

    Article  Google Scholar 

  • Desurmont X, Hayet JB, Delaigle JF, et al (2006) Trictrac video dataset: Public hdtv synthetic soccer video sequences with ground truth. Work Comput Vis Based Anal Sport Environ 92–100

  • Dosovitskiy A, Ros G, Codevilla F, et al (2017) CARLA: an open urban driving simulator.

  • Dvornik N, Mairal J, Schmid C (2021) On the importance of visual context for data augmentation in scene understanding. IEEE Trans Pattern Anal Mach Intell 43:2014–2028.

    Article  Google Scholar 

  • Dvornik N, Mairal J, Schmid C (2018) Modeling visual context is key to augmenting object detection datasets. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 11216 LNCS: 375–391.

  • Dwibedi D, Misra I, Hebert M (2017) Cut, paste and learn: surprisingly easy synthesis for instance detection. Proc IEEE Int Conf Comput Vis 2017-Octob:1310–1319.

  • Everingham M, Eslami SMA et al (2015) The pascal visual object classes challenge: a retrospective. Int J Comput vis 111:98–136

    Article  Google Scholar 

  • Fisher R (2021) CVonline: Image databases. Accessed 14 Mar 2021

  • Fonder M, Van Droogenbroeck M (2019) Mid-Air: a multi-modal dataset for extremely low altitude drone flights. In: 2019 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW). IEEE, pp 553–562

  • Gaidon A, Wang Q, Cabon Y, Vig E (2016) VirtualWorlds as proxy for multi-object tracking analysis. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 2016-Decem:4340–4349.

  • Garbin SJ, Komogortsev O, Cavin R et al (2020) Dataset for eye tracking on a virtual reality platform. ACM symposium on eye tracking research and applications. ACM, New York, pp 1–10

    Google Scholar 

  • Georgakis G, Mousavian A, Berg AC, Košecká J (2017) Synthesizing training data for object detection in indoor scenes. Robot Sci Syst.

    Article  Google Scholar 

  • Girdhar R, Ramanan D (2019) CATER: a diagnostic dataset for compositional actions and temporal reasoning.

  • Grauman K, Shakhnarovich G, Darrell T (2003) Inferring 3D structure with a statistical image-based shape model. Proc IEEE Int Conf Comput vis 1:641–648.

    Article  Google Scholar 

  • Haltakov V, Unger C, Ilic S (2013) Framework for generation of synthetic ground truth data for driver assistance applications BT—pattern recognition. In: Weickert J, Hein M, Schiele B (eds) Springer. Springer, Heidelberg, pp 323–332

    Google Scholar 

  • Hamarneh G, Gustavsson T (2004) Deformable spatio-temporal shape models: extending active shape models to 2D+time. Image Vis Comput 22:461–470.

    Article  Google Scholar 

  • Handa A, Whelan T, McDonald J, Davison AJ (2014) A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM. Proc—IEEE Int Conf Robot Autom.

    Article  Google Scholar 

  • Hattori H, Boddeti VN, Kitani K, Kanade T (2015) Learning scene-specific pedestrian detectors without real data. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR). pp 3819–3827

  • Henry KM, Pase L, Ramos-Lopez CF et al (2013) PhagoSight: an open-source MATLAB® package for the analysis of fluorescent neutrophil and macrophage migration in a zebrafish model. PLoS ONE.

    Article  Google Scholar 

  • Hinterstoisser S, Lepetit V, Wohlhart P, Konolige K (2019) On pre-trained image features and synthetic images for deep learning. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 11129 LNCS:682–697.

  • Hoeser T, Kuenzer C (2022) SyntEO: synthetic dataset generation for earth observation and deep learning—demonstrated for offshore wind farm detection. ISPRS J Photogramm Remote Sens 189:163–184.

    Article  Google Scholar 

  • Host K, Ivasic-Kos M, Pobar M (2022) Action recognition in handball scenes. In: Arai K (ed) Intelligent computing. Springer International Publishing, Cham, pp 645–656

    Chapter  Google Scholar 

  • Janai J, Güney F, Behl A, Geiger A (2020) Computer vision for autonomous vehicles: problems, datasets and state-of-the-art. Found Trends Comput Graph Vis 12:1–308

    Article  Google Scholar 

  • Jiang C, Qi S, Zhu Y et al (2018) Configurable 3D scene synthesis and 2D image rendering with per-pixel ground truth using stochastic grammars. Int J Comput Vis 126:920–941.

    Article  Google Scholar 

  • Johnson J, Fei-Fei L, Hariharan B, et al (2017) CLEVR: a diagnostic dataset for compositional language and elementary visual reasoning. Proc - 30th IEEE Conf Comput Vis Pattern Recognition, CVPR 2017 2017-Janua:1988–1997.

  • Johnson-Roberson M, Barto C, Mehta R et al (2017) Driving in the matrix: can virtual worlds replace human-generated annotations for real world tasks? Proc—IEEE Int Conf Robot Autom.

    Article  Google Scholar 

  • Kaneva B, Torralba A, Freeman WT (2011) Evaluation of image features using a photorealistic virtual world. Proc IEEE Int Conf Comput Vis.

    Article  Google Scholar 

  • Kar A, Prakash A, Liu MY, et al (2019) Meta-sim: learning to generate synthetic datasets. Proc IEEE Int Conf Comput Vis 2019-Octob:4550–4559.

  • Khan S, Phan B, Salay R, Czarnecki K (2019) CVPR workshops—ProcSy: procedural synthetic dataset generation towards influence factor studies of semantic segmentation networks. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) workshops. pp 88–96

  • Koenig N (2004) Design and use paradigms for Gazebo, an open-source multi-robot simulator. IEEE/RSJ Int Conf Intell Robot Syst 3:2149–2154.

    Article  Google Scholar 

  • Kong F, Huang B, Bradbury K, Malof JM (2020) The synthinel-1 dataset: a collection of high resolution synthetic overhead imagery for building segmentation. Proc—2020 IEEE Winter Conf Appl Comput Vis WACV 2020:1803–1812.

    Article  Google Scholar 

  • Lange D (2020) Synthetic data: a scalable way to train perception systems. Accessed 31 May 2020

  • Larumbe A, Ariz M, Bengoechea JJ et al (2017) Improved strategies for HPE employing learning-by-synthesis approaches. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW). pp 1545–1554

  • Lerer A, Gross S, Fergus R (2016) Learning physical intuition of block towers by example. 33rd Int Conf Mach Learn ICML 2016 1:648–656

  • Li W, Pan CW, Zhang R et al (2019) AADS: Augmented autonomous driving simulation using data-driven algorithms. Sci Robot.

    Article  Google Scholar 

  • Lin T-Y, Maire M, Belongie S et al (2014) Microsoft COCO: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer vision—ECCV 2014. Springer International Publishing, Cham, pp 740–755

    Chapter  Google Scholar 

  • Lin J, Guo X, Shao J et al (2016) A virtual reality platform for dynamic human-scene interaction. SIGGRAPH ASIA 2016 virtual reality meets physical reality: modelling and simulating virtual humans and environments. Association for Computing Machinery, New York

    Google Scholar 

  • Little JJ, Verri A (1989) Analysis of differential and matching methods for optical flow. In: [1989] Proceedings. Workshop on Visual Motion. IEEE Comput. Soc. Press, pp. 173–180

  • Marín J, Vázquez D, Gerónimo D, López AM (2010) Learning appearance in virtual scenarios for pedestrian detection. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit.

    Article  Google Scholar 

  • Mayer N, Ilg E, Fischer P et al (2018) What makes good synthetic training data for learning disparity and optical flow estimation? Int J Comput Vis 126:942–960.

    Article  Google Scholar 

  • Mayer N, Ilg E, Hausser P, et al (2016) A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 2016-Decem:4040–4048.

  • McCormac J, Handa A, Leutenegger S, Davison AJ (2017) SceneNet RGB-D: Can 5M synthetic images beat generic imagenet pre-training on indoor segmentation? Proc IEEE Int Conf Comput Vis 2017-Octob:2697–2706.

  • Mitash C, Bekris KE, Boularias A (2017) A self-supervised learning system for object detection using physics simulation and multi-view pose estimation. IEEE Int Conf Intell Robot Syst 2017-Septe:545–551.

  • Mnih V, Kavukcuoglu K, Silver D, et al (2013) Playing Atari with deep reinforcement learning.

  • Moiseev B, Konev A, Chigorin A, Konushin A (2013) Evaluation of traffic sign recognition methods trained on synthetically generated data. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 8192 LNCS:576–583.

  • Movshovitz-Attias Y, Kanade T, Sheikh Y (2016) How useful is photo-realistic rendering for visual learning? Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 9915 LNCS:202–217.

  • Mueller M, Smith N, Ghanem B (2016) a benchmark and simulator for UAV tracking BT—computer vision—ECCV 2016. In: Leibe B, Matas J, Sebe N, Welling M (eds). Springer International Publishing, Cham, pp 445–461

  • Müller M, Casser V, Lahoud J et al (2018) Sim4CV: a photo-realistic simulator for computer vision applications. Int J Comput Vis 126:902–919.

    Article  Google Scholar 

  • Munea TL, Jembre YZ, Weldegebriel HT et al (2020) The progress of human pose estimation: a survey and taxonomy of models applied in 2D human pose estimation. IEEE Access 8:133330–133348.

    Article  Google Scholar 

  • Nanni L, Paci M, Brahnam S, Lumini A (2021) Comparison of different image data augmentation approaches. J Imaging.

    Article  Google Scholar 

  • Nikolenko SI (2021) Synthetic data for deep learning. Springer International Publishing, Cham

    Book  Google Scholar 

  • Nowruzi FE, Kapoor P, Kolhatkar D, et al (2019) How much real data do we actually need: Analyzing object detection performance using synthetic and real data.

  • Papon J, Schoeler M (2015) Semantic pose using deep networks trained on synthetic RGB-D. Proc IEEE Int Conf Comput Vis 2015 Inter:774–782.

  • Parker SP (2003) McGraw-Hill dictionary of scientific and technical terms, 6th edn. McGraw-Hill Education, New York

    Google Scholar 

  • Patki N, Wedge R, Veeramachaneni K (2016) The synthetic data vault. In: 2016 IEEE international conference on data science and advanced analytics (DSAA). pp 399–410

  • Peng X, Sun B, Ali K, Saenko K (2015) Learning deep object detectors from 3D models. Proc IEEE Int Conf Comput Vis 2015 Inter:1278–1286.

  • Pepik B, Stark M, Gehler P, Schiele B (2012) Teaching 3D geometry to deformable part models. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit.

    Article  Google Scholar 

  • Peris M, Martull S, Maki A, et al (2012) Towards a simulation driven stereo vision system. Proc—Int Conf Pattern Recognit 1038–1042

  • Pishchulin L, Jain A, Wojek C et al (2011) Learning people detection models from few training samples. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit.

    Article  Google Scholar 

  • Pomerleau DA (1989) Alvinn: an autonomous land vehicle in a neural network. Adv Neural Inf Process Syst 1:305–313

    Google Scholar 

  • Prakash A, Boochoon S, Brophy M, et al (2019) Structured domain randomization: bridging the reality gap by context-aware synthetic data. Proc - IEEE Int Conf Robot Autom 2019-May:7249–7255.

  • Qiu W, Yuille A (2016) UnrealCV: Connecting computer vision to unreal engine. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 9915 LNCS:909–916.

  • Queiroz R, Cohen M, Moreira JL et al (2010) Generating facial ground truth with synthetic faces. Proc—23rd SIBGRAPI conf graph patterns images. SIBGRAPI 2010:25–31.

    Article  Google Scholar 

  • Ragheb H, Velastin S, Remagnino P, Ellis T (2008) ViHASi: Virtual human action silhouette data for the performance evaluation of silhouette-based action recognition methods. 2008 2nd ACM/IEEE Int Conf Distrib Smart Cameras, ICDSC 2008.

  • Richardson E, Sela M, Kimmel R (2016) 3D face reconstruction by learning from synthetic data. Proc - 2016 4th Int Conf 3D Vision, 3DV 2016 460–467.

  • Richter SR, Vineet V, Roth S, Koltun V (2016) Playing for data: Ground truth from computer games. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 9906 LNCS:102–118.

  • Richter SR, Hayder Z, Koltun V (2017) Playing for benchmarks.

  • Rivera-Rubio J, Alexiou I, Bharath AA (2015) Appearance-based indoor localization: a comparison of patch descriptor performance. Pattern Recognit Lett 66:109–117.

    Article  Google Scholar 

  • Ros G, Sellart L, Materzynska J, et al (2016) The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 2016-Decem:3234–3243.

  • Rozantsev A, Lepetit V, Fua P (2015) On rendering synthetic images for training an object detector. Comput Vis Image Underst 137:24–37.

    Article  Google Scholar 

  • Rubin DB (1993) Discussion of statistical disclosure limitation. J off Stat 9:461–468

    Google Scholar 

  • Santhosh KK, Dogra DP, Roy PP (2020) Anomaly detection in road traffic using visual surveillance: a survey. ACM Comput Surv.

    Article  Google Scholar 

  • Satkin S, Lin J, Hebert M (2012) Data-driven scene understanding from 3D models. BMVC 2012 - Electron Proc Br Mach Vis Conf 2012 1–11.

  • Savva M, Kadian A, Maksymets O, et al (2019) Habitat: A platform for embodied AI research. Proc IEEE Int Conf Comput Vis 2019-Octob:9338–9346.

  • Saxena A, Driemeyer J, Kearns J, Ng AY (2007) Robotic grasping of novel objects. Adv Neural Inf Process Syst.

    Article  Google Scholar 

  • Shafaei A, Little JJ, Schmidt M (2016) Play and learn: using video games to train computer vision models. Br Mach Vis Conf 2016, BMVC 2016 2016-Septe:26.1–26.13.

  • Shah S, Dey D, Lovett C, Kapoor A (2018) AirSim: high-fidelity visual and physical simulation for autonomous vehicles. 621–635.

  • Sharma S, Beierle C, D’Amico S (2018) Pose estimation for non-cooperative spacecraft rendezvous using convolutional neural networks. In: 2018 IEEE Aerospace Conference. pp 1–12

  • Solovev P, Aliev V, Ostyakov P, et al (2018) Learning state representations in complex systems with multimodal data.

  • Song S, Yu F, Zeng A, et al (2017) Semantic scene completion from a single depth image. Proc - 30th IEEE Conf Comput Vis Pattern Recognition, CVPR 2017 2017-Janua:190–198.

  • De Souza CR, Gaidon A, Cabon Y, López AM (2017) Procedural generation of videos to train deep action recognition networks. Proc - 30th IEEE Conf Comput Vis Pattern Recognition, CVPR 2017 2017-Janua: 2594–2604.

  • Su H, Qi CR, Li Y, Guibas LJ (2015) Render for CNN: viewpoint estimation in images using CNNs trained with rendered 3D model views. Proc IEEE Int Conf Comput Vis 2015 Inter:2686–2694.

  • Sun B, Saenko K (2014) From virtual to reality: fast adaptation of virtual object detectors to real domains. In: proceedings of the British machine vision conference 2014. British Machine Vision Association, pp 82.1–82.12

  • Tarel JP, Hautière N, Cord A et al (2010) Improved visibility of road scene images under heterogeneous fog. IEEE Intell Veh Symp Proc.

    Article  Google Scholar 

  • Taylor GR, Chosak AJ, Brewer PC (2007) OVVV: using virtual worlds to design and evaluate surveillance systems. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit.

    Article  Google Scholar 

  • Temel D, Chen M-H, AlRegib G (2019) Traffic sign detection under challenging conditions: a deeper look into performance variations and spectral characteristics. IEEE Trans Intell Transp Syst.

    Article  Google Scholar 

  • Tian Y, Li X, Wang K, Wang FY (2018) Training and testing object detectors with virtual images. IEEE/CAA J Autom Sin 5:539–546.

    Article  Google Scholar 

  • Tobin J, Fong R, Ray A, et al (2017) Domain randomization for transferring deep neural networks from simulation to the real world. IEEE Int Conf Intell Robot Syst 2017-Septe:23–30.

  • Tosi F, Aleotti F, Ramirez PZ, et al (2020) Distilled semantics for comprehensive scene understanding from videos. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE, pp 4653–4664

  • Tremblay J, Prakash A, Acuna D, et al (2018a) Training deep networks with synthetic data: Bridging the reality gap by domain randomization. IEEE Comput Soc Conf Comput Vis Pattern Recognit Work 2018a-June:1082–1090

  • Tremblay J, To T, Birchfield S (2018b) Falling things: a synthetic dataset for 3D object detection and pose estimation. IEEE Comput Soc Conf Comput Vis Pattern Recognit Work 2018b-June:2119–2122

  • Tremblay J, To T, Sundaralingam B, et al (2018c) Deep object pose estimation for semantic robotic grasping of household objects.

  • Tripathi S, Chandra S, Agrawal A, et al (2019) Learning to generate synthetic data via compositing. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 2019-June:461–470.

  • Tsirikoglou A, Kronander J, Wrenninge M, Unger J (2017) Procedural modeling and physically based rendering for synthetic data generation in automotive applications.

  • Ubbens J, Cieslak M, Prusinkiewicz P, Stavness I (2018) The use of plant models in deep learning: an application to leaf counting in rosette plants. Plant Methods 14:1–10.

    Article  Google Scholar 

  • Vacavant A, Chateau T, Wilhelm A, Lequièvre L (2013) A benchmark dataset for outdoor foreground/background extraction. In: Park J-I, Kim J (eds) Computer vision—ACCV 2012 workshops. Springer, Berlin, Heidelberg, pp 291–300

    Chapter  Google Scholar 

  • Varol G, Romero J, Martin X, et al (2017) Learning from synthetic humans. Proc - 30th IEEE Conf Comput Vis Pattern Recognition, CVPR 2017 2017-Janua:4627–4635.

  • Vazquez D, Lopez AM, Marin J et al (2014) Virtual and real world adaptation for pedestrian detection. IEEE Trans Pattern Anal Mach Intell 36:797–809.

    Article  Google Scholar 

  • Veeravasarapu VSR, Hota RN, Rothkopf C, Visvanathan R (2015) Model validation for vision systems via graphics simulation.

  • Veeravasarapu VSR, Rothkopf C, Ramesh V (2016) Model-driven simulations for deep convolutional neural networks.

  • Veeravasarapu VSR, Rothkopf C, Visvanathan R (2017) Adversarially tuned scene generation. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 6441–6449.

  • Wang K, Shi F, Wang W, et al (2019a) Synthetic data generation and adaption for object detection in smart vending machines.

  • Wang Q, Gao J, Lin W, Yuan Y (2019b) Learning from synthetic data for crowd counting in the wild. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 2019b-June:8190–8199.

  • Wang Q, Zheng S, Yan Q, et al (2019c) IRS: A large synthetic indoor robotics stereo dataset for disparity and surface normal estimation.

  • Wood E, Baltrušaitis T, Morency L-P, et al (2016) Learning an appearance-based gaze estimator from one million synthesised images. In: proceedings of the ninth biennial ACM symposium on eye tracking research & applications. Association for Computing Machinery, New York, NY, USA, pp 131–138

  • Wrenninge M, Unger J (2018) Synscapes: a photorealistic synthetic dataset for street scene parsing.

  • Wu Z, Song S, Khosla A, et al (2014) 3D ShapeNets: a deep representation for volumetric shapes.

  • Zhang Y, Wang C, Wang X et al (2021) FairMOT: on the fairness of detection and re-identification in multiple object tracking. Int J Comput Vis 129:3069–3087.

    Article  Google Scholar 

  • Zhang Y, Song S, Yumer E, et al (2017) Physically-based rendering for indoor scene understanding using convolutional neural networks. Proc - 30th IEEE Conf Comput Vis Pattern Recognition, CVPR 2017 2017-Janua:5057–5065.

  • Zhu Y, Mottaghi R, Kolve E, et al (2017) Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: 2017 IEEE international conference on robotics and automation (ICRA). pp 3357–3364

  • Zimmermann C, Brox T (2017) Learning to estimate 3D hand pose from single RGB images. Proc IEEE Int Conf Comput Vis 2017-Octob:4913–4921.

  • Zioulis N, Karakottas A, Zarpalas D, et al (2019) Spherical view synthesis for self-supervised 360° depth estimation. Proc - 2019 Int Conf 3D Vision, 3DV 2019 690–699.

Download references


This study was funded by Hrvatska Zaklada za Znanost, IP-2016-06-8345, Marina Ivasic-Kos.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Marina Ivasic‐Kos.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 1023 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Paulin, G., Ivasic‐Kos, M. Review and analysis of synthetic dataset generation methods and techniques for application in computer vision. Artif Intell Rev (2023).

Download citation

  • Published:

  • DOI:


  • Computer vision
  • Synthetic dataset
  • Synthset
  • Generation methods