A new benchmark for pose estimation with ground truth from virtual reality

Schlette, Christian; Buch, Anders Glent; Aksoy, Eren Erdal; Steil, Thomas; Papon, Jérémie; Savarimuthu, Thiusius Rajeeth; Wörgötter, Florentin; Krüger, Norbert; Roßmann, Jürgen

doi:10.1007/s11740-014-0552-0

A new benchmark for pose estimation with ground truth from virtual reality

Assembly
Published: 15 May 2014

Volume 8, pages 745–754, (2014)
Cite this article

Production Engineering Aims and scope Submit manuscript

Christian Schlette¹,
Anders Glent Buch²,
Eren Erdal Aksoy³,
Thomas Steil¹,
Jérémie Papon³,
Thiusius Rajeeth Savarimuthu²,
Florentin Wörgötter³,
Norbert Krüger² &
…
Jürgen Roßmann¹

510 Accesses
5 Citations
3 Altmetric
Explore all metrics

Abstract

The development of programming paradigms for industrial assembly currently gets fresh impetus from approaches in human demonstration and programming-by-demonstration. Major low- and mid-level prerequisites for machine vision and learning in these intelligent robotic applications are pose estimation, stereo reconstruction and action recognition. As a basis for the machine vision and learning involved, pose estimation is used for deriving object positions and orientations and thus target frames for robot execution. Our contribution introduces and applies a novel benchmark for typical multi-sensor setups and algorithms in the field of demonstration-based automated assembly. The benchmark platform is equipped with a multi-sensor setup consisting of stereo cameras and depth scanning devices (see Fig. 1). The dimensions and abilities of the platform have been chosen in order to reflect typical manual assembly tasks. Following the eRobotics methodology, a simulatable 3D representation of this platform was modelled in virtual reality. Based on a detailed camera and sensor simulation, we generated a set of benchmark images and point clouds with controlled levels of noise as well as ground truth data such as object positions and time stamps. We demonstrate the application of the benchmark to evaluate our latest developments in pose estimation, stereo reconstruction and action recognition and publish the benchmark data for objective comparison of sensor setups and algorithms in industry.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-camera 3D Object Reconstruction for Industrial Automation

Teach it Yourself - Fast Modeling of Industrial Objects for 6D Pose Estimation

Usability Study of Learning-Based Pose Estimation of Industrial Objects from Synthetic Depth Data

Notes

It is well-known that multiple Kinect sensors sharing a common field of view will cause IR interference, resulting in poor depth reconstructions. A known solution, which our platform also incorporates, is the use of vibrating motors mounted on the Kinect sensors [9]. This method has been shown to effectively blur out the noisy contributions of external sensors, while maintaining a high depth reconstruction quality.

References

Aksoy EE, Abramov A, Wörgötter F, Dellen B (2010) Categorizing object-action relations from semantic scene graphs. In: IEEE international conference on robotics and automation (ICRA), pp 398–405
Aksoy EE, Abramov A, Dörr J, Ning K, Dellen B, Wörgötter F (2011) Learning the semantics of object-action relations by observation. Int J Rob Res 30(10):1229–1249
Article Google Scholar
Aldoma A, Tombari F, Di Stefano L, Vincze M (2012) A global hypotheses verification method for 3d object recognition. In: European conference on computer vision (ECCV), Springer, pp 511–524
Badler N (1975) Temporal scene analysis: conceptual descriptions of object movements. PhD thesis, University of Toronto, Canada
Besl PJ, McKay ND (1992) A method for registration of 3-d shapes. IEEE Trans Pattern Anal Mach Intell 14(2):239–256
Article Google Scholar
Billard A, Calinon S, Guenter F (2006) Discriminative and adaptive imitation in uni-manual and bi-manual tasks. Rob Auton Syst 54(5):370–384
Article Google Scholar
Bradski G, Kaehler A (2008) Learning OpenCV: Computer vision with the OpenCV library. O'Reilly Media
Buch AG, Kraft D, Kamarainen JK, Petersen HG, Krüger N (2013) Pose estimation using local structure-specific shape and appearance context. In: IEEE international conference on robotics and automation (ICRA), pp 2080–2087
Butler DA, Izadi S, Hilliges O, Molyneaux D, Hodges S, Kim D (2012) Shake’n’sense: reducing interference for overlapping structured light depth cameras. In: Proceedings of the 2012 ACM annual conference on human factors in computing systems, ACM, pp 1933–1936
Collins K, Palmer AJ, Rathmill K (1985) The development of a European benchmark for the comparison of assembly robot programming systems. In: Robot technology and applications (Robotics Europe Conference), pp 187–199
Coppelia Robotics (2014) V-REP. http://www.coppeliarobotics.com/
Cos S, Uwaerts D, Hermans L (2006) Evaluation of STAR250 and STAR1000 CMOS image sensors. In: ESA international conference of guidance, navigation and control systems, pp 1–6
El-Laithy RA, Huang J, Yeh M (2012) Study on the use of microsoft kinect for robotics applications. In: IEEE symposium on position location and navigation (PLANS), pp 1280–1288
Emde M, Rossmann J (2013) Validating a simulation of a single ray based laser scanner used in mobile robot applications. In: International symposium on robotic and sensors environments (ROSE), pp 55–60
Farrell K, Okincha M, Parmar M (2008) Sensor calibration and simulation. In: SPIE 6817, Digital Photography IV, pp 1–9
Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Comm ACM 24(6):381–395
Article MathSciNet Google Scholar
Glover J, Popovic S (2013) Bingham procrustean alignment for object detection in clutter. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 2158–2165
Gupta A, Davis LS (2007) Objects in action: An approach for combining action understanding and object perception. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1–8
Hartley RI, Zisserman A (2004) Multiple view geometry in computer vision. Cambridge University Press, Cambridge
Book MATH Google Scholar
Herrera C, Kannala J (2012) Joint depth and color camera calibration with distortion correction. IEEE Trans Pattern Anal Mach Intell 34(10):2058–2064
Article Google Scholar
Hue C, Le Cadre JP, Perez P (2002) Tracking multiple objects with particle filtering. IEEE Trans Aerosp Electron Syst 38(3):791–812
Article Google Scholar
Ikeuchi K, Suehiro T (1994) Toward an assembly plan from observation, part I: task recognition with polyhedral objects. IEEE Trans Rob Autom 10(3):368–385
Article Google Scholar
Isard M, Blake A (1998) CONDENSATION: conditional density propagation for visual tracking. Int J Comput Vis 29(1):5–28
Article Google Scholar
Khan Z, Balch T, Dellaert F (2005) MCMC-based particle filtering for tracking a variable number of interacting targets. IEEE Trans Pattern Anal Mach Intell 27(11):1805–1819
Article Google Scholar
Koppula HS, Gupta R, Saxena A (2013) Learning human activities and object affordances from RGB-D videos. Int J Rob Res 32(8):951–970
Article Google Scholar
Lai K, Bo L, Ren X, Fox D (2011) A large-scale hierarchical multi-view RGB-D object dataset. In: IEEE international conference on robotics and automation (ICRA), pp 1817–1824
Martinez D, Alenya G, Jimenez P, Torras C, Rossmann J, Wantia N, Aksoy EE, Haller S, Piater J (2014) Active learning of manipulation sequences. In: IEEE international conference on robotics and automation (ICRA)
Mian AS, Bennamoun M, Owens R (2006) Three-dimensional model-based object recognition and segmentation in cluttered scenes. IEEE Trans Pattern Anal Mach Intell 28(10):1584–1601
Article Google Scholar
Microsoft (2012) Kinect for windows SDK version 1.5. http://www.kinectforwindows.org/
Microsoft (2014) Microsoft robotics developer studio 4. http://www.microsoft.com/en-us/download/details.aspx?id=29081
Open Source Robotics Foundation (2014) GazeboSim. http://gazebosim.org/
Papon J, Kulvicius T, Aksoy EE, Wörgötter F (2013) Point cloud video object segmentation using a persistent supervoxel world-model. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 3712–3718
Pardowitz M, Knoop S, Dillmann R, Zöllner RD (2007) Incremental learning of tasks from user demonstrations, past experiences, and vocal comments. IEEE Trans Syst Man Cybern B Cybern 37(2):322–332
Article Google Scholar
Point Grey (2011) Bumblebee2 stereo camera. http://ww2.ptgrey.com/stereo-vision/bumblebee-2
Roßmann J (2012) eRobotics: the symbiosis of advanced robotics and virtual reality technologies. In: ASME international design in engineering technical Conference and computers and information in engineering (IDETC/CIE), vol 2, pp 1395–1402
Roßmann J, Ruf H, Schlette C (2009) Model-based programming ’by demonstration’— fast setup of robot systems. In: Advances in robotics research: Theory, implementation, application. Springer, pp 159–168
Roßmann J, Schlette C, Ruf H (2010) A tool kit of new model-based methods for programming industrial robots. In: IASTED international conference on robotics and applications (RA), pp 379–385
Roßmann J, Hempe N, Emde M, Steil T (2012a) A real-time optical sensor simulation framework for development and testing of industrial and mobile robot applications. In: German conference in robotics (ROBOTIK), pp 1–6
Roßmann J, Schlette C, Wantia N (2012b) Virtual reality providing ground truth for machine learning and programming by demonstration. In: ASME international design in engineering technical conference computers and information in engineering (IDETC/CIE), pp 1501–1508
Roßmann J, Steil T, Springer M (2012c) Validating the camera and light simulation of a virtual space robotics testbed by means of physical mockup data. In: International symposium on artificial intelligence, robotics and automation in space (i-SAIRAS), pp 1–6
Rusu RB, Cousins S (2011) 3d is here: Point Cloud Library (PCL). In: IEEE international conference on robotics and automation (ICRA)
Schou C, Carøe CF, Hvilshøj M, Damgaard JS, Bøgh S, Madsen O (2012) Human assisted instructing of autonomous industrial mobile manipulator and its qualitative assessment. In: AAU workshop on human-centered robotics, pp 22–28
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: International conference on pattern recognition (ICPR), vol 3, pp 32–36
Sridhar M, Cohn GA, Hogg D (2008) Learning functional object-categories from a relational spatio-temporal representation. In: European conference on artificial intelligence (ECAI), pp 606–610
Summers-Stay D, Teo CL, Yang Y, Fermüller C, Aloimonos Y (2012) Using a minimal action grammar for activity understanding in the real world. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 4104–4111
Thomas U, Wahl FM (2001) A system for automatic planning, evaluation and execution of assembly sequences for industrial robots. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)), pp 1458–1464
Tombari F, Salti S, Di Stefano L (2010) Unique signatures of histograms for local surface description. In: European conference on computer vision (ECCV), Springer, pp 356–369
Vermaak J, Godsill S, Perez P (2005) Monte Carlo filtering for multi target tracking and data association. IEEE Trans Aerosp Electron Syst 41(1):309–332
Article Google Scholar
Yang Y, Fermüller C, Aloimonos Y (2013) Detection of manipulation action consequences (MAC). In: International conference on computer vision and pattern recognition (CVPR), pp 2563–2570
Zhang Z (2012) Microsoft kinect sensor and its effect. IEEE MultiMed 19(2):4–10
Article Google Scholar

Download references

Acknowledgments

The research leading to these results has received funding from the European Communities Seventh Framework Programme FP7/2007-2013 (Specific Programme Cooperation, Theme 3, Information and Communication Technologies) under Grant Agreement No. 269959, IntellAct.

Author information

Authors and Affiliations

Institute for Man-Machine Interaction (MMI), RWTH Aachen University, Aachen, Germany
Christian Schlette, Thomas Steil & Jürgen Roßmann
The Maersk Mc-Kinney Moller Institute (MMMI), University of Southern Denmark, Odense, Denmark
Anders Glent Buch, Thiusius Rajeeth Savarimuthu & Norbert Krüger
Bernstein Center for Computational Neuroscience (BCCN), Georg-August University Göttingen, Göttingen, Germany
Eren Erdal Aksoy, Jérémie Papon & Florentin Wörgötter

Authors

Christian Schlette
View author publications
You can also search for this author in PubMed Google Scholar
Anders Glent Buch
View author publications
You can also search for this author in PubMed Google Scholar
Eren Erdal Aksoy
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Steil
View author publications
You can also search for this author in PubMed Google Scholar
Jérémie Papon
View author publications
You can also search for this author in PubMed Google Scholar
Thiusius Rajeeth Savarimuthu
View author publications
You can also search for this author in PubMed Google Scholar
Florentin Wörgötter
View author publications
You can also search for this author in PubMed Google Scholar
Norbert Krüger
View author publications
You can also search for this author in PubMed Google Scholar
Jürgen Roßmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christian Schlette.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schlette, C., Buch, A.G., Aksoy, E.E. et al. A new benchmark for pose estimation with ground truth from virtual reality. Prod. Eng. Res. Devel. 8, 745–754 (2014). https://doi.org/10.1007/s11740-014-0552-0

Download citation

Received: 10 March 2014
Accepted: 29 April 2014
Published: 15 May 2014
Issue Date: December 2014
DOI: https://doi.org/10.1007/s11740-014-0552-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new benchmark for pose estimation with ground truth from virtual reality

Abstract

Access this article

Similar content being viewed by others

Multi-camera 3D Object Reconstruction for Industrial Automation

Teach it Yourself - Fast Modeling of Industrial Objects for 6D Pose Estimation

Usability Study of Learning-Based Pose Estimation of Industrial Objects from Synthetic Depth Data

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A new benchmark for pose estimation with ground truth from virtual reality

Abstract

Access this article

Similar content being viewed by others

Multi-camera 3D Object Reconstruction for Industrial Automation

Teach it Yourself - Fast Modeling of Industrial Objects for 6D Pose Estimation

Usability Study of Learning-Based Pose Estimation of Industrial Objects from Synthetic Depth Data

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation