A Framework for Evaluating 6-DOF Object Trackers

  • Mathieu GaronEmail author
  • Denis Laurendeau
  • Jean-François Lalonde
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11215)


We present a challenging and realistic novel dataset for evaluating 6-DOF object tracking algorithms. Existing datasets show serious limitations—notably, unrealistic synthetic data, or real data with large fiducial markers—preventing the community from obtaining an accurate picture of the state-of-the-art. Using a data acquisition pipeline based on a commercial motion capture system for acquiring accurate ground truth poses of real objects with respect to a Kinect V2 camera, we build a dataset which contains a total of 297 calibrated sequences. They are acquired in three different scenarios to evaluate the performance of trackers: stability, robustness to occlusion and accuracy during challenging interactions between a person and the object. We conduct an extensive study of a deep 6-DOF tracking architecture and determine a set of optimal parameters. We enhance the architecture and the training methodology to train a 6-DOF tracker that can robustly generalize to objects never seen during training, and demonstrate favorable performance compared to previous approaches trained specifically on the objects to track.


3D object tracking Databases Deep learning 



The authors wish to thank Jonathan Gilbert for his help with data acquisition and Sylvain Comtois for the Vicon setup. This work was supported by the NSERC/Creaform Industrial Research Chair on 3D Scanning: CREATION 3D. We gratefully acknowledge the support of Nvidia with the donation of the Tesla K40 and Titan X GPUs used for this research.

Supplementary material

474198_1_En_36_MOESM1_ESM.pdf (8.2 mb)
Supplementary material 1 (pdf 8428 KB)

Supplementary material 2 (mp4 75467 KB)


  1. 1.
    Kehl, W., Tombari, F., Ilic, S., Navab, N.: Real-time 3D model tracking in color and depth on a single CPU core. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  2. 2.
    Tan, D.J., Navab, N., Tombari, F.: Looking beyond the simple scenarios: combining learners and optimizers in 3D temporal tracking. IEEE Trans. Vis. Comput. Graph. 23(11), 2399–2409 (2017)CrossRefGoogle Scholar
  3. 3.
    Garon, M., Lalonde, J.F.: Deep 6-DOF tracking. IEEE Trans. Comput. Graph. Vis. 23(11) (2017)CrossRefGoogle Scholar
  4. 4.
    Choi, C., Christensen, H.I.: RGB-D object tracking: a particle filter approach on GPU. In: International Conference on Intelligent Robots and Systems (2013)Google Scholar
  5. 5.
    Tan, D.J., Tombari, F., Ilic, S., Navab, N.: A versatile learning-based 3D temporal tracker: scalable, robust, online. In: IEEE International Conference on Computer Vision (2015)Google Scholar
  6. 6.
    Krull, A., Michel, F., Brachmann, E., Gumhold, S., Ihrke, S., Rother, C.: 6-DOF model based tracking via object coordinate regression. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9006, pp. 384–399. Springer, Cham (2015). Scholar
  7. 7.
    Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013). Scholar
  8. 8.
    Hodan, T., Haluza, P., Obdržálek, Š., Matas, J., Lourakis, M., Zabulis, X.: T-LESS: an RGB-D dataset for 6D pose estimation of texture-less objects. In: IEEE Winter Conference on Applications of Computer Vision (2017)Google Scholar
  9. 9.
    Tejani, A., Tang, D., Kouskouridas, R., Kim, T.-K.: Latent-class hough forests for 3D object detection and pose estimation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 462–477. Springer, Cham (2014). Scholar
  10. 10.
    Doumanoglou, A., Kouskouridas, R., Malassiotis, S., Kim, T.K.: Recovering 6D object pose and predicting next-best-view in the crowd. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  11. 11.
    Akkaladevi, S., Ankerl, M., Heindl, C., Pichler, A.: Tracking multiple rigid symmetric and non-symmetric objects in real-time using depth data. In: IEEE International Conference on Robotics and Automation (2016)Google Scholar
  12. 12.
    Aldoma, A., Tombari, F., Prankl, J., Richtsfeld, A., Di Stefano, L., Vincze, M.: Multimodal cue integration through hypotheses verification for RGB-D object recognition and 6DOF pose estimation. In: IEEE International Conference on Robotics and Automation, pp. 2104–2111. IEEE (2013)Google Scholar
  13. 13.
    Kwon, J., Choi, M., Park, F.C., Chun, C.: Particle filtering on the Euclidean group: framework and applications. Robotica 25(6), 725–737 (2007)CrossRefGoogle Scholar
  14. 14.
    Chitchian, M., van Amesfoort, A.S., Simonetto, A., Keviczky, T., Sips, H.J.: Adapting particle filter algorithms to many-core architectures. In: 2013 IEEE 27th International Symposium on Parallel and Distributed Processing (IPDPS), pp. 427–438. IEEE (2013)Google Scholar
  15. 15.
    Tan, D.J., Ilic, S.: Multi-forest tracker: a chameleon in tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014)Google Scholar
  16. 16.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar
  17. 17.
    Tjaden, H., Schwanecke, U., Schömer, E.: Real-time monocular pose estimation of 3D objects using temporally consistent local color histograms. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  18. 18.
    Merriaux, P., Dupuis, Y., Boutteau, R., Vasseur, P., Savatier, X.: A study of vicon system positioning performance. Sensors 17(7), 1591 (2017)CrossRefGoogle Scholar
  19. 19.
    Zhang, Z.: A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 22(11), 1330–1334 (2000)CrossRefGoogle Scholar
  20. 20.
    Niehorster, D.C., Li, L., Lappe, M.: The accuracy and precision of position and orientation tracking in the HTC vive virtual reality system for scientific research. i-Perception 8(3) (2017)Google Scholar
  21. 21.
    Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and \(<\)0.5MB model size. arXiv:1602.07360 (2016)
  22. 22.
    Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  23. 23.
    Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs). arXiv preprint arXiv:1511.07289 (2015)
  24. 24.
    Kendall, A., Cipolla, R.: Geometric loss functions for camera pose regression with deep learning. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Université LavalQuebec CityCanada

Personalised recommendations