Integration of Probabilistic Pose Estimates from Multiple Views

  • Özgür Erkent
  • Dadhichi Shukla
  • Justus Piater
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9911)


We propose an approach to multi-view object detection and pose estimation that considers combinations of single-view estimates. It can be used with most existing single-view pose estimation systems, and can produce improved results even if the individual pose estimates are incoherent. The method is introduced in the context of an existing, probabilistic, view-based detection and pose estimation method (PAPE), which we here extend to incorporate diverse attributes of the scene. We tested the multiview approach with RGB-D cameras in different environments containing several cluttered test scenes and various textured and textureless objects. The results show that the accuracies of object detection and pose estimation increase significantly over single-view PAPE and over other multiple-view integration methods.


Pose estimation Object recognition Multiple cameras 



The research leading to this work has received funding from the European Community’s Seventh Framework Programme FP7/2007-2013 (Specific Programme Cooperation, Theme 3, Information and Communication Technologies) under grant agreement no. 610878, 3rd HAND.


  1. 1.
    Aldoma, A., Thomas, F., Vincze, M.: Automation of ground truth annotation for multi-view RGB-D object instance recognition datasets. In: IEEE International Conference on Intelligent Robots and Systems. pp. 5016–5023 (2014)Google Scholar
  2. 2.
    Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J., Rother, C.: Learning 6D object pose estimation using 3D object coordinates. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 536–551. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10605-2_35 Google Scholar
  3. 3.
    Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 6, 679–698 (1986)CrossRefGoogle Scholar
  4. 4.
    Coates, A., Ng, A.Y.: Multi-camera object detection for robotics. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 412–419 (2010)Google Scholar
  5. 5.
    Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge-a retrospective. Int. J. Comput. Vis. 111, 98–136 (2014)CrossRefGoogle Scholar
  6. 6.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)CrossRefGoogle Scholar
  7. 7.
    Franzel, T., Schmidt, U., Roth, S.: Object detection in multi-view X-Ray images. In: Pinz, A., Pock, T., Bischof, H., Leberl, F. (eds.) DAGM/OAGM 2012. LNCS, vol. 7476, pp. 144–154. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-32717-9_15 Google Scholar
  8. 8.
    Gall, J., Yao, A., Razavi, N., Van Gool, L., Lempitsky, V.: Hough forests for object detection, tracking, and action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 33(11), 2188–2202 (2011)CrossRefGoogle Scholar
  9. 9.
    Helmer, S., Meger, D., Muja, M., Little, J.J., Lowe, D.G.: Multiple viewpoint recognition and localization. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010. LNCS, vol. 6492, pp. 464–477. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-19315-6_36 CrossRefGoogle Scholar
  10. 10.
    Hinterstoisser, S., Holzer, S., Cagniart, C., Ilic, S., Konolige, K., Navab, N., Lepetit, V.: Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 858–865. IEEE (2011)Google Scholar
  11. 11.
    Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski, G., Konolige, K., Navab, N.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-37331-2_42 Google Scholar
  12. 12.
    Izadi, S., Davison, A., Fitzgibbon, A., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P., Shotton, J., Hodges, S., Freeman, D.: Kinect fusion: real-time 3D reconstruction and interaction using a moving depth camera. In: Proceedings of the 24th annual ACM symposium on User interface software and technology - UIST 2011, p. 559 (2011)Google Scholar
  13. 13.
    Liu, M.Y., Tuzel, O., Veeraraghavan, A., Chellappa, R.: Fast directional chamfer matching. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1696–1703. IEEE (2010)Google Scholar
  14. 14.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRefGoogle Scholar
  15. 15.
    Mustafa, W., Pugeault, N., Kruger, N.: Multi-view object recognition using view-point invariant shape relations and appearance information. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 4230–4237 (2013)Google Scholar
  16. 16.
    Papazov, C., Burschka, D.: An efficient RANSAC for 3D object recognition in noisy and occluded scenes. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010. LNCS, vol. 6492, pp. 135–148. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-19315-6_11 CrossRefGoogle Scholar
  17. 17.
    Roig, G., Boix, X., Shitrit, H.B., Fua, P.: Conditional random fields for multi-camera object detection. In: 2011 International Conference on Computer Vision, pp. 563–570, September 2011Google Scholar
  18. 18.
    Rusu, R.B., Bradski, G., Thibaux, R., Hsu, J.: Fast 3D recognition and pose using the viewpoint feature histogram. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2155–2162 (2010)Google Scholar
  19. 19.
    Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Inc., New York (1983)zbMATHGoogle Scholar
  20. 20.
    Susanto, W., Rohrbach, M., Schiele, B.: 3D object detection with multiple kinects. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012. LNCS, vol. 7584, pp. 93–102. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33868-7_10 Google Scholar
  21. 21.
    Tejani, A., Tang, D., Kouskouridas, R., Kim, T.k.: Latent-class hough forests for 3D object detection and pose estimation. In: European Conference on Computer Vision, pp. 462–477 (2014)Google Scholar
  22. 22.
    Teney, D., Piater, J.: Multiview feature distributions for object detection and continuous pose estimation. Comput. Vis. Image Underst. 125, 265–282 (2014). CrossRefGoogle Scholar
  23. 23.
    Tombari, F., Salti, S., Stefano, L.: Unique signatures of histograms for local surface description. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6313, pp. 356–369. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-15558-1_26 CrossRefGoogle Scholar
  24. 24.
    Vikstén, F., Söderberg, R., Nordberg, K., Perwass, C.: Increasing pose estimation performance using multi-cue integration. In: IEEE International Conference on Robotics and Automation, pp. 3760–3767 (2006)Google Scholar
  25. 25.
    Wohlhart, P., Lepetit, V.: Learning descriptors for object recognition and 3D pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3109–3118 (2015)Google Scholar
  26. 26.
    Yang, A., Maji, S., Christoudias, C., Darrell, T., Malik, J., Sastry, S.: Multiple-view object recognition in smart camera networks. In: Bhanu, B., Ravishankar, C.V., Roy-Chowdhury, A.K., Aghajan, H., Terzopoulos, D. (eds.) Distributed Video Sensor Networks, pp. 55–68. Springer, London (2011)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Özgür Erkent
    • 1
  • Dadhichi Shukla
    • 1
  • Justus Piater
    • 1
  1. 1.Institute of Computer ScienceUniversity of InnsbruckInnsbruckAustria

Personalised recommendations