Advertisement

Machine Vision and Applications

, Volume 27, Issue 2, pp 193–219 | Cite as

Depth-assisted rectification for real-time object detection and pose estimation

  • João Paulo Silva do Monte LimaEmail author
  • Francisco Paulo Magalhães Simões
  • Hideaki Uchiyama
  • Veronica Teichrieb
  • Eric Marchand
Original Paper

Abstract

RGB-D sensors have become in recent years a product of easy access to general users. They provide both a color image and a depth image of the scene and, besides being used for object modeling, they can also offer important cues for object detection and tracking in real time. In this context, the work presented in this paper investigates the use of consumer RGB-D sensors for object detection and pose estimation from natural features. Two methods based on depth-assisted rectification are proposed, which transform features extracted from the color image to a canonical view using depth data in order to obtain a representation invariant to rotation, scale and perspective distortions. While one method is suitable for textured objects, either planar or non-planar, the other method focuses on texture-less planar objects. Qualitative and quantitative evaluations of the proposed methods are performed, showing that they can obtain better results than some existing methods for object detection and pose estimation, especially when dealing with oblique poses.

Keywords

Object detection Natural features tracking Computer vision RGB-D sensor 

Notes

Acknowledgments

The authors would like to thank Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)/Institut National de Recherche en Informatique et en Automatique (INRIA)/Comisión Nacional de Investigación Científica y Tecnológica (CONICYT) STIC-AmSud project ARVS and Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) (process 141705/2010-8) for partially funding this research.

Supplementary material

Supplementary material 1 (mpg 62920 KB)

Supplementary material 2 (mpg 42922 KB)

References

  1. 1.
    Álvarez, H., Borro, D.: Junction assisted 3d pose retrieval of untextured 3d models in monocular images. Comput. Vis. Image Underst. 117(10), 1204–1214 (2013)CrossRefGoogle Scholar
  2. 2.
    Barrow, H.G., Tenenbaum, J.M., Bolles, R.C., Wolf, H.C.: Parametric correspondence and chamfer matching: two new techniques for image matching. Technical report, DTIC Document (1977)Google Scholar
  3. 3.
    Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (surf). Comput. Vis. Image Underst. 110(3), 346–359 (2008)CrossRefGoogle Scholar
  4. 4.
    Benhimane, S., Malis, E.: Real-time image-based tracking of planes using efficient second-order minimization. In: 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2004 (IROS 2004). Proceedings, vol. 1, pp. 943–948. IEEE (2004)Google Scholar
  5. 5.
    Berkmann, J., Caelli, T.: Computation of surface geometry and segmentation using covariance techniques. IEEE Trans. Pattern Anal. Mach. Intell. 16(11), 1114–1116 (1994)CrossRefGoogle Scholar
  6. 6.
    Borgefors, G.: Distance transformations in digital images. Comput. Vis. Graph. Image Process. 34(3), 344–371 (1986)CrossRefGoogle Scholar
  7. 7.
    Bradski, G., Kaehler, A.: Learning OpenCV: Computer vision with the OpenCV library. O’Reilly Media Inc, Sebastopol (2008)Google Scholar
  8. 8.
    Calonder, M., Lepetit, V., Strecha, C., Fua, P.: Brief: Binary robust independent elementary features. In: Computer Vision—ECCV 2010, pp. 778–792. Springer, Berlin (2010)Google Scholar
  9. 9.
    Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 6, 679–698 (1986)CrossRefGoogle Scholar
  10. 10.
    Cruz, L., Lucio, D., Velho, L.: Kinect and RGBD images: challenges and applications. In: 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images Tutorials (SIBGRAPI-T), pp. 36–49. IEEE (2012)Google Scholar
  11. 11.
    Del Bimbo, A., Franco, F., Pernici, F.: Local homography estimation using keypoint descriptors. In: Analysis, Retrieval and Delivery of Multimedia Content, pp. 203–217. Springer, Berlin (2013)Google Scholar
  12. 12.
    Donoser, M., Kontschieder, P., Bischof, H.: Robust planar target tracking and pose estimation from a single concavity. In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 9–15. IEEE (2011)Google Scholar
  13. 13.
    Eyjolfsdottir, E., Turk, M.: Multisensory embedded pose estimation. In: 2011 IEEE Workshop on Applications of Computer Vision (WACV), pp. 23–30. IEEE (2011)Google Scholar
  14. 14.
    Falahati, S.: OpenNI Cookbook. Packt Publishing Ltd, Birmingham (2013)Google Scholar
  15. 15.
    Gossow, D., Weikersdorfer, D., Beetz, M.: Distinctive texture features from perspective-invariant keypoints. In: 2012 21st International Conference on Pattern Recognition (ICPR), pp. 2764–2767. IEEE (2012)Google Scholar
  16. 16.
    Hagbi, N., Bergig, O., El-Sana, J., Billinghurst, M.: Shape recognition and pose estimation for mobile augmented reality. In: 8th IEEE International Symposium on Mixed and Augmented Reality, 2009. ISMAR 2009, pp. 65–71. IEEE (2009)Google Scholar
  17. 17.
    Haralock, R.M., Shapiro, L.G.: Computer and robot vision. Addison-Wesley Longman Publishing Co., Inc. (1991)Google Scholar
  18. 18.
    Harris, C., Stephens, M.: A combined corner and edge detector. In: Alvey Vision Conference, vol. 15, p. 50. Manchester, UK (1988)Google Scholar
  19. 19.
    Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)Google Scholar
  20. 20.
    Hinterstoisser, S., Benhimane, S., Navab, N., Fua, P., Lepetit, V.: Online learning of patch perspective rectification for efficient object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008, pp. 1–8. IEEE (2008)Google Scholar
  21. 21.
    Hinterstoisser, S., Holzer, S., Cagniart, C., Ilic, S., Konolige, K., Navab, N., Lepetit, V.: Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 858–865. IEEE (2011)Google Scholar
  22. 22.
    Hinterstoisser, S., Kutter, O., Navab, N., Fua, P., Lepetit, V.: Real-time learning of accurate patch rectification. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 2945–2952. IEEE (2009)Google Scholar
  23. 23.
    Hinterstoisser, S., Lepetit, V., Ilic, S., Fua, P., Navab, N.: Dominant orientation templates for real-time detection of texture-less objects. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2257–2264. IEEE (2010)Google Scholar
  24. 24.
    Hofhauser, A., Steger, C., Navab, N.: Edge-based template matching and tracking for perspectively distorted planar objects. In: Advances in Visual Computing, pp. 35–44. Springer, Berlin (2008)Google Scholar
  25. 25.
    Holzer, S., Hinterstoisser, S., Ilic, S., Navab, N.: Distance transform templates for object detection and pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 1177–1184. IEEE (2009)Google Scholar
  26. 26.
    Konolige, K.: Projected texture stereo. In: 2010 IEEE International Conference on Robotics and Automation (ICRA), pp. 148–155. IEEE (2010)Google Scholar
  27. 27.
    Koser, K., Koch, R.: Perspectively invariant normal features. In: IEEE 11th International Conference on Computer Vision, 2007. ICCV 2007, pp. 1–8. IEEE (2007)Google Scholar
  28. 28.
    Kurz, D., Benhimane, S.: Gravity-aware handheld augmented reality. In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 111–120. IEEE (2011)Google Scholar
  29. 29.
    Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. In: 2011 IEEE International Conference on Robotics and Automation (ICRA), pp. 1817–1824. IEEE (2011)Google Scholar
  30. 30.
    Lee, W., Park, N., Woo, W.: Depth-assisted real-time 3d object detection for augmented reality. In: ICAT’11, pp. 126–132 (2011)Google Scholar
  31. 31.
    Lepetit, V., Lagger, P., Fua, P.: Randomized trees for real-time keypoint recognition. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005, vol. 2, pp. 775–781. IEEE (2005)Google Scholar
  32. 32.
    Lieberknecht, S., Benhimane, S., Meier, P., Navab, N.: A dataset and evaluation methodology for template-based tracking algorithms. In: ISMAR, pp. 145–151 (2009)Google Scholar
  33. 33.
    Lima, J.P., Simoes, F., Uchiyama, H., Teichrieb, V., Marchand, E., et al.: Depth-assisted rectification of patches using RGB-D consumer devices to improve real-time keypoint matching. In: International Conference on Computer Vision Theory and Applications, Visapp 2013, pp. 651–656 (2013)Google Scholar
  34. 34.
    Lima, J.P., Teichrieb, V., Uchiyama, H., Marchand, E., et al.: Object detection and pose estimation from natural features using consumer RGB-D sensors: applications in augmented reality. In: IEEE International Symposium on Mixed and Augmented Reality (Doctoral Symposium), ISMAR’12, pp. 1–4 (2012)Google Scholar
  35. 35.
    Lima, J.P., Uchiyama, H., Teichrieb, V., Marchand, E.: Texture-less planar object detection and pose estimation using depth-assisted rectification of contours. In: 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 297–298. IEEE (2012)Google Scholar
  36. 36.
    Liu, M.Y., Tuzel, O., Veeraraghavan, A., Chellappa, R.: Fast directional chamfer matching. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1696–1703. IEEE (2010)Google Scholar
  37. 37.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRefGoogle Scholar
  38. 38.
    Marcon, M., Frigerio, E., Sarti, A., Tubaro, S.: 3d wide baseline correspondences using depth-maps. Signal Process. Image Commun. 27(8), 849–855 (2012)CrossRefGoogle Scholar
  39. 39.
    Martedi, S., Thomas, B., Saito, H.: Region-based tracking using sequences of relevance measures. In: 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 1–6. IEEE (2013)Google Scholar
  40. 40.
    Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Proceedings of the British Machine Vision Conference, vol. 1, pp. 384–393. BMVA (2002)Google Scholar
  41. 41.
    Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L.: A comparison of affine region detectors. Int. J. Comput. Vis. 65(1–2), 43–72 (2005)CrossRefGoogle Scholar
  42. 42.
    Morel, J.M., Yu, G.: ASIFT: a new framework for fully affine invariant image comparison. SIAM J. Imaging Sci. 2(2), 438–469 (2009)Google Scholar
  43. 43.
    Morwald, T., Richtsfeld, A., Prankl, J., Zillich, M., Vincze, M.: Geometric data abstraction using b-splines for range image segmentation. In: 2013 IEEE International Conference on Robotics and Automation (ICRA), pp. 148–153. IEEE (2013)Google Scholar
  44. 44.
    Newcombe, R.A., Davison, A.J., Izadi, S., Kohli, P., Hilliges, O., Shotton, J., Molyneaux, D., Hodges, S., Kim, D., Fitzgibbon, A.: Kinectfusion: real-time dense surface mapping and tracking. In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 127–136. IEEE (2011)Google Scholar
  45. 45.
    Ozuysal, M., Fua, P., Lepetit, V.: Fast keypoint recognition in ten lines of code. In: IEEE Conference on Computer Vision and Pattern Recognition, 2007. CVPR’07, pp. 1–8. IEEE (2007)Google Scholar
  46. 46.
    Pagani, A., Stricker, D.: Learning local patch orientation with a cascade of sparse regressors. In: BMVC, pp. 1–11 (2009)Google Scholar
  47. 47.
    Park, Y., Lepetit, V., Woo, W.: Texture-less object tracking with online training using an RGB-D camera. In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 121–126. IEEE (2011)Google Scholar
  48. 48.
    ROS: openni_launch_tutorials_intrinsiccalibration—ros wiki (2015). http://goo.gl/cEYyaG. Accessed 28 Aug 2015
  49. 49.
    Rosten, E., Drummond, T.: Machine learning for high-speed corner detection. In: Computer Vision—ECCV 2006, pp. 430–443. Springer, Berlin (2006)Google Scholar
  50. 50.
    Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: Orb: an efficient alternative to sift or surf. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2564–2571. IEEE (2011)Google Scholar
  51. 51.
    Rusu, R.B., Cousins, S.: 3D is here: point cloud library (PCL). In: 2011 IEEE International Conference on Robotics and Automation (ICRA), pp. 1–4. IEEE (2011)Google Scholar
  52. 52.
    Shotton, J., Blake, A., Cipolla, R.: Multiscale categorical object recognition using contour fragments. IEEE Trans. Pattern Anal. Mach. Intell. 30(7), 1270–1281 (2008)CrossRefGoogle Scholar
  53. 53.
    Suzuki, S., et al.: Topological structural analysis of digitized binary images by border following. Comput. Vis. Graph. Image Process. 30(1), 32–46 (1985)CrossRefzbMATHGoogle Scholar
  54. 54.
    Taylor, S., Drummond, T.: Multiple target localisation at over 100 fps. In: Proceedings of the British Machine Vision Conference, pp. 1–11. BMVA (2009)Google Scholar
  55. 55.
    Uchiyama, H., Marchand, E.: Toward augmenting everything: detecting and tracking geometrical features on planar objects. In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 17–25. IEEE (2011)Google Scholar
  56. 56.
    Woodfill, J.I., Gordon, G., Buck, R.: Tyzx deepsea high speed stereo vision system. In: Conference on Computer Vision and Pattern Recognition Workshop, 2004. CVPRW’04, pp. 41–41. IEEE (2004)Google Scholar
  57. 57.
    Wu, C., Clipp, B., Li, X., Frahm, J.M., Pollefeys, M.: 3D model matching with viewpoint-invariant patches (VIP). In: IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008, pp. 1–8. IEEE (2008)Google Scholar
  58. 58.
    Yang, M.Y., Cao, Y., Förstner, W., McDonald, J.: Robust wide baseline scene alignment based on 3d viewpoint normalization. In: Advances in Visual Computing, pp. 654–665. Springer, Berlin (2010)Google Scholar
  59. 59.
    Zeisl, B., Köser, K., Pollefeys, M.: Viewpoint invariant matching via developable surfaces. In: Computer Vision—ECCV 2012. Workshops and Demonstrations, pp. 62–71. Springer, Brelin (2012)Google Scholar
  60. 60.
    Zeisl, B., Koser, K., Pollefeys, M.: Automatic registration of RGB-D scans via salient directions. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 2808–2815. IEEE (2013)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • João Paulo Silva do Monte Lima
    • 1
    • 2
    Email author
  • Francisco Paulo Magalhães Simões
    • 2
  • Hideaki Uchiyama
    • 3
  • Veronica Teichrieb
    • 2
  • Eric Marchand
    • 3
  1. 1.Departamento de Estatística e Informática (DEINFO)Universidade Federal Rural de Pernambuco (UFRPE)RecifeBrazil
  2. 2.Voxar Labs, Centro de Informática (CIn)Universidade Federal de Pernambuco (UFPE)RecifeBrazil
  3. 3.INRIA Rennes Bretagne-AtlantiqueRennesFrance

Personalised recommendations