A State of the Art Report on Kinect Sensor Setups in Computer Vision

  • Kai Berger
  • Stephan Meister
  • Rahul Nair
  • Daniel Kondermann
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8200)


During the last three years after the launch of the Microsoft Kinect® in the end-consumer market we have become witnesses of a small revolution in computer vision research towards the use of a standardized consumer-grade RGBD sensor for scene content retrieval. Beside classical localization and motion capturing tasks the Kinect has successfully been employed for the reconstruction of opaque and transparent objects. This report gives a comprehensive overview over the main publications using the Microsoft Kinect out of its original context as a decision-forest based motion-capturing tool.


Depth Image Iterative Close Point Depth Data Stereo Match Kinect Sensor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Albers, M., Berger, B.K., Magnor, E.P.D.I.M.: The capturing of turbulent gas flows using multiple kinects. Bachelor thesis, Technical University Braunschweig (2012)Google Scholar
  2. 2.
    Aydemir, A., Henell, D., Jensfelt, P., Shilkrot, R.: Kinect@ home: Crowdsourcing a large 3d dataset of real environments. In: 2012 AAAI Spring Symposium Series (2012)Google Scholar
  3. 3.
    Bartczak, B., Koch, R.: Dense depth maps from low resolution time-of-flight depth and high resolution color views. In: Bebis, G., et al. (eds.) ISVC 2009, Part II. LNCS, vol. 5876, pp. 228–239. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  4. 4.
    Berger, K., Ruhl, K., Albers, M., Schroder, Y., Scholz, A., Kokemuller, J., Guthe, S., Magnor, M.: The capturing of turbulent gas flows using multiple kinects. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 1108–1113. IEEE (2011)Google Scholar
  5. 5.
    Berger, K., Ruhl, K., Brümmer, C., Schröder, Y., Scholz, A., Magnor, M.: Markerless motion capture using multiple color-depth sensors. In: Proc. Vision, Modeling and Visualization (VMV), vol. 2011, p. 3 (2011)Google Scholar
  6. 6.
    Van den Bergh, M., Carton, D., De Nijs, R., Mitsou, N., Landsiedel, C., Kuehnlenz, K., Wollherr, D., Van Gool, L., Buss, M.: Real-time 3D hand gesture interaction with a robot for understanding directions from humans. In: 2011 IEEE RO-MAN, pp. 357–362. IEEE (2011)Google Scholar
  7. 7.
    Besl, P.J., McKay, N.D.: A method for registration of 3-d shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 14(2), 239–256 (1992)CrossRefGoogle Scholar
  8. 8.
    Breiman, L.: Random forests. Machine learning 45(1), 5–32 (2001)CrossRefzbMATHGoogle Scholar
  9. 9.
    Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 2, pp. 60–65. IEEE (2005)Google Scholar
  10. 10.
    Butler, D.A., Izadi, S., Hilliges, O., Molyneaux, D., Hodges, S., Kim, D.: Shake’n’sense: Reducing interference for overlapping structured light depth cameras. In: Proceedings of the 2012 ACM Annual Conference on Human Factors in Computing Systems, pp. 1933–1936. ACM (2012)Google Scholar
  11. 11.
    Bylow, E., Sturm, J., Kerl, C., Kahl, F., Cremers, D.: Real-time camera tracking and 3d reconstruction using signed distance functions. In: Robotics: Science and Systems Conference (RSS) (2013)Google Scholar
  12. 12.
    Camplani, M., Salgado, L.: Efficient spatio-temporal hole filling strategy for kinect depth maps. In: International Society for Optics and Photonics, IS&T/SPIE Electronic Imaging, p. 82900E (2012)Google Scholar
  13. 13.
    Chen, J., Izadi, S., Fitzgibbon, A.: Kinêtre: Animating the world with the human body. In: Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology, pp. 435–444. ACM (2012)Google Scholar
  14. 14.
    Chen, L., Lin, H., Li, S.: Depth image enhancement for kinect using region growing and bilateral filter. In: 2012 21st International Conference on Pattern Recognition (ICPR), pp. 3070–3073. IEEE (2012)Google Scholar
  15. 15.
    Chiu, W.C., Blanke, U., Fritz, M.: Improving the kinect by cross-modal stereo. In: 22nd British Machine Vision Conference (BMVC) (2011)Google Scholar
  16. 16.
    Dal Mutto, C., Zanuttigh, P., Cortelazzo, G.M.: A probabilistic approach to tof and stereo data fusion. In: 3DPVT, Paris, France (May 2010)Google Scholar
  17. 17.
    Danciu, G., Banu, S.M., Caliman, A.: Shadow removal in depth images morphology-based for kinect cameras. In: 2012 16th International Conference on System Theory, Control and Computing (ICSTCC), pp. 1–6. IEEE (2012)Google Scholar
  18. 18.
    Endres, F., Hess, J., Engelhard, N., Sturm, J., Cremers, D., Burgard, W.: An evaluation of the rgb-d slam system. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 1691–1696. IEEE (2012)Google Scholar
  19. 19.
    Faion, F., Friedberger, S., Zea, A., Hanebeck, U.D.: Intelligent sensor-scheduling for multi-kinect-tracking. In: Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS) (2012)Google Scholar
  20. 20.
    Fischer, J., Arbeiter, G., Verl, A.: Combination of time-of-flight depth and stereo using semiglobal optimization. In: Int. Conf. on Robotics and Automation (ICRA), pp. 3548–3553. IEEE (2011)Google Scholar
  21. 21.
    Frati, V., Prattichizzo, D.: Using kinect for hand tracking and rendering in wearable haptics. In: 2011 IEEE World Haptics Conference (WHC), pp. 317–321. IEEE (2011)Google Scholar
  22. 22.
    Girshick, R., Shotton, J., Kohli, P., Criminisi, A., Fitzgibbon, A.: Efficient regression of general-activity human poses from depth images. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 415–422. IEEE (2011)Google Scholar
  23. 23.
    Gottfried, J.-M., Fehr, J., Garbe, C.: Computing range flow from multi-modal kinect data. Advances in Visual Computing, 758–767 (2011)Google Scholar
  24. 24.
    Gudmundsson, S.A., Aanaes, H., Larsen, R.: Fusion of stereo vision and time-of-flight imaging for improved 3D estimation. IJISTA 5(3), 425–433 (2008)Google Scholar
  25. 25.
    Hahne, U., Alexa, M.: Combining time-of-flight depth and stereo images without accurate extrinsic calibration. IJISTA 5(3), 325–333 (2008)CrossRefGoogle Scholar
  26. 26.
    Hahne, U., Alexa, M.: Depth imaging by combining time-of-flight and on-demand stereo. In: Kolb, A., Koch, R. (eds.) Dyn3D 2009. LNCS, vol. 5742, pp. 70–83. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  27. 27.
    Han, J., Shao, L., Xu, D., Shotton, J.: Enhanced computer vision with microsoft kinect sensor: A review. IEEE Transactions on Cybernetics (2013)Google Scholar
  28. 28.
    Handa, A., Newcombe, R.A., Angeli, A., Davison, A.J.: Real-Time camera tracking: when is high frame-rate best? In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VII. LNCS, vol. 7578, pp. 222–235. Springer, Heidelberg (2012), CrossRefGoogle Scholar
  29. 29.
    Henry, P., Fox, D., Bhowmik, A., Mongia, R.: Patch Volumes: Segmentation-based Consistens Mapping with RGB-D Cameras. In: International Conference on 3D Vision 2013 (3DV) (2013)Google Scholar
  30. 30.
    Henry, P., Krainin, M., Herbst, E., Ren, X., Fox, D.: Rgb-d mapping: Using depth cameras for dense 3d modeling of indoor environments. In: The 12th International Symposium on Experimental Robotics (ISER), vol. 20, pp. 22–25 (2010)Google Scholar
  31. 31.
    Henry, P., Krainin, M., Herbst, E., Ren, X., Fox, D.: Rgb-d mapping: Using kinect-style depth cameras for dense 3d modeling of indoor environments. The International Journal of Robotics Research 31(5), 647–663 (2012)CrossRefGoogle Scholar
  32. 32.
    Heredia, F., Favier, R.: Point cloud library developers blog, kinfu large scale (June 18, 2012),
  33. 33.
    Daniel Herrera, C., Kannala, J., Heikkilä, J.: Accurate and practical calibration of a depth and color camera pair. In: Real, P., Diaz-Pernil, D., Molina-Abril, H., Berciano, A., Kropatsch, W. (eds.) CAIP 2011, Part II. LNCS, vol. 6855, pp. 437–445. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  34. 34.
    Hu, G., Huang, S., Zhao, L., Alempijevic, A., Dissanayake, G.: A robust rgb-d slam algorithm. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2012)Google Scholar
  35. 35.
    Huang, J., Lee, A.B., Mumford, D.: Statistics of range images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 324–331. IEEE (2000)Google Scholar
  36. 36.
    Huhle, B., Fleck, S., Schilling, A.: Integrating 3D time-of-flight camera data and high resolution images for 3Dtv applications. In: Proc. 3DTV Conf. IEEE (2007)Google Scholar
  37. 37.
    Huhle, B., Schairer, T., Jenke, P., Straßer, W.: Robust non-local denoising of colored depth data. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW 2008), pp. 1–7. IEEE (2008)Google Scholar
  38. 38.
    Izadi, S., Newcombe, R.A., Kim, D., Hilliges, O., Molyneaux, D., Hodges, S., Kohli, P., Shotton, J., Davison, A.J., Fitzgibbon, A.: KinectFusion: Real-time dynamic 3D surface reconstruction and interaction. In: ACM SIGGRAPH 2011 Talks, p. 23. ACM (2011)Google Scholar
  39. 39.
    Kate Solomon - Meerkats to go Ultra HD in BBC’s first 4K broadcast,
  40. 40.
    Keller, M., Lefloch, D., Lambers, M., Izadi, S., Weyrich, T., Kolb, A.: Real-time 3D Reconstruction in Dynamic Scenes using Point-based Fusion. In: International Conference on 3D Vision 2013 (3DV) (2013)Google Scholar
  41. 41.
    Kerl, C., Sturm, J., Cremers, D.: Robust odometry estimation for rgb-d cameras. In: Proc. of the IEEE Int. Conf. on Robotics and Automation (ICRA) (May 2013)Google Scholar
  42. 42.
    Kuhnert, K., Stommel, M.: Fusion of stereo-camera and pmd-camera data for real-time suited precise 3d environment reconstruction. In: Int. Conf. on Intelligent Robots and Systems, pp. 4780–4785. IEEE (2006)Google Scholar
  43. 43.
    Laurentini, A.: The visual hull concept for silhouette-based image understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence 16(2), 150–162 (1994)CrossRefGoogle Scholar
  44. 44.
    Lee, T., Lim, S., Lee, S., An, S., Oh, S.: Indoor mapping using planes extracted from noisy rgb-d sensors. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2012)Google Scholar
  45. 45.
    Lenzen, F., Schäfer, H., Garbe, C.: Denoising time-of-flight data with adaptive total variation. In: Bebis, G. (ed.) ISVC 2011, Part I. LNCS, vol. 6938, pp. 337–346. Springer, Heidelberg (2011)Google Scholar
  46. 46.
    Leyvand, T., Meekhof, C., Wei, Y.C., Sun, J., Guo, B.: Kinect identity: Technology and experience. Computer 44(4), 94–96 (2011)Google Scholar
  47. 47.
    Lysenkov, I., Eruhimov, V.: Pose refinement of transparent rigid objects with a stereo camera. In: 22th International Conference on Computer Graphics and Vision (GraphiCon 2012) (2012)Google Scholar
  48. 48.
    Lysenkov, I., Eruhimov, V., Bradski, G.: Recognition and pose estimation of rigid transparent objects with a kinect sensor. In: Robotics: Science and Systems VIII, Sydney, Australia (2012)Google Scholar
  49. 49.
    Mac Aodha, O., Campbell, N.D.F., Nair, A., Brostow, G.J.: Patch based synthesis for single depth image super-resolution. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 71–84. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  50. 50.
    Maimone, A., Fuchs, H.: Reducing interference between multiple structured light depth sensors using motion. In: 2012 IEEE Virtual Reality Workshops (VR), pp. 51–54. IEEE (2012)Google Scholar
  51. 51.
    Mardia, K., Dryden, I.: The statistical analysis of shape data. Biometrika 76(2), 271–281 (1989)Google Scholar
  52. 52.
    Meister, S., Izadi, S., Kohli, P., Hämmerle, M., Rother, C., Kondermann, D.: When can we use kinectfusion for ground truth acquisition? In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Workshops & Tutorials (2012)Google Scholar
  53. 53.
    Microsoft Corporation: Kinect for windows sdk,
  54. 54.
    Microsoft News Center: Microsoft press release (March 2010),
  55. 55.
    Microsoft Xbox support: Room lighting conditions for kinect,
  56. 56.
    Nair, R., Lenzen, F., Meister, S., Schäfer, H., Garbe, C., Kondermann, D.: High accuracy TOF and stereo sensor fusion at interactive rates. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012 Ws/Demos, Part II. LNCS, vol. 7584, pp. 1–11. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  57. 57.
    Newcombe, R.A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A.J., Kohli, P., Shotton, J., Hodges, S., Fitzgibbon, A.: KinectFusion: Real-time dense surface mapping and tracking. In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality, vol. 7, pp. 127–136 (2011)Google Scholar
  58. 58.
    Newman, P., Ho, K.: Slam-loop closing with visually salient features. In: Proceedings of the 2005 IEEE International Conference on Robotics and Automation (ICRA 2005), pp. 635–642. IEEE (2005)Google Scholar
  59. 59.
    Nguyen, C.V., Izadi, S., Lovell, D.: Modeling kinect sensor noise for improved 3D reconstruction and tracking. In: Second International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT), pp. 524–530. IEEE (2012)Google Scholar
  60. 60.
    Nowozin, S., Rother, C., Bagon, S., Sharp, T., Yao, B., Kohli, P.: Decision tree fields. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 1668–1675. IEEE (2011)Google Scholar
  61. 61.
    Oikonomidis, I., Kyriazis, N., Argyros, A.: Efficient model-based 3d tracking of hand articulations using kinect. BMVC (August 2, 2011)Google Scholar
  62. 62.
    Openkinect Project: libfreenect,
  63. 63.
    OpenNI: Openni framework,
  64. 64.
    Raheja, J.L., Chaudhary, A., Singal, K.: Tracking of fingertips and centers of palm using kinect. In: 2011 Third International Conference on Computational Intelligence, Modelling and Simulation (CIMSiM), pp. 248–252. IEEE (2011)Google Scholar
  65. 65.
    Raptis, M., Kirovski, D., Hoppe, H.: Real-time classification of dance gestures from skeleton animation. In: Proceedings of the 2011 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 147–156. ACM (2011)Google Scholar
  66. 66.
    Roth, H., Vona, M.: Moving volume kinectfusion. In: British Machine Vision Conf. (BMVC), Surrey, UK (2012)Google Scholar
  67. 67.
    Ruhl, K., Klose, F., Lipski, C., Magnor, M.: Integrating approximate depth data into dense image correspondence estimation. In: Proceedings of the 9th European Conference on Visual Media Production, pp. 26–31. ACM (2012)Google Scholar
  68. 68.
    Schoner, H., Moser, B., Dorrington, A.A., Payne, A.D., Cree, M.J., Heise, B., Bauer, F.: A clustering based denoising technique for range images of time of flight cameras. In: 2008 International Conference on Computational Intelligence for Modelling Control & Automation, pp. 999–1004. IEEE (2008)Google Scholar
  69. 69.
    Schröder, Y., Berger, K., Magnor, M.: Super resolution for active light sensor enhancement. Bachelor thesis, University of Braunschweig (March 2012)Google Scholar
  70. 70.
    Schröder, Y., Scholz, A., Berger, K., Ruhl, K., Guthe, S., Magnor, M.: Multiple kinect studies. Computer Graphics (2011)Google Scholar
  71. 71.
    Schnauer, C., Kaufmann, H.: Wide area motion tracking using consumer hardware. In: Proceedings of Workshop on Whole Body Interaction in Games and Entertainment, Advances in Computer Entertainment Technology (ACE 2011), Lisbon, Portugal (2011)Google Scholar
  72. 72.
    Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1297–1304. IEEE (2011)Google Scholar
  73. 73.
    Smisek, J., Jancosek, M., Pajdla, T.: 3d with kinect. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 1154–1160. IEEE (2011)Google Scholar
  74. 74.
    Somanath, G., Cohen, S., Price, B., Kambhamettu, C.: Stereo+Kinect for High Resolution Stereo Correspondences. In: International Conference on 3D Vision 2013 (3DV) (2013)Google Scholar
  75. 75.
    Steinbrücker, F., Sturm, J., Cremers, D.: Real-time visual odometry from dense rgb-d images. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 719–722. IEEE (2011)Google Scholar
  76. 76.
    Stuckler, J., Behnke, S.: Integrating depth and color cues for dense multi-resolution scene mapping using rgb-d cameras. In: 2012 IEEE Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), pp. 162–167. IEEE (2012)Google Scholar
  77. 77.
    Stückler, J., Behnke, S.: Multi-resolution surfel maps for efficient dense 3D modeling and tracking. Journal of Visual Communication and Image Representation (2013)Google Scholar
  78. 78.
    Tam, G., Cheng, Z.Q., Lai, Y.K., Langbein, F., Liu, Y., Marshall, A., Martin, R., Sun, X.F., Rosin, P.: Registration of 3d point clouds and meshes: A survey from rigid to non-rigid. IEEE Transactions on Visualization and Computer Graphics PP(99), 1 (2012)Google Scholar
  79. 79.
    Tomasi, C., Manduchi, R.: Bilateral filtering for gray and color images. In: Sixth International Conference on Computer Vision, pp. 839–846. IEEE (1998)Google Scholar
  80. 80.
    Wang, F., Zhang, C.: Feature extraction by maximizing the average neighborhood margin. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2007), pp. 1–8. IEEE (2007)Google Scholar
  81. 81.
    Weickert, J.: Anisotropic diffusion in image processing, vol. 1. Teubner Stuttgart (1998)Google Scholar
  82. 82.
    Whelan, T., Johannsson, H., Kaess, M., Leonard, J.J., McDonald, J.: Robust real-time visual odometry for dense rgb-d mapping. In: IEEE Intl. Conf. on Robotics and Automation (ICRA), Karlsruhe, Germany (2013)Google Scholar
  83. 83.
    Whelan, T., Kaess, M., Fallon, M., Johannsson, H., Leonard, J., McDonald, J.: Kintinuous: Spatially extended kinectfusion. Technical Report MIT-CSAIL-TR-2012-020, CSAIL Technical Reports (2012),
  84. 84.
    Woodford, O., Torr, P., Reid, I., Fitzgibbon, A.: Global stereo reconstruction under second-order smoothness priors. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(12), 2115–2128 (2009)CrossRefGoogle Scholar
  85. 85.
    Xu, K., Zhou, J., Wang, Z.: A method of hole-filling for the depth map generated by kinect with moving objects detection. In: 2012 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), pp. 1–5. IEEE (2012)Google Scholar
  86. 86.
    Yang, C., Medioni, G.: Object modelling by registration of multiple range images. Image and Vision Computing 10(3), 145–155 (1992)CrossRefGoogle Scholar
  87. 87.
    Zeng, M., Zhao, F., Zheng, J., Liu, X.: A memory-efficient kinectFusion using octree. In: Hu, S.-M., Martin, R.R. (eds.) CVM 2012. LNCS, vol. 7633, pp. 234–241. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  88. 88.
    Zhu, J., Wang, L., Yang, R., J., Davis, J., et al.: Reliability fusion of time-of-flight depth and stereo for high quality depth maps. TPAMI (99), 1 (2011)Google Scholar
  89. 89.
    Zollhöfer, M., Martinek, M., Greiner, G., Stamminger, M., Süßmuth, J.: Automatic reconstruction of personalized avatars from 3D face scans. Computer Animation and Virtual Worlds 22(2-3), 195–202 (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Kai Berger
    • 1
  • Stephan Meister
    • 2
  • Rahul Nair
    • 2
  • Daniel Kondermann
    • 2
  1. 1.OeRC OxfordUniversity of OxfordUK
  2. 2.Heidelberg Collaboratory for Image ProcessingUniversity of HeidelbergGermany

Personalised recommendations