Consumer Depth Cameras for Computer Vision

Part of the series Advances in Computer Vision and Pattern Recognition pp 141-165

A Category-Level 3D Object Dataset: Putting the Kinect to Work

  • Allison JanochAffiliated withUniversity of California at Berkeley Email author 
  • , Sergey KarayevAffiliated withUniversity of California at Berkeley
  • , Yangqing JiaAffiliated withUniversity of California at Berkeley
  • , Jonathan T. BarronAffiliated withUniversity of California at Berkeley
  • , Mario FritzAffiliated withMax Plank Institute for Informatics
  • , Kate SaenkoAffiliated withUniversity of California at Berkeley
  • , Trevor DarrellAffiliated withUniversity of California at Berkeley

* Final gross prices may vary according to local VAT.

Get Access


The recent proliferation of the Microsoft Kinect, a cheap but quality depth sensor, has brought the need for a challenging category-level 3D object detection dataset to the forefront. Such a dataset can be used for object recognition in a spirit usually reserved for the large collections of intensity images typically collected from the Internet. Here, we will review current 3D datasets and find them lacking in variation of scene, category, instance, and viewpoint. The Berkeley 3D Object Dataset (B3DO), which contains color and depth image pairs gathered in read domestic and office environments will be presented. Baseline object recognition performance in a PASCAL VOC-style detection task is established, and two ways that inferred world size of the object van be used to improve detection are suggested. In an effort to make more significant performance progress, the problem of extracting useful features from range images is addressed. There has been much success in using the histogram of oriented gradients (HOG) as a global descriptor for object detection in intensity images. There are also many proposed descriptors designed specifically for depth data (spin images, shape context, etc.), but these are often focused on the local, not global descriptor paradigm. We explore the failures of gradient-based descriptors when applied to depth, and propose that the proper global descriptor in the realm of 3D should be based on curvature, not gradients.