Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Table B-1 is a brief survey of public domain datasets in various categories, in no particular order. Note that many of the public domain datasets are freely available from universities and government agencies.

Table B-1. Public domain datasets

Name

SUN

Description

Annotated scenes and objects

Categories

908 scene categories, 3,819 object categories,13,1072 objects, and growing

Contributions

Open to contributions

Tools and apps

Image classifier source code + API, iOS app, Android app

Key papers

[70]

Owner

MTI CSAIL

Link

http://groups.csail.mit.edu/vision/SUN/

Name

UC Irvine Machine Learning Repository

Description

Very useful; huge repository of many categories of images

Categories

Too many to list; very wide range of categories, many attributes of the data are specifically searchable and designed into the ground truth datasets

Contributions

Ongoing

Tools and apps

Online assistant to search for specific ground truth datasets

Key papers

[550]

Link

http://archive.ics.uci.edu/ml/datasets.html

Name

Stanford 3D Scanning Repository

Description

High-resolution 3D scanned images with sub-millimeter accuracy, including XYZ and RGB datasets

Categories

Several scanned 3D objects with 3D point clouds, resolution ranging from 3,400,000 scanned point to 750,000 triangles and upwards

Link

http://graphics.stanford.edu/data/3Dscanrep/

Name

KITTI Benchmark Suite, Karlsruhe Institute of Technology

Description

Stereo datasets for various city driving scenes

Categories

KITTI benchmark suite covers optical flow, odometry, object detection, object orientation estimation; Karlsruhe sequences cover gray scale stereo sequences taken from a moving platform driving through a city; Karlsruhe objects cover gray scale stereo sequences taken from a moving platform driving through a city

Link

http://www.cvlibs.net/datasets/index.html

Name

Caltech Object Recognition Datasets

Description

Old but still useful; objects in hundreds of categories, some annotated with outlines

Categories

Over 256 categories, animals,plants, people, common objects, common food items, tools, furniture, more.

Key papers

[71]

Link

http://www.vision.caltech.edu/Image_Datasets/Caltech101/

http://www.vision.caltech.edu/Image_Datasets/Caltech256/

http://authors.library.caltech.edu/7694/(latest versions of 101 and 256)

Name

Imagenet + Wordnet

Description

Labeled, annotated, bounding-boxed, and feature-descriptor marked images; over 14,197,122 images indexed into 21,841 sets of similar images, or synsets, created using sister app Wordnet

Categories

Categories include almost anything

Contributions

Images taken from Internet searches

Tools and apps

Online controls: http://www.image-net.org/download-API

Source Code: ImageNet Large Scale Visual Recognition Challenge (ILSVRC2010) http://www.image-net.org/challenges/LSVRC/2010/index

Key papers

[72]; several see http://www.image-net.org/about-publication

Owner

Images have individual owners; website is © Stanford and Princeton

Link

http://www.image-net.org/

http://www.image-net.org/challenges/LSVRC/2012/

Name

Middlebury Computer Vision Datasets

Description

Scholarly and comprehensive datasets, and algorithm comparisons over most of the datasets

Categories

Stereo vision (excellent), multi-view stereo (excellent), MRF, Optical Flow (excellent), Color processing

Contributions

Algorithm benchmarks over the datasets can be submitted

Key papers

Several; see website

Owner

Middlebury College

Link

http://vision.middlebury.edu/

Name

ADL Activity Recognition Dataset

Description

Annotated scenes for activity recognition of common living scenes

Categories

Daily life

Tools and apps

Activity recognition code available (see link below)

Key papers

[73]

Link

http://deepthought.ics.uci.edu/ADLdataset/adl.html

Name

MIT Indoor Scenes 67, Scene Classification

Description

Annotated dataset specifically containing diverse indoor scenes

Categories

15,620 images organized into 67 indoor categories, some annotations in Labelme format

Key papers

[74]

Link

http://web.mit.edu/torralba/www/indoor.html

Name

RGB-D Object Recognition Dataset, U of W

Description

Dataset contains RGB and corresponding depth images

Categories

300 common household objects, 51 categories using Wordnet similar to Imagenet style (Imagenet dataset reviewed above), each object recorded in RGB and Kinect depth at various rotational angles and viewpoints

Key papers

[75]

Link

http://www.cs.washington.edu/rgbd-dataset/

Name

NYU Depth Datasets

Description

Annotated dataset of indoor scenes using RGB-D datasets + accelerometer data

Categories

Over 500,000 frames, many different indoor scenes and scene types, thousands of classes, accelerometer data, inpainted and raw depth information

Tools and apps

Matlab toolbox + g++ code

Key papers

[76]

Link

http://cs.nyu.edu/∼silberman/datasets/nyu_depth_v2.html

Name

Intel Labs Seattle - Egocentric Recognition of Handled Objects

Description

Annotated dataset for egocentric handled objects using a wearable camera

Categories

Over 42 everyday objects under varied lighting, occlusion, perspectives; over 6GB total video sequence data

Key papers

[77] [78]

Link

http://seattle.intel-research.net/∼xren/egovision09/

Name

Georgia Tech GTEA Egocentric Activities - Gaze(+)

Description

Annotated dataset for egocentric handled objects using a wearable camera

Categories

Many everyday objects under varied lighting, occlusion, perspectives

Tools and apps

Code library of vision functions and mathematical functions

Key papers

[79]

Link

http://www.cc.gatech.edu/∼afathi3/GTEA_Gaze_Website/

Name

CUReT: Columbia-Utrecht Reflectance and Texture Database

Description

Extensive texture sample and illumination datasets directions

Categories

Over 60 different samples with over 200 viewing and illumination combinations, BRDF measurement database, more

Key papers

[80]

Link

http://www.cs.columbia.edu/CAVE/software/curet/

Name

MIT Flickr Material Surface Category Dataset

Description

Dataset for identifying material categories including fabric, glass, metal, plastic, water, foliage, leather, paper, stone, wood

Categories

Contains images of materials for surface property analysis, in contrast to object or texture analysis; 10 categories of materials + 100 images in each category

Key papers

[81]

Link

http://people.csail.mit.edu/celiu/CVPR2010/index.html

Name

Faces in the Wilds

Description

Collection of over 13,000 images of faces annotated with names of people

Categories

Faces

Key papers

[82]

Link

http://vis-www.cs.umass.edu/lfw/

Name

The CMU Multi-PIE Face Database

Description

Annotated face and emotion database with multiple pose angles

Categories

750,000 face images are taken over a period of several months for each of 337 subjects over 15 viewpoints and 19 illuminations, annotated facial expressions

Key papers

[83]

Link

http://www.multipie.org/

Name

Stanford 40 Actions

Description

People actions image database

Categories

People performing 40 actions, bounding-box annotations, 9,532 images, 180-300 images per action class

Key papers

[84]

Link

http://vision.stanford.edu/Datasets/40actions.html

Name

NORB 3D Object Recognition from Shape

Description

NYU object recognition benchmark

Categories

Stereo image pairs; 194,400 total images of 50 toys under 36 azimuths, 9 elevations, and 6 lighting conditions

Tools and apps

EBLEARN C++ learning and vision library, LUSH programming language, VisionGRader object detection tool

http://www.cs.nyu.edu/∼yann/software/index.html

Key papers

[85]

Link

http://www.cs.nyu.edu/∼yann/research/norb/

Name

Optical Flow Algorithm Evaluation

Description

Tools and data for optical flow evaluation purposes

Categories

Many optical flow sequence ground truth datasets

Tools and apps

Tool for generating optical flow data, some optical flow code algorithms

Key papers

[86]

Link

http://of-eval.sourceforge.net/

Name

PETS Crowd Sensing Dataset Challenge

Description

Multi-sensor camera views composed into a dataset containing sequences of crowd activities

Categories

Challenge goals include crowd estimation, density, tracking of specific people, flow of crowd

Key papers

[94]

Link

http://www.cvg.rdg.ac.uk/PETS2009/a.html

Name

I-LIDS

Description

Security-oriented challenge ground truth dataset to enable competitive benchmarking including scenes for locating parked vehicles, abandoned baggage, secure perimeters, and doorway surveillance

Categories

Various categories in the security domain

Contributions

No, funded by UK government

Tools and apps

n.a.

Key papers

n.a.

Link

http://computervision.wikia.com/wiki/I-LIDS

Name

TRECVID, NIST, US Government

Description

NIST-sponsored public project spanning 2001-2013 for research in automatic segmentation, indexing, and content-based video retrieval

Categories

1. Semantic indexing (SIN) 2. Known-item search (KIS) 3. Instance search (INS) 4. Multimedia event detection (MED) 5. Multimedia event recounting (MER) 6. Surveillance event detection (SER), natural scenes, humans, vegetation, pets, office objects, more

Contributions

Annually by U.S. Government

Tools and apps

The Framework For Detection Evaluations (F4DE) tool, story evaluation tool, and others

Key papers

[95]

Link

http://www-nlpir.nist.gov/projects/trecvid/

Name

Microsoft Research Cambridge

Description

Pixel-wise labeled or segmented objects

Categories

Several hundred objects

Link

http://research.microsoft.com/en-us/projects/objectclassrecognition/

Name

Optical Flow Algorithm Evaluation

Description

Volume-rendered video scenes for optical flow algorithm benchmarking

Categories

Various scenes for optical flow; mainly synthetic sequences generated via ray tracing

Contributions

n.a.

Tools and apps

Yes, Tcl/Tk

Key papers

[96]

Link

http://of-eval.sourceforge.net/

Name

Pascal Object Recognition VOC Challenge Dataset

Description

Standardized ground truth data for a research challenge spanning 2005-2013 in the area of object recognition; competitions include classification, detection, segmentation, and actions over each of 20 classes of data

Categories

Consists of over 20 classes of objects in scenes including persons, animals, vehicles, indoor objects

Contributions

Via the Pascal conference

Tools and apps

Includes a developer kit and other useful software for labeling data and database access, and tools for reporting benchmarks results

Key papers

[97]

Link

http://pascallin.ecs.soton.ac.uk/challenges/VOC/

Name

CRCV

Description

Very extensive; University of Central Florida’s Center for Research in Computer Vision hosts a large collection of research data covering several domains

Categories

Comprehensive set of categories (aerial views, ground views) including dynamic textures, multi-modal iPhone sensor ground truth data (video, accelerometer, gyro), several categories of human actions, crowd segmentation, parking lots, human actions, much more

Contributions

n.a.

Tools and apps

n.a.

Key papers

[98]

Link

http://vision.eecs.ucf.edu/datasetsActions.html

Name

UCB Contour Detection and Image Segmentation

Description

U.C. Berkeley Computer Vision group provides a complete set of ground truth data, algorithms, and performance evaluations for contour detection, image segmentation, and some interest point methods

Categories

500 ground truth images on natural scenes containing a wide range of subjects and labeled ground truth data

Contributions

n.a.

Tools and apps

Benchmarking code (globalPB for CPU and GPU)

Key papers

[99]

Link

http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/resources.html#bench

Name

CAVIAR Ground Truth Videos for Context-Aware Vision

Description

Project site containing labeled and annotated ground truth data of humans in cities and shopping centers, including 52 videos with 90K frames total including people in indoor office scenes and shopping centers

Categories

Both scripted and real-life activities in shopping centers and offices, including walking, browsing, meeting, fighting, window shopping, entering/exiting stores

Contributions

n.a.

Tools and apps

n.a.

Key papers

[100]

Link

http://homepages.inf.ed.ac.uk/rbf/CAVIAR/caviar.htm

Name

Boston University Computer Science Department

Description

Image and video database covering a wide range of subject categories

Categories

Video sequences for head tracking and sign language; some datasets are labeled; still images for hand tracking, multi-face tracking, vehicle tracking, more

Contributions

Anonymous FTP

Tools and apps

n.a.

Key papers

[101]

Link

http://www.cs.bu.edu/groups/ivc/data.php