Efficiently Scaling Up Video Annotation with Crowdsourced Marketplaces

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6314)


Accurately annotating entities in video is labor intensive and expensive. As the quantity of online video grows, traditional solutions to this task are unable to scale to meet the needs of researchers with limited budgets. Current practice provides a temporary solution by paying dedicated workers to label a fraction of the total frames and otherwise settling for linear interpolation. As budgets and scale require sparser key frames, the assumption of linearity fails and labels become inaccurate. To address this problem we have created a public framework for dividing the work of labeling video data into micro-tasks that can be completed by huge labor pools available through crowdsourced marketplaces. By extracting pixel-based features from manually labeled entities, we are able to leverage more sophisticated interpolation between key frames to maximize performance given a budget. Finally, by validating the power of our framework on difficult, real-world data sets we demonstrate an inherent trade-off between the mix of human and cloud computing used vs. the accuracy and cost of the labeling.


  1. 1.
    Sorokin, A., Forsyth, D.: Utility data annotation with amazon mechanical turk. Urbana 51, 61820 (2008)Google Scholar
  2. 2.
    Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proc. CVPR, pp. 710–719 (2009)Google Scholar
  3. 3.
    Russell, B., Torralba, A., Murphy, K., Freeman, W.: LabelMe: a database and web-based tool for image annotation. International Journal of Computer Vision 77, 157–173 (2008)CrossRefGoogle Scholar
  4. 4.
    Kumar, N., Berg, A.C., Belhumeur, P.N., Nayar, S.K.: Attribute and Simile Classifiers for Face Verification. In: IEEE International Conference on Computer Vision, ICCV (2009)Google Scholar
  5. 5.
    Torralba, A., Fergus, R., Freeman, W.: 80 million tiny images: a large dataset for non-parametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 1958–1970 (2008)CrossRefGoogle Scholar
  6. 6.
  7. 7.
    Yuen, J., Russell, B., Liu, C., Torralba, A.: LabelMe video: Building a Video Database with Human Annotations (2009)Google Scholar
  8. 8.
    Vijayanarasimhan, S., Grauman, K.: Whats It Going to Cost You?: Predicting Effort vs. Informativeness for Multi-Label Image Annotations. In: CVPR (2009)Google Scholar
  9. 9.
    Liu, C., Freeman, W., Adelson, E., Weiss, Y.: Human-assisted motion annotation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8 (2008)Google Scholar
  10. 10.
    Vijayanarasimhan, S., Jain, P., Grauman, K.: Far-Sighted Active Learning on a Budget for Image and Video Recognition. In: CVPR (2010)Google Scholar
  11. 11.
    Ross, J., Irani, L., Silberman, M.S., Zaldivar, A., Tomlinson, B.: Who are the crowdworkers? shifting demographics in mechanical turk. In: alt.CHI session of CHI 2010 Extended Abstracts on Human Factors in Computing Systems (2010)Google Scholar
  12. 12.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision 42, 145–175 (2001)zbMATHCrossRefGoogle Scholar
  13. 13.
    Avidan, S.: Ensemble tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 261–271 (2007)CrossRefGoogle Scholar
  14. 14.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, pp. I: 886–893 (2005)Google Scholar
  15. 15.
    Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: LIBLINEAR: A library for large linear classification. The Journal of Machine Learning Research 9, 1871–1874 (2008)Google Scholar
  16. 16.
    Bellman, R.: Some problems in the theory of dynamic programming. Econometrica: Journal of the Econometric Society, 37–48 (1954)Google Scholar
  17. 17.
    Felzenszwalb, P., Huttenlocher, D.: Distance transforms of sampled functions. Cornell Computing and Information Science Technical Report TR2004-1963 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of CaliforniaIrvineUSA

Personalised recommendations