Part and Attribute Discovery from Relative Annotations

Article

Abstract

Part and attribute based representations are widely used to support high-level search and retrieval applications. However, learning computer vision models for automatically extracting these from images requires significant effort in the form of part and attribute labels and annotations. We propose an annotation framework based on comparisons between pairs of instances within a set, which aims to reduce the overhead in manually specifying the set of part and attribute labels. Our comparisons are based on intuitive properties such as correspondences and differences, which are applicable to a wide range of categories. Moreover, they require few category specific instructions and lead to simple annotation interfaces compared to traditional approaches. On a number of visual categories we show that our framework can use noisy annotations collected via “crowdsourcing” to discover semantic parts useful for detection and parsing, as well as attributes suitable for fine-grained recognition.

Keywords

Relative annotations Crowdsourcing Semantic parts Fine-grained attributes 

References

  1. Agarwal, A., & Triggs, B. (2006). Recovering 3d human pose from monocular images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(1), 44–58.CrossRefGoogle Scholar
  2. Berg, T., Berg, A., & Shih, J. (2010). Automatic attribute discovery and characterization from noisy web data. In European Conference on Computer Vision.Google Scholar
  3. Blei, D. M., & Jordan, M. I. (2003). Modeling annotated data. In SIGIR (pp. 127–134).Google Scholar
  4. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.Google Scholar
  5. Bourdev, L., Maji, S., Brox, T., & Malik, J. (2010). Detecting people using mutually consistent poselet activations. In European Conference on Computer Vision.Google Scholar
  6. Bourdev, L., Maji, S., & Malik, J. (2011). Describing people: A poselet-based approach to attribute classication. In International Conference on Computer Vision.Google Scholar
  7. Bourdev, L., & Malik, J. (2009). Poselets: Body part detectors trained using 3d human pose annotations. In International Conference on Computer Vision.Google Scholar
  8. Branson, S., Wah, C., Schroff, F., Babenko, B., Welinder, P., Perona, P., & Belongie, S. (2010). Visual recognition with humans in the loop. In K. Daniilidis, P. Maragos & N. Paragios (Eds.), Computer vision-ECCV 2010 (pp. 438–451). Berlin: Springer.Google Scholar
  9. Brown, P. F., Cocke, J., Pietra, S. A. D., Pietra, V. J. D., Jelinek, F., Lafferty, J. D., et al. (1990). A statistical approach to machine translation. Computational Linguistics, 16, 79–85.Google Scholar
  10. Bush, V. (1945). The atlantic monthly. As we may think.Google Scholar
  11. Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., & Vedaldi, A. (2014). Describing textures in the wild. In Computer Vision and Pattern Recognition (CVPR).Google Scholar
  12. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In N. Dalal & B. Triggs (Eds.), Computer Vision and Pattern Recognition (pp. 886–893).Google Scholar
  13. Desai, C., & Ramanan, D. (2012). Detecting actions, poses, and objects with relational phraselets. In Computer vision-ECCV 2012 (pp. 158–172). Berlin: Springer.Google Scholar
  14. Duan, K., Parikh, D., Crandall, D., & Grauman, K. (2012). Discovering localized attributes for fine-grained recognition. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3474–3481).Google Scholar
  15. Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.CrossRefGoogle Scholar
  16. Farhadi, A., Endres, I., & Hoiem, D. (2010). Attribute-centric recognition for cross-category generalization. In Computer Vision and Pattern Recognition.Google Scholar
  17. Felzenszwalb, P., Girshick, R., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transaction of Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.CrossRefGoogle Scholar
  18. Felzenszwalb, P. F., & Huttenlocher, D. P. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61, 55–79.CrossRefGoogle Scholar
  19. Ferrari, V., Marin-Jimenez, M., & Zisserman, A. (2008). Progressive search space reduction for human pose estimation. In Computer Vision and Pattern Recognition.Google Scholar
  20. Frome, A., Singer, Y., & Malik, J. (2007). Image retrieval and classification using local distance functions. In Advances in neural information processing systems 19: Proceedings of the 2006 conference (Vol. 19, p. 417). MIT Press.Google Scholar
  21. Girshick, R. B., Felzenszwalb, P. F., & McAllester, D. (2012) Discriminatively trained deformable part models, release 5. http://people.cs.uchicago.edu/rbg/latent-release5/.
  22. Hariharan, B., Malik, J., & Ramanan, D. (2012). Discriminative decorrelation for clustering and classification. In A. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato & C. Schmid (Eds.), Computer vision-ECCV 2012 (pp. 459–472). Berlin: Springer.Google Scholar
  23. Itti, L., & Koch, C. (2001). Computational modelling of visual attention. Nature Reviews Neuroscience, 2(3), 194–203.CrossRefGoogle Scholar
  24. Joachims, T. (2002). Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 133–142). ACM.Google Scholar
  25. Kovashka, A., Parikh, D., & Grauman, K. (2012). Whittlesearch: Image search with relative attribute feedback. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2973–2980). IEEE. Google Scholar
  26. Kumar, N., Belhumeur, P., & Nayar, S. (2008). Facetracer: A search engine for large collections of images with faces. In European conference on computer vision.Google Scholar
  27. Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In ECCV workshop on statistical learning in computer vision (pp. 17–32).Google Scholar
  28. Maji, S. (2011). Large scale image annotations on amazon mechanical turk. Tech. Rep. UCB/EECS-2011-79, EECS Department, University of California, Berkeley (2011). http://www.eecs.berkeley.edu/Pubs/TechRpts/2011/EECS-2011-79.html
  29. Maji, S. (2012). Discovering a lexicon of parts and attributes. In Second International Workshop on Parts and Attributes, ECCV.Google Scholar
  30. Maji, S., & Shakhanarovich, G. (2013). Part discovery from partial correspondence. In Computer vision and pattern recognition.Google Scholar
  31. Maji, S., & Shakhnarovich, G. (2012). Part annotations via pairwise correspondence. In Human computation workshops at the AAAI.Google Scholar
  32. Malisiewicz, T., & Efros, A. (2009). Beyond categories: The visual memex model for reasoning about object relationships. In Advances in neural information processing systems (pp. 1222–1230).Google Scholar
  33. Malisiewicz, T., Gupta, A., & Efros, A. A. (2011). Ensemble of exemplar-svms for object detection and beyond. In International conference on computer vision.Google Scholar
  34. Parikh, D., & Grauman, K. (2011). Interactive discovery of task-specic nameable attributes. In Workshop on fine-grained visual categorization, CVPR.Google Scholar
  35. Patterson, G., & Hays, J. (2012). Sun attribute database: Discovering, annotating, and recognizing scene attributes. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2751–2758). IEEE.Google Scholar
  36. Singh, S., Gupta, A., & Efros, A. A. (2012). Unsupervised discovery of mid-level discriminative patches. In A. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato & C. Schmid (Eds.), Computer vision-ECCV 2012 (pp. 73–86). Berlin: Springer.Google Scholar
  37. Tamura, H., Mori, S., & Yamawaki, T. (1978). Textural features corresponding to visual perception. IEEE Transactions on Systems, Man and Cybernetics, 8(6), 460–473.CrossRefGoogle Scholar
  38. Tamuz, O., Liu, C., Belongie, S., Shamir, O., & Kalai, A. (2011). Adaptively learning the crowd kernel. In International conference on machine learning (ICML). Bellevue, WA.Google Scholar
  39. Von Ahn, L., & Dabbish, L. (2004). Labeling images with a computer game. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 319–326). ACM.Google Scholar
  40. Von Ahn, L., Liu, R., & Blum, M. (2006). Peekaboom: A game for locating objects in images. In Proceedings of the SIGCHI conference on Human Factors in computing systems (pp. 55–64). ACM.Google Scholar
  41. Weber, M., Welling, M., & Perona, P. (2000). Towards automatic discovery of object categories. In Computer vision and pattern recognition.Google Scholar
  42. Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., & Perona, P. (2010). Caltech-UCSD birds 200. Tech. Rep. CNS-TR-2010-001, California Institute of Technology.Google Scholar
  43. Xiao, J., Hays, J., Ehinger, K. A., Oliva, A., & Torralba, A. (2010). Sun database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3485–3492). IEEE.Google Scholar
  44. Yang, Y., & Ramanan, D. (2011). Articulated pose estimation with flexible mixtures-of-parts. In 2011 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1385–1392). IEEE.Google Scholar
  45. Zhu, X., & Ramanan, D. (2012). Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2879–2886). IEEE.Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Toyota Technological Institute at ChicagoChicagoUSA

Personalised recommendations