Advertisement

International Journal of Computer Vision

, Volume 114, Issue 1, pp 56–73 | Cite as

Discovering Attribute Shades of Meaning with the Crowd

  • Adriana KovashkaEmail author
  • Kristen Grauman
Article

Abstract

To learn semantic attributes, existing methods typically train one discriminative model for each word in a vocabulary of nameable properties. However, this “one model per word” assumption is problematic: while a word might have a precise linguistic definition, it need not have a precise visual definition. We propose to discover shades of attribute meaning. Given an attribute name, we use crowdsourced image labels to discover the latent factors underlying how different annotators perceive the named concept. We show that structure in those latent factors helps reveal shades, that is, interpretations for the attribute shared by some group of annotators. Using these shades, we train classifiers to capture the primary (often subtle) variants of the attribute. The resulting models are both semantic and visually precise. By catering to users’ interpretations, they improve attribute prediction accuracy on novel images. Shades also enable more successful attribute-based image search, by providing robust personalized models for retrieving multi-attribute query results. They are widely applicable to tasks that involve describing visual content, such as zero-shot category learning and organization of photo collections.

Keywords

Attribute learning and perception Vision and language  Attribute discovery 

Notes

Acknowledgments

We thank the anonymous reviewers for their helpful feedback and suggestions. This research is supported in part by ONR ATL N00014-11-1-0105.

References

  1. Barnard, K., & Yanai, K. (2006). Mutual information of words and pictures. Information Theory and Applications, 2.Google Scholar
  2. Barnard, K., Yanai, K., Johnson, M., & Gabbur, P. (2006). Cross modal disambiguation. Toward category-level object recognition. Lecture Notes in Computer Science (Vol. 4170, pp. 238-257).Google Scholar
  3. Berg, T. L., Berg, A. C., & Shih, J. (2010). Automatic attribute discovery and characterization from noisy Web data. In Proceedings of the European conference on computer vision (ECCV).Google Scholar
  4. Berg, T. L., & Forsyth, D. A. (2006). Animals on the Web. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  5. Branson, S., Wah, C., Schroff, F., Babenko, B., Welinder, P., Perona, P., & Belongie, S. (2010). Visual recognition with humans in the loop. In Proceedings of the European conference on computer vision (ECCV).Google Scholar
  6. Curran, W., Moore, T., Kulesza, T., Wong, W. K., Todorovic, S., Stumpf, S., White, R., & Burnett, M. (2012). Towards recognizing “Cool”: Can end users help computer vision recognize subjective attributes or objects in images? In Proceedings of the ACM international conference on intelligent user interfaces. Google Scholar
  7. Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41, 391–407.CrossRefGoogle Scholar
  8. Deng, J., Krause, J., & Fei-Fei, L. (2013). Fine-grained crowdsourcing for fine-grained recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  9. Donahue, J., & Grauman, K. (2011). Annotator rationales for visual recognition. In Proceedings of the international conference on computer vision (ICCV).Google Scholar
  10. Duan, K., Parikh, D., Crandall, D., & Grauman, K. (2012). Discovering localized attributes for fine-grained recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  11. Endres, I., Farhadi, A., Hoiem, D., & Forsyth, D. A. (2010). Benefits and challenges of collecting richer object annotations. In Proceedings of the workshop on advancing computer vision with humans in the loop (ACVHL).Google Scholar
  12. Everett, C. (2013). Linguistic relativity: Evidence across languages and cognitive domains. Berlin: Mouton De Gruyter.CrossRefGoogle Scholar
  13. Farhadi, A., Endres, I., Hoiem, D, & Forsyth, D. A. (2009). Describing objects by their attributes. In Proceedings of conference on computer vision and pattern recognition (CVPR).Google Scholar
  14. Ferrari, V., & Zisserman, A. (2007). Learning visual attributes. In Proceedings of advances in neural information processing systems (NIPS).Google Scholar
  15. Gomes, R., Welinder, P., Krause, A., & Perona, P. (2011). Crowdclustering. In Proceedings of advances in neural information processing systems (NIPS).Google Scholar
  16. Gong, B., Grauman, K., & Sha, F. (2013). Reshaping visual datasets for domain adaptation. In Proceedings of advances in neural information processing systems (NIPS).Google Scholar
  17. Hall, D., Jurafsky. D., & Manning, C. D. (2008). Studying the history of ideas using topic models. In Proceedings of the empirical methods in natural language processing (EMNLP).Google Scholar
  18. Hofmann, T. (1999). Probabilistic latent semantic analysis. In Proceedings of uncertainty in artificial intelligence (UAI).Google Scholar
  19. Hoffman, J., Kulis, B., Darrell, T., & Saenko, K. (2012). Discovering latent domains for multisource domain adaptation. In Proceedings of the European conference on computer vision (ECCV).Google Scholar
  20. Kovashka, A., & Grauman, K. (2013). Attribute adaptation for personalized image search. In Proceedings of the IEEE international conference on computer vision (ICCV).Google Scholar
  21. Kovashka, A., Parikh, D., & Grauman, K. (2012). Whittle search: Image search with relative attribute feedback. In Proceedings of the international conference on computer vision (CVPR).Google Scholar
  22. Kovashka, A., Vijayanarasimhan, S., & Grauman, K. (2011). Actively selecting annotations among objects and attributes. In Proceedings of the international conference on computer vision (ICCV).Google Scholar
  23. Kumar, N., Berg, A. C., Belhumeur, P. N., & Nayar, S. K. (2011). Describable visual attributes for face verification and image search. In Proceedings of the transactions on pattern analysis and machine intelligence (TPAMI).Google Scholar
  24. Lampert, C., Nickisch, H., & Harmeling, S. (2009). Learning to detect unseen object classes by between-class attribute transfer. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  25. Levinson, S. C. (1996). Language and space. Annual Review of Anthropology, 25, 353–382.CrossRefGoogle Scholar
  26. Loeff, N., Alm, C. O., & Forsyth, D. A. (2006). Discriminating image senses by clustering with multimodal features. In Proceedings of the COLING/ACL main conference poster sessions.Google Scholar
  27. Lucy, J. A. (1992). Language diversity and thought: A reformulation of the linguistic relativity hypothesis. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  28. Mahajan, D., Sellamanickam, S., & Nair, V. (2011). A joint learning framework for attribute models and object descriptions. In Proceedings of the international conference on computer vision (ICCV).Google Scholar
  29. Maji, S. (2012). Discovering a Lexicon of parts and attributes. In Proceedings of the European conference on computer vision workshop on parts and attributes.Google Scholar
  30. Parikh, D., & Grauman, K. (2011a). Interactively building a discriminative vocabulary of nameable attributes. In Proceedings of  the IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  31. Parikh, D., & Grauman, K. (2011b). Relative attributes. In Proceedings of the international conference on computer vision (ICCV).Google Scholar
  32. Parkash, A., & Parikh, D. (2012). Attributes for classifier feedback. In Proceedings of the European conference on computer vision (ECCV).Google Scholar
  33. Patterson, G., & Hays, J. (2012). SUN attribute database: Discovering, annotating, and recognizing scene attributes. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  34. Rastegari, M., Farhadi, A., & Forsyth, D. A. (2012). Attribute discovery via predictable discriminative binary codes. In Proceedings of the European conference on computer vision (ECCV).Google Scholar
  35. Rastegari, M., Parikh, D., Diba, A., & Farhadi, A. (2013). Multi-attribute queries: To merge or not to merge? In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  36. Rohrbach, M., Regneri, M., Andriluka, M., Amin, S., Pinkal, M., & Schiele, B. (2012). Script data for attribute-based recognition of composite activities. In Proceedings of the European conference on computer vision (ECCV).Google Scholar
  37. Rousseeuw, P. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Computational and Applied Mathematics, 20, 53–65.CrossRefzbMATHGoogle Scholar
  38. Saenko, K., & Darrell, T. (2008). Unsupervised learning of visual sense models for polysemous words. In Proceedings of neural information processing systems (NIPS).Google Scholar
  39. Salakhutdinov, R., & Mnih, A. (2007). Probabilistic matrix factorization. In Proceedings of advances in neural information processing systems (NIPS).Google Scholar
  40. Salakhutdinov, R., & Mnih, A. (2008). Bayesian probabilistic matrix factorization using Markov Chain Monte Carlo. In Proceedings of the international conference on machine learning (ICML).Google Scholar
  41. Scheirer, W., Kumar, N., Belhumeur, P. N., & Boult, T. E. (2012). Multi-attribute spaces: Calibration for attribute fusion and similarity search. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  42. Sharmanska, V., Quadrianto, N., & Lampert, C. (2012). Augmented attribute representations. In Proceedings of the European conference on computer vision (ECCV).Google Scholar
  43. Siddiquie, B., Feris, R., & Davis, L. (2011). Image ranking and retrieval based on multi-attribute queries. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  44. Tamuz, O., Liu, C., Belongie, S., Shamir, O., & Kalai, A. T. (2011). Adaptively learning the crowd kernel. In Proceedings of the international conference on machine learning (ICML).Google Scholar
  45. Vaquero, D., Feris, R., Tran, D., Brown, L., Hampapur, A., & Turk, M. (2009). Attribute-based people search in surveillance environments. In Proceedings of the IEEE winter conference on applications of computer vision (WACV).Google Scholar
  46. Wang, J., Markert, K., & Everingham, M. (2009). Learning models for object recognition from natural language descriptions. In Proceedings of the British machine vision conference (BMVC).Google Scholar
  47. Wang, Y., & Mori, G. (2010). A discriminative latent model of object classes and attributes. In Proceedings of the European conference on computer vision (ECCV).Google Scholar
  48. Welinder, P., Branson, S., Belongie, S., & Perona, P. (2010). The multidimensional wisdom of crowds. In Proceedings of conference on neural information processing systems (NIPS).Google Scholar
  49. Xiong, C., McCloskey, S., Hsieh, S. H., & Corso, J. J. (2014). Latent domains modeling for visual domain adaptation. In Proceedings of AAAI conference on artificial intelligence (AAAI) .Google Scholar
  50. Xiong, L., Chen, X., Huang, T. K., Schneider, J., & Garbonell, J. (2010) Temporal collaborative filtering with bayesian probabilistic tensor factorization. In Proceedings of SIAM data mining.Google Scholar
  51. Yang, J., Yan, R., & Hauptmann, A. G. (2007). Adapting SVM classifiers to data with shifted distributions. In Proceedings of the IEEE international conference on data mining series (ICDM) workshops.Google Scholar
  52. Yu, F., Cao, L., Feris, R., Smith, J., & Chang, S. F. (2013). Designing category-level attributes for discriminative visual recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.The University of Texas at AustinAustinUSA

Personalised recommendations