Advertisement

Fine-Grained Comparisons with Attributes

  • Aron Yu
  • Kristen Grauman
Chapter
Part of the Advances in Computer Vision and Pattern Recognition book series (ACVPR)

Abstract

Given two images, we want to predict which exhibits a particular visual attribute more than the other—even when the two images are quite similar. For example, given two beach scenes, which looks more calm? Given two high-heeled shoes, which is more ornate? Existing relative attribute methods rely on global ranking functions. However, rarely will the visual cues relevant to a comparison be constant for all data, nor will humans’ perception of the attribute necessarily permit a global ordering. At the same time, not every image pair is even orderable for a given attribute. Attempting to map relative attribute ranks to “equality” predictions is nontrivial, particularly since the span of indistinguishable pairs in attribute space may vary in different parts of the feature space. To address these issues, we introduce local learning approaches for fine-grained visual comparisons, where a predictive model is trained on the fly using only the data most relevant to the novel input. In particular, given a novel pair of images, we develop local learning methods to (1) infer their relative attribute ordering with a ranking function trained using only analogous labeled image pairs, (2) infer the optimal “neighborhood,” i.e., the subset of the training instances most relevant for training a given local model, and (3) infer whether the pair is even distinguishable, based on a local model for just noticeable differences in attributes. Our methods outperform state-of-the-art methods for relative attribute prediction on challenging datasets, including a large newly curated shoe dataset for fine-grained comparisons. We find that for fine-grained comparisons, more labeled data is not necessarily preferable to isolating the right data.

Keywords

Relative Attribute Image Pair Test Pair Training Instance Kernel Density Estimator 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgements

We thank Mark Stephenson for his help creating the UT-Zap50K dataset, Naga Sandeep for providing the part-based features for LFW-10, and Ashish Kapoor for helpful discussions. This research is supported in part by NSF IIS-1065390 and ONR YIP Award N00014-12-1-0754.

References

  1. 1.
    Altwaijry, H., Belongie, S.: Relative ranking of facial attractiveness. In: Winter Conference on Applications of Computer Vision (WACV) (2012)Google Scholar
  2. 2.
    Atkeson, C., Moore, A., Schaal, S.: Locally weighted learning. AI Rev. 11(1), 11–73 (1997)Google Scholar
  3. 3.
    Banerjee, S., Dubey, A., Machchhar, J., Chakrabarti, S.: Efficient and accurate local learning for ranking. In: ACM SIGIR Workshop on Learning to Rank for Information Retrieval (2009)Google Scholar
  4. 4.
    Bellet, A., Habrard, A., Sebban, M.: A survey on metric learning for feature vectors and structured data. Technical report, University of Southern California (2013)Google Scholar
  5. 5.
    Berg, T.L., Berg, A.C., Shih, J.: Automatic attribute discovery and characterization from noisy web data. In: European Conference on Computer Vision (ECCV) (2010)Google Scholar
  6. 6.
    Biswas, A., Parikh, D.: Simultaneous active learning of classifiers and attributes via relative feedback. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)Google Scholar
  7. 7.
    Bottou, L., Vapnik, V.: Local learning algorithms. Neural Comput. 4(6), 888–900 (1992)CrossRefGoogle Scholar
  8. 8.
    Boutilier, C.: Preference elicitation and preference learning in social choice. In: Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems. Springer (2011)Google Scholar
  9. 9.
    Branson, S., Wah, C., Schroff, F., Babenko, B., Welinder, P., Perona, P., Belongie, S.: Visual recognition with humans in the loop. In: European Conference on Computer Vision (ECCV) (2010)Google Scholar
  10. 10.
    Cao, C., Kwak, I., Belongie, S., Kriegman, D., Ai, H.: Adaptive ranking of facial attractiveness. In: International Conference on Multimedia and Expo (ICME) (2014)Google Scholar
  11. 11.
    Chen, K., Gong, S., Xiang, T., Loy, C.: Cumulative attribute space for age and crowd density estimation. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)Google Scholar
  12. 12.
    Conitzer, V., Davenport, A., Kalagnanam, J.: Improved bounds for computing Kemeny rankings. In: Conference on Artificial Intelligence (AAAI) (2006)Google Scholar
  13. 13.
    Curran, W., Moore, T., Kulesza, T., Wong, W., Todorovic, S., Stumpf, S., White, R., Burnett, M.: Towards recognizing “cool”: can end users help computer vision recognize subjective attributes or objects in images? In: ACM Conference on Intelligent User Interfaces (2012)Google Scholar
  14. 14.
    Datta, A., Feris, R., Vaquero, D.: Hierarchical ranking of facial attributes. In: International Conference on Automatic Face and Gesture Recognition (FG) (2011)Google Scholar
  15. 15.
    Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I.: Information-theoretic metric learning. In: International Conference on Machine Learning (ICML) (2007)Google Scholar
  16. 16.
    Domeniconi, C., Gunopulos, D.: Adaptive nearest neighbor classification using support vector machines. In: Conference on Neural Information Processing Systems (NIPS) (2001)Google Scholar
  17. 17.
    Duh, K., Kirchhoff, K.: Learning to rank with partially-labeled data. In: ACM SIGIR Conference on Research and Development in Information Retrieval (2008)Google Scholar
  18. 18.
    Fan, Q., Gabbur, P., Pankanti, S.: Relative attributes for large-scale abandoned object detection. In: International Conference on Computer Vision (ICCV) (2013)Google Scholar
  19. 19.
    Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2009)Google Scholar
  20. 20.
    Farrell, R., Oza, O., Zhang, N., Morariu, V., Darrell, T., Davis, L.: Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance. In: International Conference on Computer Vision (ICCV) (2011)Google Scholar
  21. 21.
    Forsyth, D., Ponce, J.: Computer Vision: A Modern Approach. Prentice Hall (2002)Google Scholar
  22. 22.
    Frome, A., Singer, Y., Malik, J.: Image retrieval and classification using local distance functions. In: Conference on Neural Information Processing Systems (NIPS) (2006)Google Scholar
  23. 23.
    Frome, A., Singer, Y., Sha, F., Malik, J.: Learning globally-consistent local distance functions for shape-based image retrieval and classification. In: International Conference on Computer Vision (ICCV) (2007)Google Scholar
  24. 24.
    Geng, X., Liu, T., Qin, T., Arnold, A., Li, H., Shum, H.: Query dependent ranking using K-nearest neighbor. In: ACM SIGIR Conference on Research and Development in Information Retrieval (2008)Google Scholar
  25. 25.
    Hastie, T., Tibshirani, R.: Discriminant adaptive nearest neighbor classification. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 18(6), 607–616 (1996)CrossRefGoogle Scholar
  26. 26.
    Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical report, University of Massachusetts, Amherst (2007)Google Scholar
  27. 27.
    Jain, P., Kulis, B., Grauman, K.: Fast image search for learned metrics. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2008)Google Scholar
  28. 28.
    Jiang, X., Lim, L., Yao, Y., Ye, Y.: Statistical ranking and combinatorial hodge theory. Math. Program. 127(1), 203–244 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Joachims, T.: Optimizing search engines using clickthrough data. In: Knowledge Discovery in Databases (PKDD) (2002)Google Scholar
  30. 30.
    Kapoor, A., Jain, P., Viswanathan, R.: Multilabel classification using Bayesian compressed sensing. In: Conference on Neural Information Processing Systems (NIPS) (2012)Google Scholar
  31. 31.
    Kovashka, A., Grauman, K.: Attribute adaptation for personalized image search. In: International Conference on Computer Vision (ICCV) (2013)Google Scholar
  32. 32.
    Kovashka, A., Grauman, K.: Attribute pivots for guiding relevance feedback in image search. In: International Conference on Computer Vision (ICCV) (2013)Google Scholar
  33. 33.
    Kovashka, A., Grauman, K.: Discovering attribute shades of meaning with the crowd. Int. J. Comput. Vis. (IJCV) 114(1), 56–73 (2015)CrossRefGoogle Scholar
  34. 34.
    Kovashka, A., Parikh, D., Grauman, K.: WhittleSearch: Image search with relative attribute feedback. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)Google Scholar
  35. 35.
    Kumar, N., Belhumeur, P., Nayar, S.: FaceTracer: A search engine for large collections of images with faces. In: European Conference on Computer Vision (ECCV) (2008)Google Scholar
  36. 36.
    Kumar, N., Berg, A., Belhumeur, P., Nayar, S.K.: Attribute and simile classifiers for face verification. In: International Conference on Computer Vision (ICCV) (2009)Google Scholar
  37. 37.
    Lampert, C., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2009)Google Scholar
  38. 38.
    Li, S., Shan, S., Chen, X.: Relative forest for attribute prediction. In: Asian Conference on Computer Vision (ACCV) (2012)Google Scholar
  39. 39.
    Lin, H., Yu, C., Chen, H.: Query-dependent rank aggregation with local models. In: Asia Information Retrieval Societies Conference (AIRS) (2011)Google Scholar
  40. 40.
    Maaten, L., Hinton, G.: Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. (JMLR) 9, 2579–2605 (2008)zbMATHGoogle Scholar
  41. 41.
    Matthews, T., Nixon, M., Niranjan, M.: Enriching texture analysis with semantic data. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)Google Scholar
  42. 42.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. (IJCV) 42(3), 145–175 (2001)CrossRefzbMATHGoogle Scholar
  43. 43.
    Parikh, D., Grauman, K.: Relative attributes. In: International Conference on Computer Vision (ICCV) (2011)Google Scholar
  44. 44.
    Parzen, E.: On estimation of a probability density function and mode. Annu. Math. Stat. 33(3), 1065–1076 (1962)MathSciNetCrossRefzbMATHGoogle Scholar
  45. 45.
    Reid, D., Nixon, M.: Soft biometrics; human identification using comparative descriptions. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 36(6), 1216–1228 (2014)CrossRefGoogle Scholar
  46. 46.
    Sadovnik, A., Gallagher, A., Parikh, D., Chen, T.: Spoken attributes: mixing binary and relative attributes to say the right thing. In: International Conference on Computer Vision (ICCV) (2013)Google Scholar
  47. 47.
    Sandeep, R., Verma, Y., Jawahar, C.: Relative parts: distinctive parts for learning relative attributes. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)Google Scholar
  48. 48.
    Scheirer, W., Kumar, N., Belhumeur, P., Boult, T.: Multi-attribute spaces: calibration for attribute fusion and similarity search. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)Google Scholar
  49. 49.
    Shrivastava, A., Singh, S., Gupta, A.: Constrained semi-supervised learning using attributes and comparative attributes. In: European Conference on Computer Vision (ECCV) (2012)Google Scholar
  50. 50.
    Siddiquie, B., Feris, R., Davis, L.: Image ranking and retrieval based on multi-attribute queries. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2011)Google Scholar
  51. 51.
    Vincent, P., Bengio, Y.: K-Local hyperplane and convex distance nearest neighbor algorithms. In: Conference on Neural Information Processing Systems (NIPS) (2001)Google Scholar
  52. 52.
    Weinberger, K., Saul, L.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. (JMLR) 10, 207–244 (2009)zbMATHGoogle Scholar
  53. 53.
    Yang, L., Jin, R., Sukthankar, R., Liu, Y.: An efficent algorithm for local distance metric learning. In: Conference on Artificial Intelligence (AAAI) (2006)Google Scholar
  54. 54.
    Yu, A., Grauman, K.: Fine-grained visual comparisons with local learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)Google Scholar
  55. 55.
    Yu, A., Grauman, K.: Predicting useful neighborhoods for lazy local learning. In: Conference on Neural Information Processing Systems (NIPS) (2014)Google Scholar
  56. 56.
    Yu, A., Grauman, K.: Just noticeable differences in visual attributes. In: International Conference on Computer Vision (ICCV) (2015)Google Scholar
  57. 57.
    Zhang, H., Berg, A., Maire, M., Malik, J.: SVM-KNN: Discriminative nearest neighbor classification for visual category recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2006)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.University of Texas at AustinAustinUSA

Personalised recommendations