Advertisement

Deep Relative Attributes

  • Yaser SouriEmail author
  • Erfan Noury
  • Ehsan Adeli
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10115)

Abstract

Visual attributes are great means of describing images or scenes, in a way both humans and computers understand. In order to establish a correspondence between images and to be able to compare the strength of each property between images, relative attributes were introduced. However, since their introduction, hand-crafted and engineered features were used to learn increasingly complex models for the problem of relative attributes. This limits the applicability of those methods for more realistic cases. We introduce a deep neural network architecture for the task of relative attribute prediction. A convolutional neural network (ConvNet) is adopted to learn the features by including an additional layer (ranking layer) that learns to rank the images based on these features. We adopt an appropriate ranking loss to train the whole network in an end-to-end fashion. Our proposed method outperforms the baseline and state-of-the-art methods in relative attribute prediction on various coarse and fine-grained datasets. Our qualitative results along with the visualization of the saliency maps show that the network is able to learn effective features for each specific attribute. Source code of the proposed method is available at https://github.com/yassersouri/ghiaseddin.

Keywords

Feature Vector Relative Attribute Convolutional Neural Network Visual Attribute Feature Learning 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgments

We would like to thank Computer Engineering Department of Sharif University of Technology and HPC center of IPM for their support with computational resources.

References

  1. 1.
    Kovashka, A., Parikh, D., Grauman, K.: Whittlesearch: image search with relative attribute feedback. In: CVPR (2012)Google Scholar
  2. 2.
    Branson, S., Beijbom, O., Belongie, S.: Efficient large-scale structured learning. In: CVPR (2013)Google Scholar
  3. 3.
    Branson, S., Wah, C., Schroff, F., Babenko, B., Welinder, P., Perona, P., Belongie, S.: Visual recognition with humans in the loop. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 438–451. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-15561-1_32 CrossRefGoogle Scholar
  4. 4.
    Lampert, C., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. IEEE TPAMI 36, 453–465 (2014)CrossRefGoogle Scholar
  5. 5.
    Parikh, D., Grauman, K.: Relative attributes. In: CVPR, pp. 503–510 (2011)Google Scholar
  6. 6.
    Ferrari, V., Zisserman, A.: Learning visual attributes. In: NIPS, pp. 433–440 (2007)Google Scholar
  7. 7.
    Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: CVPR (2009)Google Scholar
  8. 8.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: A holistic representation of the spatial envelope. IJCV 42, 145–175 (2001)CrossRefzbMATHGoogle Scholar
  9. 9.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, pp. 886–893 (2005)Google Scholar
  10. 10.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  11. 11.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)Google Scholar
  12. 12.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440 (2015)Google Scholar
  13. 13.
    Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: CVPRW, pp. 512–519 (2014)Google Scholar
  14. 14.
    Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)
  15. 15.
    Farhadi, A., Hejrati, M., Sadeghi, M.A., Young, P., Rashtchian, C., Hockenmaier, J., Forsyth, D.: Every picture tells a story: generating sentences from images. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 15–29. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-15561-1_2 CrossRefGoogle Scholar
  16. 16.
    Tao, R., Smeulders, A.W., Chang, S.F.: Attributes and categories for generic instance search from one example. In: CVPR, pp. 177–186 (2015)Google Scholar
  17. 17.
    Khan, F., van de Weijer, J., Anwer, R., Felsberg, M., Gatta, C.: Semantic pyramids for gender and action recognition. IEEE TIP 23, 3633–3645 (2014)MathSciNetGoogle Scholar
  18. 18.
    Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In: CVPR, pp. 3337–3344 (2011)Google Scholar
  19. 19.
    Liu, J., Yu, Q., Javed, O., Ali, S., Tamrakar, A., Divakaran, A., Cheng, H., Sawhney, H.: Video event recognition using concept attributes. In: WACV, pp. 339–346 (2013)Google Scholar
  20. 20.
    Kovashka, A., Grauman, K.: Attribute pivots for guiding relevance feedback in image search. In: ICCV, pp. 297–304 (2013)Google Scholar
  21. 21.
    Joachims, T.: Optimizing search engines using clickthrough data. In: ACM KDD, pp. 133–142 (2002)Google Scholar
  22. 22.
    Li, S., Shan, S., Chen, X.: Relative forest for attribute prediction. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 316–327. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-37331-2_24 CrossRefGoogle Scholar
  23. 23.
    Datta, A., Feris, R., Vaquero, D.: Hierarchical ranking of facial attributes. In: FG, pp. 36–42 (2011)Google Scholar
  24. 24.
    Jayaraman, D., Sha, F., Grauman, K.: Decorrelating semantic visual attributes by resisting the urge to share. In: CVPR, pp. 1629–1636 (2014)Google Scholar
  25. 25.
    Zhang, H., Berg, A., Maire, M., Malik, J.: SVM-KNN: discriminative nearest neighbor classification for visual category recognition. In: CVPR, vol. 2, pp. 2126–2136 (2006)Google Scholar
  26. 26.
    Yu, A., Grauman, K.: Fine-grained visual comparisons with local learning. In: CVPR (2014)Google Scholar
  27. 27.
    Yu, A., Grauman, K.: Just noticeable differences in visual attributes. In: ICCV (2015)Google Scholar
  28. 28.
    LeCun, Y., Boser, B.E., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W.E., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. In: NIPS (1989)Google Scholar
  29. 29.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)Google Scholar
  30. 30.
    Zhang, N., Paluri, M., Ranzato, M., Darrell, T., Bourdev, L.: PANDA: pose aligned networks for deep attribute modeling. In: CVPR, pp. 1637–1644 (2014)Google Scholar
  31. 31.
    Escorcia, V., Carlos Niebles, J., Ghanem, B.: On the relationship between visual attributes and convolutional networks. In: CVPR (2015)Google Scholar
  32. 32.
    Shankar, S., Garg, V.K., Cipolla, R.: Deep-carving: discovering visual attributes by carving deep neural nets. In: CVPR (2015)Google Scholar
  33. 33.
    Khan, F.S., Anwer, R.M., Weijer, J., Felsberg, M., Laaksonen, J.: Deep semantic pyramids for human attributes and action recognition. In: Paulsen, R.R., Pedersen, K.S. (eds.) SCIA 2015. LNCS, vol. 9127, pp. 341–353. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-19665-7_28 CrossRefGoogle Scholar
  34. 34.
    Huang, J., Feris, R.S., Chen, Q., Yan, S.: Cross-domain image retrieval with a dual attribute-aware ranking network. In: ICCV (2015)Google Scholar
  35. 35.
    Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to rank using gradient descent. In: ICML, pp. 89–96 (2005)Google Scholar
  36. 36.
    Song, Y., Wang, H., He, X.: Adapting deep ranknet for personalized search. In: WSDM (2014)Google Scholar
  37. 37.
    Wan, J., Wang, D., Hoi, S.C.H., Wu, P., Zhu, J., Zhang, Y., Li, J.: Deep learning for content-based image retrieval: a comprehensive study. In: ACM MM, pp. 157–166 (2014)Google Scholar
  38. 38.
    Yao, T., Mei, T., Rui, Y.: Highlight detection with pairwise deep ranking for first-person video summarization. In: CVPR (2016)Google Scholar
  39. 39.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  40. 40.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR (2015)Google Scholar
  41. 41.
    Sandeep, R.N., Verma, Y., Jawahar, C.V.: Relative parts: distinctive parts for learning relative attributes. In: CVPR (2014)Google Scholar
  42. 42.
    Dieleman, S., Schlter, J., Raffel, C., Olson, E., Snderby, S.K., Nouri, D., Maturana, D., Thoma, M., Battenberg, E., Kelly, J., Fauw, J.D., Heilman, M., diogo149, McFee, B., Weideman, H., takacsg84, peterderivaz, Jon, instagibbs, Rasul, D.K., CongLiu, Britefury, Degrave, J.: Lasagne: first release (2015)Google Scholar
  43. 43.
    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115, 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  44. 44.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: AISTATS, pp. 249–256 (2010)Google Scholar
  45. 45.
    Tieleman, T., Hinton, G.: Lecture 6.5–RmsProp: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw. Mach. Learn. (2012)Google Scholar
  46. 46.
    Verma, Y., Jawahar, C.V.: Exploring locally rigid discriminative patches for learning relative attributes. In: BMVC (2015)Google Scholar
  47. 47.
    Xiao, F., Jae Lee, Y.: Discovering the spatial extent of relative attributes. In: CVPR (2015)Google Scholar
  48. 48.
    Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. JMLR 9, 85 (2008)zbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.SobheTehranIran
  2. 2.Sharif University of TechnologyTehranIran
  3. 3.University of North Carolina at Chapel HillChapel HillUSA

Personalised recommendations