Large scale image annotation: learning to rank with joint word-image embeddings

Weston, Jason; Bengio, Samy; Usunier, Nicolas

doi:10.1007/s10994-010-5198-3

Large scale image annotation: learning to rank with joint word-image embeddings

Published: 27 July 2010

Volume 81, pages 21–35, (2010)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Large scale image annotation: learning to rank with joint word-image embeddings

Download PDF

Jason Weston¹,
Samy Bengio² &
Nicolas Usunier³

4101 Accesses
226 Citations
13 Altmetric
Explore all metrics

Abstract

Image annotation datasets are becoming larger and larger, with tens of millions of images and tens of thousands of possible annotations. We propose a strongly performing method that scales to such datasets by simultaneously learning to optimize precision at k of the ranked list of annotations for a given image and learning a low-dimensional joint embedding space for both images and annotations. Our method both outperforms several baseline methods and, in comparison to them, is faster and consumes less memory. We also demonstrate how our method learns an interpretable model, where annotations with alternate spellings or even languages are close in the embedding space. Hence, even when our model does not predict the exact annotation given by a human labeler, it often predicts similar annotations, a fact that we try to quantify by measuring the newly introduced “sibling” precision metric, where our method also obtains excellent results.

References

Ando, R. K., & Zhang, T. (2005). A framework for learning predictive structures from multiple tasks and unlabeled data. JMLR, 6, 1817–1953.
MathSciNet Google Scholar
Bai, B., Weston, J., Grangier, D., Collobert, R., Sadamasa, K., Qi, Y., Cortes, C., & Mohri, M. (2009). Polynomial semantic indexing. In Advances in neural information processing systems (NIPS 2009).
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., & Singer, Y. (2006). Online passive-aggressive algorithms. Journal of Machine Learning Research, 7, 551–585.
MathSciNet Google Scholar
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). ImageNet: a large-scale hierarchical image database. In IEEE conference on computer vision and pattern recognition.
Everingham, M., Gool, L. V., Williams, C. K. I., Winn, J., & Zisserman, A. (2007). The PASCAL Visual Object Classes Challenge 2007 (VOC2007).
Fellbaum, C. (Ed.) (1998). WordNet: An electronic lexical database. Cambridge: MIT Press.
MATH Google Scholar
Fergus, R., Weiss, Y., & Torralba, A. (2009). Semi-supervised learning in gigantic image collections. In Advances in neural information processing systems, 2009.
Grangier, D., & Bengio, S. (2008). A discriminative kernel-based model to rank images from text queries. Transactions on Pattern Analysis and Machine Intelligence, 30(8), 1371–1384.
Article Google Scholar
Grauman, K., & Darrell, T. (2007). The pyramid match kernel: Efficient learning with sets of features. Journal of Machine Learning Research, 8(725–760), 7–8.
Google Scholar
Griffin, G., Holub, A., & Perona, P. (2007). Caltech-256 object category dataset (Technical Report 7694). California Institute of Technology.
Guillaumin, M., Mensink, T., Verbeek, J., Schmid, C., Lear, I., & Kuntzmann, L. (2009). Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In ICCV.
Loeff, N., Farhadi, A., Endres, I., & Forsyth, D. (2009). Unlabeled data improves word prediction. In ICCV’09.
Makadia, A., Pavlovic, V., & Kumar, S. (2008). A new baseline for image annotation. In European conference on computer vision (ECCV).
Monay, F., & Gatica-Perez, D. (2004). PLSA-based image auto-annotation: constraining the latent space. In Proceedings of the 12th annual ACM international conference on multimedia (pp. 348–351). New York: ACM.
Chapter Google Scholar
Robbins, H., & Monro, S. (1951). A stochastic approximation method. Annals of Mathematical Statistics, 22, 400–407.
Article MATH MathSciNet Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B (Methodological), 58(1), 267–288.
MATH MathSciNet Google Scholar
Torralba, A., Fergus, R., & Freeman, W. T. (2008a). 80 million tiny images: a large dataset for non-parametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 1958–1970.
Article Google Scholar
Torralba, A. B., Fergus, R., & Weiss, Y. (2008b). Small codes and large image databases for recognition. In CVPR. Los Alamitos: IEEE Comput. Soc.
Google Scholar
Usunier, N., Buffoni, D., & Gallinari, P. (2009). Ranking with ordered weighted pairwise classification. In L. Bottou, & M. Littman (Eds.), Proceedings of the 26th international conference on machine learning, Montreal, Omnipress, June 2009 (pp. 1057–1064).
Wang, J., Li, J., & Wiederholdy, G. (2000). SIMPLIcity: Semantics-sensitive integrated matching for picture libraries. Advances in Visual Information Systems (pp. 171–193).
Wolpert, D. (1992). Stacked generalization. Neural Networks, 5, 241–259.
Article Google Scholar
Xia, F., Liu, T. Y., Wang, J., Zhang, W., & Li, H. (2008). Listwise approach to learning to rank: theory and algorithm. In Proceedings of the 25th international conference on machine learning.
Yue, Y., Finley, T., Radlinski, F., & Joachims, T. (2007). A support vector method for optimizing average precision. In Proceedings of the 30th international ACM SIGIR conference on research and development in information retrieval (pp. 271–278).
Zhou, Z., Zhan, D., & Yang, Q. (1999/2007). Semi-supervised learning with very few labeled training examples. In Proceedings of the national conference on artificial intelligence (Vol. 22, p. 675). Menlo Park/Cambridge: AAAI Press/MIT Press.
Google Scholar

Download references

Author information

Authors and Affiliations

Google, New York, USA
Jason Weston
Google, Mountain View, USA
Samy Bengio
Université Paris 6, LIP6, Paris, France
Nicolas Usunier

Authors

Jason Weston
View author publications
You can also search for this author in PubMed Google Scholar
Samy Bengio
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Usunier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jason Weston.

Additional information

Editors: José L. Balcázar, Francesco Bonchi, Aristides Gionis, and Michèle Sebag.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Weston, J., Bengio, S. & Usunier, N. Large scale image annotation: learning to rank with joint word-image embeddings. Mach Learn 81, 21–35 (2010). https://doi.org/10.1007/s10994-010-5198-3

Download citation

Received: 30 April 2010
Accepted: 20 June 2010
Published: 27 July 2010
Issue Date: October 2010
DOI: https://doi.org/10.1007/s10994-010-5198-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Large scale image annotation: learning to rank with joint word-image embeddings

Abstract

Article PDF

Similar content being viewed by others

Automatic image annotation: the quirks and what works

A Structured Listwise Approach to Learning to Rank for Image Tagging

Neural ranking for automatic image annotation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Large scale image annotation: learning to rank with joint word-image embeddings

Abstract

Article PDF

Similar content being viewed by others

Automatic image annotation: the quirks and what works

A Structured Listwise Approach to Learning to Rank for Image Tagging

Neural ranking for automatic image annotation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation