Abstract
We harvest training images for visual object recognition by casting it as an IR task. In contrast to previous work, we concentrate on fine-grained object categories, such as the large number of particular animal subspecies, for which manual annotation is expensive. We use ‘visual descriptions’ from nature guides as a novel augmentation to the well-known use of category names. We use these descriptions in both the query process to find potential category images as well as in image reranking where an image is more highly ranked if web page text surrounding it is similar to the visual descriptions. We show the potential of this method when harvesting images for 10 butterfly categories: when compared to a method that relies on the category name only, using visual descriptions improves precision for many categories.
M. Everingham—who died in 2012—is included as a posthumous author of this paper for his intellectual contributions during the course of this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
References
Ba, J.L., Swersky, K., Fidler, S., Salakhutdinov, R.: Predicting deep zero-shot convolutional neural networks using textual descriptions. In: Proceedings of the IEEE International Conference on Computer Vision (2015)
Berg, T.L., Forsyth, D.A.: Animals on the web. In: Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition, vol. 2, pp. 1463–1470 (2006)
Collins, B., Deng, J., Li, K., Fei-Fei, L.: Towards scalable dataset construction: an active learning approach. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 86–98. Springer, Heidelberg (2008)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition, pp. 248–255 (2009)
Elhoseiny, M., Saleh, B., Elgammal, A.: Write a classifier: Zero-shot learning using purely textual descriptions. In: Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition (2013)
Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from Google’s image search. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2, pp. 1816–1823 (2005)
George, M., Ghanem, N., Ismail, M.A.: Learning-based incremental creation of web image databases. In: Proceedings of the 12th IEEE International Conference on Machine Learning and Applications (ICMLA 2013), pp. 424–429 (2013)
Krapac, J., Allan, M., Verbeek, J., Jurie, F.: Improving web-image search results using query-relative classifiers. In: Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition, pp. 1094–1101 (2010)
Li, L.J., Wang, G., Fei-Fei, L.: OPTIMOL: Automatic Object Picture collecTion via Incremental MOdel Learning. In: Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition, pp. 1–8 (2007)
Nilsback, M.E., Zisserman, A.: Automatedower classification over a large numberof classes. In: Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing, pp. 722–729 (2008)
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Schroff, F., Criminisi, A., Zisserman, A.: Harvesting image databases from the Web. IEEE Trans. Pattern Anal. Mach. Intell. 33(4), 754–766 (2011)
Singhal, A., Salton, G., Buckley, C.: Length normalization in degraded text collections. In: Proceedings of Fifth Annual Symposium on Document Analysis and Information Retrieval, pp. 149–162 (1996)
Vijayanarasimhan, S., Grauman, K.: Keywords to visual categories: Multiple-instance learning for weakly supervised object categorization. In: Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition (2008)
Wang, J., Markert, K., Everingham, M.: Learning models for object recognition from natural language descriptions. In: Proceedings of the British Machine Vision Conference, pp. 2.1-2.11. BMVA Press (2009)
Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001. California Institute of Technology (2010)
Zhou, N., Fan, J.: Automatic image-text alignment for large-scale web image indexing and retrieval. Pattern Recogn. 48(1), 205–219 (2015)
Acknowledgements
The authors thank Paul Clough and the anonymous reviewers for their feedback on an earlier draft of this paper. This work was supported by the EU CHIST-ERA D2K 2011 Visual Sense project (EPSRC grant EP/K019082/1) and the Overseas Research Students Awards Scheme (ORSAS) for Josiah Wang.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Wang, J., Markert, K., Everingham, M. (2016). Harvesting Training Images for Fine-Grained Object Categories Using Visual Descriptions. In: Ferro, N., et al. Advances in Information Retrieval. ECIR 2016. Lecture Notes in Computer Science(), vol 9626. Springer, Cham. https://doi.org/10.1007/978-3-319-30671-1_40
Download citation
DOI: https://doi.org/10.1007/978-3-319-30671-1_40
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30670-4
Online ISBN: 978-3-319-30671-1
eBook Packages: Computer ScienceComputer Science (R0)