Learning Visual Compound Models from Parallel Image-Text Datasets

Moringen, Jan; Wachsmuth, Sven; Dickinson, Sven; Stevenson, Suzanne

doi:10.1007/978-3-540-69321-5_49

Jan Moringen¹,
Sven Wachsmuth¹,
Sven Dickinson² &
…
Suzanne Stevenson²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5096))

Included in the following conference series:

Joint Pattern Recognition Symposium

2305 Accesses
1 Citations

Abstract

In this paper, we propose a new approach to learn structured visual compound models from shape-based feature descriptions. We use captioned text in order to drive the process of grouping boundary fragments detected in an image. In the learning framework, we transfer several techniques from computational linguistics to the visual domain and build on previous work in image annotation. A statistical translation model is used in order to establish links between caption words and image elements. Then, compounds are iteratively built up by using a mutual information measure. Relations between compound elements are automatically extracted and increase the discriminability of the visual models. We show results on different synthetic and realistic datasets in order to validate our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D., Jordan, M.: Matching Words and Pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)
Article MATH Google Scholar
Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.A.: Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary. In: Proc. of 7th Europ. Conf. on Computer Vision, Copenhagen, vol. 4, pp. 97–112 (2002)
Google Scholar
Hofmann, T.: Learning and representing topic. A hierarchical mixture model for word occurrence in document databases. In: Proc. Workshop on learning from text and the web, CMU (1998)
Google Scholar
Brown, P.F., Pietra, S.A.D., Mercer, R.L., Pietra, V.J.D.: The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics 19(2), 263–311 (1993)
Google Scholar
Berg, T., Berg, A., Edwards, J., Maire, M., White, R., Teh, Y.W., Learned-Miller, E., Forsyth, D.: Names and faces in the news. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (2004)
Google Scholar
Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised Learning of Semantic Classes for Image Annotation and Retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(3), 394–410 (2007)
Article Google Scholar
Crandall, D., Huttenlocher, D.: Weakly supervised learning of part-based spatial models for visual object recognition. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954. Springer, Heidelberg (2006)
Google Scholar
Opelt, A., Pinz, A., Zisserman, A.: A Boundary-Fragment-Model for Object Detection. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954. Springer, Heidelberg (2006)
Google Scholar
Jamieson, M., Dickinson, S., Stevenson, S., Wachsmuth, S.: Using Language to Drive the Perceptual Grouping of Local Image Features. In: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), New York, vol. 2, pp. 2102–2109 (2006)
Google Scholar
Jamieson, M., Fazly, A., Dickinson, S., Stevenson, S., Wachsmuth, S.: Learning Structured Appearance Models from Captioned Images of Cluttered Scenes. In: Proc. of the Int. Conf. on Computer Vision (ICCV), Rio de Janeiro (October 2007)
Google Scholar
Wachsmuth, S., Stevenson, S., Dickinson, S.: Towards a Framework for Learning Structured Shape Models from Text-Annotated Images. In: Proceedings of the HLT-NAACL 2003 workshop on Learning word meaning from non-linguistic data, Edmonton, vol. 6, pp. 22–29 (2003)
Google Scholar
Melamed, I.D.: Automatic Discovery of Non-Compositional Compounds in Parallel Data. In: Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing, Providence, pp. 97–108 (1997)
Google Scholar
Christoudias, C.M., Georgescu, B., Meer, P.: Synergism in Low Level Vision. In: 16th Int. Conf. on Pattern Recognition, Quebec City, vol. 4, pp. 150–155 (2002)
Google Scholar
Borgefors, G.: Hierarchical chamfer matching: A parametric edge matching algorithm. Trans. on Pattern Analysis and Machine Intelligence 10, 849–865 (1988)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Bielefeld University,
Jan Moringen & Sven Wachsmuth
University of Toronto,
Sven Dickinson & Suzanne Stevenson

Authors

Jan Moringen
View author publications
You can also search for this author in PubMed Google Scholar
Sven Wachsmuth
View author publications
You can also search for this author in PubMed Google Scholar
Sven Dickinson
View author publications
You can also search for this author in PubMed Google Scholar
Suzanne Stevenson
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Gerhard Rigoll

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Moringen, J., Wachsmuth, S., Dickinson, S., Stevenson, S. (2008). Learning Visual Compound Models from Parallel Image-Text Datasets. In: Rigoll, G. (eds) Pattern Recognition. DAGM 2008. Lecture Notes in Computer Science, vol 5096. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69321-5_49

Download citation

DOI: https://doi.org/10.1007/978-3-540-69321-5_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69320-8
Online ISBN: 978-3-540-69321-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics