Skip to main content

Learning Visual Compound Models from Parallel Image-Text Datasets

  • Conference paper
Pattern Recognition (DAGM 2008)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5096))

Included in the following conference series:

Abstract

In this paper, we propose a new approach to learn structured visual compound models from shape-based feature descriptions. We use captioned text in order to drive the process of grouping boundary fragments detected in an image. In the learning framework, we transfer several techniques from computational linguistics to the visual domain and build on previous work in image annotation. A statistical translation model is used in order to establish links between caption words and image elements. Then, compounds are iteratively built up by using a mutual information measure. Relations between compound elements are automatically extracted and increase the discriminability of the visual models. We show results on different synthetic and realistic datasets in order to validate our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D., Jordan, M.: Matching Words and Pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)

    Article  MATH  Google Scholar 

  2. Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.A.: Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary. In: Proc. of 7th Europ. Conf. on Computer Vision, Copenhagen, vol. 4, pp. 97–112 (2002)

    Google Scholar 

  3. Hofmann, T.: Learning and representing topic. A hierarchical mixture model for word occurrence in document databases. In: Proc. Workshop on learning from text and the web, CMU (1998)

    Google Scholar 

  4. Brown, P.F., Pietra, S.A.D., Mercer, R.L., Pietra, V.J.D.: The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics 19(2), 263–311 (1993)

    Google Scholar 

  5. Berg, T., Berg, A., Edwards, J., Maire, M., White, R., Teh, Y.W., Learned-Miller, E., Forsyth, D.: Names and faces in the news. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (2004)

    Google Scholar 

  6. Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised Learning of Semantic Classes for Image Annotation and Retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(3), 394–410 (2007)

    Article  Google Scholar 

  7. Crandall, D., Huttenlocher, D.: Weakly supervised learning of part-based spatial models for visual object recognition. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954. Springer, Heidelberg (2006)

    Google Scholar 

  8. Opelt, A., Pinz, A., Zisserman, A.: A Boundary-Fragment-Model for Object Detection. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954. Springer, Heidelberg (2006)

    Google Scholar 

  9. Jamieson, M., Dickinson, S., Stevenson, S., Wachsmuth, S.: Using Language to Drive the Perceptual Grouping of Local Image Features. In: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), New York, vol. 2, pp. 2102–2109 (2006)

    Google Scholar 

  10. Jamieson, M., Fazly, A., Dickinson, S., Stevenson, S., Wachsmuth, S.: Learning Structured Appearance Models from Captioned Images of Cluttered Scenes. In: Proc. of the Int. Conf. on Computer Vision (ICCV), Rio de Janeiro (October 2007)

    Google Scholar 

  11. Wachsmuth, S., Stevenson, S., Dickinson, S.: Towards a Framework for Learning Structured Shape Models from Text-Annotated Images. In: Proceedings of the HLT-NAACL 2003 workshop on Learning word meaning from non-linguistic data, Edmonton, vol. 6, pp. 22–29 (2003)

    Google Scholar 

  12. Melamed, I.D.: Automatic Discovery of Non-Compositional Compounds in Parallel Data. In: Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing, Providence, pp. 97–108 (1997)

    Google Scholar 

  13. Christoudias, C.M., Georgescu, B., Meer, P.: Synergism in Low Level Vision. In: 16th Int. Conf. on Pattern Recognition, Quebec City, vol. 4, pp. 150–155 (2002)

    Google Scholar 

  14. Borgefors, G.: Hierarchical chamfer matching: A parametric edge matching algorithm. Trans. on Pattern Analysis and Machine Intelligence 10, 849–865 (1988)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Gerhard Rigoll

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Moringen, J., Wachsmuth, S., Dickinson, S., Stevenson, S. (2008). Learning Visual Compound Models from Parallel Image-Text Datasets. In: Rigoll, G. (eds) Pattern Recognition. DAGM 2008. Lecture Notes in Computer Science, vol 5096. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69321-5_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69321-5_49

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69320-8

  • Online ISBN: 978-3-540-69321-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics