Skip to main content

A Scalable Bootstrapping Framework for Auto-Annotation of Large Image Collections

  • Chapter
Intelligent Multimedia Processing with Soft Computing

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 168))

  • 690 Accesses

Abstract

Image annotation aims to assign semantic concepts to images based on their visual contents. It has received much attention recently as huge dynamic collections of images/videos become available on the Web. Most recent approaches employ supervised learning techniques, which have the limitation that a large set of labeled training samples is required for effective learning. This is both tedious and time consuming to obtain. This chapter explores the use of a bootstrapping framework to tackle this problem by employing three complementary strategies. First, we train two “view independent” classifiers based on probabilistic SVM using two orthogonal sets of content features and incorporate the classifiers in the co-training framework to annotate regions. Second, at the image level, we employ two different segmentation methods to segment the image into different sets of possibly overlapping regions and devise a contextual model to disambiguate the concepts learned from different regions. Third, we incorporate active learning in order to ensure that the framework is scalable to large image collections. Our experiments on a mid-sized image collection demonstrate that our bootstrapping cum active learning framework is effective. As compared to the traditional supervised learning approach, it is able to improve the accuracy of annotation by over 4% in F1 measure without active learning, and by over 18% when active learning is incorporated. Most importantly, the bootstrapping framework has the added benefit that it requires only a small set of training samples to kick start the learning process, making it suitable to practical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abney, S. (2002) Bootstrapping, Association for Computational Linguistics (ACL’02).

    Google Scholar 

  2. Barnard, K., Forsyth, D. A. (2001) Learning the semantics of words and pictures, IEEE International Conference on Computer Vision II, 408–415

    Google Scholar 

  3. Barnard, K., Duygulu, P., Forsyth, D. (2001) Clustering Art, IEEE Computer Vision and Pattern Recognition, 434–441

    Google Scholar 

  4. Blum, A., Mitchell, T. (1998) Combined labeled data and unlabelled data with co-training, Proceeding of the 11th Annual Conference on Computational Learning Theory.

    Google Scholar 

  5. Cao, Y., Li, H., Lian, L. (2003) Uncertainty reduction in collaborative bootstrapping: measure and algorithm, Association for computational Linguistics (ACL’03).

    Google Scholar 

  6. Carson, C, Thomas, M, Hellerstein, J. M., Malik, J. (1999) BlobWorld: A system for region-based image indexing and retrieval, International Conf Visual Info Sys.

    Google Scholar 

  7. Chang, E., Goh, K., Sychay, G., Wu, G. (2003) CBSA: content-based soft annotation for multimodal image retrieval using Bayes Point Machines, IEEE Transactions on Circuits and Systems for Video Technology, Special Issue on Conceptual and Dynamical Aspects of Multimedia Content Description 13, 26–38

    Google Scholar 

  8. Collins, M., Singer, Y. (1999) Unsupervised models for name entity classification, Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural language Processing and Very Large Corpora.

    Google Scholar 

  9. Deng, Y., Manjunath, B. S. (2001) Unsupervised segmentation of color-texture regions in images and video, IEEE Trans on Pattern Analysis and Machine Intelligence, 23, 800–810

    Article  Google Scholar 

  10. Feng, H., Chua, T.-S., (2003) A bootstrapping approach to annotating large image collection, Workshop on Multimedia Information Retrieval, organized in part of ACM Multimedia 2003, 55–62

    Google Scholar 

  11. Jeon, J., Lavrenko, V., Manmatha, R. (2003) Automatic image annotation and retrieval using cross-media relevance models, ACM AIGIR, 119–126

    Google Scholar 

  12. Lewis, D. D., Gale, W. A. (1994) A sequential algorithm for training text classifiers, in proceeding of ACM SIGIR, 3–12

    Google Scholar 

  13. Mori, Y., Takahashi, H., Oka, R. (1999) Image-to-word transformation based on dividing and vector quantizing images with words, First International Workshop on multimedia Intelligent Storage and Retrieval Management.

    Google Scholar 

  14. Muslea, I., Minton, S., Knoblock, C. A. (2000) Selective sampling with co-testing, CRM Workshop on Combining and Selecting Multiple Models with Machine Learning.

    Google Scholar 

  15. Nigam, K., Ghani, R. (2000) Analyzing the effectiveness and applicability of co-training, Proceedings of the 9th International Conference on Information and Knowledge management.

    Google Scholar 

  16. Pierce, D., Cardie, C. (2001) Limitations of co-training for natural language learning from large datasets, Proceeding of the Conference on Empirical Methods in Natural Language Processing.

    Google Scholar 

  17. Platt, J. C. (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods, in ‘Advances in Large Margin Classifiers’, Smola, A. J., Bartlett, P., Scholkopf, B., Schuurmans, D. (Eds). MIT Press.

    Google Scholar 

  18. Salton, G., McGill, M. J. (1983) Introduction to modern information retrieval, McGraw Hill.

    Google Scholar 

  19. Smith, J. R., Chang, S.-F. (1996) VisualSeek: A fully automated content-based query system, ACM Multimedia, 87–92

    Google Scholar 

  20. Smith, J. R., Naphade, M., Natsev, A. (2003) Multimedia semantic indexing using model vectors. ICME’ 03.

    Google Scholar 

  21. Shi, R., Feng, H., Chua, T.-S., Lee, C.-H. (2004) An adaptive image content representation and segmentation approach to automatic image annotation, Conference on Image and Video Retrieval (CIVR’04).

    Google Scholar 

  22. Vapnik, Vladimir. (1995) The nature of statistical learning theory, Springer, New York.

    MATH  Google Scholar 

  23. Wang, J. Z., Li, J. (2002) Learning-based linguistic indexing of pictures with 2-D MHHMs, ACM Multimedia’ 2002, 436–445

    Google Scholar 

  24. Zhang C, Chen, T. (2002) An active learning framework for content-based information retrieval, IEEE transactions on multimedia, 4, 260–268

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Chua, TS., Feng, H. (2005). A Scalable Bootstrapping Framework for Auto-Annotation of Large Image Collections. In: Tan, YP., Yap, K.H., Wang, L. (eds) Intelligent Multimedia Processing with Soft Computing. Studies in Fuzziness and Soft Computing, vol 168. Springer, Berlin, Heidelberg . https://doi.org/10.1007/3-540-32367-8_4

Download citation

  • DOI: https://doi.org/10.1007/3-540-32367-8_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23053-3

  • Online ISBN: 978-3-540-32367-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics