A Scalable Bootstrapping Framework for Auto-Annotation of Large Image Collections

Chua, Tat-Seng; Feng, Huamin

doi:10.1007/3-540-32367-8_4

Tat-Seng Chua³ &
Huamin Feng³

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 168))

690 Accesses

Abstract

Image annotation aims to assign semantic concepts to images based on their visual contents. It has received much attention recently as huge dynamic collections of images/videos become available on the Web. Most recent approaches employ supervised learning techniques, which have the limitation that a large set of labeled training samples is required for effective learning. This is both tedious and time consuming to obtain. This chapter explores the use of a bootstrapping framework to tackle this problem by employing three complementary strategies. First, we train two “view independent” classifiers based on probabilistic SVM using two orthogonal sets of content features and incorporate the classifiers in the co-training framework to annotate regions. Second, at the image level, we employ two different segmentation methods to segment the image into different sets of possibly overlapping regions and devise a contextual model to disambiguate the concepts learned from different regions. Third, we incorporate active learning in order to ensure that the framework is scalable to large image collections. Our experiments on a mid-sized image collection demonstrate that our bootstrapping cum active learning framework is effective. As compared to the traditional supervised learning approach, it is able to improve the accuracy of annotation by over 4% in F₁ measure without active learning, and by over 18% when active learning is incorporated. Most importantly, the bootstrapping framework has the added benefit that it requires only a small set of training samples to kick start the learning process, making it suitable to practical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abney, S. (2002) Bootstrapping, Association for Computational Linguistics (ACL’02).
Google Scholar
Barnard, K., Forsyth, D. A. (2001) Learning the semantics of words and pictures, IEEE International Conference on Computer Vision II, 408–415
Google Scholar
Barnard, K., Duygulu, P., Forsyth, D. (2001) Clustering Art, IEEE Computer Vision and Pattern Recognition, 434–441
Google Scholar
Blum, A., Mitchell, T. (1998) Combined labeled data and unlabelled data with co-training, Proceeding of the 11th Annual Conference on Computational Learning Theory.
Google Scholar
Cao, Y., Li, H., Lian, L. (2003) Uncertainty reduction in collaborative bootstrapping: measure and algorithm, Association for computational Linguistics (ACL’03).
Google Scholar
Carson, C, Thomas, M, Hellerstein, J. M., Malik, J. (1999) BlobWorld: A system for region-based image indexing and retrieval, International Conf Visual Info Sys.
Google Scholar
Chang, E., Goh, K., Sychay, G., Wu, G. (2003) CBSA: content-based soft annotation for multimodal image retrieval using Bayes Point Machines, IEEE Transactions on Circuits and Systems for Video Technology, Special Issue on Conceptual and Dynamical Aspects of Multimedia Content Description 13, 26–38
Google Scholar
Collins, M., Singer, Y. (1999) Unsupervised models for name entity classification, Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural language Processing and Very Large Corpora.
Google Scholar
Deng, Y., Manjunath, B. S. (2001) Unsupervised segmentation of color-texture regions in images and video, IEEE Trans on Pattern Analysis and Machine Intelligence, 23, 800–810
Article Google Scholar
Feng, H., Chua, T.-S., (2003) A bootstrapping approach to annotating large image collection, Workshop on Multimedia Information Retrieval, organized in part of ACM Multimedia 2003, 55–62
Google Scholar
Jeon, J., Lavrenko, V., Manmatha, R. (2003) Automatic image annotation and retrieval using cross-media relevance models, ACM AIGIR, 119–126
Google Scholar
Lewis, D. D., Gale, W. A. (1994) A sequential algorithm for training text classifiers, in proceeding of ACM SIGIR, 3–12
Google Scholar
Mori, Y., Takahashi, H., Oka, R. (1999) Image-to-word transformation based on dividing and vector quantizing images with words, First International Workshop on multimedia Intelligent Storage and Retrieval Management.
Google Scholar
Muslea, I., Minton, S., Knoblock, C. A. (2000) Selective sampling with co-testing, CRM Workshop on Combining and Selecting Multiple Models with Machine Learning.
Google Scholar
Nigam, K., Ghani, R. (2000) Analyzing the effectiveness and applicability of co-training, Proceedings of the 9th International Conference on Information and Knowledge management.
Google Scholar
Pierce, D., Cardie, C. (2001) Limitations of co-training for natural language learning from large datasets, Proceeding of the Conference on Empirical Methods in Natural Language Processing.
Google Scholar
Platt, J. C. (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods, in ‘Advances in Large Margin Classifiers’, Smola, A. J., Bartlett, P., Scholkopf, B., Schuurmans, D. (Eds). MIT Press.
Google Scholar
Salton, G., McGill, M. J. (1983) Introduction to modern information retrieval, McGraw Hill.
Google Scholar
Smith, J. R., Chang, S.-F. (1996) VisualSeek: A fully automated content-based query system, ACM Multimedia, 87–92
Google Scholar
Smith, J. R., Naphade, M., Natsev, A. (2003) Multimedia semantic indexing using model vectors. ICME’ 03.
Google Scholar
Shi, R., Feng, H., Chua, T.-S., Lee, C.-H. (2004) An adaptive image content representation and segmentation approach to automatic image annotation, Conference on Image and Video Retrieval (CIVR’04).
Google Scholar
Vapnik, Vladimir. (1995) The nature of statistical learning theory, Springer, New York.
MATH Google Scholar
Wang, J. Z., Li, J. (2002) Learning-based linguistic indexing of pictures with 2-D MHHMs, ACM Multimedia’ 2002, 436–445
Google Scholar
Zhang C, Chen, T. (2002) An active learning framework for content-based information retrieval, IEEE transactions on multimedia, 4, 260–268
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, National University of Singapore, Singapore, 117543
Tat-Seng Chua & Huamin Feng

Authors

Tat-Seng Chua
View author publications
You can also search for this author in PubMed Google Scholar
Huamin Feng
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electrical and Electronic Engineering, Nanyang Technological University, Nanyang Avenue, Singapore, 639798
Yap-Peng Tan , Kim Hui Yap & Lipo Wang , &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chua, TS., Feng, H. (2005). A Scalable Bootstrapping Framework for Auto-Annotation of Large Image Collections. In: Tan, YP., Yap, K.H., Wang, L. (eds) Intelligent Multimedia Processing with Soft Computing. Studies in Fuzziness and Soft Computing, vol 168. Springer, Berlin, Heidelberg . https://doi.org/10.1007/3-540-32367-8_4

Download citation

DOI: https://doi.org/10.1007/3-540-32367-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23053-3
Online ISBN: 978-3-540-32367-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics