Incorporating Prior Knowledge into Multi-label Boosting for Cross-Modal Image Annotation and Retrieval

  • Wei Li
  • Maosong Sun
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4182)


Automatic image annotation (AIA) has proved to be an effective and promising solution to automatically deduce the high-level semantics from low-level visual features. In this paper, we formulate the task of image annotation as a multi-label, multi class semantic image classification problem and propose a simple yet effective joint classification framework in which probabilistic multi-label boosting and contextual semantic constraints are integrated seamlessly. We conducted experiments on a medium-sized image collection including about 5000 images from Corel Stock Photo CDs. The experimental results demonstrated that the annotation performance of our proposed method is comparable to state-of-the-art approaches, showing the effectiveness and feasibility of the proposed unified framework.


Confidence Score Latent Semantic Analysis Image Annotation Annotation Model Semantic Label 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Barnard, K., Dyugulu, P., de Freitas, N., Forsyth, D., Blei, D., Jordan, M.I.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)zbMATHCrossRefGoogle Scholar
  2. 2.
    Barnard, K., Forsyth, D.A.: Learning the Semantics of Words and Pictures. In: Proceedings of International Conference on Computer Vision, pp. 408–415 (2001)Google Scholar
  3. 3.
    Duygulu, P., Barnard, K., de Freitas, N., Forsyth, D.: Ojbect recognition as machine translation: Learning a lexicon fro a fixed image vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  4. 4.
    Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of the 26th intl. SIGIR Conf., pp. 119–126 (2003)Google Scholar
  5. 5.
    Mori, Y., Takahashi, H., Oka, R.: Image-to-word transformation based on dividing and vector quantizing images with words. In: First International Workshop on Multimedia Intelligent Storage and Retrieval Management (1999)Google Scholar
  6. 6.
    Chang, E., Goh, K., Sychay, G., Wu, G.: CBSA: Content-based soft annotation for multimodal image retrieval using bayes point machines. IEEE Transactions on Circuts and Systems for Video Technology Special Issue on Conceptual and Dynamical Aspects of Multimedia Content Descriptions 13(1), 26–38 (2003)Google Scholar
  7. 7.
    shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)CrossRefGoogle Scholar
  8. 8.
    Li, J., Wang, J.A.: Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Transactions on PAMI 25(10), 1075–1088 (2003)Google Scholar
  9. 9.
    Lavrenko, V., Manmatha, R., Jeon, J.: A model for learning the semantics of pictures. In: Proc. of the 16th Annual Conference on Neural Information Processing Systems (2004)Google Scholar
  10. 10.
    Cusano, C., Ciocca, G., Schettini, R.: Image Annotation using SVM. In: Proceedings of SPIEIS& T Electronic Imaging, vol. 5304, pp. 330–338. SPIE, San Jose (2004)Google Scholar
  11. 11.
    Blei, D., Jordan, M.I.: Modeling annotated data. In: Proceedings of the 26th intl. SIGIR Conf., pp. 127–134 (2003)Google Scholar
  12. 12.
    Goh, K.-S., Chang, E., Cheng, K.-T.: SVM binary classifier ensembles for image classification. In: Proceedings of the tenth international conference on Information and knowledge management, pp. 395–402. ACM Press, New York (2001)CrossRefGoogle Scholar
  13. 13.
    Li, B., Goh, K.: Confidence-based dynamic ensemble for image annotation and semantics discovery. In: Proceedings of the eleventh ACM international conference on Multimedia, pp. 195–206. ACM Press, New York (2003)CrossRefGoogle Scholar
  14. 14.
    Goh, K., Li, B., Chang, E.: Semantics and feature discovery via confidence-based ensemble. ACM Transactions on Multimedia Computing, Communications, and Applications 1(2), 168-189 (2005)Google Scholar
  15. 15.
    Goh, K., Chang, E., Li, B.: Using on-class and two-class SVMs for multiclass image annotation. IEEE Trans. on Knowledge and Data Engineering 17(10), 1333–1346 (2005)CrossRefGoogle Scholar
  16. 16.
    Fan, J., Gao, Y., Luo, H.: Multi-level annotation of natural scenes using dominant image components and semantic concepts. In: Proc. of ACM MM, pp. 540–547 (2004)Google Scholar
  17. 17.
    Feng, S.L., Lavrenko, V., Manmatha, R.: Multiple Bernoulli Relevance Models for Image and Video Annotation. In: Proceedings of CVPR 2004 (2004)Google Scholar
  18. 18.
    Jin, R., Chai, J.Y., Si, L.: Effective Automatic image annotation via a coherent language model and active learning. In: Proceedings of ACM MM (2004)Google Scholar
  19. 19.
    Kang, F., Jin, R., Chai, J.Y.: Regularizing Translation Models for Better Automatic Image Annotation. In: Proceedings of ACM MM 2004 (2004)Google Scholar
  20. 20.
    Monay, F., Gatica-Perez, D.: On image auto-annotation with latent space models. In: Proc. of ACM Int. Conf. on Multimedia, Berkeley (November 2003)Google Scholar
  21. 21.
    Monay, F., Gatica-Perez, D.: PLSA-based image auto-annotation: Constraining the latent space. In: Proc. ACM Int. Conf. on Multimedia, New York (October 2004)Google Scholar
  22. 22.
    Zhang, R., Zhang, Z., Li, M., Ma, W.-Y., Zhang, H.-J.: A probabilistic semantic model for image annotation and multi-modal image retrieval. In: IEEE Int’l. ICCV 2005 (2005)Google Scholar
  23. 23.
    He, X., Zemel, R., Carreira-Perpinan, M.: Multiscale conditional random fields for image labeling. In: IEEE Conf. CVPR 2004, pp. 695–702 (2004)Google Scholar
  24. 24.
    Kumar, S., Hebert, M.: Discriminative fields for modeling spatial dependencies in natural images. In: NIPS 2003 (2003)Google Scholar
  25. 25.
    Schapire, R., Singer, Y.: Boostexter: A boosting-based system for text categorization. Machine Learning 39, 135–168 (2000)zbMATHCrossRefGoogle Scholar
  26. 26.
    Schapire, R.E.: The boosting approach to machine learning: An overview. In: Workshop on Nonlinear Estimation and Classification. MSRI (2002)Google Scholar
  27. 27.
    Fellbaum, C.: WordNet: An electronic lexical database. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  28. 28.
    Pedersen, T., Patwardhan, S., Michelizzi, J.: WpordNet:Similarity - measuring the relatedness of concepts. In: Proceedings of the Nineteenth National Conference on Artificial Intelligence, AAAI 2004 (2004)Google Scholar
  29. 29.
    Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), Software available at,
  30. 30.
    Boutell, M., Shen, X., Luo, J., Brown, C.: Multi-label semantic scene classification. Technical report, Dept. Comp. Sci. U. Rochester (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Wei Li
    • 1
  • Maosong Sun
    • 1
  1. 1.State Key Lab of Intelligent Technology Systems, Department of Computer Science and TechnologyTsinghua UniversityBeijingP.R. China

Personalised recommendations