Semi-supervised Learning for Image Annotation Based on Conditional Random Fields

  • Wei Li
  • Maosong Sun
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4071)


Automatic image annotation (AIA) has been proved to be an effective and promising solution to automatically deduce the high-level semantics from low-level visual features. Due to the inherent ambiguity of image-label mapping and the scarcity of training examples, it has become a challenge to systematically develop robust annotation models with better performance. In this paper, we try to attack the problem based on 2D CRFs (Conditional Random Fields) and semi-supervised learning which are seamlessly integrated into a unified framework. 2D CRFs can effectively capture the spatial dependency between the neighboring labels, while the semi-supervised learning techniques can exploit the unlabeled data to improve the joint classification performance. We conducted experiments on a medium-sized image collection including about 500 images from Corel Stock Photo CDs. The experimental results demonstrated that the annotation performance of this method outperforms standard CRFs, showing the effectiveness of the proposed unified framework and the feasibility of unlabeled data to help the classification accuracy.


Image Patch Unlabeled Data Latent Semantic Analysis Conditional Random Field Image Annotation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Barnard, K., Dyugulu, P., de Freitas, N., Forsyth, D., Blei, D., Jordan, M.I.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)zbMATHCrossRefGoogle Scholar
  2. 2.
    Barnard, K., Forsyth, D.A.: Learning the Semantics of Words and Pictures. In: Proceedings of International Conference on Computer Vision, pp. 408–415 (2001)Google Scholar
  3. 3.
    Duygulu, P., Barnard, K., de Freitas, N., Forsyth, D.: Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  4. 4.
    Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of the 26th intl. SIGIR Conf., pp. 119–126 (2003)Google Scholar
  5. 5.
    Mori, Y., Takahashi, H., Oka, R.: Image-to-word transformation based on dividing and vector quantizing images with words. In: First International Workshop on Multimedia Intelligent Storage and Retrieval Management (1999)Google Scholar
  6. 6.
    Chang, E., Goh, K., Sychay, G., Wu, G.: CBSA: Content-based soft annotation for multimodal image retrieval using bayes point machines. IEEE Transactions on Circuts and Systems for Video Technology Special Issue on Conceptual and Dynamical Aspects of Multimedia Content Descriptions 13(1), 26–38 (2003)Google Scholar
  7. 7.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions On Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)CrossRefGoogle Scholar
  8. 8.
    Li, J., Wang, J.A.: Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Transactions on PAMI 25(10), 175–1088 (2003)zbMATHGoogle Scholar
  9. 9.
    Lavrenko, V., Manmatha, R., Jeon, J.: A model for learning the semantics of pictures. In: Proc of the 16th Annual Conference on Neural Information Processing Systems (2004)Google Scholar
  10. 10.
    Cusano, C., Ciocca, G., Schettini, R.: Image Annotation using SVM. In: Proceedings of SPIE-IS&T Electronic Imaging, SPIE, vol. 5304, pp. 330–338. (2004)Google Scholar
  11. 11.
    Blei, D., Jordan, M.I.: Modeling annotated data. In: Proceedings of the 26th intl. SIGIR Conf., pp. 127–134 (2003)Google Scholar
  12. 12.
    Goh, K.-S., Chang, E., Cheng, K.-T.: SVM binary classifier ensembles for image classification. In: Proceedings of the tenth international conference on Information and knowledge management, pp. 395–402. ACM Press, New York (2001)CrossRefGoogle Scholar
  13. 13.
    Li, B., Goh, K.: Confidence-based dynamic ensemble for image annotation and semantics discovery. In: Proceedings of the eleventh ACM international conference on Multimedia, pp. 195–206. ACM Press, New York (2003)CrossRefGoogle Scholar
  14. 14.
    Goh, K., Li, B., Chang, E.: Semantics and feature discovery via confidence-based ensemble. ACM Transactions on Multimedia Computing, Communications, and Applications 1(2), 168–189 (2005)CrossRefGoogle Scholar
  15. 15.
    Goh, K., Chang, E., Li, B.: Using on-class and two-class SVMs for multiclass image annotation. IEEE Trans. on Knowledge and Data Engineering 17(10), 1333–1346 (2005)CrossRefGoogle Scholar
  16. 16.
    Fan, J., Gao, Y., Luo, H.: Multi-level annotation of natural scenes using dominant image components and semantic concepts. In: Proc. of ACM MM, pp. 540–547 (2004)Google Scholar
  17. 17.
    Feng, S.L., Lavrenko, V., Manmatha, R.: Multiple Bernoulli Relevance Models for Image and Video Annotation. In: Proceedings of CVPR 2004 (2004)Google Scholar
  18. 18.
    Jin, R., Chai, J.Y., Si, L.: Effective Automatic image annotation via a coherent language model and active learning. In: Proceedings of ACM MM 2004 (2004)Google Scholar
  19. 19.
    Kang, F., Jin, R., Chai, J.Y.: Regularizing Translation Models for Better Automatic Image Annotation. In: Proceedings of CIKM 2004 (2004)Google Scholar
  20. 20.
    Monay, F., Gatica-Perez, D.: On image auto-annotation with latent space models. In: Proc. of ACM Int. Conf. on Multimedia, Berkeley (November 2003)Google Scholar
  21. 21.
    Monay, F., Gatica-Perez, D.: PLSA-based image auto-annotation: Constraining the latent space. In: Proc. ACM Int. Conf. on Multimedia, New York (October 2004)Google Scholar
  22. 22.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proc. ICML 2001, pp. 282–289 (2001)Google Scholar
  23. 23.
    He, X., Zemel, R., Carreira-Perpinan, M.: Multiscale conditional random fields for image labeling. In: IEEE Conf. CVPR 2004, pp. 695–702 (2004)Google Scholar
  24. 24.
    Kumar, S., Hebert, M.: Discriminative fields for modeling spatial dependencies in natural images. In: NIPS 2003 (2003)Google Scholar
  25. 25.
    Szummer, M., Qi, Y.: Contextual recognition of hand-drawn diagrams with conditional random fields. In: Workshop on Frontiers in Handwriting Recognition (2004)Google Scholar
  26. 26.
    Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using Gaussian fields and harmonic functions. In: ICML (2003)Google Scholar
  27. 27.
    Seeger, M.: Learning with labeled and unlabeled data. Technical report, The University of Edinburgh (2000)Google Scholar
  28. 28.
    Joachims, T.: Transductive learning via spectral graph partitioning. In: ICML (2003)Google Scholar
  29. 29.
    Wang, L., Chan, K.L., Zhang, Z.: Bootstrapping SVM Active Learning by Incorporating Unlabelled Images for Image Retrieval. In: IEEE Int’l CVPR 2003 (2003)Google Scholar
  30. 30.
    Tian, Q., Yu, J., Xue, Q., Sebe, N.: A New Analysis of the Value of Unlabeled Data in Semi-Supervised Learning for Image Retrieval. In: IEEE Conf. ICME 2004 (2004)Google Scholar
  31. 31.
    Zhou, Z.-H., Chen, K.-J., Jiang, Y.: Exploiting Unlabeled Data in Content-Based Image Retrieval. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS, vol. 3201, pp. 525–536. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  32. 32.
    Fan, J., Luo, H., Gao, Y.: Learning the semantics of images by uing unlabeled samples. In: IEEE Int’l CVPR 2005 (2005)Google Scholar
  33. 33.
    Carbonetto, P., de Freitas, N., Barnard, K.: A Statistical Model for General Contextual Object Recognition. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 350–362. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  34. 34.
    Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comp. 14, 1771–1800 (2002)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Wei Li
    • 1
  • Maosong Sun
    • 1
  1. 1.State Key Lab of Intelligent Technology and Systems, Department of Computer Science and TechnologyTsinghua UniversityBeijingChina

Personalised recommendations