Improving Automatic Image Annotation Based on Word Co-occurrence

  • H. Jair Escalante
  • Manuel Montes
  • L. Enrique Sucar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4918)


Accuracy of current automatic image labeling methods is under the requirements of annotation-based image retrieval systems. The performance of most of these labeling methods is poor if we just consider the most relevant label for a given region. However, if we look within the set of the top− k candidate labels for a given region, accuracy of most of these systems is improved. In this paper we take advantage of this fact and propose a method (NBI) based on word co-occurrences that uses the naïve Bayes formulation for improving automatic image annotation methods. Our approach utilizes co-occurrence information of the candidate labels for a region with those candidate labels for the other surrounding regions, within the same image, for selecting the correct label. Co-occurrence information is obtained from an external collection of manually annotated images: the IAPR-TC12 benchmark. Experimental results using a k −nearest neighbors method as our annotation system, give evidence of significant improvements after applying the NBI method. NBI is efficient since the co-occurrence information was obtained off-line. Furthermore, our method can be applied to any other annotation system that ranks labels by their relevance.


Image Retrieval Content Base Image Retrieval Annotation System Image Annotation Annotation Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Barnard, K., Forsyth, D.: Learning the semantics of words and pictures. In: Proc. ICCV, vol. 2, pp. 408–415. IEEE, Los Alamitos (2001)Google Scholar
  2. 2.
    Blei, D.M., Jordan, M.I.: Modeling annotated data. In: Proc. of the 26th international ACM-SIGIR conf. on Research and development in informaion retrieval, pp. 127–134. ACM Press, New York, NY, USA (2003)Google Scholar
  3. 3.
    Carbonetto, P.: Unsupervised statistical models for general object recognition. Master’s thesis, C.S. Department, University of British Columbia (August 2003)Google Scholar
  4. 4.
    Carbonetto, P., de Freitas, N., Barnard, K.: A statistical model for general context object recognition. In: Proc. of 8th ECCV, pp. 350–362 (2005)Google Scholar
  5. 5.
    Carbonetto, P., de Freitas, N., Gustafson, P., Thompson, N.: Bayesian feature eeighting for unsupervised learning. In: Proc. of the HLT-NAACL workshop on Learning word meaning from non-linguistic data, Morristown, NJ, USA, pp. 54–61 (2003)Google Scholar
  6. 6.
    Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans. on PAMI 29(3), 394–410 (2007)CrossRefGoogle Scholar
  7. 7.
    Carneiro, G., Vasconcelos, N.: Formulating semantic image annotation as a supervised learning problem. In: Proc. of CVPR, Washington, DC, USA, vol. 2, pp. 163–168. IEEE Computer Society, Los Alamitos (2005)Google Scholar
  8. 8.
    Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. In: Proc. of the 34th meeting on Association for Computational Linguistics, Morristown, NJ, USA, pp. 310–318 (1996)Google Scholar
  9. 9.
    Datta, R., Li, J., Wang, J.Z.: Content-based image retrieval - approaches and trends of the new age. In: Proceedings ACM International Workshop on Multimedia Information Retrieval, Singapore. ACM Multimedia, New York (2005)Google Scholar
  10. 10.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience, Chichester (2000)Google Scholar
  11. 11.
    Duygulu, P., Barnard, K., de Freitas, N., Forsyth, D.: Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  12. 12.
    Iyengar, G., et al.: Joint visual-text modeling for automatic retrieval of multimedia documents. In: Proc. the 13th MULTIMEDIA, pp. 21–30. ACM Press, New York, NY, USA (2005)Google Scholar
  13. 13.
    Ghoshal, A., Ircing, P., Khudanpur, S.: Hmm’s for automatic annotation and content-based retrieval of images and video. In: Proc. of the 28th int. conf. on Research and development in information retrieval, New York, NY, USA, pp. 544–551 (2005)Google Scholar
  14. 14.
    Grubinger, M., Clough, P., Leung, C.: The iapr tc-12 benchmark -a new evaluation resource for visual information systems. In: Proc. of the International Workshop OntoImage 2006 Language Resources for CBIR (2006)Google Scholar
  15. 15.
    Hare, J.S., Lewis, P.H., Enser, P.G.B., Sandom, C.J.: Mind the Gap: Another look at the problem of the semantic gap in image retrieval. In: Hanjalic, A., Chang, E.Y., Sebe, N. (eds.) Proceedings of Multimedia Content Analysis, Management and Retrieval, San Jose, California, USA, SPIE, vol. 6073 (2006)Google Scholar
  16. 16.
    Lavrenko, V., Manmatha, R., Jeon, J.: A model for learning the semantics of pictures. In: NIPS, vol. 16. MIT Press, Cambridge, MA (2004)Google Scholar
  17. 17.
    Li, W., Sun, M.: Automatic image annotation based on wordnet and hierarchical ensembles. In: Gelbukh, A. (ed.) CICLING 2006. LNCS, vol. 3878, pp. 417–428. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  18. 18.
    Liu, Y., Zhang, D., Lu, G., Ma, W.Y.: A survey of content-based image retrieval with high-level semantics. Pattern Recogn. 40(1), 262–282 (2007)zbMATHCrossRefGoogle Scholar
  19. 19.
    Mitchell, T.: Machine Learning. McGraw-Hill Education, New York (1997)zbMATHGoogle Scholar
  20. 20.
    Mori, Y., Takahashi, H., Oka, R.: Image-to-word transformation based on dividing and vector quantizing images with words. In: 1st Int. Worksh. on Multimedia Intelligent Storage and Retrieval Management (1999)Google Scholar
  21. 21.
    Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using em. Machine Learning 39, 103–134 (2000)CrossRefGoogle Scholar
  22. 22.
    Pan, J., Yang, H., Duygulu, P., Faloutsos, C.: Automatic image captioning. In: Proc. of the ICME (2004)Google Scholar
  23. 23.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. PAMI-IEEE 22(8), 888–905 (2000)CrossRefGoogle Scholar
  24. 24.
    Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans. on PAMI 22(12), 1349–1380 (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • H. Jair Escalante
    • 1
  • Manuel Montes
    • 1
  • L. Enrique Sucar
    • 1
  1. 1.Computer Science DepartmentNational Institute of Astrophysics, Optics and ElectronicsPueblaMéxico

Personalised recommendations