Automatic Image Annotation for Description of Urban and Outdoor Scenes

  • Claudia Cruz-PerezEmail author
  • Oleg Starostenko
  • Vicente Alarcon-Aquino
  • Jorge Rodriguez-Asomoza
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 313)


In this paper we present a novel approach for automatic annotation of objects or regions in images based on their color and texture. According to the proposed generalized architecture for automatic generation of image content descriptions the detected regions are labeled by developed cascade SVM-based classifier mapping them to structure that reflects their hierarchical and spatial relation used by text generation engine. For testing the designed system for automatic image annotation around 2,000 images with outdoor-indoor scenes from standard IAPR-TC12 image dataset have been processed obtaining an average precision of classification about 75 % with 94 % of recall. The precision of classification based on color features has been improved up to 15 ± 5 % after extension of classifier with texture detector based on Gabor filter. The proposed approach has a good compromise between classification precision of regions in images and speed despite used considerable time processing taking up to 1 s per image. The approach may be used as a tool for efficient automatic image understanding and description.


Automatic image annotation Image processing Color and texture based image feature extraction 



This research is sponsored by European Grant #247083: Security, Services, Networking and Performance of Next Generation IP-based Multimedia Wireless Networks and by Mexican National Council of Science and Technology, CONACyT, project #154438.


  1. 1.
    D. Zhang, M. Islam, G. Lu, A review on automatic image annotation techniques, in J. Pattern Recognion, vol. 45, 1, 2012, pp. 346–362.Google Scholar
  2. 2.
    P. Arbelaez, M. Maire, C. Fowlkes y J. Malik, Contour Detection and Hierarchical Image Segmentation, in IEEE Trans. Pattern Anal. Mach. Intelligence, vol. 33, 5, 2011, pp. 898-916.Google Scholar
  3. 3.
    O. Starostenko, V. Alarcon-Aquino. Computational approaches to support image-based language learning within mobile environments, J. of Mobile Learning and Organisation, 2010, Vol. 4, #2, pp.150–171.Google Scholar
  4. 4.
    A. Gupta, et al., Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos, in Proc. of IEEE Conf. On Computer Vision and Pattern Recognition, 2009, pp. 2012–2019.Google Scholar
  5. 5.
    B. Yao, X. Yang, L. Lin,. I2t: Image parsing to text description, in IEEE Special issue on Internet Vision vol. 98 (8), 2010, pp.1485–1508.Google Scholar
  6. 6.
    Y. Feng, and M. Lapata, How many words are in picture: automatic caption generation for news images, in Proc. of 48th Meeting of Association for Comp. Linguistics, USA, 2010, pp. 1239–1249.Google Scholar
  7. 7.
    Ifeoma, and Z. Yingbo, DISCO: Describing Images Using Scene Contexts and Objects, in Proc. of 25th AAAI Conference on Artificial Intelligence, San Francisco, CA, August 2011, pp. 1487–1493.Google Scholar
  8. 8.
    G. Kulkarni, et al., Baby talk: Understanding and generating simple image descriptions, in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 1601–1608.Google Scholar
  9. 9.
    S. Li, G. Kulkarni, et al., Composing simple image descriptions using web-scale N-grams, in Proc. of 15th Conf. on Computational Natural Language Learning, Stroudsburg, USA, 2011, pp. 220–228.Google Scholar
  10. 10.
    A. Gupta, L. Davis, Beyond nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers, in Proc. of 10th European Conf. on Comp. Vision, Springer, 2008, pp. 16–29.Google Scholar
  11. 11.
    A. Farhadi, M. Hejrati, et al., Every picture tells a story: Generating sentences from images, in Lecture Notes in Computer Science, Computer Vision, Springer, vol. 6314 2010, pp. 15–29.Google Scholar
  12. 12.
    D. Hoiem, A. Efros, M. Hebert, Recovering surface layout from an image. Int. J. Computer Vision, vol. 75, 2007, pp. 151–172.CrossRefGoogle Scholar
  13. 13.
    J. Xiao, J. Hays, and A. Torralba, SUN database: Large-scale scene recognition from abbey to zoo, in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 3485–3492.Google Scholar
  14. 14.
    C. Schmidt, Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 2006, pp. 2169–2178.Google Scholar
  15. 15.
    A. Torralba, R. Fergus, 80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition, in J. IEEE Trans. Pattern Anal. Mach. Intelligence, 2008, vol. 30 #11, pp. 1958–1970.Google Scholar
  16. 16.
    P. Perona and J. Malik, Scale-space and edge detection using anisotropic diffusion, in J. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, 1990, pp. 629–639.CrossRefGoogle Scholar
  17. 17.
    H. Escalante, C. Hernández, et al., The segmented and annotated IAPR TC-12 benchmark, in J.Computer Vision and Image Understanding, vol. 114, 2010 pp. 419–428.CrossRefGoogle Scholar
  18. 18.
    J. Alfredo Sánchez, O. Starostenko, Organizing open archives via lightweight ontologies to facilitate the use of heterogeneous collections, J. ASLIB Proceedings, vol. 64, 1, 2012, pp.46–66.Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Claudia Cruz-Perez
    • 1
    Email author
  • Oleg Starostenko
    • 1
  • Vicente Alarcon-Aquino
    • 1
  • Jorge Rodriguez-Asomoza
    • 1
  1. 1.Department of Computing, Electronics and MechatronicsUniversity de las Americas-PueblaCholulaMexico

Personalised recommendations