Skip to main content
Log in

Multi-scale salient region and relevant visual keywords based model for automatic image annotation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Automatic image annotation is a vital and challenging problem in pattern recognition and image understanding areas. The existing models directly extract visual features from segmented image regions. Since segmented image regions may still have multi-objects, the extractive visual features may not effectively describe corresponding regions. In addition, existing models did not consider the visual representations of corresponding keywords, which would lead to appearing plenty of irrelevant annotations in final annotation results, and these annotations did not relate to any part of images considering visual contents. In order to overcome the above problems, an image annotation model based on multi-scale salient region and relevant visual keywords is proposed. In this model, each image is segmented by using multi-scale grid segmentation method and the global contrast based method is used to extract the saliency maps from each image region. Visual features are extracted from each salient region. In addition, each keyword is divided into two categories: abstract words or non-abstract words. Visual seeds of each non-abstract word are established, and then a new method is proposed to extract visual keyword collections by using corresponding seeds. According to the traits of abstract words, an algorithm based on subtraction regions is proposed to extract visual seeds and corresponding visual keyword collections of each abstract word. Adaptive parameter method and a fast solution algorithm are proposed to determine the similarity thresholds of each keyword. Finally, multi-scale visual features and the combinations of the above methods are used to improve the annotation performance. Our model can improve the object descriptions of images and image regions. Experimental results verify the effectiveness of the proposed model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://www.flickr.com

References

  1. Borji A, Sihite DN, Itti L (2012) Salient object detection: A benchmark. Proceedings of 12th European Conference on Computer Vision, pp 414–429

  2. Carneiro G, Chan AB, Moreno PJ et al (2007) Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans Pattern Anal Mach Intell 29(3):394–410

    Article  Google Scholar 

  3. Cheng MM, Mistra NJ, Huang X et al (2011) Global contrast based salient region detection. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 409–416

  4. Duygulu P, Barnard K, Freitas J, Forsyth D (2002) Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary. Proceedings of the 7th European Conference on Computer Vision, pp 97–112

  5. Feng S L, Manmatha R, Lavrenko V (2004) Multiple Bernoulli relevance models for image and video annotation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, pp 1002–1009

  6. Feng S, Xu D (2010) Transductive multi-instance multi-label learning algorithm with application to automatic image annotation. Expert Syst Appl 37:661–670

    Article  MathSciNet  Google Scholar 

  7. Grubinger M (2007) Analysis and Evaluation of Visual Information Systems Performance. PhD thesis, Victoria University, Melbourne

  8. Guan T, Fan Y, Duan L et al (2014) On-device mobile visual location recognition by using panoramic images and compressed sensing based visual descriptors. PLoS One 9(6):e98806

    Article  Google Scholar 

  9. Guan T, He Y, Duan L et al (2014) Efficient BOF generation and compression for on-device mobile visual location recognition. IEEE Multimed 21(2):32–41

    Article  Google Scholar 

  10. Guan T, He Y, Gao J, Yang J, Yu J (2013) On-device mobile visual location recognition by integrating vision and inertial sensors. IEEE Trans Multimed 15(7):1688–1699

    Article  Google Scholar 

  11. Han Y, Wu F, Tian Q et al (2012) Image annotation by input–output structural grouping sparsity. IEEE Trans Image Process 21(6):3066–3079

    Article  MathSciNet  Google Scholar 

  12. Hu J, Lam KM (2013) An efficient two-stage framework for image annotation. Pattern Recogn 46(3):936–947

    Article  Google Scholar 

  13. Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. Proceedings of the 26th Annual International ACM SIGIR, Toronto, pp 119–126

  14. Ji R, Duan LY, Chen J et al (2012) Location discriminative vocabulary coding for mobile landmark search. Int J Comput Vis 96(3):290–314

    Article  MATH  Google Scholar 

  15. Ji R, Duan LY, Chen J et al (2013) Learning to distribute vocabulary indexing for scalable visual search. IEEE Trans Multimed 15(1):153–166

    Article  Google Scholar 

  16. Ji R, Gao Y, Hong R et al (2014) Spectral-spatial constraint hyperspectral image classification. IEEE Trans Geosci Remote Sens 52(3):1811–1824

    Article  Google Scholar 

  17. Ji R, Yao H, Liu W et al (2012) Task dependent visual codebook compression. IEEE Trans Image Process 21(4):2282–2293

    Article  MathSciNet  Google Scholar 

  18. Kang F, Jin R, Sukthankar R (2006) Correlated Label Propagation with application to Multi-label Learning. Proc of the 2006 I.E. Computer Society Conf on Computer Vision and Pattern Recognition. IEEE, Piscataway, pp 1719–1726

    Google Scholar 

  19. Lavrenko V, Manmatha R, Jeon J (2003) A model for learning the semantics of pictures. Proceedings of Advance in Neutral Information Processing, Vancouver

  20. Li Z, Liu J, Xu C et al (2013) MLRank Multi-correlation Learning to Rank for image annotation. Pattern Recogn 46(10):2700–2710

    Article  MathSciNet  MATH  Google Scholar 

  21. Lindstaedt S, Morzinger R, Sorschag R et al (2009) Automatic image annotation using visual content and folksonomies. Multimed Tools Appl 42:97–113

    Article  Google Scholar 

  22. Liu J, Li MJ, Ma WY et al (2006) An adaptive graph model for automatic image annotation. Proceedings of the ACM SIGMM Workshop on Multimedia Information Retrieval. ACM, New York, pp 61–69

    Google Scholar 

  23. Luis VA, Laura D (2004) Labeling images with a computer game. Proceedings of the SIGCHI conference on Human factors in computing system, pp 319–326

  24. Makadia A, Pavlovic V, Kumar S (2008) A New Baseline for Image Annotation. 10th European Conference on Computer Vision, Marseille, pp 316–329

  25. Matthieu G, Thomas M, Jakob V et al (2009). TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation. Proceedings of 12th International Conference on Computer Vision, pp 309–316

  26. Pedro F, Daniel P (2004) Efficient graph-based image segmentation [J]. Int J Comput Vis 59(2):167–181

    Article  Google Scholar 

  27. Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22:888–905

    Article  Google Scholar 

  28. Si Z, Zhu SC (2013) Learning AND-OR templates for object recognition and detection. IEEE Trans Pattern Anal Mach Intell 35(9):2189–2205

    Article  Google Scholar 

  29. Wang Y, Mei T, Gong S, Hua XS (2009) Combining global, regional and contextual features for automatic image annotation. Pattern Recogn 42:259–266

    Article  MATH  Google Scholar 

  30. Yang Y, Wu F, Nie F et al (2012) Web and personal image annotation by mining label correlation with relaxed visual graph embedding. IEEE Trans Image Process 21(3):1339–1351

    Article  MathSciNet  Google Scholar 

  31. Zhang L, Gao Y, Hong C et al (2014) Feature correlation hypergraph: exploiting high-order potentials for multimodal recognition. IEEE Trans Cybern 44(8):1408–1419

    Article  Google Scholar 

  32. Zhang L, Gao Y, Xia Y et al (2014) A Fine-grained image categorization system by cellet-encoded spatial pyramid modeling. IEEE Trans Ind Electron

  33. Zhang L, Han Y, Yang Y et al (2013) Discovering discriminative graphlets for aerial image categories recognition. IEEE Trans Image Process 22(12):5071–5084

    Article  MathSciNet  Google Scholar 

  34. Zhang S, Huang J, Li H et al (2012) Automatic image annotation and retrieval using group sparsity. IEEE Trans Syst Man Cybern 42(3):838–849

    Article  MathSciNet  Google Scholar 

  35. Zhang D, Islam MM, Lu G (2012) A review on automatic image annotation techniques [J]. Pattern Recogn 45(1):346–262

    Article  Google Scholar 

  36. Zhang L, Song M, Liu X et al (2013) Fast multi-view segment graph kernel for object classification. Sig Process 93(6):1597–1607

    Article  Google Scholar 

  37. Zhang L, Song M, Liu X et al (2014) Recognizing architecture styles by hierarchical sparse coding of blocklets. Inf Sci 254(1):141–154

    Article  MathSciNet  Google Scholar 

  38. Zhu JY, Wu J, Wei Y et al (2012) Unsupervised Object Discovery via Saliency-Guided Multiple Class Learning. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 3218–3225

Download references

Acknowledgments

This work is partially supported by the National Natural Science Foundation of China under Grants No. 61103175, the Natural Science Foundation of Fujian Province under Grant No. 2013 J05088, the Key Project of Chinese Ministry of Education under Grant No.212086, the Fujian Province High School Science Fund for Distinguished Young Scholars under Grand No.JA12016, the Program for New Century Excellent Talents in Fujian Province University under Grant No. JA13021, and the Fujian Natural Science Funds for Distinguished Young Scholar under Grant No. 2014 J06017.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenzhong Guo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ke, X., Guo, W. Multi-scale salient region and relevant visual keywords based model for automatic image annotation. Multimed Tools Appl 75, 12477–12498 (2016). https://doi.org/10.1007/s11042-014-2318-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-014-2318-2

Keywords

Navigation