Multimedia Tools and Applications

, Volume 77, Issue 9, pp 10733–10751 | Cite as

Semantic image retrieval for complex queries using a knowledge parser

  • Hua ChenEmail author
  • Antoine Trouve
  • Kazuaki J. Murakami
  • Akira Fukuda


In order to improve the retrieval accuracy of image retrieval systems, research focus has been shifted from designing sophisticated low-level feature extraction algorithms to combining image retrieval processing with rich semantics and knowledge-based methods. In this paper, we aim at improving text-based image retrieval for complex natural language queries by using a semantic parser (Knowledge Parser or K-Parser). From text written in natural language, the K-parser extracts a graphical semantic representation of the objects involved, their properties as well as their relations. We analyze both the image textual captions and the natural language queries with the K-parser. As a technical solution, we leverage RDF in two ways: first, we store the parsed image captions as RDF triples; second, we translate image queries into SPARQL queries. When applied to the Flickr8k dataset with a set of 16 custom queries, we notice that the K-parser exhibits some biases that negatively affect the accuracy of the queries. We propose two techniques to address the weaknesses: (1) we introduce a set of rules to transform the output of K-parser and fix some basic, recurrent parsing mistakes that occur on the captions of Flickr8k; (2) we leverage two popular commonsense knowledge databases, ConceptNet and WordNet, to raise the accuracy of queries on broad concepts. Using those two techniques, we can fix most of the initial retrieval errors, and accurately execute our set of 16 queries on the Flickr8k dataset.


Image retrieval Object retrieval Commonsense knowledge K-parser RDF 


  1. 1.
    Aditya S, Yang Y, Baral C, Fermuller C, Aloimonos Y (2015) From images to sentences through scene description graphs using commonsense reasoning and knowledge. arXiv:151103292
  2. 2.
    Chen H, Trouve A, Murakami KJ, Fukuda A (2016) An intelligent annotation-based image retrieval system based on rdf descriptions. Comput Electr EngGoogle Scholar
  3. 3.
    Clark P, Porter B, Works BP (2004) Km–the knowledge machine 2.0: Users manual. Department of Computer Science, University of Texas at Austin 2:5Google Scholar
  4. 4.
    Dasiopoulou S, Giannakidou E, Litos G, Malasioti P, Kompatsiaris Y (2011) A survey of semantic image and video annotation tools. In: Knowledge-driven multimedia information extraction and ontology evolution. Springer, pp 196–239Google Scholar
  5. 5.
    Grobe M (2009) Rdf, jena, sparql and the ‘semantic web’. In: Proceedings of the 37th annual ACM SIGUCCS fall conference: communication and collaboration. ACM, pp 131–138Google Scholar
  6. 6.
    Hodosh M, Young P, Hockenmaier J (2013) Framing image description as a ranking task: data, models and evaluation metrics. J Artif Intell Res 47:853–899MathSciNetzbMATHGoogle Scholar
  7. 7.
    Hsu MH, Tsai MF, Chen HH (2006) Query expansion with conceptnet and wordnet: an intrinsic comparison. In: Asia information retrieval symposium. Springer, pp 1–13Google Scholar
  8. 8.
    Im DH, Park GD (2015) Linked tag: image annotation using semantic relationships between image tags. Multimedia Tools Appl 74(7):2273–2287CrossRefGoogle Scholar
  9. 9.
    Johnson J, Krishna R, Stark M, Li LJ, Shamma D, Bernstein M, Fei-Fei L (2015) Image retrieval using scene graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3668–3678Google Scholar
  10. 10.
    Li Y, Lu H, Li J, Li X, Li Y, Serikawa S (2016) Underwater image de-scattering and classification by deep neural network. Comput Electr Eng 54:68–77CrossRefGoogle Scholar
  11. 11.
    Liu H, Singh P (2004) Conceptnet–a practical commonsense reasoning tool-kit. BT Technology Journal 22(4):211–226CrossRefGoogle Scholar
  12. 12.
    Lu H, Li Y, Nakashima S, Serikawa S (2016) Single image dehazing through improved atmospheric light estimation. Multimedia Tools Appl 75(24):17,081–17,096CrossRefGoogle Scholar
  13. 13.
    Magesh N, Thangaraj P (2011) Semantic image retrieval based on ontology and sparql query. In: Proceedings of International Journal of Computer Applications (IJCA)–ICACT, pp 12–16Google Scholar
  14. 14.
    Manola F, Miller E (2004) Resource description framework (rdf) primer. W3C Recommendation 10:5Google Scholar
  15. 15.
    McBride B, Boothby D, Dollin C (2004) An introduction to rdf and the jena rdf api. Retrieved August 1:2007Google Scholar
  16. 16.
    Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38 (11):39–41CrossRefGoogle Scholar
  17. 17.
    Prud E, Seaborne A (2006) Sparql query language for rdf. W3C RecommendationGoogle Scholar
  18. 18.
    Sankar S, Sayed A, Bani-Younis JA (2014) A schematic analysis on selective-rdf database stores. Int J Comput Appl 86(11)Google Scholar
  19. 19.
    Scherp A (2013) Semantic technologies for multimedia content: foundations and applications. In: Proceedings of the 21st ACM international conference on multimedia. ACM, pp 1107–1108Google Scholar
  20. 20.
    Schuster S, Krishna R, Chang A, Fei-Fei L, Manning CD (2015) Generating semantically precise scene graphs from textual descriptions for improved image retrieval. In: Proceedings of the fourth workshop on vision and language, pp 70–80Google Scholar
  21. 21.
    Serikawa S, Lu H (2014) Underwater image dehazing using joint trilateral filter. Comput Electr Eng 40(1):41–50CrossRefGoogle Scholar
  22. 22.
    Sharma A, Vo NH, Aditya S, Baral C (2015) Towards addressing the winograd schema challenge-building and using a semantic parser and a knowledge hunting module. In: IJCAI, pp 1319–1325Google Scholar
  23. 23.
    Speer R, Havasi C (2012) Representing general relational knowledge in conceptnet 5. In: LREC, pp 3679–3686Google Scholar
  24. 24.
    Xu X, He L, Lu H, Shimadam A, Taniguchi R (2016) Non-linear matrix completion for social image tagging. IEEE AccessGoogle Scholar
  25. 25.
    Xu X, He L, Shimada A, Taniguchi R, Lu H (2016) Learning unified binary codes for cross-modal retrieval via latent semantic hashing. Neurocomputing 213:191–203CrossRefGoogle Scholar
  26. 26.
    Xu X, Shen F, Yang Y, Shen HT, Li X (2017) Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans Image ProcessGoogle Scholar
  27. 27.
    Xu X, Shen F, Yang Y, Zhang D, Shen HT, Song J (2017) Matrix tri-factorization with manifold regularizations for zero-shot learning. In: Proceeding of the IEEE conference on computer vision and pattern recognition. CVPRGoogle Scholar
  28. 28.
    Yang Y, EDU U, Aloimonos Y, Fermuller C (2016) Deepiu: an architecture for image understanding. Adv Cogn SystGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  • Hua Chen
    • 1
    Email author
  • Antoine Trouve
    • 2
  • Kazuaki J. Murakami
    • 1
  • Akira Fukuda
    • 1
  1. 1.Graduate School of Information Science and Electrical EngineeringKyushu UniversityFukuokaJapan
  2. 2.Institute of Systems, Information Technologies and Nanotechnologies (ISIT)FukuokaJapan

Personalised recommendations