Visual Question Generation for Class Acquisition of Unknown Objects

  • Kohei UeharaEmail author
  • Antonio Tejero-De-Pablos
  • Yoshitaka Ushiku
  • Tatsuya Harada
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11216)


Traditional image recognition methods only consider objects belonging to already learned classes. However, since training a recognition model with every object class in the world is unfeasible, a way of getting information on unknown objects (i.e., objects whose class has not been learned) is necessary. A way for an image recognition system to learn new classes could be asking a human about objects that are unknown. In this paper, we propose a method for generating questions about unknown objects in an image, as means to get information about classes that have not been learned. Our method consists of a module for proposing objects, a module for identifying unknown objects, and a module for generating questions about unknown objects. The experimental results via human evaluation show that our method can successfully get information about unknown objects in an image dataset. Our code and dataset are available at


Visual question generation Unknown object recognition Unknown object class acquisition Real world recognition 



This work was supported by JST CREST Grant Number JPMJCR1403, Japan.


  1. 1.
    Alexe, B., Deselaers, T., Ferrari, V.: Measuring the objectness of image windows. IEEE PAMI 34, 2189–2202 (2012)CrossRefGoogle Scholar
  2. 2.
    Bendale, A., Boult, T.E.: Towards open world recognition. In: CVPR (2015)Google Scholar
  3. 3.
    Bendale, A., Boult, T.E.: Towards open set deep networks. In: CVPR (2016)Google Scholar
  4. 4.
    Cheng, M.M., Zhang, Z., Lin, W.Y., Torr, P.: BING: binarized normed gradients for objectness estimation at 300fps. In: CVPR (2014)Google Scholar
  5. 5.
    Denkowski, M., Lavie, A.: Meteor universal: language specific translation evaluation for any target language. In: EACL Workshop on Statistical Machine Translation (2014)Google Scholar
  6. 6.
    Fellbaum, C.: WordNet : An Electronic Lexical Database. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  7. 7.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  8. 8.
    Hwa, R.: Sample selection for statistical parsing. Comput. Linguist. 30, 253–276 (2004)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Jain, U., Zhang, Z., Schwing, A.G.: Creativity: Generating diverse questions using variational autoencoders. In: CVPR (2017)Google Scholar
  10. 10.
    Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. IJCV 123, 32–73 (2017)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  12. 12.
    Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: ACM SIGIR (1994)CrossRefGoogle Scholar
  13. 13.
    Li, Y., Huang, C., Tang, X., Loy, C.C.: Learning to disambiguate by asking discriminative questions. In: ICCV (2017)Google Scholar
  14. 14.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR (2013)Google Scholar
  15. 15.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS (2013)Google Scholar
  16. 16.
    Mostafazadeh, N., Misra, I., Devlin, J., Mitchell, M., He, X., Vanderwende, L.: Generating natural questions about an image. In: ACL (2016)Google Scholar
  17. 17.
    Nickel, M., Kiela, D.: Poincaré embeddings for learning hierarchical representations. In: NIPS (2017)Google Scholar
  18. 18.
    Otsu, N.: A threshold selection method from gray-level histograms. IEEE SMC 9, 62–66 (1979)Google Scholar
  19. 19.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: ACL (2001)Google Scholar
  20. 20.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR (2016)Google Scholar
  21. 21.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)Google Scholar
  22. 22.
    Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. IJCV 115, 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  23. 23.
    van de Sande, K.E.A., Uijlings, J.R.R., Gevers, T., Smeulders, A.W.M.: Segmentation as selective search for object recognition. In: ICCV (2011)Google Scholar
  24. 24.
    Scheffer, T., Decomain, C., Wrobel, S.: Active hidden Markov models for information extraction. In: Hoffmann, F., Hand, D.J., Adams, N., Fisher, D., Guimaraes, G. (eds.) IDA 2001. LNCS, vol. 2189, pp. 309–318. Springer, Heidelberg (2001). Scholar
  25. 25.
    Scheirer, W.J., de Rezende Rocha, A., Sapkota, A., Boult, T.E.: Toward open set recognition. IEEE PAMI 35, 1757–1772 (2013)CrossRefGoogle Scholar
  26. 26.
    Settles, B., Craven, M.: An analysis of active learning strategies for sequence labeling tasks. In: EMNLP (2008)Google Scholar
  27. 27.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar
  28. 28.
    Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. IJCV 104, 154–171 (2013)CrossRefGoogle Scholar
  29. 29.
    Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: CVPR (2015)Google Scholar
  30. 30.
    Zhang, S., Qu, L., You, S., Yang, Z., Zhang, J.: Automatic generation of grounded visual questions. In: IJCAI (2017)Google Scholar
  31. 31.
    Zhu, W., Liang, S., Wei, Y., Sun, J.: Saliency Optimization from Robust Background Detection. In: CVPR (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.The University of TokyoTokyoJapan
  2. 2.RIKENTokyoJapan

Personalised recommendations