Advertisement

A Competence-Aware Curriculum for Visual Concepts Learning via Question Answering

Conference paper
  • 885 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12347)

Abstract

Humans can progressively learn visual concepts from easy to hard questions. To mimic this efficient learning ability, we propose a competence-aware curriculum for visual concept learning in a question-answering manner. Specifically, we design a neural-symbolic concept learner for learning the visual concepts and a multi-dimensional Item Response Theory (mIRT) model for guiding the learning process with an adaptive curriculum. The mIRT effectively estimates the concept difficulty and the model competence at each learning step from accumulated model responses. The estimated concept difficulty and model competence are further utilized to select the most profitable training samples. Experimental results on CLEVR show that with a competence-aware curriculum, the proposed method achieves state-of-the-art performances with superior data efficiency and convergence speed. Specifically, the proposed model only uses 40% of training data and converges three times faster compared with other state-of-the-art methods.

Keywords

Visual question answering Visual concept learning Curriculum learning Model competence 

Notes

Acknowledgements

We thank Yixin Chen from UCLA for helpful discussions. This work reported herein is supported by ARO W911NF1810296, DARPA XAI N66001-17-2-4029, and ONR MURI N00014-16-1-2007.

Supplementary material

504434_1_En_9_MOESM1_ESM.pdf (3 mb)
Supplementary material 1 (pdf 3021 KB)

References

  1. 1.
    Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6077–6086 (2018)Google Scholar
  2. 2.
    Andreas, J., Rohrbach, M., Darrell, T., Klein, D.: Neural module networks. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 39–48 (2015)Google Scholar
  3. 3.
    Andreas, J., Rohrbach, M., Darrell, T., Klein, D.: Learning to compose neural networks for question answering. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2016)Google Scholar
  4. 4.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015)Google Scholar
  5. 5.
    Baker, F.B.: The basics of item response theory. In: ERIC (2001)Google Scholar
  6. 6.
    Baker, F.B., Kim, S.H.: Item Response Theory: Parameter Estimation Techniques. CRC Press, Boca Raton (2004)CrossRefGoogle Scholar
  7. 7.
    Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: International Conference on Machine Learning (ICML) (2009)Google Scholar
  8. 8.
    Bingham, E., et al.: Pyro: deep universal probabilistic programming. J. Mach. Learn. Res. 20, 1–6 (2018)Google Scholar
  9. 9.
    Bock, R.D., Aitkin, M.: Marginal maximum likelihood estimation of item parameters: application of an EM algorithm. Psychometrika 46, 443–459 (1981)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Chrupała, G., Kádár, A., Alishahi, A.: Learning language through pictures. In: Association for Computational Linguistics (ACL) (2015)Google Scholar
  11. 11.
    Dasgupta, S., Hsu, D., Poulis, S., Zhu, X.: Teaching a black-box learner. In: ICML (2019)Google Scholar
  12. 12.
    Dong, L., Lapata, M.: Language to logical form with neural attention. In: ACL (2016)Google Scholar
  13. 13.
    Elman, J.L.: Learning and development in neural networks: the importance of starting small. Cognition 48, 71–99 (1993)CrossRefGoogle Scholar
  14. 14.
    Embretson, S.E., Reise, S.P.: Item Response Theory. Psychology Press, New York (2013)CrossRefGoogle Scholar
  15. 15.
    Fan, Y., et al.: Learning to teach. In: ICLR (2018)Google Scholar
  16. 16.
    Fazly, A., Alishahi, A., Stevenson, S.: A probabilistic computational model of cross-situational word learning. In: Annual Meeting of the Cognitive Science Society (CogSci) (2010)Google Scholar
  17. 17.
    Gan, C., Li, Y., Li, H., Sun, C., Gong, B.: VQS: linking segmentations to questions and answers for supervised attention in VQA and question-focused semantic segmentation. In: ICCV, pp. 1811–1820 (2017)Google Scholar
  18. 18.
    Gauthier, J., Levy, R., Tenenbaum, J.B.: Word learning and the acquisition of syntactic-semantic over hypotheses. In: Annual Meeting of the Cognitive Science Society (CogSci) (2018)Google Scholar
  19. 19.
    Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., Parikh, D.: Making the V in VQA matter: elevating the role of image understanding in visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6904–6913 (2017)Google Scholar
  20. 20.
    Graves, A., Bellemare, M.G., Menick, J., Munos, R., Kavukcuoglu, K.: Automated curriculum learning for neural networks. In: International Conference on Machine Learning (ICML) (2017)Google Scholar
  21. 21.
    Guo, S., et al.: CurriculumNet: weakly supervised learning from large-scale web images. arXiv preprint arXiv:1808.01097 (2018)
  22. 22.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  23. 23.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  24. 24.
    Hu, R., Andreas, J., Rohrbach, M., Darrell, T., Saenko, K.: Learning to reason: end-to-end module networks for visual question answering. In: International Conference on Computer Vision (ICCV), pp. 804–813 (2017)Google Scholar
  25. 25.
    Hudson, D.A., Manning, C.D.: Compositional attention networks for machine reasoning. In: International Conference on Learning Representations (ICLR) (2018)Google Scholar
  26. 26.
    Hudson, D.A., Manning, C.D.: GQA: a new dataset for real-world visual reasoning and compositional question answering. In: CVPR (2019)Google Scholar
  27. 27.
    Jiang, L., et al.: Self-paced learning with diversity. In: NIPS (2014)Google Scholar
  28. 28.
    Jiang, L., et al.: Self-paced curriculum learning. In: AAAI (2015)Google Scholar
  29. 29.
    Johnson, J., Hariharan, B., van der Maaten, L., Fei-Fei, L., Lawrence Zitnick, C., Girshick, R.: CLEVR: a diagnostic dataset for compositional language and elementary visual reasoning. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  30. 30.
    Johnson, J., et al.: Inferring and executing programs for visual reasoning. In: International Conference on Computer Vision (ICCV) (2017)Google Scholar
  31. 31.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR) (2015)Google Scholar
  32. 32.
    Krueger, K.A., Dayan, P.: Flexible shaping: how learning in small steps helps. Cognition 110, 380–394 (2009)CrossRefGoogle Scholar
  33. 33.
    Kumar, M.P., et al.: Self-paced learning for latent variable models. In: NIPS (2010)Google Scholar
  34. 34.
    Lalor, J.P., Wu, H., Yu, H.: Building an evaluation scale using item response theory. In: Conference on Empirical Methods in Natural Language Processing (EMNLP) (2016)Google Scholar
  35. 35.
    Lalor, J.P., Wu, H., Yu, H.: Learning latent parameters without human response patterns: item response theory with artificial crowds. In: Conference on Empirical Methods in Natural Language Processing (EMNLP) (2019)Google Scholar
  36. 36.
    Liang, C., Berant, J., Le, Q., Forbus, K.D., Lao, N.: Neural symbolic machines: learning semantic parsers on freebase with weak supervision. In: ACL (2016)Google Scholar
  37. 37.
    Liang, C., Norouzi, M., Berant, J., Le, Q., Lao, N.: Memory augmented policy optimization for program synthesis and semantic parsing. In: NIPS (2018)Google Scholar
  38. 38.
    Liu, W., et al.: Iterative machine teaching. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 2149–2158. JMLR.org (2017)Google Scholar
  39. 39.
    Malinowski, M., Fritz, M.: A multi-world approach to question answering about real-world scenes based on uncertain input. In: Advances in Neural Information Processing Systems (NeurIPS) (2014)Google Scholar
  40. 40.
    Mansouri, F., Chen, Y., Vartanian, A., Zhu, X., Singla, A.: Preference-based batch and sequential teaching: towards a unified view of models. In: NeurIPS (2019)Google Scholar
  41. 41.
    Mao, J., Gan, C., Kohli, P., Tenenbaum, J.B., Wu, J.: The neuro-symbolic concept learner: interpreting scenes, words, and sentences from natural supervision. In: International Conference on Learning Representations (ICLR) (2019)Google Scholar
  42. 42.
    Misra, I., Girshick, R.B., Fergus, R., Hebert, M., Gupta, A., van der Maaten, L.: Learning by asking questions. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  43. 43.
    Natesan, P., Nandakumar, R., Minka, T., Rubright, J.D.: Bayesian prior choice in IRT estimation using MCMC and variational Bayes. Front. Psychol. 7, 1–11 (2016)CrossRefGoogle Scholar
  44. 44.
    Pentina, A., Sharmanska, V., Lampert, C.H.: Curriculum learning of multiple tasks. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5492–5500 (2014)Google Scholar
  45. 45.
    Perez, E., Strub, F., de Vries, H., Dumoulin, V., Courville, A.C.: FiLM: visual reasoning with a general conditioning layer. In: AAAI Conference on Artificial Intelligence (AAAI) (2017)Google Scholar
  46. 46.
    Peterson, G.B.: A day of great illumination: B. F. Skinner’s discovery of shaping. J. Exp. Anal. Behav. 82, 317–328 (2004)CrossRefGoogle Scholar
  47. 47.
    Platanios, E.A., Stretcu, O., Neubig, G., Póczos, B., Mitchell, T.M.: Competence-based curriculum learning for neural machine translation. In: North American Chapter of the Association for Computational Linguistics (NAACL-HLT) (2019)Google Scholar
  48. 48.
    Reckase, M.D.: The difficulty of test items that measure more than one ability. Appl. Psychol. Meas. 13, 113–127 (1985)Google Scholar
  49. 49.
    Reckase, M.D.: Multidimensional item response theory models. In: Multidimensional Item Response Theory (2009)Google Scholar
  50. 50.
    Sachan, M., et al.: Easy questions first? A case study on curriculum learning for question answering. In: ACL (2016)Google Scholar
  51. 51.
    Skinner, B.F.: Reinforcement today. Am. Psychol. 47, 1318–1328 (1958)Google Scholar
  52. 52.
    Spitkovsky, V.I., Alshawi, H., Jurafsky, D.: From baby steps to leapfrog: How less is more in unsupervised dependency parsing. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 751–759. Association for Computational Linguistics (2010)Google Scholar
  53. 53.
    Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)Google Scholar
  54. 54.
    Tsvetkov, Y., Faruqui, M., Ling, W., MacWhinney, B., Dyer, C.: Learning the curriculum with Bayesian optimization for task-specific word representation learning. In: ACL (2016)Google Scholar
  55. 55.
    Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 229–256 (1992)zbMATHGoogle Scholar
  56. 56.
    Wu, L., et al.: Learning to teach with dynamic loss functions. In: NeurIPS (2018)Google Scholar
  57. 57.
    Yi, K., et al.: CLEVRER: collision events for video representation and reasoning. In: ICLR (2020)Google Scholar
  58. 58.
    Yi, K., Wu, J., Gan, C., Torralba, A., Kohli, P., Tenenbaum, J.: Neural-symbolic VQA: disentangling reasoning from vision and language understanding. In: Advances in Neural Information Processing Systems (2018)Google Scholar
  59. 59.
    Zhu, X.: Machine teaching: An inverse problem to machine learning and an approach toward optimal education. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)Google Scholar
  60. 60.
    Zhu, X., Singla, A., Zilles, S., Rafferty, A.N.: An overview of machine teaching. arXiv preprint arXiv:1801.05927 (2018)

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.UCLA Center for Vision, Cognition, Learning, and Autonomy (VCLA)Los AngelesUSA

Personalised recommendations