Vocabulary Learning Support System Based on Automatic Image Captioning Technology
Learning context has evident to be an essential part in vocabulary development, however describing learning context for each vocabulary is considered to be difficult. In the human brain, it is relatively easy to describe learning contexts using pictures because pictures describe an immense amount of details at a quick glance that text annotations cannot do. Therefore, in an informal language learning system, pictures can be used to overcome the problems that language learners face in describing learning contexts. The present study aimed to develop a support system that generates and represents learning contexts automatically by analyzing the visual contents of the pictures captured by language learners. Automatic image captioning, a technology of artificial intelligence that connects computer vision and natural language processing is used for analyzing the visual contents of the learners’ captured images. A neural image caption generator model called Show and Tell is trained for image-to-word generation and to describe the context of an image. The three-fold objectives of this research are: First, an intelligent technology that can understand the contents of the picture and capable to generate learning contexts automatically; Second, a leaner can learn multiple vocabularies by using one picture without relying on a representative picture for each vocabulary, and Third, a learner’s prior vocabulary knowledge can be mapped with new learning vocabulary so that previously acquired vocabulary be reviewed and recalled while learning new vocabulary.
KeywordsArtificial intelligence in education Automatic image captioning Learning context representation Ubiquitous learning Visual contents analysis Vocabulary learning
This work was partly supported by JSPS Grant-in-Aid for Scientific Research (S)16H06304 and 17K12947; NEDO Special Innovation Program on AI and Big Data 18102059-0; and JSPS Start-up Grant-in-Aid Number 18H05745.
- 5.Gu, P.Y.: Vocabulary learning in a second language: person, task, context and strategies. TESL-EJ 7(2), 1–25 (2003)Google Scholar
- 6.Ibrahim, W.J.: The importance of contextual situation in language teaching. Adab AL Rafidayn 51, 630–655 (2008)Google Scholar
- 7.Sternberg, R.J.: Most vocabulary is learned from context. Nat. Vocab. Acquis. 89, 105 (1987)Google Scholar
- 8.Nagy, W.E.: On the role of context in first- and second-language vocabulary learning. University of Illinois at Urbana-Champaign, Center for the Study of Reading, Champaign (1995)Google Scholar
- 9.Daniel, R.P.R.: Caption this, with TensorFlow. O’Reilly Media, 28 March 2017. https://www.oreilly.com/learning/caption-this-with-tensorflow
- 10.Liu, C., Wang, C., Sun, F., Rui, Y.: Image2Text: a multimodal caption generator. In: ACM Multimedia, pp. 746–748 (2016)Google Scholar
- 11.Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 312–83137 (2015)Google Scholar
- 13.Plummer, B.A., Wang, L., Cervantes, C.M., Caicedo, J.C., Hockenmaier, J., Lazebnik, S.: Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2641–2649 (2015)Google Scholar
- 15.Image Processing with the Computer Vision API—Microsoft Azure. https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/
- 16.Ogata, H., Uosaki, N., Mouri, K., Hasnine, M.N., Abou-Khalil, V., Flanagan, B.: SCROLL Dataset in the context of ubiquitous language learning. In: Workshop Proceedings of the 26th International Conference on Computer in Education, Manila, Philippines, pp. 418–423 (2018)Google Scholar
- 17.Hasnine, M.N., Mouri, K., Flanagan, B., Akcapinar, G., Uosaki, N., Ogata, H.: Image recommendation for informal vocabulary learning in a context-aware learning environment. In: Proceedings of the 26th International Conference on Computer in Education, Manila, Philippines, pp. 669–674 (2018)Google Scholar
- 19.Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)Google Scholar