Proposal of Japanese Vocabulary Difficulty Level Dictionaries for Automated Essay Scoring Support System Using Rubric

  • Megumi YamamotoEmail author
  • Nobuo Umemura
  • Hiroyuki Kawano


We are developing a Moodle plug-in, which is an AES (automated essay scoring) support system for the basic education of university students. Our system evaluates essays based on rubric, which has five evaluation viewpoints “Contents, Structure, Evidence, Style, and Skill”. Vocabulary level is one of the scoring items of Skill. It is calculated using Japanese Language Learners’ Dictionaries constructed by Sunakawa et al. Since this does not fully cover the words used in the student-level essays, we found that there is a problem with the accuracy of the vocabulary level scoring. In this paper, we propose to construct comprehensive Japanese vocabulary difficulty level dictionaries using Japanese Wikipedia as the corpus. We apply Latent Dirichlet Allocation (LDA) to the Wikipedia corpus and find the word appearance probability as one of the indexes of word difficulty. We use the TF-IDF value instead of the LDA value of the words, which rarely appears. As a result, we constructed highly comprehensive Japanese vocabulary difficulty level dictionaries. We confirmed that the vocabulary level can be scored for all words in the test dataset by using the constructed dictionaries.


Automated essay scoring Vocabulary level Dictionary Wikipedia Corpus LDA Rubric 

Mathematics Subject Classification




In this research, we used Japanese Learners’ Dictionaries ver1.0 (


  1. 1.
    Corpus Survey: Well-known and influential corpora, (2018-04-30)
  2. 2.
    Kyoto University Text Corpus Version 4.0, (2018-04-30) (in Japanese)
  3. 3.
    Introduction to the BCCWJ, (2018-04-30)
  4. 4.
    Yigal, A., Jill, B.: Automated essay scoring with e-rater? J. Technol. Learn. Assess. 4(3), 3–30 (2006)Google Scholar
  5. 5.
    Breland, H.M.: Word frequency and word difficulty, a comparison of counts in Four Corpora. Psychol. Sci. 7(2), 96–99 (1996)CrossRefGoogle Scholar
  6. 6.
    Ishioka, T., Kameda, M.: Automated Japanese essay scoring system based on articles written by experts. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pp. 233–240 (2006)Google Scholar
  7. 7.
    Yamamoto, M., Umemura, N., Kawano, H.: Automated essay scoring system based on Rubric. Stud. Comput. Intell. Appl. Comput. Inf. Technol. 727, 177–190 (2017)Google Scholar
  8. 8.
    Yamamoto, M., Umemura, N., Kawano, H.: Implementation of automated report scoring system based on Rubric. In: The 79th National Convention of Information Processing Society of Japan, Proceeding DVD (2017) (in Japanese)Google Scholar
  9. 9.
    Yamamoto, M., Umemura, N., Kawano, H.: Development and evaluation of the plugin for automated scoring of reports. In: Proceedings of Moodle Moot Japan 2017 Annual Conference, pp. 16–21 (2017) (in Japanese)Google Scholar
  10. 10.
    Sunakawa, Y., Lee, J., Takahara, M.: The construction of a database to support the compilation of Japanese learners’ dictionaries. Acta Linguist. Asiat. 2(2), 97–115 (2012)CrossRefGoogle Scholar
  11. 11.
    Qi, L., Xu, Z., Yang, Q.: Preface. J. Oper. Res. Soc. Chin. 5(1), 1–2 (2017)Google Scholar
  12. 12.
    Li, Q., He, Y., Wu, L., Wang, R.: Robust PCA for ground moving target indication in wide-area surveillance radar system. J. Oper. Res. Soc. Chin. 1(1), 135–153 (2013)CrossRefGoogle Scholar
  13. 13.
    Kanasugi, T., Kasahara, K., Inago, N., Amano, S.: Selection of a basic vocabulary based on word familiarity ratings. In: IEICE, vol. 19, no. 6, pp. 502–510 (2004) (in Japanese)Google Scholar
  14. 14.
    Kondo, T., Amano, S.: Lexical properties of Japanese, Nihongo-no Goitokusei: significance and problems CIEICE technical report. Thought Lang. 100, 1–8 (2000). (in Japanese)Google Scholar
  15. 15.
    Kajiwara, T., Komachi, M.: Simple PPDB: Japanese. In: Proceedings of the Twenty-third Annual Meeting of the Association for Natural Language Processing, pp. 529–532 (2017) (in Japanese)Google Scholar
  16. 16.
    Takigawa, M., Yamana, H.: A proposal of word weighting method in specific field and its adoption for the estimation of users’ expertise appearing in their tweets. In: FIT2016, pp. 1–7 (2016) (in Japanese)Google Scholar
  17. 17.
    Stephen, R., Hugo, Z.: The probabilistic relevance framework: BM25 and beyond. J. Found. Trends Inf. 3, 333–389 (2009)Google Scholar
  18. 18.
    Iwata, T.: Topic Models. Kodansha, Tokyo (2016). (in Japanese)Google Scholar
  19. 19.
    Matsukawa, H., Oyama, M., Negishi, C., Arai, Y., Iwasaki, C., Hotta, H.: Analysis of free descriptions of course evaluation questionnaires using topic model. In: Japan Journal of Educational Technology, vol. 41, no. 3, pp. 233–244 (2017) (in Japanese)Google Scholar

Copyright information

© Operations Research Society of China, Periodicals Agency of Shanghai University, Science Press, and Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Contemporary International StudiesNagoya University of Foreign Studies, Iwasaki-choNisshinJapan
  2. 2.School of Media and DesignNagoya University of Arts and SciencesNagoyaJapan
  3. 3.Faculty of Science and EngineeringNanzan UniversityNagoyaJapan

Personalised recommendations