Using machine learning to classify reviewer comments in research article drafts to enable students to focus on global revision


Reviewer comments in research articles such as journal papers or dissertations guide students during the revision process to improve the quality of their articles. Our goal is to make the comments more meaningful to the students’ revision process. Revision involves implicit cognitive processes and ICT has the potential to make such processes explicit. Previous research into the cognitive processes involved in revision has shown that novices focus on local, sentence level revision while expert writers focus on global revision of ideas or restructuring of arguments. For better quality writing, students should focus more on global revision. The reviewer comments can either trigger more meaningful global revision (content-related comments) or local revision (non content-related comments). In this paper, a machine learning algorithm was applied to classify the comments in academic drafts in our laboratory as either content-related or not. Reviewer comments in academic article drafts are usually short. Therefore, this research applied a Support Vector Machine (SVM) algorithm for the classification, which is one of the most common machine learning algorithms for short texts. Performance evaluation was based on the measures of accuracy, precision and recall for the non content-related comments. Using cross validation, highest scores of 86%, 89% and 89% were achieved for accuracy, recall, and precision, respectively. The results demonstrate the success of the automatic classification, which can be applied to filter out non content-related comments so that the students focus first on revising the content-related comments. In this way, the students can increase their awareness of the importance of global revision.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4


  1. Amiangshu, B., Greiler, M., & Bird, C. (2015). Characteristics of useful code reviews: An empirical study at microsoft. Mining Software Repositories (MSR), 2015 IEEE/ACM 12th Working Conference on (S. 146–156). Florence: IEEE and ACM.

  2. Atreya, B., Walters, C., & Shepherd, M. (2003). Support Vector Machines for Text Categorization. Proceedings of the 36th Hawaii International Conference on System Sciences. Hawaii.

  3. Bird, S., Loper, E., & Klein, E. (2009). Natural Language Processing with Python. O’Reilly Media Inc. Abgerufen am 11. 9 2017 von

  4. Chih-Chung, C., & Chih-Jen, L. (2011). LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27.

  5. Chinnappa, G., Miller, T., & Gurevych, I. (2016). CNN-and LSTM-based Claim Classification in Online User Comments. 26th International Conference on Computational Linguistics (S. 2740–2751). Osaka: Association for Computational Linguistics.

  6. Chu, T., Kylie, J., & Wang, M. (2016). Comment Abuse Classification with Deep Learning. Von abgerufen.

  7. Flower, L., & Hayes, J. R. (1981). A cognitive process theory of writing. College Composition and Communication, 32(4), 365–387.

    Article  Google Scholar 

  8. Hasegawa, S., & Yamane, K. (2011). An Article/Presentation Revising Support System for Transferring Laboratory Knowledge. 19th International Conference on Computers in Education (S. 247–254). Chiang Mai, Thailand: Asia-Pacific Society for Computers in Education.

  9. Hayes, J. R., Linda, F., Schriver, K. A., Stratman, J., & Carey, L. (1987). Cognitive processes in revision. Advances in applied psycholinguistics (2), 176–240.

  10. Iwai, H., Hijikata, Y., Ikeda, K., & Nishida, S. (2014). Sentence-based Plot Classification for Online Review Comments. IEEE/WIC/ACM International Joint Conference on Web Intelligencce (WI) and Intelligent Agent Technologies (IAT) (S. 245–253). Warsaw: IEEE Computer Society Press.

  11. Joty, S., Alberto, B.-C., Giovanni, D. S., Simone, F., Lluís, M., Alessandro, M., & Preslav, N. (2015). Global thread-level inference for comment classification in community question answering. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (S. 573–578). Lisbon: Association of Computational Linguistics.

  12. Kak, A. (2016). DecisionTree-3.4.3.html. Von DecisionTree-3.4.3: abgerufen.

  13. Kaszuba, T., Albert, H., & Adam, W. (2009). Comment classification for internet auction platforms. East European Conference on Advances in Databases and Information Systems (S. 129–136). Orhid: Springer.

  14. Kozma, R. B. (1991). Computer-based writing tools and the cognitive needs of novice writers. Computers and composition, 8(2), 31–45.

    Article  Google Scholar 

  15. Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. J., & McClosky, D. (2014). The Stanford CoreNLP Natural Language Processing Toolkit. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, (S. 55–60). Von abgerufen.

  16. Mosab, F., Abdulla, N., Al-Ayyoub, M., Jararweh, Y., & Quwaider, M. (2014). Cross-lingual short-text document classification for facebook comments. Future Internet of Things and Cloud (FiCloud) (S. 573–578). Barcelona: IEEE.

  17. Mukherjee, A., & Bing, L. (2012). Modeling review comments. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1 (S. 320–329). Jeju Island: Association for Computational Linguistics.

  18. Ocharo, H. N., Hasegawa, S., & Shirai, K. (2017). Topic-based Revision Tool to Support Academic Writing Skill for Research Students. Proceedings of The Tenth International Conference (S. 102–107). Nice: ThinkMind.

  19. Refaeilzadeh, P., Lei, T., & Huan, L. (2009). Cross-validation. Encyclopedia of database systems, 532–538.

  20. Sun, Y., Ma, L., & Wang, S. (February 2015). A Comparative Evaluation of String Similarity Metrics. Journal of Information & Computational Science, 12(3), 957–964.

    Article  Google Scholar 

  21. Yue, L., Zhai, C., & Sundaresan, N. (2009). Rated aspect summarization of short comments. Proceedings of the 18th international conference on World wide web (S. 131–140). Madrid: Association for Computing Machinery (ACM).

Download references


This work was supported by the Japan Society for the Promotion of Science (KAKENHI) Grant Number 17 K00479.

Author information



Corresponding author

Correspondence to Harriet Nyanchama Ocharo.



Table 6 List of grammatical keywords
Table 7 List of Content keywords

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ocharo, H.N., Hasegawa, S. Using machine learning to classify reviewer comments in research article drafts to enable students to focus on global revision. Educ Inf Technol 23, 2093–2110 (2018).

Download citation


  • Reviewer comment classification
  • Short text classification
  • Support vector machine
  • Academic article revision