Iterative Design and Classroom Evaluation of Automated Formative Feedback for Improving Peer Feedback Localization



A peer-review system that automatically evaluates and provides formative feedback on free-text feedback comments of students was iteratively designed and evaluated in college and high-school classrooms. Classroom assignments required students to write paper drafts and submit them to a peer-review system. When student peers later submitted feedback comments on the papers to the system, Natural Language Processing was used to automatically evaluate peer feedback quality with respect to localization (i.e., pinpointing the source of the comment in the paper being reviewed). These evaluations in turn triggered immediate formative feedback by the system, which was designed to increase peer feedback localization whenever a feedback submission was predicted to have a ratio of localized comments less than a threshold. System feedback was dynamically generated based on the results of localization prediction. Reviewers could choose to either revise their feedback comments to address the system’s feedback or could ignore the feedback. Our analysis of data from system logs demonstrates that our peer feedback localization prediction model triggered the formative feedback with high precision, particularly when peer feedback comments were written by college students. Our findings also show that although students often incorrectly disagree with the system’s feedback, when they do revise their peer feedback comments, the system feedback was successful in increasing peer feedback localization (although the sample size was low). Finally, while most peer comments were revised immediately after the system feedback, the desired revision behavior also occurred further after such system feedback.


Peer feedback Feedback localization Automated formative feedback 


  1. Berg, Ivd., Admiraal, W., & Pilot, A. (2006). Design principles and outcomes of peer assessment in higher education. Studies in Higher Education, 31(3), 341–356. doi:10.1080/03075070600680836,
  2. Bland, J.M., & Altman, D.G. (2009). Analysis of continuous data from small samples. Bmj, 338, a3166.CrossRefGoogle Scholar
  3. Cho, K. (2008). Machine classification of peer comments in physics, In Baker, R.S.J.D., Barnes, T., & Beck, J.E. (Eds.) Proceedings of the 1st International Conference on Educational Data Mining (pp. 192–196). Canada: Montreal.Google Scholar
  4. Cho, K., & MacArthur, C. (2010). Student revision with peer and expert reviewing. Learning and Instruction, 20(4), 328–338. doi:10.1016/j.learninstruc.2009.08.006,, unravelling Peer Assessment.
  5. Cho, K., & MacArthur, C. (2011). Learning by reviewing. Journal of Educational Psychology, 103(1), 73–84. doi:10.1037/a0021950.
  6. Cho, K., & Schunn, C.D. (2007). Scaffolded writing and rewriting in the discipline: a web-based reciprocal peer review system. Computers & Education, 48(3), 409–426. doi:10.1016/j.compedu.2005.02.004.
  7. Cho, K., Schunn, C.D., & Kwon, K. (2007). Learning writing by reviewing in science. In 8th International conference on computer-supported collaborative learning, International Society of the Learning Sciences, New Brunswick, NJ, USA. (pp. 141–143).
  8. Corley, C., & Mihalcea, R. (2005). Measuring the semantic similarity of texts. In Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, Association for Computational Linguistics, Stroudsburg, PA, USA, EMSEE ’05 (pp. 13–18).
  9. Ellis, J. (2011). Peer feedback on writing: Is on-line actually better than on-paper? Journal of Academic Language and Learning, 5(1), A88–A99.MathSciNetGoogle Scholar
  10. Ernst-Gerlach, A., & Crane, G. (2008). Identifying quotations in reference works and primary materials. In Research and Advanced Technology for Digital Libraries, Springer Berlin Heidelberg, Lecture Notes in Computer Science, (Vol. 5173 pp. 78–87). doi:10.1007/978-3-540-87599-4_9.
  11. Ferris, D.R., Liu, H., Sinha, A., & Senna, M. (2013). Written corrective feedback for individual {L2} writers. Journal of Second Language Writing, 22(3), 307–329. doi:10.1016/j.jslw.2012.09.009,
  12. Gan, M.J.S., & Hattie, J. (2014). Prompting secondary students’ use of criteria, feedback specificity and feedback levels during an investigative task. Instructional Science: An International Journal of the Learning Sciences, 42(6), 861–878.
  13. Gielen, M., & De Wever, B. (2015). Structuring the peer assessment process: a multilevel approach for the impact on product improvement and peer feedback quality. Journal of Computer Assisted Learning, 31(5), 435–449. doi:10.1111/jcal.12096.
  14. Gielen, S., Peeters, E., Dochy, F., Onghena, P., & Struyven, K. (2010). Improving the effectiveness of peer feedback for learning. Learning and Instruction, 20(4), 304–315. doi:10.1016/j.learninstruc.2009.08.007,, unravelling Peer Assessment.
  15. Goldin, I.M. (2012). Accounting for peer reviewer bias with bayesian models. In: Proceedings of the Workshop on Intelligent Support for Learning Groups at the 11th International Conference on Intelligent Tutoring Systems, Citeseer.Google Scholar
  16. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I.H. (2009). The WEKA Data Mining Software: An Update. ACM SIGKDD Explorations Newsletter, 11(1), 10–18. doi:10.1145/1656274.1656278.
  17. Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–112. doi:10.3102/003465430298487,
  18. Heift, T. (2004). Corrective feedback and learner uptake in CALL. ReCALL, 16(02), 416–431. doi:10.1017/S0958344004001120.
  19. Islam, A., & Inkpen, D. (2008). Semantic text similarity using corpus-based word similarity and string similarity. ACM Transactions on Knowledge Discovery from Data, 2(2), 10:1–10:25. doi:10.1145/1376815.1376819.
  20. Kaufman, J., & Schunn, C. (2011). Students’ perceptions about peer assessment for writing: their origin and impact on revision work. Instructional Science, 39(3), 387–406. doi:10.1007/s11251-010-9133-6.
  21. Kern, V.M., Saraiva, L.M., & dos Santos Pacheco, R.C. (2003). Peer review in education: Promoting collaboration, written expression, critical thinking, and professional responsibility. Education and Information Technologies, 8(1), 37–46. doi:10.1023/A:1023974224315.
  22. Van der Kleij, F.M., Eggen, T.J.H.M., Timmers, C.F., & Veldkamp, B.P. (2012). Effects of feedback in a computer-based assessment for learning. Computers & Education, 58(1), 263–272. doi:10.1016/j.compedu.2011.07.020,
  23. Kumar, A. (2010). Error-flagging support for testing and its effect on adaptation. In Intelligent Tutoring systems, Springer Berlin Heidelberg, Lecture Notes in Computer Science, (Vol. 6094 pp. 359–368).Google Scholar
  24. Landry, A., Jacobs, S., & Newton, G. (2014). Effective use of peer assessment in a graduate level writing assignment: a case study. International Journal of Higher Education, 4(1), 38.CrossRefGoogle Scholar
  25. Li, Y., Mclean, D., Bandar, Z., O’Shea, J., & Crockett, K. (2006). Sentence similarity based on semantic nets and corpus statistics. IEEE Transactions on Knowledge and Data Engineering, 18(8), 1138–1150. doi:10.1109/TKDE.2006.130.
  26. Lippman, J., Elfenbein, M., Diabes, M., Luchau, C., Lynch, C., Ashley, K., & Schunn, C. (2012). To revise or not to revise: What influences undergrad authors to implement peer critiques of their argument diagrams. In International Society for the Psychology of Science and Technology 2012 Conference.Google Scholar
  27. Lundstrom, K., & Baker, W. (2009). To give is better than to receive: The benefits of peer review to the reviewer’s own writing. Journal of Second Language Writing, 18(1), 30–43. doi:10.1016/j.jslw.2008.06.002.
  28. Malakasiotis, P. (2009). Paraphrase recognition using machine learning to combine similarity measures. In Proceedings of the ACL-IJCNLP 2009 Student Research Workshop, Association for Computational Linguistics, Stroudsburg, PA, USA, ACLstudent ’09 (pp. 27–35).
  29. McCarthey, S., Magnifico, A.M., & Kline, S.M. (2013). Secondary students’ use of two online peer review tools. Dallas, Texas.Google Scholar
  30. McCarthey, S., Magnifico, A.M., & Kline, S.M. (2014). Reconsidering peer feedback for argumentative essays. Philadelphia, Pennsylvania.Google Scholar
  31. Mulder, R.A., Pearce, J.M., & Baik, C. (2014). Peer review in higher education: Student perceptions before and after participation. Active Learning in Higher Education, 15(2), 157–171. doi:10.1177/1469787414527391,
  32. Narciss, S. (2013). Designing and evaluating tutoring feedback strategies for digital learning environments on the basis of the interactive tutoring feedback model. Digital Education Review, 23, 7–26.
  33. Nelson, M., & Schunn, C. (2009). The nature of feedback: how different types of peer feedback affect writing performance. Instructional Science, 37(4), 375–401. doi:10.1007/s11251-008-9053-x.
  34. Nguyen, H., & Litman, D. (2013). Identifying localization in peer reviews of argument diagrams. In Artificial Intelligence in education, Springer Berlin Heidelberg, Lecture Notes in Computer Science, (Vol. 7926 pp. 91–100).Google Scholar
  35. Nguyen, H., & Litman, D. (2014). Improving peer feedback prediction: the sentence level is right. In Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications, Association for Computational Linguistics, Baltimore, Maryland (pp. 99–108).
  36. Nguyen, H., Xiong, W., & Litman, D. (2016). Instant feedback for increasing the presence of solutions in peer reviews. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, Association for Computational Linguistics, San Diego, California. (pp. 6–10).
  37. Nicol, D., Thomson, A., & Breslin, C. (2014). Rethinking feedback practices in higher education: a peer review perspective. Assessment & Evaluation in Higher Education, 39(1), 102–122. doi:10.1080/02602938.2013.795518.
  38. Nilson, L.B. (2003). Improving student peer feedback. College Teaching, 51 (1), 34–38. doi:10.1080/87567550309596408.
  39. Piech, C., Huang, J., Chen, Z., Do, C.B., Ng, A.Y., & Koller, D. (2013). Tuned Models of Peer Assessment in MOOCs. CoRR abs/1307.2579. arXiv:1307.2579.
  40. Ramachandran, L., & Gehringer, E.F. (2011). Automated assessment of review quality using latent semantic analysis. In Advanced Learning Technologies, IEEE International Conference on, vol 0. (pp. 136–138). doi:10.1109/ICALT.2011.46
  41. Ramachandran, L., & Gehringer, E.F. (2013). A user study on the automated assessment of reviews. In Proceedings of the Workshops at the 16th International Conference on Artificial Intelligence in Education AIED 2013, Memphis, USA, July 9-13, 2013.
  42. Ramachandran, L., & Gehringer, E.F. (2015). Identifying content patterns in peer reviews using graph-based Cohesion. In Proceedings of the Twenty-Eighth International Florida Artificial Intelligence Research Society Conference, FLAIRS 2015, Hollywood, Florida. May 18-20, 2015 (pp. 269–275).Google Scholar
  43. Ramachandran, L., Gehringer, E., & Ravi, Y. (2016). Automated assessment of the quality of peer reviews using natural language processing Techniques. International Journal of Artificial Intelligence in Education (Special Issue on Formative Feedback).Google Scholar
  44. Razzaq, L., & Heffernan, N.T. (2010). Hints: Is it better to give or wait to be asked?. In Proceedings of the 10th International Conference on Intelligent Tutoring Systems - Volume Part I, Springer-Verlag, Berlin, Heidelberg, ITS’10. (pp. 349–358). doi:10.1007/978-3-642-13388-6_39
  45. Shute, V.J. (2008). Focus on formative feedback. Review of Educational Research, 78(1), 153–189. doi:10.3102/0034654307313795,
  46. Steendam, E.V., Rijlaarsdam, G., Sercu, L., & Bergh, HVd (2010). The effect of instruction type and dyadic or individual emulation on the quality of higher-order peer feedback in {EFL}. Learning and Instruction, 20 (4), 316–327. doi:10.1016/j.learninstruc.2009.08.009,, unravelling Peer Assessment.
  47. Strijbos, J.W., Narciss, S., & Duennebier, K. (2010). Peer feedback content and sender’s competence level in academic writing revision tasks: Are they critical for feedback perceptions and efficiency?. Learning and Instruction, 20(4), 291–303. doi:10.1016/j.learninstruc.2009.08.008.
  48. Topping, K.J. (2009). Peer assessment. Theory Into Practice, 48(1), 20–27. doi:10.1080/00405840802577569.
  49. Waters, A.E., Tinapple, D., & Baraniuk, R.G. (2015). BayesRank: A Bayesian approach to ranked peer grading. In Proceedings of the Second (2015) ACM Conference on Learning @ Scale, ACM, New York, NY, USA, L@S ’15. (pp. 177–183). doi:10.1145/2724660.2724672
  50. Xiong, W., & Litman, D. (2010). Identifying problem localization in peer-review feedback. In Intelligent Tutoring Systems, Springer Berlin Heidelberg, Lecture Notes in Computer Science, vol 6095. (pp. 429–431).
  51. Xiong, W., & Litman, D. (2011). Automatically predicting peer-review helpfulness. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers - Volume 2, Association for Computational Linguistics, Stroudsburg, PA, USA, HLT ’11. (pp. 502–507).
  52. Xiong, W., Litman, D.J., & Schunn, C.D. (2010). Assessing reviewer’s performance based on mining problem localization in peer-review data. ERIC.Google Scholar
  53. Xiong, W., Litman, D., & Schunn, C. (2012). Natural language processing techniques for researching and improving peer feedback. Journal of Writing Research, 4(2), 155–176. doi:10.17239/jowr-2012.04.02.3,, query date: 2015-05-24.

Copyright information

© International Artificial Intelligence in Education Society 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of PittsburghPittsburghUSA
  2. 2.IBM WatsonYorktown HeightsUSA
  3. 3.Department of Computer Science and Learning Research and Development CenterUniversity of PittsburghPittsburghUSA

Personalised recommendations