Metrics for Automated Review Classification: What Review Data Show
Peer review is only effective if reviews are of high quality. In a large class, it is unrealistic for the course staff to evaluate all reviews, so a scalable assessment mechanism is needed. In an automated system, several metrics can be calculated for each review. One of these metrics is volume, which is simply the number of distinct words used in the review. Another is tone, which can be positive (e.g., praise), negative (e.g., disapproval), or neutral. A third is content, which we divide into three subtypes: summative, advisory, and problem detection. These metrics can be used to rate reviews, either singly or in combination. This paper compares the automated metrics for hundreds of reviews from the Expertiza system with scores manually assigned by the course staff. Almost all of the automatic metrics are positively correlated with manually assigned scores, but many of the correlations are weak. Another issue is how the review rubric influences review content. A more detailed rubric draws the reviewer’s attention to more characteristics of an author’s work. But ultimately, the author will benefit most from advisory or problem detection review text. And filling out a long rubric may distract the reviewer from providing textual feedback to the author. The data fail to show clear evidence that this effect occurs.
KeywordsPeer review systems Rubrics Automated metareviewing
This work has been supported by the U.S. National Science Foundation under grant 1432347.
- 1.Gehringer, E. F. (2009). Expertiza: Information management for collaborative learning. In: A. A. Juan Perez (Ed.), Monitoring and assessment in online collaborative environments: Emergent computational technologies for e-learning support. Hershey: IGI Global Press.Google Scholar
- 2.Ramachandran, L. (2013). Automated assessment of reviews. Ph.D. dissertation, North Carolina State University, May 2013.Google Scholar
- 3.Ramachandran, L., & Gehringer, E. F. Automated assessment of the quality of peer reviews using natural language processing techniques, submitted to International Journal of Artificial Intelligence in Education. Google Scholar
- 4.Gehringer, E. F., & Peddycord, B. W. (2013). Grading by experience points: An example from computer ethics. In: Proceedings of Frontiers in Education 2013, Oklahoma City, OK, October 23–26.Google Scholar
- 5.Xiong, W., & Litman, D. (2011). Automatically predicting peer-review helpfulness. Short paper presented at The 49th Annual Meeting of the Association for Computational Linguistics. Portland, Oregon: Human Language Technologies (ACL-HLT).Google Scholar
- 6.Xiong, W. & Litman, D. (2014). Empirical analysis of exploiting review helpfulness for extractive summarization of online reviews. In: The 25th International Conference on Computational Linguistics (COLING 2014), Dublin, Ireland.Google Scholar
- 7.Liu, Y., Huang, X., An, A., & Yu, X. (2008). Modeling and predicting the helpfulness of online reviews. In Eighth IEEE International Conference on Data Mining, 2008. ICDM’08, pp. 443–452. New York: IEEE.Google Scholar
- 8.Zhuang, L., Jing, F., & Zhu, X. (2006). Movie review mining and summarization. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pp 43–50. New York City: ACM.Google Scholar
- 9.Lu, Y., Tsaparas, P., Ntoulas, A., & Polanyi, L. (2010). Exploiting social context for review quality prediction. In: Proceedings of the 19th International conference on World wide web (WWW’10), pp. 691–700. New York, NY, USA: ACM. doi: 10.1145/1772690.1772761
- 10.Palanski, M., Babik, D., & Ford, E. (2014). Mobius SLIP: Anonymous, peer-reviewed student writing. OBTC 2014 at Vanderbilt University.Google Scholar
- 11.de Alfaro, L., & Shavlovsky, M. (2014). CrowdGrader: A tool for crowdsourcing the evaluation of homework assignments. In: Proceedings of the 45th ACM technical symposium on Computer science education (SIGCSE’14), pp. 415–420. New York, NY, USA: ACM. doi: 10.1145/2538862.2538900, http://doi.acm.org/10.1145/2538862.2538900