Reflective Writing About the Utility Value of Science as a Tool for Increasing STEM Motivation and Retention – Can AI Help Scale Up?

  • Beata Beigman KlebanovEmail author
  • Jill Burstein
  • Judith M. Harackiewicz
  • Stacy J. Priniski
  • Matthew Mulholland


The integration of subject matter learning with reading and writing skills takes place in multiple ways. Students learn to read, interpret, and write texts in the discipline-relevant genres. However, writing can be used not only for the purposes of practice in professional communication, but also as an opportunity to reflect on the learned material. In this paper, we address a writing intervention – Utility Value (UV) intervention – that has been shown to be effective for promoting interest and retention in STEM subjects in laboratory studies and field experiments. We conduct a detailed investigation into the potential of natural language processing technology to support evaluation of such writing at scale: We devise a set of features that characterize UV writing across different genres, present common themes, and evaluate UV scoring models using essays on known and new biology topics. The automated UV scoring results are, we believe, promising, especially for the personal essay genre.


Intrapersonal factors Motivation Automated writing evaluation Utility value Natural language processing Machine learning 


  1. Abouelenien, M., Perez-Rosas, V., Mihalcea, R., & Burzo, M. (2014). Deception detection using a multimodal approach. In Proceedings of the 16th ACM international conference on multimodal interaction (pp. 58–65). New York: ACM.Google Scholar
  2. Aull, L. L., & Lancaster, Z. (2014). Linguistic markers of stance in early and advanced academic writing: a corpus-based comparison. Written Communication, 31, 151–183.CrossRefGoogle Scholar
  3. Beauchamp, C., & Thomas, L. (2010). Reflecting on an ideal: student teachers envision a future identity. Reflective Practice, 11, 631–643.CrossRefGoogle Scholar
  4. Beigman Klebanov, B., Beigman, E., & Diermeier, D. (2010). Vocabulary choice as an indicator of perspective. In Proceedings of the 48th annual meeting of the association for computational linguistics (pp. 253–257). Uppsala, Sweden: Association for Computational Linguistics.Google Scholar
  5. Beigman Klebanov, B., Diermeier, D., & Beigman, E. (2008). Automatic annotation of semantic fields for political science research. Journal of Information Technology and Politics, 5(1), 95–120.CrossRefGoogle Scholar
  6. Beigman Klebanov, B., & Flor, M. (2013). Word association profiles and their use for automated scoring of essays. In Proceedings of the 51st annual meeting of the association for computational linguistics (pp. 1148–1158). Sofia, Bulgaria: Association for Computational Linguistics.Google Scholar
  7. Beigman Klebanov, B., Madnani, N., Burstein, J., & Somasundaran, S. (2014). Content importance models for scoring writing from sources. In Proceedings of the 52nd annual meeting of the association for computational linguistics (pp. 247–252). Baltimore, MD: Association for Computational Linguistics.Google Scholar
  8. Biber, D., & Conrad, S. (2009). Register, genre, and style. Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
  9. Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.CrossRefzbMATHGoogle Scholar
  10. Brewer, C., & Smith, D. (2011). Vision and change in undergraduate biology education: a call to action.
  11. Brown, E., Smith, J., Thoman, D., Allen, J., & Muragishi, G. (2015). From bench to bedside: a communal utility value intervention to enhance students’ biomedical science motivation. Journal of Educational Psychology, 107(4), 1116–1135.CrossRefGoogle Scholar
  12. Burstein, J., Chodorow, M., & Leacock, C. (2004). Automated essay evaluation: the criterion service. AI Magazine, 25(3), 27–36.Google Scholar
  13. Burstein, J., Kukich, K., Wolff, S., Lu, J., & Chodorow, M. (1998). Enriching automated essay scoring using discourse marking. In Proceedings of the ACL workshop on discourse relations and discourse marking (pp. 15–21). Montréal, Canada: Association for Computational Linguistics.Google Scholar
  14. Burstein, J., Marcu, D., & Knight, K. (2003). Finding the write stuff: automatic identification of discourse structure in student essays. IEEE Intelligent Systems, 18 (1), 32–39.CrossRefGoogle Scholar
  15. Burstein, J., Tetreault, J., & Chodorow, M. (2013a). Holistic discourse coherence annotation for noisy essay writing. Dialogue and Discourse, 4(2), 34–52.Google Scholar
  16. Burstein, J., Tetreault, J., & Madnani, N. (2013b). The e-rater®; automated essay scoring system. In Shermis, M., & Burstein, J. (Eds.) Handbook of automated essay scoring: current applications and future directions. New York: Routledge.Google Scholar
  17. Canning, E., & Harackiewicz, J. (2015). Teach it, don’t preach it: the differential effects of directly communicated and self-generated utility-value information. Motivation Science, 1, 47–71.CrossRefGoogle Scholar
  18. Coltheart, M. (1981). The MRC psycholinguistic database. Quarterly Journal of Experimental Psychology, 33A, 497–505.CrossRefGoogle Scholar
  19. Conway, P. F. (2001). Anticipatory reflection while learning to teach: from a temporally truncated to a temporally distributed model of reflection in teacher education. Teaching and Teacher Education, 17, 89–106.CrossRefGoogle Scholar
  20. Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213–238.CrossRefGoogle Scholar
  21. Durik, A. M., & Harackiewicz, J. M. (2007). Different strokes for different folks: how personal interest moderates the effects of situational factors on task interest. Journal of Educational Psychology, 99, 597–610.CrossRefGoogle Scholar
  22. Eccles, J. (2009). Who am I and what am I going to do with my life? Personal and collective identities as motivators of action. Educational Psychologist, 44, 78–89.CrossRefGoogle Scholar
  23. Eccles, J., Adler, T., Futterman, R., Goff, S., Kaczala, C., & Meece, J. (1983). Expectations, values and academic behaviors. In Spence, J. T. (Ed.) Perspective on achievement and achievement motivation (pp. 75–146). San Francisco, CA: W. H. Freeman.Google Scholar
  24. Falakmasir, M. H., Ashley, K. D., Schunn, C. D., & Litman, D. J. (2014). Identifying thesis and conclusion statements in student essays to scaffold peer review. In Proceedings of the 12th international conference on intelligent tutoring systems (pp. 254–259). Honolulu, Hawaii: Springer International Publishing.Google Scholar
  25. Foltz, P., Streeter, L., Lochbaum, K., & Landauer, T. (2013). Implementation and application of the intelligent essay assessor. In Shermis, M., & Burstein, J. (Eds.) Handbook of automated essay evaluation: current applications and new directions (pp. 68–88). New York: Routhledge.Google Scholar
  26. Gaspard, H., Dicke, A., Flunger, B., Brisson, M., Hafner, I., Nagengast, B., & Trautwein, U. (2015). Fostering adolescents’ value beliefs for mathematics with a relevance intervention in the classroom. Developmental Psychology, 51, 1226–1240.CrossRefGoogle Scholar
  27. Greene, S., & Resnik, P. (2009). More than words: syntactic packaging and implicit sentiment. In Proceedings of human language technologies: the 2009 annual conference of the north american chapter of the association for computational linguistics (pp. 503–511). Boulder, Colorado: Association for Computational Linguistics.Google Scholar
  28. Grossman, R. (2008). Structures for facilitating student reflection. College Teaching, 57, 15–22.CrossRefGoogle Scholar
  29. Gunel, M., Hand, B., & McDermott, M. A. (2009). Writing for different audiences: effects on high-school students’ conceptual understanding of biology. Learning and Instruction, 19(4), 354– 367.CrossRefGoogle Scholar
  30. Gunel, M., Hand, B., & Prain, V. (2007). Writing for learning in science: a secondary analysis of six studies. International Journal of Science and Mathematics Education, 5, 615–637.CrossRefGoogle Scholar
  31. Harackiewicz, J., Canning, E., Tibbetts, Y., Priniski, S., & Hyde, J. (2016). Closing achievement gaps with a utility-value intervention: Disentangling race and social class. Journal of Personality and Social Psychology, 111(5), 745–765.CrossRefGoogle Scholar
  32. Harackiewicz, J., Durik, A., Barron, K., Linnenbrink-Garcia, L., & Tauer, J. (2008). The role of achievement goals in the development of interest: reciprocal relations between achievement goals, interest, and performance. Journal of Educational Psychology, 100, 105–122.CrossRefGoogle Scholar
  33. Harackiewicz, J., Tibbetts, Y., Canning, E., & Hyde, J. (2014). Harnessing values to promote motivation in education. In Karabenick, S., & Urden, T. (Eds.) Advances in motivation and achievement (pp. 71–105). Bingley, UK: Emerald Group Publishing Limited.Google Scholar
  34. Hidi, S., & Harackiewicz, J. M. (2000). Motivating the academically unmotivated: a critical issue for the 21st century. Review of Educational Research, 70, 151–179.CrossRefGoogle Scholar
  35. Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. In Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 168–177). Seattle, Washington: ACM.Google Scholar
  36. Hulleman, C., Godes, O., Hendricks, B., & Harackiewicz, J. (2010). Enhancing interest and performance with a utility value intervention. Journal of Educational Psychology, 102, 880–895.CrossRefGoogle Scholar
  37. Hulleman, C., & Harackiewicz, J. (2009). Promoting interest and performance in high school science classes. Science, 326, 1410–1412.CrossRefGoogle Scholar
  38. Hulleman, C. S., Durik, A. M., Schweigert, S. A., & Harackiewicz, J. M. (2008). Task values, achievement goals, and interest: an integrative analysis. Journal of Educational Psychology, 100, 398– 416.CrossRefGoogle Scholar
  39. Leacock, C., Tetreault, J., Gamon, M., & Chodorow, M. (2014). Automated grammatical error detection for language learners, 2nd edn. Morgan & Claypool Publishers: San Rafael, CA.Google Scholar
  40. Mihalcea, R., & Strapparava, C. (2009). The lie detector: explorations in the automatic recognition of deceptive language. In Proceedings of the 47th annual meeting of the association for computational linguistics (pp. 309–312). Singapore: Association for Computational Linguistics.Google Scholar
  41. Miltsakaki, E., & Kukich, K. (2004). Evaluation of text coherence for electronic essay scoring systems. Natural Language Engineering, 10, 25–55.CrossRefGoogle Scholar
  42. Mulholland, M., & Quinn, J. (2013). Suicidal tendencies: the automatic classification of suicidal and non-suicidal lyricists using NLP. In Proceedings of the sixth international joint conference on natural language processing (pp. 680–684). Nagoya, Japan: Asian Federation of Natural Language Processing.Google Scholar
  43. NCES (2013). NCES 2013-152: STEM in postsecondary education: entrance, attrition, and coursetaking among 2003-04 beginning postsecondary students.
  44. NCES (2014). NCES 2014-001: STEM attrition: college students’ paths into and out of STEM fields.
  45. Neviarouskaya, A., Prendinger, H., & Ishizuka, M. (2010). Recognition of affect, judgment, and appreciation in text. In Proceedings of the 23rd international conference on computational linguistics (pp. 806–814). Beijing, China: COLING 2010 Organizing Committee.Google Scholar
  46. PCAST (2012). Engage to excel: producing one million additional college graduates with degrees in science, technology, engineering, and mathematics.
  47. Pedersen, T. (2015). Screening Twitter users for depression and PTSD with lexical decision lists. In Proceedings of the 2nd workshop on computational linguistics and clinical psychology: from linguistic signal to clinical reality (pp. 46–53). Denver, Colorado: Association for Computational Linguistics.Google Scholar
  48. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.zbMATHMathSciNetGoogle Scholar
  49. Pennebaker, J., Boyd, R., Jordan, K., & Blackburn, K. (2015). The development and psychometric properties of LIWC2015. Austin, TX: University of Texas at Austin.Google Scholar
  50. Pérez-Rosas, V., & Mihalcea, R. (2014). Cross-cultural deception detection. In Proceedings of the 52nd annual meeting of the association for computational linguistics (volume 2: Short papers) (pp. 440–445). Baltimore, Maryland: Association for Computational Linguistics.Google Scholar
  51. Persing, I., Davis, A., & Ng, V. (2010). Modeling organization in student essays. In Proceedings of the 2010 conference on empirical methods in natural language processing, EMNLP ’10 (pp. 229–239). Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
  52. Pintrich, P. (2003). A motivational science perspective on the role of student motivation in learning and teaching contexts. Journal of Educational Psychology, 95 (4), 667–686.CrossRefGoogle Scholar
  53. Prain, V., & Hand, B. (2016). Coming to know more through and from writing. Educational Researcher, 45, 430–434.CrossRefGoogle Scholar
  54. Rahimi, Z., Litman, D. J., Correnti, R., Matsumura, L. C., Wang, E., & Kisa, Z. (2014). Automatic scoring of an analytical response-to-text assessment. In 12th international conference on intelligent tutoring systems (ITS) (pp. 601–610). Honolulu, Hawaii: Springer International Publishing.Google Scholar
  55. Ranganath, R., Jurafsky, D., & McFarland, D. (2009). It’s not you, it’s me: detecting flirting and its misperception in speed-dates. In Proceedings of the 2009 conference on empirical methods in natural language processing (pp. 334–342). Singapore: Association for Computational Linguistics.Google Scholar
  56. Resnik, P., Garron, A., & Resnik, R. (2013). Using topic modeling to improve prediction of neuroticism and depression in college students. In Proceedings of the 2013 conference on empirical methods in natural language processing (pp. 1348–1353). Seattle, Washington, USA: Association for Computational Linguistics.Google Scholar
  57. Ripley, B. D. (1996). Pattern recognition and neural networks. Cambridge: Cambridge University Press.CrossRefzbMATHGoogle Scholar
  58. Smith, J., Brown, E., Thoman, D., & Deemer, E. (2015). Losing its expected communal value: how stereotype threat undermines women’s identity as research scientists. Social Psychology of Education, 18, 443–466.CrossRefGoogle Scholar
  59. Somasundaran, S., Burstein, J., & Chodorow, M. (2014). Lexical chaining for measuring discourse coherence quality in test-taker essays. In Proceedings of the 25th international conference on computational linguistics (pp. 950–961). Dublin, Ireland: The COLING Organizing Committee.Google Scholar
  60. Stark, A., Shafran, I., & Kaye, J. (2012). Hello, who is calling?: can words reveal the social nature of conversations?. In Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: human language technologies (pp. 112–119). Montréal, Canada: Association for Computational Linguistics.Google Scholar
  61. Wigfield, A. (1994). Expectancy-value theory of achievement motivation: a developmental perspective. Educational Psychology Review, 6, 49–78.CrossRefGoogle Scholar
  62. Xiong, W., Litman, D., & Schunn, C. (2012). Natural language processing techniques for researching and improving peer feedback. Journal of Writing Research, 4(2), 155–176.CrossRefGoogle Scholar
  63. Yannakoudakis, H., & Briscoe, T. (2012). Modeling coherence in ESOL learner texts. In Proceedings of the 7th workshop on building educational applications using NLP (pp. 33–43). Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar

Copyright information

© International Artificial Intelligence in Education Society 2017

Authors and Affiliations

  • Beata Beigman Klebanov
    • 1
    Email author
  • Jill Burstein
    • 1
  • Judith M. Harackiewicz
    • 2
  • Stacy J. Priniski
    • 2
  • Matthew Mulholland
    • 1
  1. 1.Educational Testing ServicePrincetonUSA
  2. 2.University of WisconsinMadisonUSA

Personalised recommendations