Abstract
Natural Language Processing (NLP) is playing an increasingly important role in learning and assessments. Some typical applications of NLP in education include automated scoring, automated item generation, conversation-based assessments, writing assistants, text mining for education, and so on. In this chapter, we aim at introducing some basics of NLP through two typical applications in educational contexts, text mining and automated scoring. We hope readers can get an overall picture of NLP and get familiarized with some basic tools for handling natural language data, which may serve as stepping stones for their future work with NLP.
The R or Python codes can be found at the GitHub repository of this book: https://github.com/jgbrainstorm/computational_psychometrics
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The term ‘raw text’ refers to the form in which a text appears in the ‘real-world’ electronic application (e.g. a word-processor file, an electronic file of a student’s essay, a text from a webpage, etc.). Usually, it is presumed that the extra markup is already stripped off (e.g. HTML or XML markup), so ‘raw text’ often means just the long string of the text characters.
- 2.
Text ‘cleaning’ is a rather subjective notion – which components in a text should be discarded and which should be retained depends largely on the particular application. There are some common trends, though, as illustrated in this section. For example, in many cases punctuation might be discarded.
- 3.
‘Stop words’ are words that are eliminated (filtered out) from a document before further processing. In most cases those are very common words in a language (like the, an, of, out), which are not considered to contribute much to distinguishing the content of a text.
- 4.
Note that n-gram is often discussed in the context of language models.
- 5.
Number of different words by number of all words (the footnote to the left has 9 tokens but only 6 types).
- 6.
In the sense that analysis of each word may require considering neighboring words and possibly also going beyond the sentence boundaries (for example for computing inter-sentence cohesion of a text).
- 7.
A text document is typically represented as a vector which is a sum or an average of the vectors of its words..
- 8.
The specific methods of aggregation or combination may vary.` Some may involve weighted summation of feature values, while some may include complex transformations of feature values. Features that are combined into larger assemblies are called ‘micro-features, and the larger assemblies are called macro-features. In e-Rater, a feature, or an assembly, whose values are used directly in a scoring formula is called ‘macro-features’.
- 9.
A basic approach is to compare a new essay with the best-scoring previous essays, and to derive a score from the similarity to best essays. Another approach is to compare the new essay to previously scored essays from different score points, so a new essay gets a score from the group to which it is most similar. Previously-scored essays here are criterial data, typical scored by expert raters.
Bibliography
Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E. D., Gutierrez, J. B., & Kochut, K. (2017). A brief survey of text mining: Classification, clustering and extraction techniques. arXiv preprint arXiv:1707.02919.
Attali, Y., Bridgeman, B., & Trapani, C. (2010). Performance of a generic approach in automated essay scoring. The Journal of Technology, Learning and Assessment, 10(3).
Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater® V. 2. The Journal of Technology, Learning and Assessment, 4(3).
Basili, R., & Magnini, B. (2012). Natural language processing in the web era. Intelligenza Artificiale, 6(2), 117–119.
Beigman Klebanov, B., Flor, M., & Gyawali, B. (2016). Topicality-based indices for essay scoring. In Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications (BEA11), pages 63–72.
Bennett, R. E., Ward, W. C., Rock, D. A., & LaHart, C. (1990). Toward a framework for constructed-response items. Research report. Educational Testing Service.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993–1022.
Bolshakov, I., & Gelbukh, A. (2004). Computational linguistics: Models, resources, applications. Instituto Politécnico Nacional and Universidad Nacional Autónoma De México.
Briscoe, T., Medlock, B., & Andersen, Ø. (2010). Automated assessment of ESOL free text examinations (No. UCAM-CL-TR-790). University of Cambridge, Computer Laboratory.
Burrows, S., Gurevych, I., & Stein, B. (2015). The eras and trends of automatic short answer grading. International Journal of Artificial Intelligence in Education, 25(1), 60–117.
Chapelle, C. A., Cotos, E., & Lee, J. (2015). Validity arguments for diagnostic assessment using automated writing evaluation. Language Testing, 32(3), 385–405.
Chen, J., Zhang, M., & Bejar, I. I. (2017). An investigation of the e-rater® automated scoring Engine’s grammar, usage, mechanics, and style microfeatures and their aggregation model. ETS Research Report Series, 2017(1), 1–14.
Chiticariu, L., Li, Y., & Reiss, F. (2013). Rule-based information extraction is dead! Long live rule-based information extraction systems!. In Proceedings of the 2013 conference on Empirical Methods in Natural Language Processing, pages 827–832.
Chodorow, M., & Burstein, J. (2004). Beyond essay length: Evaluating e-rater®’s performance on TOEFL® essays. Report RR-04-04, ETS Research Report Series, 2004(1).
Deane, P., & Quinlan, T. (2010). What automated analyses of corpora can tell us about students’ writing skills. Journal of Writing Research, 2(2).
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391–407.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Dikli, S. (2006). An overview of automated scoring of essays. The Journal of Technology, Learning and Assessment, 5(1).
Dikli, S., & Bleyle, S. (2014). Automated Essay Scoring feedback for second language writers: How does it compare to instructor feedback? Assessing Writing, 22, 1–17.
Etzioni, O., Fader, A., Christensen, J., & Soderland, S. (2011, June). Open information extraction: The second generation. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence.
Feldman, R. (2013). Techniques and applications for sentiment analysis. Communications of the ACM, 56(4), 82–89.
Harris, Z. S. (1954). Distributional structure. Word, 10:(2–3), 146–162, https://doi.org/10.1080/00437956.1954.11659520
Horbach, A., & Zesch, T. (2019). The influence of variance in learner answers on automatic content scoring. Frontiers in Education, 4, 28.
Hutchins, J. (1995). Machine translation: A brief history, concise history of the language sciences: From the Sumerians to the cognitivists. Edited by EFK Koerner and RE Asher.
Jurafsky, D & Martin, J. H. (2008). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall.
Jurafsky, D & Martin, J. H. (2019). Speech and Language Processing (3rd ed. draft). https://web.stanford.edu/~jurafsky/slp3/
Kakkonen, T., Myller, N., Timonen, J., & Sutinen, E. (2005, June). Automatic essay grading with probabilistic latent semantic analysis. In Proceedings of the second workshop on building educational applications using NLP (pp. 29–36).
Ke, Z., & Ng, V. (2019, August). Automated essay scoring: A survey of the state of the art. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (pp. 6300–6308). AAAI Press.
Kolomiyets, O., & Moens, M. F. (2011). A survey on question answering technology from an information retrieval perspective. Information Sciences, 181(24), 5412–5434.
Landauer, T. K., Foltz, P. W., & Laham, D. (1998). Introduction to latent semantic analysis. Discourse Processes, 25, 259–284.
Landauer, T. K., Laham, D., & Foltz, P. W. (2003). Automated scoring and annotation of essays with the Intelligent Essay Assessor. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 87–112). Erlbaum.
Leacock, C., & Chodorow, M. (2003). C-rater: Automated scoring of short-answer questions. Computers and the Humanities, 37(4), 389–405.
Levy, O., & Goldberg, Y. (2014). Neural word embedding as implicit matrix factorization. In Proceedings of Advances in Neural Information Processing Systems, NeurIPS 2014. https://proceedings.neurips.cc/paper/2014/file/feab05aa91085b7a8012516bc3533958-Paper.pdf
Li, H. (2017). Deep learning for natural language processing: Advantages and challenges. National Science Review, 5(1), 24–26. https://doi.org/10.1093/nsr/nwx110
Madnani, N., Loukina, A., & Cahill, A. (2017, September). A large scale quantitative exploration of modeling strategies for content scoring. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications (pp. 457–467).
Manning, C. D. (2015). Computational linguistics and deep learning. Computational Linguistics, 41(4), 701–707.
Manning, C. D., Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing. MIT press.
Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge University Press.
McNamara, D. S., Crossley, S. A., Rosco, R. D., Allen, L. K., & Dai, J. (2015). A hierarchical classification approach to automated essay scoring. Assessing Writing, 23, 35–59.
Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., & Joulin, A. (2017). Advances in pre-training distributed word representations. arXiv preprint arXiv:1712.09405.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pp. 3111–3119.
Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., & Joulin, A. (2018). Advances in pre-training distributed word representations. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan. https://www.aclweb.org/anthology/L18-1008
Mosteller, F., & Wallace, D. L. (1963). Inference in an authorship problem. Journal of the American Statistical Association, 58(302), 275–309.
Okoye, I., Bethard, S., & Sumner, T. (2013, June). CU: Computational assessment of short free text answers-a tool for evaluating students’ understanding. In Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), at the Second Joint Conference on Lexical and Computational Semantics (* SEM), pages. 603–607.
Page, E. B. (1966). The imminence of … grading essays by computer. Phi Delta Kappan, 47(5), 238–243.
Page, E. B., & Petersen, N. S. (1995). The computer moves into essay grading: Updating the ancient test. Phi Delta Kappan, 76(7), 561.
Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1–2), 1–135.
Pennington, J., Socher, R., & Manning, C. D. (2014). GLoVe: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543.
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprint arXiv:1802.05365.
Quinlan, T., Higgins, D., & Wolff, S. (2009). Evaluating the Construct-Coverage of the e-rater® Scoring Engine. Report RR-09-01, ETS Research Report Series. : Educational Testing Service.
Ramineni, C., & Williamson, D. M. (2013). Automated essay scoring: Psychometric guidelines and practices. Assessing Writing, 18(1), 25–39.
Riordan, B., Horbach, A., Cahill, A., Zesch, T., & Lee, C. (2017). Investigating neural architectures for short answer scoring. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications (pp. 159–168).
Rudner, L. M., & Liang, T. (2002). Automated essay scoring using Bayes' Theorem. The Journal of Technology, Learning and Assessment, 1(2), 3–21. https://ejournals.bc.edu/index.php/jtla/article/view/1668
Shermis, M. D. (2015). Contrasting state-of-the-art in the machine scoring of short-form constructed responses. Educational Assessment, 20(1), 46–65.
Shermis, M. D., & Burstein, J. (Eds.). (2013). Handbook of automated essay evaluation: Current applications and new directions. Routledge.
Shermis, M. D., Burstein, J., Higgins, D., & Zechner, K. (2010). Automated essay scoring: Writing assessment and instruction. In P. Peterson, E. Baker, & B. McGaw (Eds.), International Encyclopedia of Education (Vol. 4, 3rd ed., pp. 20–26). Elsevier.
Slocum, J. (1985). A survey of machine translation: Its history, current status and future prospects. Computational Linguistics, 11(1), 1–17. https://www.aclweb.org/anthology/J85-1001
Stevenson, M. (2016). A critical interpretative synthesis: The integration of automated writing evaluation into classroom writing instruction. Computers and Composition, 42, 1–16.
Sukkarieh, J. Z., & Blackmore, J. (2009). C-rater: Automatic content scoring for short constructed responses. In Proceedings of the 22nd International Florida Artificial Intelligence Research Society Conference, pages 290–295.
Tur, G., & De Mori, R. (2011). Spoken language understanding: Systems for extracting semantic information from speech. John Wiley & Sons.
Valenti, S., Neri, F., & Cucchiarelli, A. (2003). An overview of current research on automated essay grading. Journal of Information Technology Education, 2. https://doi.org/10.28945/331
Vitartas, P., Heath, J., Midford, S., Ong, K. L., Alahakoon, D., & Sullivan-Mort, G. (2016). Applications of automatic writing evaluation to guide the understanding of learning and teaching. In Proceedings of 33rd International Conference of Innovation, Practice and Research in the Use of Educational Technologies in Tertiary Education, pages. 592–601.
Wang, S. (2005). Corpus-based approaches and discourse analysis in relation to reduplication and repetition. Journal pf Pragmatics, 37, 505–540.
Weigle, S. C. (2013). English as a second language writing and automated essay evaluation. In M. D. Shermis & J. Burstein (Eds.), Handbook of automated essay evaluation (pp. 58–76). Routledge.
Wikipedia Contributors. (2019, June 3). Text mining. In Wikipedia, The Free Encyclopedia. Retrieved 18:10, June 26, 2019, from https://en.wikipedia.org/w/index.php?title=Text_mining&oldid=900109344
Williamson, D. M., Bejar, I. I., & Hone, A. S. (1999). ‘Mental model’ comparison of automated and human scoring. Journal of Educational Measurement, 36(2), 158–184.
Williamson, D. M., Xi, X., & Breyer, F. J. (2012). A framework for evaluation and use of automated scoring. Educational Measurement: Issues and Practice, 31(1), 2–13.
Wilson, J., & Czik, A. (2016). Automated essay evaluation software in English language arts classrooms: Effects on teacher feedback, student motivation, and writing quality. Computers & Education, 100, 94–109.
Zupanc, K., & Bosnic, Z. (2016). Advances in the field of automated essay evaluation. Informatica, 39(4), 383–395.
Acknowledgement
The authors thank Beata Beigman Klebanov, Aoife Cahill and Isaac Bejar for helpful comments on the manuscript, and Rick Meisner for copyediting.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Flor, M., Hao, J. (2021). Text Mining and Automated Scoring. In: von Davier, A.A., Mislevy, R.J., Hao, J. (eds) Computational Psychometrics: New Methodologies for a New Generation of Digital Learning and Assessment. Methodology of Educational Measurement and Assessment. Springer, Cham. https://doi.org/10.1007/978-3-030-74394-9_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-74394-9_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-74393-2
Online ISBN: 978-3-030-74394-9
eBook Packages: EducationEducation (R0)