Text Mining and Automated Scoring

Flor, Michael; Hao, Jiangang

doi:10.1007/978-3-030-74394-9_14

Michael Flor¹² &
Jiangang Hao¹²

Part of the book series: Methodology of Educational Measurement and Assessment ((MEMA))

1309 Accesses
3 Citations
1 Altmetric

Abstract

Natural Language Processing (NLP) is playing an increasingly important role in learning and assessments. Some typical applications of NLP in education include automated scoring, automated item generation, conversation-based assessments, writing assistants, text mining for education, and so on. In this chapter, we aim at introducing some basics of NLP through two typical applications in educational contexts, text mining and automated scoring. We hope readers can get an overall picture of NLP and get familiarized with some basic tools for handling natural language data, which may serve as stepping stones for their future work with NLP.

The R or Python codes can be found at the GitHub repository of this book: https://github.com/jgbrainstorm/computational_psychometrics

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Natural language processing in educational research: The evolution of research topics

Article 23 May 2024

Progress and Challenges for Automated Scoring and Feedback Systems for Large-Scale Assessments

Notes

1.
The term ‘raw text’ refers to the form in which a text appears in the ‘real-world’ electronic application (e.g. a word-processor file, an electronic file of a student’s essay, a text from a webpage, etc.). Usually, it is presumed that the extra markup is already stripped off (e.g. HTML or XML markup), so ‘raw text’ often means just the long string of the text characters.
2.
Text ‘cleaning’ is a rather subjective notion – which components in a text should be discarded and which should be retained depends largely on the particular application. There are some common trends, though, as illustrated in this section. For example, in many cases punctuation might be discarded.
3.
‘Stop words’ are words that are eliminated (filtered out) from a document before further processing. In most cases those are very common words in a language (like the, an, of, out), which are not considered to contribute much to distinguishing the content of a text.
4.
Note that n-gram is often discussed in the context of language models.
5.
Number of different words by number of all words (the footnote to the left has 9 tokens but only 6 types).
6.
In the sense that analysis of each word may require considering neighboring words and possibly also going beyond the sentence boundaries (for example for computing inter-sentence cohesion of a text).
7.
A text document is typically represented as a vector which is a sum or an average of the vectors of its words..
8.
The specific methods of aggregation or combination may vary.` Some may involve weighted summation of feature values, while some may include complex transformations of feature values. Features that are combined into larger assemblies are called ‘micro-features, and the larger assemblies are called macro-features. In e-Rater, a feature, or an assembly, whose values are used directly in a scoring formula is called ‘macro-features’.
9.
A basic approach is to compare a new essay with the best-scoring previous essays, and to derive a score from the similarity to best essays. Another approach is to compare the new essay to previously scored essays from different score points, so a new essay gets a score from the group to which it is most similar. Previously-scored essays here are criterial data, typical scored by expert raters.

Bibliography

Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E. D., Gutierrez, J. B., & Kochut, K. (2017). A brief survey of text mining: Classification, clustering and extraction techniques. arXiv preprint arXiv:1707.02919.
Google Scholar
Attali, Y., Bridgeman, B., & Trapani, C. (2010). Performance of a generic approach in automated essay scoring. The Journal of Technology, Learning and Assessment, 10(3).
Google Scholar
Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater® V. 2. The Journal of Technology, Learning and Assessment, 4(3).
Google Scholar
Basili, R., & Magnini, B. (2012). Natural language processing in the web era. Intelligenza Artificiale, 6(2), 117–119.
Article Google Scholar
Beigman Klebanov, B., Flor, M., & Gyawali, B. (2016). Topicality-based indices for essay scoring. In Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications (BEA11), pages 63–72.
Google Scholar
Bennett, R. E., Ward, W. C., Rock, D. A., & LaHart, C. (1990). Toward a framework for constructed-response items. Research report. Educational Testing Service.
Google Scholar
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993–1022.
Google Scholar
Bolshakov, I., & Gelbukh, A. (2004). Computational linguistics: Models, resources, applications. Instituto Politécnico Nacional and Universidad Nacional Autónoma De México.
Google Scholar
Briscoe, T., Medlock, B., & Andersen, Ø. (2010). Automated assessment of ESOL free text examinations (No. UCAM-CL-TR-790). University of Cambridge, Computer Laboratory.
Google Scholar
Burrows, S., Gurevych, I., & Stein, B. (2015). The eras and trends of automatic short answer grading. International Journal of Artificial Intelligence in Education, 25(1), 60–117.
Article Google Scholar
Chapelle, C. A., Cotos, E., & Lee, J. (2015). Validity arguments for diagnostic assessment using automated writing evaluation. Language Testing, 32(3), 385–405.
Article Google Scholar
Chen, J., Zhang, M., & Bejar, I. I. (2017). An investigation of the e-rater® automated scoring Engine’s grammar, usage, mechanics, and style microfeatures and their aggregation model. ETS Research Report Series, 2017(1), 1–14.
Article Google Scholar
Chiticariu, L., Li, Y., & Reiss, F. (2013). Rule-based information extraction is dead! Long live rule-based information extraction systems!. In Proceedings of the 2013 conference on Empirical Methods in Natural Language Processing, pages 827–832.
Google Scholar
Chodorow, M., & Burstein, J. (2004). Beyond essay length: Evaluating e-rater®’s performance on TOEFL® essays. Report RR-04-04, ETS Research Report Series, 2004(1).
Google Scholar
Deane, P., & Quinlan, T. (2010). What automated analyses of corpora can tell us about students’ writing skills. Journal of Writing Research, 2(2).
Google Scholar
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391–407.
Article Google Scholar
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Google Scholar
Dikli, S. (2006). An overview of automated scoring of essays. The Journal of Technology, Learning and Assessment, 5(1).
Google Scholar
Dikli, S., & Bleyle, S. (2014). Automated Essay Scoring feedback for second language writers: How does it compare to instructor feedback? Assessing Writing, 22, 1–17.
Article Google Scholar
Etzioni, O., Fader, A., Christensen, J., & Soderland, S. (2011, June). Open information extraction: The second generation. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence.
Google Scholar
Feldman, R. (2013). Techniques and applications for sentiment analysis. Communications of the ACM, 56(4), 82–89.
Article Google Scholar
Harris, Z. S. (1954). Distributional structure. Word, 10:(2–3), 146–162, https://doi.org/10.1080/00437956.1954.11659520
Horbach, A., & Zesch, T. (2019). The influence of variance in learner answers on automatic content scoring. Frontiers in Education, 4, 28.
Article Google Scholar
Hutchins, J. (1995). Machine translation: A brief history, concise history of the language sciences: From the Sumerians to the cognitivists. Edited by EFK Koerner and RE Asher.
Google Scholar
Jurafsky, D & Martin, J. H. (2008). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall.
Google Scholar
Jurafsky, D & Martin, J. H. (2019). Speech and Language Processing (3rd ed. draft). https://web.stanford.edu/~jurafsky/slp3/
Kakkonen, T., Myller, N., Timonen, J., & Sutinen, E. (2005, June). Automatic essay grading with probabilistic latent semantic analysis. In Proceedings of the second workshop on building educational applications using NLP (pp. 29–36).
Google Scholar
Ke, Z., & Ng, V. (2019, August). Automated essay scoring: A survey of the state of the art. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (pp. 6300–6308). AAAI Press.
Google Scholar
Kolomiyets, O., & Moens, M. F. (2011). A survey on question answering technology from an information retrieval perspective. Information Sciences, 181(24), 5412–5434.
Article Google Scholar
Landauer, T. K., Foltz, P. W., & Laham, D. (1998). Introduction to latent semantic analysis. Discourse Processes, 25, 259–284.
Article Google Scholar
Landauer, T. K., Laham, D., & Foltz, P. W. (2003). Automated scoring and annotation of essays with the Intelligent Essay Assessor. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 87–112). Erlbaum.
Google Scholar
Leacock, C., & Chodorow, M. (2003). C-rater: Automated scoring of short-answer questions. Computers and the Humanities, 37(4), 389–405.
Article Google Scholar
Levy, O., & Goldberg, Y. (2014). Neural word embedding as implicit matrix factorization. In Proceedings of Advances in Neural Information Processing Systems, NeurIPS 2014. https://proceedings.neurips.cc/paper/2014/file/feab05aa91085b7a8012516bc3533958-Paper.pdf
Li, H. (2017). Deep learning for natural language processing: Advantages and challenges. National Science Review, 5(1), 24–26. https://doi.org/10.1093/nsr/nwx110
Madnani, N., Loukina, A., & Cahill, A. (2017, September). A large scale quantitative exploration of modeling strategies for content scoring. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications (pp. 457–467).
Google Scholar
Manning, C. D. (2015). Computational linguistics and deep learning. Computational Linguistics, 41(4), 701–707.
Article Google Scholar
Manning, C. D., Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing. MIT press.
Google Scholar
Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge University Press.
Book Google Scholar
McNamara, D. S., Crossley, S. A., Rosco, R. D., Allen, L. K., & Dai, J. (2015). A hierarchical classification approach to automated essay scoring. Assessing Writing, 23, 35–59.
Google Scholar
Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., & Joulin, A. (2017). Advances in pre-training distributed word representations. arXiv preprint arXiv:1712.09405.
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pp. 3111–3119.
Google Scholar
Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., & Joulin, A. (2018). Advances in pre-training distributed word representations. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan. https://www.aclweb.org/anthology/L18-1008
Mosteller, F., & Wallace, D. L. (1963). Inference in an authorship problem. Journal of the American Statistical Association, 58(302), 275–309.
Google Scholar
Okoye, I., Bethard, S., & Sumner, T. (2013, June). CU: Computational assessment of short free text answers-a tool for evaluating students’ understanding. In Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), at the Second Joint Conference on Lexical and Computational Semantics (* SEM), pages. 603–607.
Google Scholar
Page, E. B. (1966). The imminence of … grading essays by computer. Phi Delta Kappan, 47(5), 238–243.
Google Scholar
Page, E. B., & Petersen, N. S. (1995). The computer moves into essay grading: Updating the ancient test. Phi Delta Kappan, 76(7), 561.
Google Scholar
Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1–2), 1–135.
Article Google Scholar
Pennington, J., Socher, R., & Manning, C. D. (2014). GLoVe: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543.
Google Scholar
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprint arXiv:1802.05365.
Google Scholar
Quinlan, T., Higgins, D., & Wolff, S. (2009). Evaluating the Construct-Coverage of the e-rater® Scoring Engine. Report RR-09-01, ETS Research Report Series. : Educational Testing Service.
Google Scholar
Ramineni, C., & Williamson, D. M. (2013). Automated essay scoring: Psychometric guidelines and practices. Assessing Writing, 18(1), 25–39.
Article Google Scholar
Riordan, B., Horbach, A., Cahill, A., Zesch, T., & Lee, C. (2017). Investigating neural architectures for short answer scoring. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications (pp. 159–168).
Google Scholar
Rudner, L. M., & Liang, T. (2002). Automated essay scoring using Bayes' Theorem. The Journal of Technology, Learning and Assessment, 1(2), 3–21. https://ejournals.bc.edu/index.php/jtla/article/view/1668
Shermis, M. D. (2015). Contrasting state-of-the-art in the machine scoring of short-form constructed responses. Educational Assessment, 20(1), 46–65.
Article Google Scholar
Shermis, M. D., & Burstein, J. (Eds.). (2013). Handbook of automated essay evaluation: Current applications and new directions. Routledge.
Google Scholar
Shermis, M. D., Burstein, J., Higgins, D., & Zechner, K. (2010). Automated essay scoring: Writing assessment and instruction. In P. Peterson, E. Baker, & B. McGaw (Eds.), International Encyclopedia of Education (Vol. 4, 3rd ed., pp. 20–26). Elsevier.
Chapter Google Scholar
Slocum, J. (1985). A survey of machine translation: Its history, current status and future prospects. Computational Linguistics, 11(1), 1–17. https://www.aclweb.org/anthology/J85-1001
Stevenson, M. (2016). A critical interpretative synthesis: The integration of automated writing evaluation into classroom writing instruction. Computers and Composition, 42, 1–16.
Article Google Scholar
Sukkarieh, J. Z., & Blackmore, J. (2009). C-rater: Automatic content scoring for short constructed responses. In Proceedings of the 22nd International Florida Artificial Intelligence Research Society Conference, pages 290–295.
Google Scholar
Tur, G., & De Mori, R. (2011). Spoken language understanding: Systems for extracting semantic information from speech. John Wiley & Sons.
Book Google Scholar
Valenti, S., Neri, F., & Cucchiarelli, A. (2003). An overview of current research on automated essay grading. Journal of Information Technology Education, 2. https://doi.org/10.28945/331
Vitartas, P., Heath, J., Midford, S., Ong, K. L., Alahakoon, D., & Sullivan-Mort, G. (2016). Applications of automatic writing evaluation to guide the understanding of learning and teaching. In Proceedings of 33rd International Conference of Innovation, Practice and Research in the Use of Educational Technologies in Tertiary Education, pages. 592–601.
Google Scholar
Wang, S. (2005). Corpus-based approaches and discourse analysis in relation to reduplication and repetition. Journal pf Pragmatics, 37, 505–540.
Article Google Scholar
Weigle, S. C. (2013). English as a second language writing and automated essay evaluation. In M. D. Shermis & J. Burstein (Eds.), Handbook of automated essay evaluation (pp. 58–76). Routledge.
Google Scholar
Wikipedia Contributors. (2019, June 3). Text mining. In Wikipedia, The Free Encyclopedia. Retrieved 18:10, June 26, 2019, from https://en.wikipedia.org/w/index.php?title=Text_mining&oldid=900109344
Williamson, D. M., Bejar, I. I., & Hone, A. S. (1999). ‘Mental model’ comparison of automated and human scoring. Journal of Educational Measurement, 36(2), 158–184.
Article Google Scholar
Williamson, D. M., Xi, X., & Breyer, F. J. (2012). A framework for evaluation and use of automated scoring. Educational Measurement: Issues and Practice, 31(1), 2–13.
Article Google Scholar
Wilson, J., & Czik, A. (2016). Automated essay evaluation software in English language arts classrooms: Effects on teacher feedback, student motivation, and writing quality. Computers & Education, 100, 94–109.
Article Google Scholar
Zupanc, K., & Bosnic, Z. (2016). Advances in the field of automated essay evaluation. Informatica, 39(4), 383–395.
Google Scholar

Download references

Acknowledgement

The authors thank Beata Beigman Klebanov, Aoife Cahill and Isaac Bejar for helpful comments on the manuscript, and Rick Meisner for copyediting.

Author information

Authors and Affiliations

Educational Testing Service, Princeton, NJ, USA
Michael Flor & Jiangang Hao

Authors

Michael Flor
View author publications
You can also search for this author in PubMed Google Scholar
Jiangang Hao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Flor .

Editor information

Editors and Affiliations

Duolingo and EdAstra Tech, LLC, Newton, MA, USA
Alina A. von Davier
Educational Testing Service, Princeton, NJ, USA
Robert J. Mislevy
Educational Testing Service, Princeton, NJ, USA
Jiangang Hao

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Flor, M., Hao, J. (2021). Text Mining and Automated Scoring. In: von Davier, A.A., Mislevy, R.J., Hao, J. (eds) Computational Psychometrics: New Methodologies for a New Generation of Digital Learning and Assessment. Methodology of Educational Measurement and Assessment. Springer, Cham. https://doi.org/10.1007/978-3-030-74394-9_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-74394-9_14
Published: 01 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-74393-2
Online ISBN: 978-3-030-74394-9
eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics

Text Mining and Automated Scoring

Abstract

Access this chapter

Similar content being viewed by others

Natural language processing in educational research: The evolution of research topics

Progress and Challenges for Automated Scoring and Feedback Systems for Large-Scale Assessments

Progress and Challenges for Automated Scoring and Feedback Systems for Large-Scale Assessments

Notes

Bibliography

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Text Mining and Automated Scoring

Abstract

Access this chapter

Similar content being viewed by others

Natural language processing in educational research: The evolution of research topics

Progress and Challenges for Automated Scoring and Feedback Systems for Large-Scale Assessments

Progress and Challenges for Automated Scoring and Feedback Systems for Large-Scale Assessments

Notes

Bibliography

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation