Skip to main content

Abstract

Natural Language Processing (NLP) is playing an increasingly important role in learning and assessments. Some typical applications of NLP in education include automated scoring, automated item generation, conversation-based assessments, writing assistants, text mining for education, and so on. In this chapter, we aim at introducing some basics of NLP through two typical applications in educational contexts, text mining and automated scoring. We hope readers can get an overall picture of NLP and get familiarized with some basic tools for handling natural language data, which may serve as stepping stones for their future work with NLP.

The R or Python codes can be found at the GitHub repository of this book: https://github.com/jgbrainstorm/computational_psychometrics

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The term ‘raw text’ refers to the form in which a text appears in the ‘real-world’ electronic application (e.g. a word-processor file, an electronic file of a student’s essay, a text from a webpage, etc.). Usually, it is presumed that the extra markup is already stripped off (e.g. HTML or XML markup), so ‘raw text’ often means just the long string of the text characters.

  2. 2.

    Text ‘cleaning’ is a rather subjective notion – which components in a text should be discarded and which should be retained depends largely on the particular application. There are some common trends, though, as illustrated in this section. For example, in many cases punctuation might be discarded.

  3. 3.

    ‘Stop words’ are words that are eliminated (filtered out) from a document before further processing. In most cases those are very common words in a language (like the, an, of, out), which are not considered to contribute much to distinguishing the content of a text.

  4. 4.

    Note that n-gram is often discussed in the context of language models.

  5. 5.

    Number of different words by number of all words (the footnote to the left has 9 tokens but only 6 types).

  6. 6.

    In the sense that analysis of each word may require considering neighboring words and possibly also going beyond the sentence boundaries (for example for computing inter-sentence cohesion of a text).

  7. 7.

    A text document is typically represented as a vector which is a sum or an average of the vectors of its words..

  8. 8.

    The specific methods of aggregation or combination may vary.` Some may involve weighted summation of feature values, while some may include complex transformations of feature values. Features that are combined into larger assemblies are called ‘micro-features, and the larger assemblies are called macro-features. In e-Rater, a feature, or an assembly, whose values are used directly in a scoring formula is called ‘macro-features’.

  9. 9.

    A basic approach is to compare a new essay with the best-scoring previous essays, and to derive a score from the similarity to best essays. Another approach is to compare the new essay to previously scored essays from different score points, so a new essay gets a score from the group to which it is most similar. Previously-scored essays here are criterial data, typical scored by expert raters.

Bibliography

  • Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E. D., Gutierrez, J. B., & Kochut, K. (2017). A brief survey of text mining: Classification, clustering and extraction techniques. arXiv preprint arXiv:1707.02919.

    Google Scholar 

  • Attali, Y., Bridgeman, B., & Trapani, C. (2010). Performance of a generic approach in automated essay scoring. The Journal of Technology, Learning and Assessment, 10(3).

    Google Scholar 

  • Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater® V. 2. The Journal of Technology, Learning and Assessment, 4(3).

    Google Scholar 

  • Basili, R., & Magnini, B. (2012). Natural language processing in the web era. Intelligenza Artificiale, 6(2), 117–119.

    Article  Google Scholar 

  • Beigman Klebanov, B., Flor, M., & Gyawali, B. (2016). Topicality-based indices for essay scoring. In Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications (BEA11), pages 63–72.

    Google Scholar 

  • Bennett, R. E., Ward, W. C., Rock, D. A., & LaHart, C. (1990). Toward a framework for constructed-response items. Research report. Educational Testing Service.

    Google Scholar 

  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993–1022.

    Google Scholar 

  • Bolshakov, I., & Gelbukh, A. (2004). Computational linguistics: Models, resources, applications. Instituto Politécnico Nacional and Universidad Nacional Autónoma De México.

    Google Scholar 

  • Briscoe, T., Medlock, B., & Andersen, Ø. (2010). Automated assessment of ESOL free text examinations (No. UCAM-CL-TR-790). University of Cambridge, Computer Laboratory.

    Google Scholar 

  • Burrows, S., Gurevych, I., & Stein, B. (2015). The eras and trends of automatic short answer grading. International Journal of Artificial Intelligence in Education, 25(1), 60–117.

    Article  Google Scholar 

  • Chapelle, C. A., Cotos, E., & Lee, J. (2015). Validity arguments for diagnostic assessment using automated writing evaluation. Language Testing, 32(3), 385–405.

    Article  Google Scholar 

  • Chen, J., Zhang, M., & Bejar, I. I. (2017). An investigation of the e-rater® automated scoring Engine’s grammar, usage, mechanics, and style microfeatures and their aggregation model. ETS Research Report Series, 2017(1), 1–14.

    Article  Google Scholar 

  • Chiticariu, L., Li, Y., & Reiss, F. (2013). Rule-based information extraction is dead! Long live rule-based information extraction systems!. In Proceedings of the 2013 conference on Empirical Methods in Natural Language Processing, pages 827–832.

    Google Scholar 

  • Chodorow, M., & Burstein, J. (2004). Beyond essay length: Evaluating e-rater®’s performance on TOEFL® essays. Report RR-04-04, ETS Research Report Series, 2004(1).

    Google Scholar 

  • Deane, P., & Quinlan, T. (2010). What automated analyses of corpora can tell us about students’ writing skills. Journal of Writing Research, 2(2).

    Google Scholar 

  • Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391–407.

    Article  Google Scholar 

  • Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

    Google Scholar 

  • Dikli, S. (2006). An overview of automated scoring of essays. The Journal of Technology, Learning and Assessment, 5(1).

    Google Scholar 

  • Dikli, S., & Bleyle, S. (2014). Automated Essay Scoring feedback for second language writers: How does it compare to instructor feedback? Assessing Writing, 22, 1–17.

    Article  Google Scholar 

  • Etzioni, O., Fader, A., Christensen, J., & Soderland, S. (2011, June). Open information extraction: The second generation. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence.

    Google Scholar 

  • Feldman, R. (2013). Techniques and applications for sentiment analysis. Communications of the ACM, 56(4), 82–89.

    Article  Google Scholar 

  • Harris, Z. S. (1954). Distributional structure. Word, 10:(2–3), 146–162, https://doi.org/10.1080/00437956.1954.11659520

  • Horbach, A., & Zesch, T. (2019). The influence of variance in learner answers on automatic content scoring. Frontiers in Education, 4, 28.

    Article  Google Scholar 

  • Hutchins, J. (1995). Machine translation: A brief history, concise history of the language sciences: From the Sumerians to the cognitivists. Edited by EFK Koerner and RE Asher.

    Google Scholar 

  • Jurafsky, D & Martin, J. H. (2008). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall.

    Google Scholar 

  • Jurafsky, D & Martin, J. H. (2019). Speech and Language Processing (3rd ed. draft). https://web.stanford.edu/~jurafsky/slp3/

  • Kakkonen, T., Myller, N., Timonen, J., & Sutinen, E. (2005, June). Automatic essay grading with probabilistic latent semantic analysis. In Proceedings of the second workshop on building educational applications using NLP (pp. 29–36).

    Google Scholar 

  • Ke, Z., & Ng, V. (2019, August). Automated essay scoring: A survey of the state of the art. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (pp. 6300–6308). AAAI Press.

    Google Scholar 

  • Kolomiyets, O., & Moens, M. F. (2011). A survey on question answering technology from an information retrieval perspective. Information Sciences, 181(24), 5412–5434.

    Article  Google Scholar 

  • Landauer, T. K., Foltz, P. W., & Laham, D. (1998). Introduction to latent semantic analysis. Discourse Processes, 25, 259–284.

    Article  Google Scholar 

  • Landauer, T. K., Laham, D., & Foltz, P. W. (2003). Automated scoring and annotation of essays with the Intelligent Essay Assessor. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 87–112). Erlbaum.

    Google Scholar 

  • Leacock, C., & Chodorow, M. (2003). C-rater: Automated scoring of short-answer questions. Computers and the Humanities, 37(4), 389–405.

    Article  Google Scholar 

  • Levy, O., & Goldberg, Y. (2014). Neural word embedding as implicit matrix factorization. In Proceedings of Advances in Neural Information Processing Systems, NeurIPS 2014. https://proceedings.neurips.cc/paper/2014/file/feab05aa91085b7a8012516bc3533958-Paper.pdf

  • Li, H. (2017). Deep learning for natural language processing: Advantages and challenges. National Science Review, 5(1), 24–26. https://doi.org/10.1093/nsr/nwx110

  • Madnani, N., Loukina, A., & Cahill, A. (2017, September). A large scale quantitative exploration of modeling strategies for content scoring. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications (pp. 457–467).

    Google Scholar 

  • Manning, C. D. (2015). Computational linguistics and deep learning. Computational Linguistics, 41(4), 701–707.

    Article  Google Scholar 

  • Manning, C. D., Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing. MIT press.

    Google Scholar 

  • Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge University Press.

    Book  Google Scholar 

  • McNamara, D. S., Crossley, S. A., Rosco, R. D., Allen, L. K., & Dai, J. (2015). A hierarchical classification approach to automated essay scoring. Assessing Writing, 23, 35–59.

    Google Scholar 

  • Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., & Joulin, A. (2017). Advances in pre-training distributed word representations. arXiv preprint arXiv:1712.09405.

    Google Scholar 

  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pp. 3111–3119.

    Google Scholar 

  • Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., & Joulin, A. (2018). Advances in pre-training distributed word representations. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan. https://www.aclweb.org/anthology/L18-1008

  • Mosteller, F., & Wallace, D. L. (1963). Inference in an authorship problem. Journal of the American Statistical Association, 58(302), 275–309.

    Google Scholar 

  • Okoye, I., Bethard, S., & Sumner, T. (2013, June). CU: Computational assessment of short free text answers-a tool for evaluating students’ understanding. In Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), at the Second Joint Conference on Lexical and Computational Semantics (* SEM), pages. 603–607.

    Google Scholar 

  • Page, E. B. (1966). The imminence of … grading essays by computer. Phi Delta Kappan, 47(5), 238–243.

    Google Scholar 

  • Page, E. B., & Petersen, N. S. (1995). The computer moves into essay grading: Updating the ancient test. Phi Delta Kappan, 76(7), 561.

    Google Scholar 

  • Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1–2), 1–135.

    Article  Google Scholar 

  • Pennington, J., Socher, R., & Manning, C. D. (2014). GLoVe: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543.

    Google Scholar 

  • Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprint arXiv:1802.05365.

    Google Scholar 

  • Quinlan, T., Higgins, D., & Wolff, S. (2009). Evaluating the Construct-Coverage of the e-rater® Scoring Engine. Report RR-09-01, ETS Research Report Series. : Educational Testing Service.

    Google Scholar 

  • Ramineni, C., & Williamson, D. M. (2013). Automated essay scoring: Psychometric guidelines and practices. Assessing Writing, 18(1), 25–39.

    Article  Google Scholar 

  • Riordan, B., Horbach, A., Cahill, A., Zesch, T., & Lee, C. (2017). Investigating neural architectures for short answer scoring. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications (pp. 159–168).

    Google Scholar 

  • Rudner, L. M., & Liang, T. (2002). Automated essay scoring using Bayes' Theorem. The Journal of Technology, Learning and Assessment, 1(2), 3–21. https://ejournals.bc.edu/index.php/jtla/article/view/1668

  • Shermis, M. D. (2015). Contrasting state-of-the-art in the machine scoring of short-form constructed responses. Educational Assessment, 20(1), 46–65.

    Article  Google Scholar 

  • Shermis, M. D., & Burstein, J. (Eds.). (2013). Handbook of automated essay evaluation: Current applications and new directions. Routledge.

    Google Scholar 

  • Shermis, M. D., Burstein, J., Higgins, D., & Zechner, K. (2010). Automated essay scoring: Writing assessment and instruction. In P. Peterson, E. Baker, & B. McGaw (Eds.), International Encyclopedia of Education (Vol. 4, 3rd ed., pp. 20–26). Elsevier.

    Chapter  Google Scholar 

  • Slocum, J. (1985). A survey of machine translation: Its history, current status and future prospects. Computational Linguistics, 11(1), 1–17. https://www.aclweb.org/anthology/J85-1001

  • Stevenson, M. (2016). A critical interpretative synthesis: The integration of automated writing evaluation into classroom writing instruction. Computers and Composition, 42, 1–16.

    Article  Google Scholar 

  • Sukkarieh, J. Z., & Blackmore, J. (2009). C-rater: Automatic content scoring for short constructed responses. In Proceedings of the 22nd International Florida Artificial Intelligence Research Society Conference, pages 290–295.

    Google Scholar 

  • Tur, G., & De Mori, R. (2011). Spoken language understanding: Systems for extracting semantic information from speech. John Wiley & Sons.

    Book  Google Scholar 

  • Valenti, S., Neri, F., & Cucchiarelli, A. (2003). An overview of current research on automated essay grading. Journal of Information Technology Education, 2. https://doi.org/10.28945/331

  • Vitartas, P., Heath, J., Midford, S., Ong, K. L., Alahakoon, D., & Sullivan-Mort, G. (2016). Applications of automatic writing evaluation to guide the understanding of learning and teaching. In Proceedings of 33rd International Conference of Innovation, Practice and Research in the Use of Educational Technologies in Tertiary Education, pages. 592–601.

    Google Scholar 

  • Wang, S. (2005). Corpus-based approaches and discourse analysis in relation to reduplication and repetition. Journal pf Pragmatics, 37, 505–540.

    Article  Google Scholar 

  • Weigle, S. C. (2013). English as a second language writing and automated essay evaluation. In M. D. Shermis & J. Burstein (Eds.), Handbook of automated essay evaluation (pp. 58–76). Routledge.

    Google Scholar 

  • Wikipedia Contributors. (2019, June 3). Text mining. In Wikipedia, The Free Encyclopedia. Retrieved 18:10, June 26, 2019, from https://en.wikipedia.org/w/index.php?title=Text_mining&oldid=900109344

  • Williamson, D. M., Bejar, I. I., & Hone, A. S. (1999). ‘Mental model’ comparison of automated and human scoring. Journal of Educational Measurement, 36(2), 158–184.

    Article  Google Scholar 

  • Williamson, D. M., Xi, X., & Breyer, F. J. (2012). A framework for evaluation and use of automated scoring. Educational Measurement: Issues and Practice, 31(1), 2–13.

    Article  Google Scholar 

  • Wilson, J., & Czik, A. (2016). Automated essay evaluation software in English language arts classrooms: Effects on teacher feedback, student motivation, and writing quality. Computers & Education, 100, 94–109.

    Article  Google Scholar 

  • Zupanc, K., & Bosnic, Z. (2016). Advances in the field of automated essay evaluation. Informatica, 39(4), 383–395.

    Google Scholar 

Download references

Acknowledgement

The authors thank Beata Beigman Klebanov, Aoife Cahill and Isaac Bejar for helpful comments on the manuscript, and Rick Meisner for copyediting.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Flor .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Flor, M., Hao, J. (2021). Text Mining and Automated Scoring. In: von Davier, A.A., Mislevy, R.J., Hao, J. (eds) Computational Psychometrics: New Methodologies for a New Generation of Digital Learning and Assessment. Methodology of Educational Measurement and Assessment. Springer, Cham. https://doi.org/10.1007/978-3-030-74394-9_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-74394-9_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-74393-2

  • Online ISBN: 978-3-030-74394-9

  • eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics