Automatic Text Difficulty Estimation Using Embeddings and Neural Networks

Filighera, Anna; Steuer, Tim; Rensing, Christoph

doi:10.1007/978-3-030-29736-7_25

Anna Filighera¹³,
Tim Steuer¹³ &
Christoph Rensing¹³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11722))

Included in the following conference series:

European Conference on Technology Enhanced Learning

4820 Accesses
7 Citations

Abstract

Text difficulty, also called reading difficulty, refers to the complexity of texts on a language level. For many educational applications, such as learning resource recommendation systems, the text difficulty of text is highly relevant information. However, manual annotation of text difficulty is very expensive and not feasible for large collections of texts. For this reason, many approaches to automatic text difficulty estimation have been proposed in the past. All text difficulty estimation models published thus far have one thing in common: they rely on manually engineered feature sets. This is problematic as features are tailored to a specific type of text and do not generalize well to other types and languages. To alleviate this problem we propose a novel approach using neural networks and embeddings to the task of text difficulty classification. Our approach distinguishes between 5 reading levels which correspond to non-overlapping age groups ranging from ages 7 to 16. It performs comparably to existing state-of-the-art approaches in terms of accuracy and Pearson correlation coefficient while being easier and cheaper to adapt to new types of text.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)
MathSciNet MATH Google Scholar
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media Inc., Sebastopol (2009)
MATH Google Scholar
Chollet, F., et al.: Keras. https://keras.io. Accessed 13 Apr 2019
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Feng, L., Jansche, M., Huenerfauth, M., Elhadad, N.: A comparison of features for automatic readability assessment. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters. Association for Computational Linguistics (2010)
Google Scholar
François, T., Fairon, C.: An AI readability formula for French as a foreign language. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics (2012)
Google Scholar
Gonzalez-Garduno, A.V., Søgaard, A.: Using gaze to predict text readability. In: Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications (2017)
Google Scholar
Hancke, J., Vajjala, S., Meurers, D.: Readability classification for German using lexical, syntactic, and morphological features. Proc. COLING 2012, 1063–1080 (2012)
Google Scholar
Heilman, M., Collins-Thompson, K., Eskenazi, M.: An analysis of statistical models and features for reading difficulty prediction. In: Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications. Association for Computational Linguistics (2008)
Google Scholar
Jiang, Z., Gu, Q., Yin, Y., Chen, D.: Enriching word embeddings with domain knowledge for readability assessment. In: Proceedings of the 27th International Conference on Computational Linguistics (2018)
Google Scholar
Jiang, Z., Sun, G., Gu, Q., Chen, D.: An ordinal multi-class classification method for readability assessment of Chinese documents. In: Buchmann, R., Kifor, C.V., Yu, J. (eds.) KSEM 2014. LNCS (LNAI), vol. 8793, pp. 61–72. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12096-6_6
Chapter Google Scholar
Kennedy, A., Hill, R., Pynte, J.: The Dundee corpus. In: Proceedings of the 12th European Conference on Eye Movement (2003)
Google Scholar
Kincaid, J.P., Fishburne Jr., R.P., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel (1975). https://stars.library.ucf.edu/istlibrary/56/. Accessed 13 Apr 2019
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems (2013)
Google Scholar
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014)
Google Scholar
Peters, M.E., et al.: Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018)
Pilán, I., Vajjala, S., Volodina, E.: A readable read: automatic assessment of language learning materials based on linguistic complexity. arXiv preprint arXiv:1603.08868 (2016)
Pitler, E., Nenkova, A.: Revisiting readability: a unified framework for predicting text quality. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (2008)
Google Scholar
Schwarm, S.E., Ostendorf, M.: Reading level assessment using support vector machines and statistical language models. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics (2005)
Google Scholar
Si, L., Callan, J.: A statistical model for scientific readability. In: Proceedings of the Tenth International Conference on Information and Knowledge Management. ACM (2001)
Google Scholar
Vajjala, S., Meurers, D.: On improving the accuracy of readability classification using insights from second language acquisition. In: Proceedings of the Seventh Workshop on Building Educational Applications Using NLP. Association for Computational Linguistics (2012)
Google Scholar
Wang, S., Andersen, E.: Grammatical templates: improving text difficulty evaluation for language learners. arXiv preprint arXiv:1609.05180 (2016)
Xia, M., Kochmar, E., Briscoe, T.: Text readability assessment for second language learners. In: Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Multimedia Communications Lab, Technische Universität Darmstadt, Rundeturmstr. 10, 64283, Darmstadt, Germany
Anna Filighera, Tim Steuer & Christoph Rensing

Authors

Anna Filighera
View author publications
You can also search for this author in PubMed Google Scholar
Tim Steuer
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Rensing
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anna Filighera .

Editor information

Editors and Affiliations

Open University Netherlands, Heerlen, The Netherlands
Maren Scheffel
Paul Sabatier University, Toulouse, France
Julien Broisin
Know-Center GmbH, Graz, Austria
Viktoria Pammer-Schindler
Cyprus University of Technology, Limassol, Cyprus
Andri Ioannou
DIPF, Frankfurt/Main, Germany
Jan Schneider

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Filighera, A., Steuer, T., Rensing, C. (2019). Automatic Text Difficulty Estimation Using Embeddings and Neural Networks. In: Scheffel, M., Broisin, J., Pammer-Schindler, V., Ioannou, A., Schneider, J. (eds) Transforming Learning with Meaningful Technologies. EC-TEL 2019. Lecture Notes in Computer Science(), vol 11722. Springer, Cham. https://doi.org/10.1007/978-3-030-29736-7_25

Download citation

DOI: https://doi.org/10.1007/978-3-030-29736-7_25
Published: 09 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29735-0
Online ISBN: 978-3-030-29736-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics