A Posteriori Agreement as a Quality Measure for Readability Prediction Systems

van Oosten, Philip; Hoste, Véronique; Tanghe, Dries

doi:10.1007/978-3-642-19437-5_35

Philip van Oosten^17,18,
Véronique Hoste^17,18 &
Dries Tanghe^17,18

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6609))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1275 Accesses

Abstract

All readability research is ultimately concerned with the research question whether it is possible for a prediction system to automatically determine the level of readability of an unseen text. A significant problem for such a system is that readability might depend in part on the reader. If different readers assess the readability of texts in fundamentally different ways, there is insufficient a priori agreement to justify the correctness of a readability prediction system based on the texts assessed by those readers. We built a data set of readability assessments by expert readers. We clustered the experts into groups with greater a priori agreement and then measured for each group whether classifiers trained only on data from this group exhibited a classification bias. As this was found to be the case, the classification mechanism cannot be unproblematically generalized to a different user group.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anderson, R.C., Davison, A.: Conceptual and Empirical Bases of Readability Formulas. Tech. Rep. 392, University of Illinois at Urbana-Champaign (October 1986)
Google Scholar
Beigman Klebanov, B., Beigman, E.: From Annotator Agreement to Noise Models. Computational Linguistics 35(4), 495–503 (2009)
Article Google Scholar
van den Bosch, A., Busser, B., Canisius, S., Daelemans, W.: An efficient memory-based morphosyntactic tagger and parser for dutch. In: van Eynde, F., Dirix, P., Schuurman, I., Vandeghinste, V. (eds.) Proceedings of CLIN17, pp. 99–114 (2007)
Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Coleman, M., Liau, T.L.: A computer readability formula designed for machine scoring. Journal of Applied Psychology 60, 283–284 (1975)
Article Google Scholar
Flesch, R.: A new readability yardstick. Journal of Applied Psychology 32(3), 221–233 (1948)
Article Google Scholar
Heilman, M.J., Collins-Thompson, K., Callan, J., Eskenazi, M.: Combining lexical and grammatical features to improve readability measures for first and second language texts. In: Proceedings of HLT (2007)
Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Article Google Scholar
Jurafsky, D., Martin, J.H.: Speech and Language Processing. Prentice-Hall, Englewood Cliffs (2008)
Google Scholar
Kate, R.J., Luo, X., Patwardhan, S., Franz, M., Florian, R., Mooney, R.J., Roukos, S., Welty, C.: Learning to Predict Readability using Diverse Linguistic Features. In: Proceedings of Coling23 (2010)
Google Scholar
McNamara, D.S., Kintsch, E., Songer, N.B., Kintsch, W.: Are good texts always better? Interactions of text coherence, background knowledge, and levels of understanding in learning from text. Tech. rep., University of Colorado (1993)
Google Scholar
van Noord, G.J.: Large Scale Syntactic Annotation of written Dutch (LASSY) (January 2009), http://www.let.rug.nl/vannoord/Lassy/
van Oosten, P., Tanghe, D., Hoste, V.: Towards an Improved Methodology for Automated Readability Prediction. In: Proceedings of LREC7 (2010)
Google Scholar
Pitler, E., Nenkova, A.: Revisiting readability: A unified framework for predicting text quality. In: EMNLP, pp. 186–195. ACL (2008)
Google Scholar
Schuurman, I., Hoste, V., Monachesi, P.: Cultivating Trees: Adding Several Semantic Layers to the Lassy Treebank in SoNaR. In: Proceedings of TLT7. Groningen, The Netherlands (2009)
Google Scholar
Schwarm, S.E., Ostendorf, M.: Reading level assessment using support vector machines and statistical language models. In: Proceedings of ACL43, pp. 523–530. Association of Computational Linguistics, Ann Arbor (June 2005)
Google Scholar
Staphorsius, G.: Leesbaarheid en leesvaardigheid. De ontwikkeling van een domeingericht meetinstrument. Cito, Arnhem (1994)
Google Scholar
Tanaka-Ishii, K., Tezuka, S., Terada, H.: Sorting texts by readability. Computational Linguistics 36(2), 203–227 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

LT3 Language and Translation Technology Team, University College Ghent, Groot-Brittanniëlaan 45, 9000, Ghent, Belgium
Philip van Oosten, Véronique Hoste & Dries Tanghe
Ghent University, Krijgslaan 281, 9000, Ghent, Belgium
Philip van Oosten, Véronique Hoste & Dries Tanghe

Authors

Philip van Oosten
View author publications
You can also search for this author in PubMed Google Scholar
Véronique Hoste
View author publications
You can also search for this author in PubMed Google Scholar
Dries Tanghe
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

van Oosten, P., Hoste, V., Tanghe, D. (2011). A Posteriori Agreement as a Quality Measure for Readability Prediction Systems. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6609. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19437-5_35

Download citation

DOI: https://doi.org/10.1007/978-3-642-19437-5_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19436-8
Online ISBN: 978-3-642-19437-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics