Abstract
Text Simplification (TS) has taken off as an important Natural Language Processing (NLP) application which promises to offer a significant societal impact in that it can be employed to the benefit of users with limited language comprehension skills such as children, foreigners who do not have a good command of a language, and readers struggling with a language disability. With the recent emergence of various TS systems, the question we are faced with is how to automatically evaluate their performance given that access to target users might be difficult. This chapter addresses one aspect of this issue by exploring whether existing readability formulae could be applied to assess the level of simplification offered by a TS system. It focuses on three readability indices for Spanish. The indices are first adapted in a way that allows them to be computed automatically and then applied to two corpora of original and manually simplified texts. The first corpus has been compiled as part of the Simplext project targeting people with Down syndrom, and the second corpus as part of the FIRST project, where the users are people with autism spectrum disorder. The experiments show that there is a significant correlation between each of the readability indices and eighteen linguistically motivated features which might be seen as reading obstacles for various target populations, thus indicating the possibility of using those indices as a measure of the degree of simplification achieved by TS systems. Various ways they can be used in TS are further illustrated by comparing their values when applied to four different corpora.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Aphasia is a language disorder usually caused by a stroke or a head injury. The impairments in language processing experienced by people with aphasia are quite diverse, but many aphasic people are very likely to encounter problems in understanding written text at some point (Carroll et al. 1998).
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
In this study, both lists (from the Reference Corpus of Contemporary Spanish (CREA) and the Spaulding's list of 1500 most common Spanish words) were lemmatised using Connexor's parser in order to retrieve the frequency of the lemma and not a word form (action carried out manually in the two cited works), and to enable a fully automatic computation of both indices.
- 14.
- 15.
References
Alu´ısio, S. M., Specia, L., Pardo, T. A. S., Maziero, E. G., Caseli, H. M., & Fortes, R. P. M. (2008). A corpus analysis of simple account texts and the proposal of simplification strategies: first steps towards text simplification systems. In Proceedings of the 26th annual ACM international conference on Design of communication, SIGDOC ’08, (pp. 15–22). New York, NY, USA: ACM.
Anula, A. (2007). Tipos de textos, complejidad lingüística y facilicitaci´on lectora. In Actas del Sexto Congreso de Hispanistas de Asia, (pp. 45–61).
Aranzabe, M. J., D´ıaz De Ilarraza, A., & Gonz´alez, I. (2012). First approach to automatic text simplification in basque. In Proceedings of the first Natural Language Processing for Improving Textual Accessibility Workshop (NLP4ITA).
American Psychiatric Association. (2013). Diagnostic and Statistical Manual of Mental Disorders (5th ed.). Arlington, VA: American Psychiatric Publishing.
Balota, D., Cortese, M. J., Sergent-Marshall, S. D., Spieler, D. H., & Yap, M. J. (2004). Visual word recognition of single-syllabe words. Journal of Experimental Psychology: General, 133, 283–316.
Barlacchi, G., & Tonelli, S. (2013). ERNESTA: A sentence simplification tool for childrens stories in italian. In Computational Linguistics and Intelligent Text Processing.
Barzilay, R., & Elhadad, N. (2003). Sentence alignment for monolingual comparable corpora. In Proceedings of the 2003 conference on Empirical methods in natural language processing, EMNLP ’03 (pp. 25–32). Stroudsburg, PA, USA: Association for Computational Linguistics.
Brouwer, R. H. M. (1963). Onderzoek naar de leesmoeilijkheden van nederlands proza. Pedagogische studiën, 40, 454–464.
Carroll, J., Minnen, G., Canning, Y., Devlin, S., & Tait, J. (1998). Practical simplification of english newspaper text to assist aphasic readers. In Proceedings of AAAI-98 Workshop on Integrating Artificial Intelligence and Assistive Technology (pp. 7–10).
Chomsky, N. (1986). Knowledge of language: Its nature, origin, and use. Santa Barbara, CA: Greenwood Publishing Group.
Coleman, M., & Liau, T. L. (1975). A computer readability formula designed for machine scoring. Journal of Applied Psychology, 60(2), 283–284.
Coster, W., & Kauchak, D. (2011). Learning to simplify sentences using Wikipedia. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (pp. 1–9).
Cuetos, F., Domínguez, A., & de Vega, M. (1997). El efecto de la polisemia: ahora lo ves otra vez. Cognitiva, 9(2), 175–194.
Dale, E., & Chall, J. S. (1948). A formula for predicting readability. Educational research bulletin, 27, 11–20.
Devlin, S. (1999). Simplifying natural language text for aphasic readers. Ph.D. thesis, University of Sunderland, UK.
Douma, W. H. (1960). De leesbaarheid van landbouwbladen: een onderzoek naar en een toepassing van leesbaarheidsformules. Landbouwhogeschool Wageningen, Afdeling Sociologie en Sociografie, Bulletin nr. 17.
Drndarević, B., Štajner, S., Bott, S., Bautista, S. & Saggion, H. (2013). Automatic text simplication in spanish: A comparative evaluation of complementing components. In Proceedings of the 12th International Conference on Intelligent Text Processing and Computational Linguistics. Lecture Notes in Computer Science. Samos, Greece, 24–30 March, (pp. 488–500).
DuBay, W. H. (2004). The principles of readability. California: Impact Information.
Feng, L. (2009). Automatic readability assessment for people with intellectual disabilities. In SIGACCESS Accessibility and Computers. number 93, (pp. 84–91). New York, NY, USA: ACM.
Feng, L., Elhadad, N., & Huenerfauth, M. (2009). Cognitively motivated features for readability assessment. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, EACL ’09, (pp. 229–237), Stroudsburg, PA, USA: Association for Computational Linguistics.
Flesch, R. (1948). A new readability yardstick. The journal of applied psychology, 32(3), 221–233.
Freyhoff, G., Hess, G., Kerr, L., Tronbacke, B., & Van Der Veken, K. (1998). Make it simple, European guidelines for the production of easy-to read information for people with learning disability. Brussels: ILSMH European Association.
Glanzer, M., & Bowles, N. (1976). Analysis of the word frequency effect in recognition memory. Journal of Experimental Psychology: Human Learning and Memory, 2, 21–31.
Glavaš, G., & Štajner, S. (2013). Event-centered simplication of news stories. In Proceedings of the Student Workshop held in conjunction with RANLP 2013, Hissar, Bulgaria (pp. 71–78).
Gunning, R. (1952). The technique of clear writing. New York: McGraw-Hill.
Inui, K., Fujita, A., Takahashi, T., Iida, R., & Iwakura, T. (2003). Text simplification for reading assistance: a project note. In Proceedings of the second international workshop on Paraphrasing—Volume 16, PARAPHRASE ’03, (pp. 9–16), Stroudsburg, PA, USA: Association for Computational Linguistics.
Jastrzembski, J. (1981). Multiple meaning, number or related meanings, frequency of occurrence and the lexicon. Cognitive Psychology, 13, 278–305.
Kincaid, J. P., Fishburne, R. P., Rogers, R. L., & Chissom, B. S. (1975). Derivation of new readability formulas for navy enlisted personnel. Research Branch Report 8–75.
Martos, J., Freire, S., González, A., Gil, D., & Sebastian, M. (2012). D2.1: Functional requirements specifications and user preference survey. Technical report, FIRST technical report.
McLaughlin, G. H. (1969). SMOG grading—a new readability formula. Journal of Reading, 22, 639–646.
Norbury, C. F. (2005). Barking up the wrong tree? lexical ambiguity resolution in children with language impairments and autistic spectrum disorders. Journal of Experimental Child Psychology, 90, 142–171.
Orasan, C., Evans, R., & Dornescu, I. (2013). Towards multilingual Europe 2020: A romanian perspective, chapter text simplification for people with autistic spectrum disorders (pp. 287–312). Bucharest: Romanian Academy Publishing House.
Petersen, S., & Ostendorf, M. (2009). A machine learning approach to reading level assessment. Computer Speech and Language, 23(1), 89–106.
Petersen, S. E., & Ostendorf, M. (2007). Text simplification for language learners: A corpus analysis. In Proceedings of Workshop on Speech and Language Technology for Education(SLaTE), 69–72.
PlainLanguage. (2011). Federal plain language guidelines.
Rello, L. (2012). Dyswebxia: a model to improve accessibility of the textual web for dyslexic users. In SIGACCESS Accessibility and Computers, number 102, (pp. 41–44) New York, NY, USA: ACM.
Rello, L., Baeza-Yates, R., Bott, S., & Saggion, H. (2013b). Simplify or help? Text simplification strategies for people with dyslexia. In Proceedings of W4A conference, Article no. 15
Rello, L., Baeza-Yates, R., Dempere, L., & Saggion, H. (2013a). Frequent words improve readability and short words improve understandability for people with dyslexia. In Proceedings of the INTERACT 2013: 14th IFIP TC13 Conference on Human-Computer Interaction. Cape Town, South Africa, pp. 203–219.
Ruiter, M. B., Rietveld, T. C. M., Cucchiarini C., Krahmer E. J., & Strik, H. (2010). Human language technology and communicative disabilities: requirements and possibilities for the future. In Proceedings of the the seventh international conference on Language Resources and Evaluation (LREC).
Rybing, J., Smithr, C., & Silvervarg, A. (2010). Towards a rule based system for automatic simplification of texts. In The Third Swedish Language Technology Conference.
Saggion, H., Gómez Martínez, E., Etayo, E., Anula, A., & Bourg, L. (2011). Text simplification in simplext: Making text more accessible. Revista de la Sociedad Española para el Procesamiento del Lenguaje Natural.
Schwarm, S. E, & Ostendorf, M. (2005). Reading level assessment using support vector machines and statistical language models. In Proceedings of the 43rd annual meeting of the Association of Computational Linguistics (ACL), pp. 523–530.
Siddharthan, A. (2006). Syntactic simplification and text cohesion. Research on Language and Computation, 4(1), 77–109.
Smith, E. A., & Senter R. J. (1967) Automated readability index. Technical report, Aerospace Medical Research Laboratories, Wright-Patterson Air Force Base, Ohio.
Spaulding, S. (1956). A Spanish readability formula. Modern Language Journal 40, 433–441.
UN. (2006) Convention on the rigths of persons with disabilities.
van Oosten, P., Tanghe, D., & Hoste, V. (2010). Towards an improved methodology for automated readability prediction. In Proceedings of the seventh international conference on language resources and evaluation (LREC10). Valletta, Malta: European Language Resources Association (ELRA), pp. 775–782.
Vossen, P. (Ed.). (1998) EuroWordNet: A multilingual database with lexical semantic networks. Dordrecht: Kluwer Academic Publishers.
Štajner, S., Drndarević, B., & Saggion, H. (2013). Corpus-based sentence deletion and split decisions for spanish text simplification. Computación y Systemas, 17(2), 251–262.
Štajner, S., Evans, R., Orasan, C., & Mitkov, R. (2012). What can readability measures really tell us about text complexity? In Proceedings of the LREC’12 Workshop: Natural Language Processing for Improving Textual Accessibility (NLP4ITA), Istanbul, Turkey.
Štajner, S., & Saggion, H. (2013b). Adapting text simplification decisions to different text genres and target users. Procesamiento del Lenguaje Natural, 51, 135–142.
Štajner, S., & Saggion, H. (2013a). Readability indices for automatic evaluation of text simplification systems: A feasability study for spanish. In Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP 2013), Nagoya, Japan, October 14–18, 2013. pp. 374–382.
Woodsend, K., & Lapata, M. (2011). Learning to simplify sentences with Quasi-synchronous grammar and integer programming. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP).
Woodsend, K. & Lapata, M. (2011). WikiSimple: automatic simplification of Wikipedia articles. In Proceedings of the 25th AAI Coference on Artificial Intelligence, pp. 374–382.
Wubben, S., van den Bosch, A., & Krahmer, E. (2012). Sentence simplification by monolingual machine translation. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers—Volume 1, ACL ’12, (pp. 1015–1024) Stroudsburg, PA, USA: Association for Computational Linguistics.
Zhu, Z., Berndard, D., & Gurevych, I. (2010). A monolingual tree-based translation model for sentence simplification. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), (pp. 1353–1361).
Acknowledgements
This work has been partially supported by TRADICOR (Ref: PIE 13-054), EXPERT (Ref: 317471-FP7-PEOPLE-2012-ITN), LATEST (Ref: 327197-FP7-PEOPLE-2012-IEF) and FIRST (Ref: 287607-FP7-ICT-2011-7). The authors would also like to express their gratitude to Horacio Saggion for his very helpful comments and input.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Štajner, S., Mitkov, R., Corpas Pastor, G. (2015). Simple or Not Simple? A Readability Question. In: Gala, N., Rapp, R., Bel-Enguix, G. (eds) Language Production, Cognition, and the Lexicon. Text, Speech and Language Technology, vol 48. Springer, Cham. https://doi.org/10.1007/978-3-319-08043-7_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-08043-7_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08042-0
Online ISBN: 978-3-319-08043-7
eBook Packages: Computer ScienceComputer Science (R0)