Simple or Not Simple? A Readability Question

Štajner, Sanja; Mitkov, Ruslan; Corpas Pastor, Gloria

doi:10.1007/978-3-319-08043-7_22

Sanja Štajner⁵,
Ruslan Mitkov⁵ &
Gloria Corpas Pastor^5,6

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 48))

1590 Accesses
3 Citations

Abstract

Text Simplification (TS) has taken off as an important Natural Language Processing (NLP) application which promises to offer a significant societal impact in that it can be employed to the benefit of users with limited language comprehension skills such as children, foreigners who do not have a good command of a language, and readers struggling with a language disability. With the recent emergence of various TS systems, the question we are faced with is how to automatically evaluate their performance given that access to target users might be difficult. This chapter addresses one aspect of this issue by exploring whether existing readability formulae could be applied to assess the level of simplification offered by a TS system. It focuses on three readability indices for Spanish. The indices are first adapted in a way that allows them to be computed automatically and then applied to two corpora of original and manually simplified texts. The first corpus has been compiled as part of the Simplext project targeting people with Down syndrom, and the second corpus as part of the FIRST project, where the users are people with autism spectrum disorder. The experiments show that there is a significant correlation between each of the readability indices and eighteen linguistically motivated features which might be seen as reading obstacles for various target populations, thus indicating the possibility of using those indices as a measure of the degree of simplification achieved by TS systems. Various ways they can be used in TS are further illustrated by comparing their values when applied to four different corpora.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Automatic Text Simplification in Spanish: A Comparative Evaluation of Complementing Modules

The Text Simplification in TERENCE

An approach to treat numerical information in the text simplification process

Article 06 August 2015

Notes

1.
Aphasia is a language disorder usually caused by a stroke or a head injury. The impairments in language processing experienced by people with aphasia are quite diverse, but many aphasic people are very likely to encounter problems in understanding written text at some point (Carroll et al. 1998).
2.
Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterised by qualitative impairment in communication and stereotyped repetitive behaviour (American Psychiatric Association 2013). People with ASD have deficits in the comprehension of speech and writing (Štajner et al. 2012).
3.
http://www.plainlanguage.gov/.
4.
http://inclusion-europe.org/.
5.
http://www.weeklyreader.com/.
6.
http://literacynet.org/cnnsf/.
7.
Available at: http://www.first-asd.eu/?q=system/files/FIRST_D7.2_20130228_annex.pdf.
8.
http://www.first-asd.eu/.
9.
www.servimedia.es.
10.
www.simplext.es.
11.
www.noticiasfacil.es.
12.
http://corpus.rae.es/lfrecuencias.html.
13.
In this study, both lists (from the Reference Corpus of Contemporary Spanish (CREA) and the Spaulding's list of 1500 most common Spanish words) were lemmatised using Connexor's parser in order to retrieve the frequency of the lemma and not a word form (action carried out manually in the two cited works), and to enable a fully automatic computation of both indices.
14.
www.connexor.eu.
15.
http://openthes-es.berlios.de.

References

Alu´ısio, S. M., Specia, L., Pardo, T. A. S., Maziero, E. G., Caseli, H. M., & Fortes, R. P. M. (2008). A corpus analysis of simple account texts and the proposal of simplification strategies: first steps towards text simplification systems. In Proceedings of the 26th annual ACM international conference on Design of communication, SIGDOC ’08, (pp. 15–22). New York, NY, USA: ACM.
Google Scholar
Anula, A. (2007). Tipos de textos, complejidad lingüística y facilicitaci´on lectora. In Actas del Sexto Congreso de Hispanistas de Asia, (pp. 45–61).
Google Scholar
Aranzabe, M. J., D´ıaz De Ilarraza, A., & Gonz´alez, I. (2012). First approach to automatic text simplification in basque. In Proceedings of the first Natural Language Processing for Improving Textual Accessibility Workshop (NLP4ITA).
Google Scholar
American Psychiatric Association. (2013). Diagnostic and Statistical Manual of Mental Disorders (5th ed.). Arlington, VA: American Psychiatric Publishing.
Google Scholar
Balota, D., Cortese, M. J., Sergent-Marshall, S. D., Spieler, D. H., & Yap, M. J. (2004). Visual word recognition of single-syllabe words. Journal of Experimental Psychology: General, 133, 283–316.
Article Google Scholar
Barlacchi, G., & Tonelli, S. (2013). ERNESTA: A sentence simplification tool for childrens stories in italian. In Computational Linguistics and Intelligent Text Processing.
Google Scholar
Barzilay, R., & Elhadad, N. (2003). Sentence alignment for monolingual comparable corpora. In Proceedings of the 2003 conference on Empirical methods in natural language processing, EMNLP ’03 (pp. 25–32). Stroudsburg, PA, USA: Association for Computational Linguistics.
Google Scholar
Brouwer, R. H. M. (1963). Onderzoek naar de leesmoeilijkheden van nederlands proza. Pedagogische studiën, 40, 454–464.
Google Scholar
Carroll, J., Minnen, G., Canning, Y., Devlin, S., & Tait, J. (1998). Practical simplification of english newspaper text to assist aphasic readers. In Proceedings of AAAI-98 Workshop on Integrating Artificial Intelligence and Assistive Technology (pp. 7–10).
Google Scholar
Chomsky, N. (1986). Knowledge of language: Its nature, origin, and use. Santa Barbara, CA: Greenwood Publishing Group.
Google Scholar
Coleman, M., & Liau, T. L. (1975). A computer readability formula designed for machine scoring. Journal of Applied Psychology, 60(2), 283–284.
Article Google Scholar
Coster, W., & Kauchak, D. (2011). Learning to simplify sentences using Wikipedia. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (pp. 1–9).
Google Scholar
Cuetos, F., Domínguez, A., & de Vega, M. (1997). El efecto de la polisemia: ahora lo ves otra vez. Cognitiva, 9(2), 175–194.
Google Scholar
Dale, E., & Chall, J. S. (1948). A formula for predicting readability. Educational research bulletin, 27, 11–20.
Google Scholar
Devlin, S. (1999). Simplifying natural language text for aphasic readers. Ph.D. thesis, University of Sunderland, UK.
Google Scholar
Douma, W. H. (1960). De leesbaarheid van landbouwbladen: een onderzoek naar en een toepassing van leesbaarheidsformules. Landbouwhogeschool Wageningen, Afdeling Sociologie en Sociografie, Bulletin nr. 17.
Google Scholar
Drndarević, B., Štajner, S., Bott, S., Bautista, S. & Saggion, H. (2013). Automatic text simplication in spanish: A comparative evaluation of complementing components. In Proceedings of the 12th International Conference on Intelligent Text Processing and Computational Linguistics. Lecture Notes in Computer Science. Samos, Greece, 24–30 March, (pp. 488–500).
Google Scholar
DuBay, W. H. (2004). The principles of readability. California: Impact Information.
Google Scholar
Feng, L. (2009). Automatic readability assessment for people with intellectual disabilities. In SIGACCESS Accessibility and Computers. number 93, (pp. 84–91). New York, NY, USA: ACM.
Google Scholar
Feng, L., Elhadad, N., & Huenerfauth, M. (2009). Cognitively motivated features for readability assessment. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, EACL ’09, (pp. 229–237), Stroudsburg, PA, USA: Association for Computational Linguistics.
Google Scholar
Flesch, R. (1948). A new readability yardstick. The journal of applied psychology, 32(3), 221–233.
Article Google Scholar
Freyhoff, G., Hess, G., Kerr, L., Tronbacke, B., & Van Der Veken, K. (1998). Make it simple, European guidelines for the production of easy-to read information for people with learning disability. Brussels: ILSMH European Association.
Google Scholar
Glanzer, M., & Bowles, N. (1976). Analysis of the word frequency effect in recognition memory. Journal of Experimental Psychology: Human Learning and Memory, 2, 21–31.
Google Scholar
Glavaš, G., & Štajner, S. (2013). Event-centered simplication of news stories. In Proceedings of the Student Workshop held in conjunction with RANLP 2013, Hissar, Bulgaria (pp. 71–78).
Google Scholar
Gunning, R. (1952). The technique of clear writing. New York: McGraw-Hill.
Google Scholar
Inui, K., Fujita, A., Takahashi, T., Iida, R., & Iwakura, T. (2003). Text simplification for reading assistance: a project note. In Proceedings of the second international workshop on Paraphrasing—Volume 16, PARAPHRASE ’03, (pp. 9–16), Stroudsburg, PA, USA: Association for Computational Linguistics.
Google Scholar
Jastrzembski, J. (1981). Multiple meaning, number or related meanings, frequency of occurrence and the lexicon. Cognitive Psychology, 13, 278–305.
Article Google Scholar
Kincaid, J. P., Fishburne, R. P., Rogers, R. L., & Chissom, B. S. (1975). Derivation of new readability formulas for navy enlisted personnel. Research Branch Report 8–75.
Google Scholar
Martos, J., Freire, S., González, A., Gil, D., & Sebastian, M. (2012). D2.1: Functional requirements specifications and user preference survey. Technical report, FIRST technical report.
Google Scholar
McLaughlin, G. H. (1969). SMOG grading—a new readability formula. Journal of Reading, 22, 639–646.
Google Scholar
Norbury, C. F. (2005). Barking up the wrong tree? lexical ambiguity resolution in children with language impairments and autistic spectrum disorders. Journal of Experimental Child Psychology, 90, 142–171.
Article Google Scholar
Orasan, C., Evans, R., & Dornescu, I. (2013). Towards multilingual Europe 2020: A romanian perspective, chapter text simplification for people with autistic spectrum disorders (pp. 287–312). Bucharest: Romanian Academy Publishing House.
Google Scholar
Petersen, S., & Ostendorf, M. (2009). A machine learning approach to reading level assessment. Computer Speech and Language, 23(1), 89–106.
Article Google Scholar
Petersen, S. E., & Ostendorf, M. (2007). Text simplification for language learners: A corpus analysis. In Proceedings of Workshop on Speech and Language Technology for Education(SLaTE), 69–72.
Google Scholar
PlainLanguage. (2011). Federal plain language guidelines.
Google Scholar
Rello, L. (2012). Dyswebxia: a model to improve accessibility of the textual web for dyslexic users. In SIGACCESS Accessibility and Computers, number 102, (pp. 41–44) New York, NY, USA: ACM.
Google Scholar
Rello, L., Baeza-Yates, R., Bott, S., & Saggion, H. (2013b). Simplify or help? Text simplification strategies for people with dyslexia. In Proceedings of W4A conference, Article no. 15
Google Scholar
Rello, L., Baeza-Yates, R., Dempere, L., & Saggion, H. (2013a). Frequent words improve readability and short words improve understandability for people with dyslexia. In Proceedings of the INTERACT 2013: 14th IFIP TC13 Conference on Human-Computer Interaction. Cape Town, South Africa, pp. 203–219.
Google Scholar
Ruiter, M. B., Rietveld, T. C. M., Cucchiarini C., Krahmer E. J., & Strik, H. (2010). Human language technology and communicative disabilities: requirements and possibilities for the future. In Proceedings of the the seventh international conference on Language Resources and Evaluation (LREC).
Google Scholar
Rybing, J., Smithr, C., & Silvervarg, A. (2010). Towards a rule based system for automatic simplification of texts. In The Third Swedish Language Technology Conference.
Google Scholar
Saggion, H., Gómez Martínez, E., Etayo, E., Anula, A., & Bourg, L. (2011). Text simplification in simplext: Making text more accessible. Revista de la Sociedad Española para el Procesamiento del Lenguaje Natural.
Google Scholar
Schwarm, S. E, & Ostendorf, M. (2005). Reading level assessment using support vector machines and statistical language models. In Proceedings of the 43rd annual meeting of the Association of Computational Linguistics (ACL), pp. 523–530.
Google Scholar
Siddharthan, A. (2006). Syntactic simplification and text cohesion. Research on Language and Computation, 4(1), 77–109.
Article Google Scholar
Smith, E. A., & Senter R. J. (1967) Automated readability index. Technical report, Aerospace Medical Research Laboratories, Wright-Patterson Air Force Base, Ohio.
Google Scholar
Spaulding, S. (1956). A Spanish readability formula. Modern Language Journal 40, 433–441.
Article Google Scholar
UN. (2006) Convention on the rigths of persons with disabilities.
Google Scholar
van Oosten, P., Tanghe, D., & Hoste, V. (2010). Towards an improved methodology for automated readability prediction. In Proceedings of the seventh international conference on language resources and evaluation (LREC10). Valletta, Malta: European Language Resources Association (ELRA), pp. 775–782.
Google Scholar
Vossen, P. (Ed.). (1998) EuroWordNet: A multilingual database with lexical semantic networks. Dordrecht: Kluwer Academic Publishers.
Google Scholar
Štajner, S., Drndarević, B., & Saggion, H. (2013). Corpus-based sentence deletion and split decisions for spanish text simplification. Computación y Systemas, 17(2), 251–262.
Google Scholar
Štajner, S., Evans, R., Orasan, C., & Mitkov, R. (2012). What can readability measures really tell us about text complexity? In Proceedings of the LREC’12 Workshop: Natural Language Processing for Improving Textual Accessibility (NLP4ITA), Istanbul, Turkey.
Google Scholar
Štajner, S., & Saggion, H. (2013b). Adapting text simplification decisions to different text genres and target users. Procesamiento del Lenguaje Natural, 51, 135–142.
Google Scholar
Štajner, S., & Saggion, H. (2013a). Readability indices for automatic evaluation of text simplification systems: A feasability study for spanish. In Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP 2013), Nagoya, Japan, October 14–18, 2013. pp. 374–382.
Google Scholar
Woodsend, K., & Lapata, M. (2011). Learning to simplify sentences with Quasi-synchronous grammar and integer programming. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP).
Google Scholar
Woodsend, K. & Lapata, M. (2011). WikiSimple: automatic simplification of Wikipedia articles. In Proceedings of the 25th AAI Coference on Artificial Intelligence, pp. 374–382.
Google Scholar
Wubben, S., van den Bosch, A., & Krahmer, E. (2012). Sentence simplification by monolingual machine translation. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers—Volume 1, ACL ’12, (pp. 1015–1024) Stroudsburg, PA, USA: Association for Computational Linguistics.
Google Scholar
Zhu, Z., Berndard, D., & Gurevych, I. (2010). A monolingual tree-based translation model for sentence simplification. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), (pp. 1353–1361).
Google Scholar

Download references

Acknowledgements

This work has been partially supported by TRADICOR (Ref: PIE 13-054), EXPERT (Ref: 317471-FP7-PEOPLE-2012-ITN), LATEST (Ref: 327197-FP7-PEOPLE-2012-IEF) and FIRST (Ref: 287607-FP7-ICT-2011-7). The authors would also like to express their gratitude to Horacio Saggion for his very helpful comments and input.

Author information

Authors and Affiliations

Research Group in Computational Linguistics, University of Wolverhampton, Wolverhampton, UK
Sanja Štajner, Ruslan Mitkov & Gloria Corpas Pastor
Research Group in Lexicography and Translation, University of Malaga, Malaga, Spain
Gloria Corpas Pastor

Authors

Sanja Štajner
View author publications
You can also search for this author in PubMed Google Scholar
Ruslan Mitkov
View author publications
You can also search for this author in PubMed Google Scholar
Gloria Corpas Pastor
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sanja Štajner .

Editor information

Editors and Affiliations

CNRS-LIF, UMR 7279, Aix-Marseille University, City, France
Núria Gala
CNRS-LIF, UMR 7279, Aix-Marseille University and University of Mainz, Marseille, France
Reinhard Rapp
CNRS-LIF, UMR 7279, Aix-Marseille University, Marseille, France
Gemma Bel-Enguix

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Štajner, S., Mitkov, R., Corpas Pastor, G. (2015). Simple or Not Simple? A Readability Question. In: Gala, N., Rapp, R., Bel-Enguix, G. (eds) Language Production, Cognition, and the Lexicon. Text, Speech and Language Technology, vol 48. Springer, Cham. https://doi.org/10.1007/978-3-319-08043-7_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-08043-7_22
Published: 12 November 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08042-0
Online ISBN: 978-3-319-08043-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Simple or Not Simple? A Readability Question

Abstract

Access this chapter

Similar content being viewed by others

Automatic Text Simplification in Spanish: A Comparative Evaluation of Complementing Modules

The Text Simplification in TERENCE

An approach to treat numerical information in the text simplification process

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Simple or Not Simple? A Readability Question

Abstract

Access this chapter

Similar content being viewed by others

Automatic Text Simplification in Spanish: A Comparative Evaluation of Complementing Modules

The Text Simplification in TERENCE

An approach to treat numerical information in the text simplification process

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation