Advertisement

Simple or Not Simple? A Readability Question

  • Sanja Štajner
  • Ruslan Mitkov
  • Gloria Corpas Pastor
Chapter
Part of the Text, Speech and Language Technology book series (TLTB, volume 48)

Abstract

Text Simplification (TS) has taken off as an important Natural Language Processing (NLP) application which promises to offer a significant societal impact in that it can be employed to the benefit of users with limited language comprehension skills such as children, foreigners who do not have a good command of a language, and readers struggling with a language disability. With the recent emergence of various TS systems, the question we are faced with is how to automatically evaluate their performance given that access to target users might be difficult. This chapter addresses one aspect of this issue by exploring whether existing readability formulae could be applied to assess the level of simplification offered by a TS system. It focuses on three readability indices for Spanish. The indices are first adapted in a way that allows them to be computed automatically and then applied to two corpora of original and manually simplified texts. The first corpus has been compiled as part of the Simplext project targeting people with Down syndrom, and the second corpus as part of the FIRST project, where the users are people with autism spectrum disorder. The experiments show that there is a significant correlation between each of the readability indices and eighteen linguistically motivated features which might be seen as reading obstacles for various target populations, thus indicating the possibility of using those indices as a measure of the degree of simplification achieved by TS systems. Various ways they can be used in TS are further illustrated by comparing their values when applied to four different corpora.

Keywords

Text simplification Readability Automatic evaluation Readers with Down syndrom and autism spectrum disorder 

Notes

Acknowledgements

This work has been partially supported by TRADICOR (Ref: PIE 13-054), EXPERT (Ref: 317471-FP7-PEOPLE-2012-ITN), LATEST (Ref: 327197-FP7-PEOPLE-2012-IEF) and FIRST (Ref: 287607-FP7-ICT-2011-7). The authors would also like to express their gratitude to Horacio Saggion for his very helpful comments and input.

References

  1. Alu´ısio, S. M., Specia, L., Pardo, T. A. S., Maziero, E. G., Caseli, H. M., & Fortes, R. P. M. (2008). A corpus analysis of simple account texts and the proposal of simplification strategies: first steps towards text simplification systems. In Proceedings of the 26th annual ACM international conference on Design of communication, SIGDOC ’08, (pp. 15–22). New York, NY, USA: ACM.Google Scholar
  2. Anula, A. (2007). Tipos de textos, complejidad lingüística y facilicitaci´on lectora. In Actas del Sexto Congreso de Hispanistas de Asia, (pp. 45–61).Google Scholar
  3. Aranzabe, M. J., D´ıaz De Ilarraza, A., & Gonz´alez, I. (2012). First approach to automatic text simplification in basque. In Proceedings of the first Natural Language Processing for Improving Textual Accessibility Workshop (NLP4ITA).Google Scholar
  4. American Psychiatric Association. (2013). Diagnostic and Statistical Manual of Mental Disorders (5th ed.). Arlington, VA: American Psychiatric Publishing.Google Scholar
  5. Balota, D., Cortese, M. J., Sergent-Marshall, S. D., Spieler, D. H., & Yap, M. J. (2004). Visual word recognition of single-syllabe words. Journal of Experimental Psychology: General, 133, 283–316.CrossRefGoogle Scholar
  6. Barlacchi, G., & Tonelli, S. (2013). ERNESTA: A sentence simplification tool for childrens stories in italian. In Computational Linguistics and Intelligent Text Processing.Google Scholar
  7. Barzilay, R., & Elhadad, N. (2003). Sentence alignment for monolingual comparable corpora. In Proceedings of the 2003 conference on Empirical methods in natural language processing, EMNLP ’03 (pp. 25–32). Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
  8. Brouwer, R. H. M. (1963). Onderzoek naar de leesmoeilijkheden van nederlands proza. Pedagogische studiën, 40, 454–464.Google Scholar
  9. Carroll, J., Minnen, G., Canning, Y., Devlin, S., & Tait, J. (1998). Practical simplification of english newspaper text to assist aphasic readers. In Proceedings of AAAI-98 Workshop on Integrating Artificial Intelligence and Assistive Technology (pp. 7–10).Google Scholar
  10. Chomsky, N. (1986). Knowledge of language: Its nature, origin, and use. Santa Barbara, CA: Greenwood Publishing Group.Google Scholar
  11. Coleman, M., & Liau, T. L. (1975). A computer readability formula designed for machine scoring. Journal of Applied Psychology, 60(2), 283–284.CrossRefGoogle Scholar
  12. Coster, W., & Kauchak, D. (2011). Learning to simplify sentences using Wikipedia. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (pp. 1–9).Google Scholar
  13. Cuetos, F., Domínguez, A., & de Vega, M. (1997). El efecto de la polisemia: ahora lo ves otra vez. Cognitiva, 9(2), 175–194.Google Scholar
  14. Dale, E., & Chall, J. S. (1948). A formula for predicting readability. Educational research bulletin, 27, 11–20.Google Scholar
  15. Devlin, S. (1999). Simplifying natural language text for aphasic readers. Ph.D. thesis, University of Sunderland, UK.Google Scholar
  16. Douma, W. H. (1960). De leesbaarheid van landbouwbladen: een onderzoek naar en een toepassing van leesbaarheidsformules. Landbouwhogeschool Wageningen, Afdeling Sociologie en Sociografie, Bulletin nr. 17.Google Scholar
  17. Drndarević, B., Štajner, S., Bott, S., Bautista, S. & Saggion, H. (2013). Automatic text simplication in spanish: A comparative evaluation of complementing components. In Proceedings of the 12th International Conference on Intelligent Text Processing and Computational Linguistics. Lecture Notes in Computer Science. Samos, Greece, 24–30 March, (pp. 488–500).Google Scholar
  18. DuBay, W. H. (2004). The principles of readability. California: Impact Information.Google Scholar
  19. Feng, L. (2009). Automatic readability assessment for people with intellectual disabilities. In SIGACCESS Accessibility and Computers. number 93, (pp. 84–91). New York, NY, USA: ACM.Google Scholar
  20. Feng, L., Elhadad, N., & Huenerfauth, M. (2009). Cognitively motivated features for readability assessment. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, EACL ’09, (pp. 229–237), Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
  21. Flesch, R. (1948). A new readability yardstick. The journal of applied psychology, 32(3), 221–233.CrossRefGoogle Scholar
  22. Freyhoff, G., Hess, G., Kerr, L., Tronbacke, B., & Van Der Veken, K. (1998). Make it simple, European guidelines for the production of easy-to read information for people with learning disability. Brussels: ILSMH European Association.Google Scholar
  23. Glanzer, M., & Bowles, N. (1976). Analysis of the word frequency effect in recognition memory. Journal of Experimental Psychology: Human Learning and Memory, 2, 21–31.Google Scholar
  24. Glavaš, G., & Štajner, S. (2013). Event-centered simplication of news stories. In Proceedings of the Student Workshop held in conjunction with RANLP 2013, Hissar, Bulgaria (pp. 71–78).Google Scholar
  25. Gunning, R. (1952). The technique of clear writing. New York: McGraw-Hill.Google Scholar
  26. Inui, K., Fujita, A., Takahashi, T., Iida, R., & Iwakura, T. (2003). Text simplification for reading assistance: a project note. In Proceedings of the second international workshop on Paraphrasing—Volume 16, PARAPHRASE ’03, (pp. 9–16), Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
  27. Jastrzembski, J. (1981). Multiple meaning, number or related meanings, frequency of occurrence and the lexicon. Cognitive Psychology, 13, 278–305.CrossRefGoogle Scholar
  28. Kincaid, J. P., Fishburne, R. P., Rogers, R. L., & Chissom, B. S. (1975). Derivation of new readability formulas for navy enlisted personnel. Research Branch Report 8–75.Google Scholar
  29. Martos, J., Freire, S., González, A., Gil, D., & Sebastian, M. (2012). D2.1: Functional requirements specifications and user preference survey. Technical report, FIRST technical report.Google Scholar
  30. McLaughlin, G. H. (1969). SMOG grading—a new readability formula. Journal of Reading, 22, 639–646.Google Scholar
  31. Norbury, C. F. (2005). Barking up the wrong tree? lexical ambiguity resolution in children with language impairments and autistic spectrum disorders. Journal of Experimental Child Psychology, 90, 142–171.CrossRefGoogle Scholar
  32. Orasan, C., Evans, R., & Dornescu, I. (2013). Towards multilingual Europe 2020: A romanian perspective, chapter text simplification for people with autistic spectrum disorders (pp. 287–312). Bucharest: Romanian Academy Publishing House.Google Scholar
  33. Petersen, S., & Ostendorf, M. (2009). A machine learning approach to reading level assessment. Computer Speech and Language, 23(1), 89–106.CrossRefGoogle Scholar
  34. Petersen, S. E., & Ostendorf, M. (2007). Text simplification for language learners: A corpus analysis. In Proceedings of Workshop on Speech and Language Technology for Education(SLaTE), 69–72.Google Scholar
  35. PlainLanguage. (2011). Federal plain language guidelines.Google Scholar
  36. Rello, L. (2012). Dyswebxia: a model to improve accessibility of the textual web for dyslexic users. In SIGACCESS Accessibility and Computers, number 102, (pp. 41–44) New York, NY, USA: ACM.Google Scholar
  37. Rello, L., Baeza-Yates, R., Bott, S., & Saggion, H. (2013b). Simplify or help? Text simplification strategies for people with dyslexia. In Proceedings of W4A conference, Article no. 15Google Scholar
  38. Rello, L., Baeza-Yates, R., Dempere, L., & Saggion, H. (2013a). Frequent words improve readability and short words improve understandability for people with dyslexia. In Proceedings of the INTERACT 2013: 14th IFIP TC13 Conference on Human-Computer Interaction. Cape Town, South Africa, pp. 203–219.Google Scholar
  39. Ruiter, M. B., Rietveld, T. C. M., Cucchiarini C., Krahmer E. J., & Strik, H. (2010). Human language technology and communicative disabilities: requirements and possibilities for the future. In Proceedings of the the seventh international conference on Language Resources and Evaluation (LREC).Google Scholar
  40. Rybing, J., Smithr, C., & Silvervarg, A. (2010). Towards a rule based system for automatic simplification of texts. In The Third Swedish Language Technology Conference.Google Scholar
  41. Saggion, H., Gómez Martínez, E., Etayo, E., Anula, A., & Bourg, L. (2011). Text simplification in simplext: Making text more accessible. Revista de la Sociedad Española para el Procesamiento del Lenguaje Natural.Google Scholar
  42. Schwarm, S. E, & Ostendorf, M. (2005). Reading level assessment using support vector machines and statistical language models. In Proceedings of the 43rd annual meeting of the Association of Computational Linguistics (ACL), pp. 523–530.Google Scholar
  43. Siddharthan, A. (2006). Syntactic simplification and text cohesion. Research on Language and Computation, 4(1), 77–109.CrossRefGoogle Scholar
  44. Smith, E. A., & Senter R. J. (1967) Automated readability index. Technical report, Aerospace Medical Research Laboratories, Wright-Patterson Air Force Base, Ohio.Google Scholar
  45. Spaulding, S. (1956). A Spanish readability formula. Modern Language Journal 40, 433–441.CrossRefGoogle Scholar
  46. UN. (2006) Convention on the rigths of persons with disabilities.Google Scholar
  47. van Oosten, P., Tanghe, D., & Hoste, V. (2010). Towards an improved methodology for automated readability prediction. In Proceedings of the seventh international conference on language resources and evaluation (LREC10). Valletta, Malta: European Language Resources Association (ELRA), pp. 775–782.Google Scholar
  48. Vossen, P. (Ed.). (1998) EuroWordNet: A multilingual database with lexical semantic networks. Dordrecht: Kluwer Academic Publishers.Google Scholar
  49. Štajner, S., Drndarević, B., & Saggion, H. (2013). Corpus-based sentence deletion and split decisions for spanish text simplification. Computación y Systemas, 17(2), 251–262.Google Scholar
  50. Štajner, S., Evans, R., Orasan, C., & Mitkov, R. (2012). What can readability measures really tell us about text complexity? In Proceedings of the LREC’12 Workshop: Natural Language Processing for Improving Textual Accessibility (NLP4ITA), Istanbul, Turkey.Google Scholar
  51. Štajner, S., & Saggion, H. (2013b). Adapting text simplification decisions to different text genres and target users. Procesamiento del Lenguaje Natural, 51, 135–142.Google Scholar
  52. Štajner, S., & Saggion, H. (2013a). Readability indices for automatic evaluation of text simplification systems: A feasability study for spanish. In Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP 2013), Nagoya, Japan, October 14–18, 2013. pp. 374–382.Google Scholar
  53. Woodsend, K., & Lapata, M. (2011). Learning to simplify sentences with Quasi-synchronous grammar and integer programming. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP).Google Scholar
  54. Woodsend, K. & Lapata, M. (2011). WikiSimple: automatic simplification of Wikipedia articles. In Proceedings of the 25th AAI Coference on Artificial Intelligence, pp. 374–382.Google Scholar
  55. Wubben, S., van den Bosch, A., & Krahmer, E. (2012). Sentence simplification by monolingual machine translation. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers—Volume 1, ACL ’12, (pp. 1015–1024) Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar
  56. Zhu, Z., Berndard, D., & Gurevych, I. (2010). A monolingual tree-based translation model for sentence simplification. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), (pp. 1353–1361).Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Sanja Štajner
    • 1
  • Ruslan Mitkov
    • 1
  • Gloria Corpas Pastor
    • 1
    • 2
  1. 1.Research Group in Computational LinguisticsUniversity of WolverhamptonWolverhamptonUK
  2. 2.Research Group in Lexicography and TranslationUniversity of MalagaMalagaSpain

Personalised recommendations