Skip to main content
Log in

Text simplification resources for Spanish

  • SI: Resources for language learning
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

In this paper we present the development of a text simplification system for Spanish. Text simplification is the adaptation of a text for the special needs of certain groups of readers, such as language learners, people with cognitive difficulties, and elderly people, among others. There is a clear need for simplified texts, but manual production and adaptation of existing text is labour-intensive and costly. Automatic simplification is a field which attracts growing attention in Natural Language Processing, but, to the best of our knowledge, there are no existing simplification tools for Spanish. We present a corpus study which aims to identify the operations a text simplification system needs to carry out in order to produce an output similar to what human editors produce when they simplify news texts. We also present a first prototype for automatic simplification, which shows that the most important simplification operations can be successfully treated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. http://www.un.org/disabilities.

  2. http://www.fundacionprodis.org/.

  3. http://www.lecturafacil.net/.

  4. http://www.lattlast.se/.

  5. http://8sidor.lattlast.se.

  6. http://www.klartale.no.

  7. http://cours.funoc.be/essentiel/.

  8. http://www.wablieft.be.

  9. http://www.dr.dk/Nyheder/Ligetil/Presse/Artikler/om.htm.

  10. http://www.dueparole.it.

  11. http://papunet.net/selko.

  12. http://www.noticiasfacil.es.

  13. http://www.literacyworks.org/learningresources/.

  14. http://www.inclusion-europe.org.

  15. http://simple.wikipedia.org.

  16. Corpus de referencia del español actual, http://www.rae.es.

  17. They report an improvement from 61.6 to 78.2 % of accuracy when including syntax and morpho-syntax in addition to basic counts and lexical information.

  18. For example the sentence Álex de la Iglesia, the director of the Academy, announced his resignation contains the information that Álex de la Iglesia is the director of the Academy, which can be expressed in a separate sentence.

  19. Corpus de referencia del español actual, http://www.rae.es.

  20. Light-to-right order in the tree does not necessarily represent linear order of words in the sentence.

  21. Such errors included cases where a preposition required by the construction was missing (such as a/in in the case of (i-a)) or an article was wrongly inserted before a proper name (as este/this in (i-b)).

    The errors which were encountered were later corrected by improving the rules. The sentences used for rule improvement are excluded from the test set for future evaluations.

  22. The annotation scheme did not allow us to calculate recall for the two different relative clause splitting operations separately and these values are not listed in Table 4.

References

  • Aluísio, S. M., Specia, L., Pardo, T. A. S., Maziero, E. G., & de Mattos Fortes, R. P. (2008). Towards Brazilian Portuguese automatic text simplification systems. In ACM symposium on document engineering (pp. 240–248).

  • Anula, A. (2007). Tipos de textos, complejidad lingüística y facilicitación lectora. In Man-Ki, Jy-Eun, y Macas (Eds.), Actas del Sexto Congreso de Hispanistas de Asia (pp. 45–61). República de Corea: Seúl.

  • Anula, A. (2008). Lecturas adaptadas a la enseñanza del español como L2: variables lingüísticas para la determinación del nivel de legibilidad. In Pastor y Roca (Eds.) La evaluación en el aprendizaje y la enseñanza del español como LE/L2 (pp. 162–170). Alicante.

  • Anula, A. (2011). Pautas básicas de simplificación textual y diseño del corpus SIMPLEXT. Technical report, Grupo DILES. Madrid, Spain: Universidad Autónoma de Madrid.

  • Aranzabe, M., de Ilarraza, A., & Gonzalez-Dios, I. (2012). First approach to automatic text simplification in Basque. In Natural language processing for improving textual accessibility (NLP4ITA) workshop programme (pp. 1–8).

  • Barthe, K., Juaneda, C., Leseigneur, D., Loquet, J.-C., Morin, C., Escande, J., et al. (1999). GIFAS rationalized French: A controlled language for aerospace documentation in French. Technical Communication, 46(2), 220–229.

    Google Scholar 

  • Barzilay, R., & Elhadad, N. (2003). Sentence alignment for monolingual comparable corpora. In Proceedings of the 2003 conference on empirical methods in natural language processing (pp. 25–32).

  • Bohnet, B. (2009). Efficient parsing of syntactic and semantic dependency structures. In Proceedings of the conference on natural language learning (CoNLL) (pp. 67–72). Boulder, Colorado: Association for Computational Linguistics.

  • Bohnet, B., Langjahr, A., & Wanner, L. (2000). A development environment for an MTT-based sentence generator. In Proceedings of the first international conference on natural language generation (pp. 260–263). Stroudsburg, PA, USA: Association for Computational Linguistics.

  • Bott, S., Rello, L., Drndarević, B., & Saggion, H. (2012). Can Spanish be simpler? LexSiS: Lexical Simplification for Spanish. In Proceedings of Coling 2012: The 24th International Conference on Computational Linguistics.

  • Bott, S., & Saggion, H. (2011). An unsupervised alignment algorithm for text simplification corpus construction. In Workshop on monolingual text-to-text generation, co-located with ACL 2011 Porland, Oregon.

  • Bouayad-Agha, N., Casamayor, G., Ferraro, G., & Wanner, L. (2009). Simplification of patent claim sentences for their paraphrasing and summarization. In FLAIRS Conference.

  • Brown, K. (1995). Current Issues in Plain English. ARIS Bulletin, 6(4).

  • Canning, Y., Tait, J., Archibald, J., & Crawley, R. (2000). Cohesive generation of syntactically simplified newspaper text. In Proceedings of the third international workshop on text, speech and dialogue (pp. 145–150).

  • Carroll, J., Minnen, G., Canning, Y., Devlin, S., & Tait, J. (1998). Practical simplification of English Newspaper text to assist aphasic readers. In Proceedings of AAAI-98 workshop on integrating artificial intelligence and assistive technology (pp. 7–10).

  • Chandrasekar, R., Doran, C., & Srinivas, B. (1996). Motivations and methods for text simplification. In Proceedings of the international conference on computational Linguistics (pp. 1041–1044).

  • Coster, W., & Kauchak, D. (2011). Learning to simplify sentences using Wikipedia. In Proceedings of the workshop on monolingual text-to-text generation (pp. 1–9). Portland, Oregon: Association for Computational Linguistics.

  • Crossley, S. A., & Mcnamara, D. S. (2008). Assessing L2 reading texts at the intermediate level: An approximate replication of Crossley, Louwerse, McCarthy & McNamara (2007). Language Teaching, 41(03), 409–429.

    Article  Google Scholar 

  • Daelemans, W., Hthker, A., & Sang, E. T. K. (2004). Automatic sentence simplification for subtitling in Dutch and English. In Proceedings of the 4th conference on language resources and evaluation (pp. 1045–1048). ELRA.

  • De Belder, J., Deschacht, K., & Moens, M. (2010). Lexical simplification. In Proceedings of ITEC2010: 1st international conference on interdisciplinary research on technology, education and communication.

  • Dell’Orletta, F., Montemagni, S., & Venturi, G. (2011). Read-it: Assessing readability of italian texts with a view to text simplification. In Proceedings of the second workshop on speech and language processing for assistive technologies (pp. 73–83). Association for Computational Linguistics.

  • Devlin, S., & Tait, J. (1998). The use of a psycholinguistic database in the simplification of text for aphasic readers. Linguistic Databases, 161–173.

  • DuBay, W. (2004). The principles of readability. Impact Information, 1–76.

  • Feng, L., Elhadad, N., & Huenerfauth, M. (2009). Cognitively motivated features for readability assessment. In EACL (pp. 229–237).

  • Feng, L., Jansche, M., Huenerfauth, M., & Elhadad, N. (2010). A comparison of features for automatic readability assessment. In Proceedings of the international conference on computational Linguistics (Posters) (pp. 276–284).

  • Flesch, R. (1948). A new readability yardstick. Journal of applied psychology, 32(3), 221–233.

    Article  Google Scholar 

  • Gasperin, C., Maziero, E. G., & Aluísio, S. M. (2010). Challenging choices for text simplification. In PROPOR (pp. 40–50).

  • Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). Coh-Metrix: Analysis of text on cohesion and language. Behavior Research Methods Instruments Computers a Journal of the Psychonomic Society Inc, 36(2), 193–202.

    Article  Google Scholar 

  • Hyönä, J., & Olson, R. (1995). Eye fixation patterns among dyslexic and normal readers: Effects of word length and word frequency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21(6), 1430.

    Google Scholar 

  • Inui, K., Fujita, A., Takahashi, T., Iida, R., & Iwakura, T. (2003). Text simplification for reading assistance: A project note. In Proceedings of the second international workshop on Paraphrasing—volume 16, PARAPHRASE ’03 (pp. 9–16). Stroudsburg, PA, USA: Association for Computational Linguistics.

  • Jing, H. (2002). Using Hidden Markov Modeling to decompose human-written summaries. Computational Linguistics, 28, 527–543.

    Article  Google Scholar 

  • Klebanov, B. B., Knight, K., & Marcu, D. (2004). Text simplification for information-seeking applications. In On the move to meaningful internet systems, lecture notes in computer science (pp. 735–747). Berlin: Springer.

  • Krifka, M. (2007). Approximate interpretation of number words: A case for strategic communication. Cognitive Foundations of Interpretation, 111–126.

  • Max, A. (2006). Writing for language-impaired readers. In Proceedings of the conference on intelligent text processing and computational Linguistics (pp. 567–570).

  • Maynard, D., Tablan, V., Cunningham, H., Ursu, C., Saggion, H., Bontcheva, K., et al. (2002). Architectural elements of language engineering robustness. Journal of Natural Language Engineering – Special Issue on Robust Methods in Analysis of Natural Language Data, 8(2/3), 257–274.

    Google Scholar 

  • Mille, S., & Wanner, L. (2008). Making text resources accessible to the reader: The case of patent claims. In Proceedings of the language resources and evaluation conference, Marrakech (Marocco).

  • Ogden, C. K. (1937). Basic English: A general introduction with rules and grammar. London: Paul Treber.

    Google Scholar 

  • Ong, E., Damay, J., Lojico, G., Lu, K., & Tarantan, D. (2008). Simplifying text in medical literature. Journal of Research in Science, Computing and Engineering, 4(1).

  • Padró, L., Collado, M., Reese, S., Lloberes, M., & Castellón, I. (2010). FreeLing 2.1: Five years of open-source language processing tools. In Proceedings of 7th language resources and evaluation conference Malta: La Valletta.

  • Petersen, S. E., & Ostendorf, M. (2007). Text simplification for language learners: A corpus analysis. In Proceedings of workshop on speech and language technology for education.

  • Petz, A., & Tronbacke, B. (2008). People with specific learning difficulties: Easy to read and HCI. In ICCHP (pp. 690–692).

  • Pitler, E., & Nenkova, A. (2008). Revisiting readability: A unified framework for predicting text quality. In Proceedings of the conference on empirical methods in natural language processing, EMNLP ’08 (pp. 186–195). Stroudsburg, PA, USA: Association for Computational Linguistics.

  • Power, R., & Williams, S. (2011). Generating numerical approximations. Computational Linguistics, 38(1), 113–134.

    Article  Google Scholar 

  • Rodríguez Diéguez, J., Moro Berihuete, P., & Cabero Pérez, M. (1992). La predicción de la lecturabilidad de los textos escritos. In X Congreso Nacional de Pedagogía Salamanca.

  • Saggion, H. (2008). Automatic summarization: An overview. Revue française de linguistique appliquée, XIII(1).

  • Saggion, H., Gmez-Martnez, E., Etayo, E., Anula, A., & Bourg, L. (2011). Text simplification in simplext: Making text more accessible. Revista de la Sociedad Espaola para el Procesamiento del Lenguaje Natural, 47, 341–342.

    Google Scholar 

  • Seretan, V. (2012). Acquisition of syntactic simplification rules for French. In Chair), N. C. C., Choukri, K., Declerck, T., Dogan, M. U., Maegaard, B., Mariani, J., Odijk, J., & Piperidis, S. (Eds.), Proceedings of the eight international conference on language resources and evaluation (LREC’12). Istanbul, Turkey: European Language Resources Association (ELRA).

  • Siddharthan, A. (2002). An architecture for a text simplification system. In Proceedings of the language engineering conference (LEC’02) (pp. 64–71).

  • Siddharthan, A. (2011). Text simplification using typed dependencies: A comparison of the robustness of different generation strategies. In Proceedings of the 13th European workshop on natural language generation (ENLG) (pp. 2–11).

  • Specia, L. (2010). Translating from complex to simplified sentences. In PROPOR (pp. 30–39).

  • Specia, L., Jauhar, S. K., & Mihalcea, R. (2012). SemEval-2012 task 1: English lexical simplification. In Proceedings of the first joint conference on lexical and computational semantics—volume 1: Proceedings of the main conference and the shared task, and volume 2: Proceedings of the sixth international workshop on semantic evaluation, SemEval ’12 (pp. 347–355). Stroudsburg, PA, USA: Association for Computational Linguistics.

  • Tanguy, L., & Tulechki, N. (2009). Sentence complexity in French: A corpus-based approach. Intelligent information systems (IIS), pages 1–14.

  • Watanabe, W. M., Junior, A. C., de Uzêda, V. R., de Mattos Fortes, R. P., Pardo, T. A. S., & Aluísio, S. M. (2009). Facilita: Reading assistance for low-literacy readers. In SIGDOC (pp. 29–36).

  • Woodsend, K., & Lapata, M. (2011). Learning to simplify sentences with quasi-synchronous grammar and integer programming. In Proceedings of the conference on empirical methods in natural language processing (pp. 409–420). Association for Computational Linguistics.

  • Zhu, Z., Bernhard, D., & Gurevych, I. (2010). A monolingual tree-based translation model for sentence simplification. In Proceedings of The 23rd international conference on computational linguistics (pp. 1353–1361). Beijing, China.

Download references

Acknowledgments

We are grateful to five anonymous reviewers for their very constructive comments and insights which helped us improve the final version of the paper. We would also like to thank Simon Mille for his substantial help with the MATE grammar framework. The research described in this paper arises from a Spanish research project called Simplext: An automatic system for text simplification (http://www.simplext.es). Simplext is led by Technosite and partially funded by the Ministry of Industry, Tourism and Trade of the Government of Spain, through the National Plan of Scientific Research, Development and Technological Innovation (I+D+i), within the strategic Action of Telecommunications and Information Society (Avanza Competitiveness, with file number TSI-020302-2010-84). We are grateful to the fellowship RYC-2009-04291 from Programa Ramón y Cajal 2009 and to the project SKATER-UPF-TALN (TIN2012-38584-C06-03), Ministerio de Economía y Competitividad, Secretaría de Estado de Investigación, Desarrollo e Innovación, Spain. We are grateful to Biljana Drndarevic for proofreading the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Horacio Saggion.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bott, S., Saggion, H. Text simplification resources for Spanish. Lang Resources & Evaluation 48, 93–120 (2014). https://doi.org/10.1007/s10579-014-9265-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-014-9265-4

Keywords

Navigation