Abstract
Readability is a linguistic feature that indicates how difficult it is to read a text. Traditional readability formulas were made for the English language. This study evaluates their adequacy to the Portuguese language. We applied the traditional formulas in 10 parallel corpora. We verified that the Portuguese language had higher grade scores (less readability) in the formulas that use the number of syllables per words or number of complex words per sentence. Formulas that use letters by words instead of syllables by words output similar grade scores. Considering this, we evaluated the correlation of the complex words in 65 Portuguese school books of 12 schooling years. We found out that the concept of complex word as a word with 4 or more syllables, instead of 3 or more syllables as originally used in traditional formulas applied to English texts, is more correlated with the grade of Portuguese school books. In the end, for each traditional readability formula, we adapted it to the Portuguese language performing a multiple linear regression in the same dataset of school books.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cha, M., Gwon, Y., Kung, H.T.: Language modeling by clustering with word embeddings for text readability assessment. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM 2017, pp. 2003–2006. ACM, New York (2017)
Coleman, M., Liau, T.L.: A computer readability formula designed for machine scoring. J. Appl. Psychol. 60, 283–284 (1975)
Collins-Thompson, K.: Computational assessment of text readability: a survey of current and future research. ITL - Int. J. Appl. Linguist 165(2), 97–135 (2015)
Feng, L., Jansche, M., Huenerfauth, M., Elhadad, N.: A comparison of features for automatic readability assessment. In: COLING 2010 Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 276–284. Association for Computational Linguistics, Stroudsburg (2010)
François, T., Miltsakaki, E.: Do NLP and machine learning improve traditional readability formulas? In: Proceedings of the First Workshop on Predicting and Improving Text Readability for Target Reader Populations, PITR 2012, pp. 49–57. Association for Computational Linguistics, Stroudsburg (2012)
Gunning, R.: The Technique of Clear Writing. McGraw-Hill, New York (1952)
Jiang, Z., Gu, Q., Yin, Y., Chen, D.: Enriching word embeddings with domain knowledge for readability assessment. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 366–378. Association for Computational Linguistics, Santa Fe (2018)
Kincaid, J.: Derivation of new readability formulas: (automated readability index, fog count and Flesch reading ease formula) for navy enlisted personnel. Research Branch report, Chief of Naval Technical Training, Naval Air Station Memphis (1975)
Kolahi, S., Shirvani, E.: A comparative study of the readability of english textbooks of translation and their Persian translations. Int. J. Linguist. 4, 344–366 (2012)
Martins, T.B.F., Ghiraldelo, C.M., Nunes, M.D.G.V., Oliveira Junior, O.N.D.: Readability Formulas Applied to Textbooks in Brazilian Portuguese (1996)
McLaughlin, H.G.: SMOG grading - a new readability formula. J. Read. 12(8), 639–646 (1969)
Smith, E.A., Senter, R.: Automated readability index. In: AMRL-TR. Aerospace Medical Research Laboratories, pp. 1–14 (1967)
Tiedemann, J.: News from OPUS - a collection of multilingual parallel corpora with tools and interfaces. In: Nicolov, N., Bontcheva, K., Angelova, G., Mitkov, R. (eds.) Recent Advances in Natural Language Processing, vol. V, pp. 237–248. John Benjamins, Amsterdam (2009)
Tiedemann, J.: Parallel data, tools and interfaces in opus. In: Chair, N.C.C., et al. (eds.) Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA), Istanbul, Turkey, May 2012
Tillman, R., Hagberg, L.: Readability algorithms compability on multiple languages (2014)
Acknowledgment
This work is financed by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia within the project: UID/EEA/50014/2019. We would also like to thank the Master in Informatics and Computing Engineering of the Faculty of Engineering of the University of Porto for supporting the registration and travel costs.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Antunes, H., Lopes, C.T. (2019). Analyzing the Adequacy of Readability Indicators to a Non-English Language. In: Crestani, F., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2019. Lecture Notes in Computer Science(), vol 11696. Springer, Cham. https://doi.org/10.1007/978-3-030-28577-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-28577-7_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28576-0
Online ISBN: 978-3-030-28577-7
eBook Packages: Computer ScienceComputer Science (R0)