Analyzing the Adequacy of Readability Indicators to a Non-English Language

  • Hélder AntunesEmail author
  • Carla Teixeira LopesEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11696)


Readability is a linguistic feature that indicates how difficult it is to read a text. Traditional readability formulas were made for the English language. This study evaluates their adequacy to the Portuguese language. We applied the traditional formulas in 10 parallel corpora. We verified that the Portuguese language had higher grade scores (less readability) in the formulas that use the number of syllables per words or number of complex words per sentence. Formulas that use letters by words instead of syllables by words output similar grade scores. Considering this, we evaluated the correlation of the complex words in 65 Portuguese school books of 12 schooling years. We found out that the concept of complex word as a word with 4 or more syllables, instead of 3 or more syllables as originally used in traditional formulas applied to English texts, is more correlated with the grade of Portuguese school books. In the end, for each traditional readability formula, we adapted it to the Portuguese language performing a multiple linear regression in the same dataset of school books.


Readability Portuguese language Text simplification Natural language processing 



This work is financed by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia within the project: UID/EEA/50014/2019. We would also like to thank the Master in Informatics and Computing Engineering of the Faculty of Engineering of the University of Porto for supporting the registration and travel costs.


  1. 1.
    Cha, M., Gwon, Y., Kung, H.T.: Language modeling by clustering with word embeddings for text readability assessment. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM 2017, pp. 2003–2006. ACM, New York (2017)Google Scholar
  2. 2.
    Coleman, M., Liau, T.L.: A computer readability formula designed for machine scoring. J. Appl. Psychol. 60, 283–284 (1975)CrossRefGoogle Scholar
  3. 3.
    Collins-Thompson, K.: Computational assessment of text readability: a survey of current and future research. ITL - Int. J. Appl. Linguist 165(2), 97–135 (2015)CrossRefGoogle Scholar
  4. 4.
    Feng, L., Jansche, M., Huenerfauth, M., Elhadad, N.: A comparison of features for automatic readability assessment. In: COLING 2010 Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 276–284. Association for Computational Linguistics, Stroudsburg (2010)Google Scholar
  5. 5.
    François, T., Miltsakaki, E.: Do NLP and machine learning improve traditional readability formulas? In: Proceedings of the First Workshop on Predicting and Improving Text Readability for Target Reader Populations, PITR 2012, pp. 49–57. Association for Computational Linguistics, Stroudsburg (2012)Google Scholar
  6. 6.
    Gunning, R.: The Technique of Clear Writing. McGraw-Hill, New York (1952)Google Scholar
  7. 7.
    Jiang, Z., Gu, Q., Yin, Y., Chen, D.: Enriching word embeddings with domain knowledge for readability assessment. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 366–378. Association for Computational Linguistics, Santa Fe (2018)Google Scholar
  8. 8.
    Kincaid, J.: Derivation of new readability formulas: (automated readability index, fog count and Flesch reading ease formula) for navy enlisted personnel. Research Branch report, Chief of Naval Technical Training, Naval Air Station Memphis (1975)Google Scholar
  9. 9.
    Kolahi, S., Shirvani, E.: A comparative study of the readability of english textbooks of translation and their Persian translations. Int. J. Linguist. 4, 344–366 (2012)Google Scholar
  10. 10.
    Martins, T.B.F., Ghiraldelo, C.M., Nunes, M.D.G.V., Oliveira Junior, O.N.D.: Readability Formulas Applied to Textbooks in Brazilian Portuguese (1996)Google Scholar
  11. 11.
    McLaughlin, H.G.: SMOG grading - a new readability formula. J. Read. 12(8), 639–646 (1969)Google Scholar
  12. 12.
    Smith, E.A., Senter, R.: Automated readability index. In: AMRL-TR. Aerospace Medical Research Laboratories, pp. 1–14 (1967)Google Scholar
  13. 13.
    Tiedemann, J.: News from OPUS - a collection of multilingual parallel corpora with tools and interfaces. In: Nicolov, N., Bontcheva, K., Angelova, G., Mitkov, R. (eds.) Recent Advances in Natural Language Processing, vol. V, pp. 237–248. John Benjamins, Amsterdam (2009)CrossRefGoogle Scholar
  14. 14.
    Tiedemann, J.: Parallel data, tools and interfaces in opus. In: Chair, N.C.C., et al. (eds.) Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA), Istanbul, Turkey, May 2012Google Scholar
  15. 15.
    Tillman, R., Hagberg, L.: Readability algorithms compability on multiple languages (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Faculdade de Engenharia da Universidade do PortoPortoPortugal
  2. 2.INESC TECPortoPortugal

Personalised recommendations