Skip to main content

Automatic Classification of the Complexity of Nonfiction Texts in Portuguese for Early School Years

  • Conference paper
  • First Online:
Computational Processing of the Portuguese Language (PROPOR 2016)

Abstract

Recent research shows that most Brazilian students have serious problems regarding their reading skills. The full development of this skill is key for the academic and professional future of every citizen. Tools for classifying the complexity of reading materials for children aim to improve the quality of the model of teaching reading and text comprehension. For English, Feng’s work [11] is considered the state-of-art in grade level prediction and achieved 74 % of accuracy in automatically classifying 4 levels of textual complexity for close school grades. There are no classifiers for nonfiction texts for close grades in Portuguese. In this article, we propose a scheme for manual annotation of texts in 5 grade levels, which will be used for customized reading to avoid the lack of interest by students who are more advanced in reading and the blocking of those that still need to make further progress. We obtained 52 % of accuracy in classifying texts into 5 levels and 74 % in 3 levels. The results prove to be promising when compared to the state-of-art work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Available at oecd.org/education/PISA-2012-results-brazil.pdf.

  2. 2.

    Available at primeiro-livro.com.

  3. 3.

    Available at rede.novaescolaclube.org.br.

  4. 4.

    Available at lexile.com.

  5. 5.

    Available at tea.cohmetrix.com.

  6. 6.

    Available at nilc.icmc.usp.br/coh-metrix-port.

  7. 7.

    Available at www.weeklyreader.com.

  8. 8.

    Provinha Brasil is a test to evaluate how much children have learned about Portuguese and Mathematics subjects. Available at provinhabrasil.inep.gov.br.

  9. 9.

    Available at http://www.educacao.sp.gov.br/saresp.

  10. 10.

    Prova Brasil is a test to evaluate the quality of the educational brazilian system. Available at http://portal.mec.gov.br/prova-brasil.

  11. 11.

    Available at nilc.icmc.usp.br/nilc/images/download/corpusNilc.zip.

  12. 12.

    Available at chc.cienciahoje.uol.com.br.

  13. 13.

    Available at www.folha.uol.com.br/folhinha.

  14. 14.

    Available at zh.clicrbs.com.br/rs.

  15. 15.

    Available at mundoestranho.abril.com.br.

  16. 16.

    Available at sites.google.com/site/provassaresp.

  17. 17.

    It was used a libsvm implementation of SVM classifier.

  18. 18.

    Available at http://143.107.183.175:22680.

  19. 19.

    Available at http://143.107.183.175:21380/portlex/index.php/en/liwc.

References

  1. Aluisio, S., Specia, L., Gasperin, C., Scarton, C.: Readability assessment for text simplification. In: Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 1–9. Association for Computational Linguistics (2010)

    Google Scholar 

  2. Aluísio, S.M., Pinheiro, G.M., Manfrin, A.M., de Oliveira, L.H., Genoves Jr., L.C., Tagnin, S.E.: The lácio-web: corpora and tools to advance brazilian portuguese language investigations and computational linguistic tools. In: Proceedings of LREC, pp. 1779–1782 (2004)

    Google Scholar 

  3. Aluísio, S.M., Gasperin, C.: Fostering digital inclusion and accessibility: the porsimples project for simplification of portuguese texts. In: Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas, pp. 46–53. Association for Computational Linguistics (2010)

    Google Scholar 

  4. Bakhtin, M.: Estética da criação verbal. Livraria Martins Fontes, São Paulo (2003)

    Google Scholar 

  5. Bick, E.: The Parsing System “Palavras”: Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Aarhus University Press, Aarhus (2000)

    Google Scholar 

  6. Biderman, M.T.C.: Dicionários do português: da tradição à contemporaneidade. ALFA: Revista de Linguística 47(1) (2003)

    Google Scholar 

  7. Cimadon, É.: Funções executivas em crianças com dificuldade de leitura (2012)

    Google Scholar 

  8. Collins-Thompson, K., Bennett, P.N., White, R.W., de la Chica, S., Sontag, D.: Personalizing web search results by reading level. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 403–412. ACM (2011)

    Google Scholar 

  9. Curto, P.: Classificador de textos para o ensino de português como segunda língua. Master’s thesis, Universidade Técnico Lisboa, Portugal (2014)

    Google Scholar 

  10. Dell’Orletta, F., Venturi, G., Cimino, A., Montemagni, S.: T2k2: system for automatically extracting and organizing knowledge from texts. In: Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014) (2014)

    Google Scholar 

  11. Feng, L., Jansche, M., Huenerfauth, M., Elhadad, N.: A comparison of features for automatic readability assessment. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, COLING 2010, pp. 276–284. Association for Computational Linguistics (2010)

    Google Scholar 

  12. Flor, M., Klebanov, B.B.: Associative lexical cohesion as a factor in text complexity. Int. J. Appl. Linguist. 165(2), 223–258 (2014)

    Google Scholar 

  13. Forsyth, J.N.: Automatic Readability Detection for Modern Standard Arabic. Master’s thesis, Brigham Young University, United States

    Google Scholar 

  14. François, T.: An analysis of a french as a foreign language corpus for readability assessment. NEALT Proc. Ser. 22, 13–32 (2014)

    Google Scholar 

  15. Fulcher, K.Y., White, P.D.: Randomised controlled trial of graded exercise in patients with the chronic fatigue syndrome. BMJ 314(7095), 1647–1652 (1997)

    Article  Google Scholar 

  16. Giangiacomo, M.C.P.B., Navas, A.L.G.P.: A influência da memória operacional nas habilidades de compreensão de leitura em escolares de 4\({\rm ^a}\) série influence of working memory in reading comprehension in 4th grade students. Sociedade Brasileira de Fonoaudiologia 13(1), 69–74 (2008)

    Article  Google Scholar 

  17. Graesser, A.C., McNamara, D.S., Kulikowich, J.M.: Coh-metrix providing multilevel analyses of text characteristics. Edu. Res. 40(5), 223–234 (2011)

    Article  Google Scholar 

  18. Graesser, A.C., McNamara, D.S., Louwerse, M.M., Cai, Z.: Coh-metrix: analysis of text on cohesion and language. Behav. Res. Methods, Instrum. Comput. 36(2), 193–202 (2004)

    Article  Google Scholar 

  19. Hancke, J., Vajjala, S., Meurers, D.: Readability classification for german using lexical, syntactic, and morphological features. In: Proceedings of COLING, pp. 1063–1080 (2012)

    Google Scholar 

  20. Hovy, E., Lavid, J.: Towards a ‘science’of corpus annotation: a new methodological challenge for corpus linguistics. Int. J. Transl. 22(1), 13–36 (2010)

    Google Scholar 

  21. Kato, M.: O aprendizado da leitura. Martins Fontes, São Paulo (1985)

    Google Scholar 

  22. Kato, M.A.: No mundo da escrita: uma perspectiva psicolingüística, vol. 9. Editora Ática (1986)

    Google Scholar 

  23. da Graça Krieger, M.: Dicionários para o ensino de língua materna: princípios e critérios de escolha. Revista Língua & Literatura 7(10-11), 101–112 (2012)

    Google Scholar 

  24. Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33(1), 159–174 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  25. Lennon, C., Burdick, H.: The lexile framework as an approach for reading measurement and success. Electronic publication on www.lexile.com (2004)

  26. LoPucki, L.M.: System and method for enhancing comprehension and readability of legal text (2014). US Patent 8, 794–972

    Google Scholar 

  27. Maia, M.: Gramática e parser. Boletim da ABRALIN 1(26), 288–291 (2001)

    Google Scholar 

  28. Maia, M.: Efeitos do status argumental e de segmentação no processamento de sintagmas preposicionais em português brasileiro. Cadernos de Estudos Lingüísticos 50(1) (2011)

    Google Scholar 

  29. Maia, M., Finger, I.: Processamento da linguagem. Educat, Pelotas (2005)

    Google Scholar 

  30. Martins, T.B., Ghiraldelo, C.M., Nunes, M.d.G.V., de Oliveira Jr., O.N.: Readability formulas applied to textbooks in brazilian portuguese. Icmsc-Usp (1996)

    Google Scholar 

  31. Maziero, E.G., Pardo, T.A.S., Aluísio, S.M.: Ferramenta de análise automática de inteligibilidade de córpus (aic). Technical report (2008)

    Google Scholar 

  32. Navas, A.L.G.P., Pinto, J.C.B.R., Dellisa, P.R.R.: Avanços no conhecimento do processamento da fluência em leitura: da palavra ao texto improvements in the knowledge of the reading fluency processing: from word to text. Sociedade Brasileira de Fonoaudiologia 14(3), 553–9 (2009)

    Article  Google Scholar 

  33. O’Reilly, T., Sinclair, G., McNamara, D.S.: istart: a web-based reading strategy intervention that improves students’s science comprehension. In: CELDA, pp. 173–180 (2004)

    Google Scholar 

  34. Petersen, S.E., Ostendorf, M.: A machine learning approach to reading level assessment. Comput. Speech Lang. 23(1), 89–106 (2009)

    Article  Google Scholar 

  35. de Salles, J.S.F., Parente, M.A.d.M.P.: Heterogeneidade nas estratégias de leitura/ escrita em crianças com dificuldades de leitura e escrita. Psico 37(1), 83–90

    Google Scholar 

  36. San Norberto, E.M., Gómez-Alonso, D., Trigueros, J.M., Quiroga, J., Gualis, J., Vaquero, C.: Readability of surgical informed consent in spain. Cirugía Española 92(3), 201–207 (2014)

    Article  Google Scholar 

  37. Scarton, C., Aluísio, S.: Análise da Inteligibilidade de textos via ferramentas de Processamento de Língua Natural: adaptando as métricas do Coh-Metrix para o Português. Linguamática 2(1), 45–62 (2010)

    Google Scholar 

  38. Sheehan, K.M., Flor, M., Napolitano, D.: A two-stage approach for generating unbiased estimates of text complexity. In: Proceedings of the Workshop on Natural Language Processing for Improving Textual Accessibility, pp. 49–58 (2013)

    Google Scholar 

  39. Stenner, A.J.: Measuring reading comprehension with the lexile framework (1996)

    Google Scholar 

  40. Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2002)

    MATH  Google Scholar 

  41. Vajjala, S., Meurers, D.: Readability assessment for text simplification: from analysing documents to identifying sentential simplifications. Int. J. Appl. Linguist. 165(2), 194–222 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nathan Hartmann .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Hartmann, N., Cucatto, L., Brants, D., Aluísio, S. (2016). Automatic Classification of the Complexity of Nonfiction Texts in Portuguese for Early School Years. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds) Computational Processing of the Portuguese Language. PROPOR 2016. Lecture Notes in Computer Science(), vol 9727. Springer, Cham. https://doi.org/10.1007/978-3-319-41552-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41552-9_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41551-2

  • Online ISBN: 978-3-319-41552-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics