Abstract
Recent research shows that most Brazilian students have serious problems regarding their reading skills. The full development of this skill is key for the academic and professional future of every citizen. Tools for classifying the complexity of reading materials for children aim to improve the quality of the model of teaching reading and text comprehension. For English, Feng’s work [11] is considered the state-of-art in grade level prediction and achieved 74 % of accuracy in automatically classifying 4 levels of textual complexity for close school grades. There are no classifiers for nonfiction texts for close grades in Portuguese. In this article, we propose a scheme for manual annotation of texts in 5 grade levels, which will be used for customized reading to avoid the lack of interest by students who are more advanced in reading and the blocking of those that still need to make further progress. We obtained 52 % of accuracy in classifying texts into 5 levels and 74 % in 3 levels. The results prove to be promising when compared to the state-of-art work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Available at oecd.org/education/PISA-2012-results-brazil.pdf.
- 2.
Available at primeiro-livro.com.
- 3.
Available at rede.novaescolaclube.org.br.
- 4.
Available at lexile.com.
- 5.
Available at tea.cohmetrix.com.
- 6.
Available at nilc.icmc.usp.br/coh-metrix-port.
- 7.
Available at www.weeklyreader.com.
- 8.
Provinha Brasil is a test to evaluate how much children have learned about Portuguese and Mathematics subjects. Available at provinhabrasil.inep.gov.br.
- 9.
Available at http://www.educacao.sp.gov.br/saresp.
- 10.
Prova Brasil is a test to evaluate the quality of the educational brazilian system. Available at http://portal.mec.gov.br/prova-brasil.
- 11.
Available at nilc.icmc.usp.br/nilc/images/download/corpusNilc.zip.
- 12.
Available at chc.cienciahoje.uol.com.br.
- 13.
Available at www.folha.uol.com.br/folhinha.
- 14.
Available at zh.clicrbs.com.br/rs.
- 15.
Available at mundoestranho.abril.com.br.
- 16.
Available at sites.google.com/site/provassaresp.
- 17.
It was used a libsvm implementation of SVM classifier.
- 18.
Available at http://143.107.183.175:22680.
- 19.
Available at http://143.107.183.175:21380/portlex/index.php/en/liwc.
References
Aluisio, S., Specia, L., Gasperin, C., Scarton, C.: Readability assessment for text simplification. In: Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 1–9. Association for Computational Linguistics (2010)
Aluísio, S.M., Pinheiro, G.M., Manfrin, A.M., de Oliveira, L.H., Genoves Jr., L.C., Tagnin, S.E.: The lácio-web: corpora and tools to advance brazilian portuguese language investigations and computational linguistic tools. In: Proceedings of LREC, pp. 1779–1782 (2004)
Aluísio, S.M., Gasperin, C.: Fostering digital inclusion and accessibility: the porsimples project for simplification of portuguese texts. In: Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas, pp. 46–53. Association for Computational Linguistics (2010)
Bakhtin, M.: Estética da criação verbal. Livraria Martins Fontes, São Paulo (2003)
Bick, E.: The Parsing System “Palavras”: Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Aarhus University Press, Aarhus (2000)
Biderman, M.T.C.: Dicionários do português: da tradição à contemporaneidade. ALFA: Revista de Linguística 47(1) (2003)
Cimadon, É.: Funções executivas em crianças com dificuldade de leitura (2012)
Collins-Thompson, K., Bennett, P.N., White, R.W., de la Chica, S., Sontag, D.: Personalizing web search results by reading level. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 403–412. ACM (2011)
Curto, P.: Classificador de textos para o ensino de português como segunda língua. Master’s thesis, Universidade Técnico Lisboa, Portugal (2014)
Dell’Orletta, F., Venturi, G., Cimino, A., Montemagni, S.: T2k2: system for automatically extracting and organizing knowledge from texts. In: Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014) (2014)
Feng, L., Jansche, M., Huenerfauth, M., Elhadad, N.: A comparison of features for automatic readability assessment. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, COLING 2010, pp. 276–284. Association for Computational Linguistics (2010)
Flor, M., Klebanov, B.B.: Associative lexical cohesion as a factor in text complexity. Int. J. Appl. Linguist. 165(2), 223–258 (2014)
Forsyth, J.N.: Automatic Readability Detection for Modern Standard Arabic. Master’s thesis, Brigham Young University, United States
François, T.: An analysis of a french as a foreign language corpus for readability assessment. NEALT Proc. Ser. 22, 13–32 (2014)
Fulcher, K.Y., White, P.D.: Randomised controlled trial of graded exercise in patients with the chronic fatigue syndrome. BMJ 314(7095), 1647–1652 (1997)
Giangiacomo, M.C.P.B., Navas, A.L.G.P.: A influência da memória operacional nas habilidades de compreensão de leitura em escolares de 4\({\rm ^a}\) série influence of working memory in reading comprehension in 4th grade students. Sociedade Brasileira de Fonoaudiologia 13(1), 69–74 (2008)
Graesser, A.C., McNamara, D.S., Kulikowich, J.M.: Coh-metrix providing multilevel analyses of text characteristics. Edu. Res. 40(5), 223–234 (2011)
Graesser, A.C., McNamara, D.S., Louwerse, M.M., Cai, Z.: Coh-metrix: analysis of text on cohesion and language. Behav. Res. Methods, Instrum. Comput. 36(2), 193–202 (2004)
Hancke, J., Vajjala, S., Meurers, D.: Readability classification for german using lexical, syntactic, and morphological features. In: Proceedings of COLING, pp. 1063–1080 (2012)
Hovy, E., Lavid, J.: Towards a ‘science’of corpus annotation: a new methodological challenge for corpus linguistics. Int. J. Transl. 22(1), 13–36 (2010)
Kato, M.: O aprendizado da leitura. Martins Fontes, São Paulo (1985)
Kato, M.A.: No mundo da escrita: uma perspectiva psicolingüística, vol. 9. Editora Ática (1986)
da Graça Krieger, M.: Dicionários para o ensino de língua materna: princípios e critérios de escolha. Revista Língua & Literatura 7(10-11), 101–112 (2012)
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33(1), 159–174 (1977)
Lennon, C., Burdick, H.: The lexile framework as an approach for reading measurement and success. Electronic publication on www.lexile.com (2004)
LoPucki, L.M.: System and method for enhancing comprehension and readability of legal text (2014). US Patent 8, 794–972
Maia, M.: Gramática e parser. Boletim da ABRALIN 1(26), 288–291 (2001)
Maia, M.: Efeitos do status argumental e de segmentação no processamento de sintagmas preposicionais em português brasileiro. Cadernos de Estudos Lingüísticos 50(1) (2011)
Maia, M., Finger, I.: Processamento da linguagem. Educat, Pelotas (2005)
Martins, T.B., Ghiraldelo, C.M., Nunes, M.d.G.V., de Oliveira Jr., O.N.: Readability formulas applied to textbooks in brazilian portuguese. Icmsc-Usp (1996)
Maziero, E.G., Pardo, T.A.S., Aluísio, S.M.: Ferramenta de análise automática de inteligibilidade de córpus (aic). Technical report (2008)
Navas, A.L.G.P., Pinto, J.C.B.R., Dellisa, P.R.R.: Avanços no conhecimento do processamento da fluência em leitura: da palavra ao texto improvements in the knowledge of the reading fluency processing: from word to text. Sociedade Brasileira de Fonoaudiologia 14(3), 553–9 (2009)
O’Reilly, T., Sinclair, G., McNamara, D.S.: istart: a web-based reading strategy intervention that improves students’s science comprehension. In: CELDA, pp. 173–180 (2004)
Petersen, S.E., Ostendorf, M.: A machine learning approach to reading level assessment. Comput. Speech Lang. 23(1), 89–106 (2009)
de Salles, J.S.F., Parente, M.A.d.M.P.: Heterogeneidade nas estratégias de leitura/ escrita em crianças com dificuldades de leitura e escrita. Psico 37(1), 83–90
San Norberto, E.M., Gómez-Alonso, D., Trigueros, J.M., Quiroga, J., Gualis, J., Vaquero, C.: Readability of surgical informed consent in spain. Cirugía Española 92(3), 201–207 (2014)
Scarton, C., Aluísio, S.: Análise da Inteligibilidade de textos via ferramentas de Processamento de Língua Natural: adaptando as métricas do Coh-Metrix para o Português. Linguamática 2(1), 45–62 (2010)
Sheehan, K.M., Flor, M., Napolitano, D.: A two-stage approach for generating unbiased estimates of text complexity. In: Proceedings of the Workshop on Natural Language Processing for Improving Textual Accessibility, pp. 49–58 (2013)
Stenner, A.J.: Measuring reading comprehension with the lexile framework (1996)
Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2002)
Vajjala, S., Meurers, D.: Readability assessment for text simplification: from analysing documents to identifying sentential simplifications. Int. J. Appl. Linguist. 165(2), 194–222 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Hartmann, N., Cucatto, L., Brants, D., Aluísio, S. (2016). Automatic Classification of the Complexity of Nonfiction Texts in Portuguese for Early School Years. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds) Computational Processing of the Portuguese Language. PROPOR 2016. Lecture Notes in Computer Science(), vol 9727. Springer, Cham. https://doi.org/10.1007/978-3-319-41552-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-41552-9_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41551-2
Online ISBN: 978-3-319-41552-9
eBook Packages: Computer ScienceComputer Science (R0)