Abstract
Question Generation (QG) is an important and challenging problem that has attracted attention from the natural language processing (NLP) community over the last years. QG aims to automatically generate questions given an input. Recent studies in this field typically use widely available question-answering (QA) datasets (in English) and neural models to train and build these QG systems. As lower-resourced languages (e.g. Portuguese) lack large-scale quality QA data, it becomes a significant challenge to experiment with recent neural techniques. This study uses a Portuguese machine-translated version of the SQuAD v1.1 dataset to perform a preliminary analysis of a neural approach to the QG task for Portuguese. We frame our approach as a sequence-to-sequence problem by fine-tuning a pre-trained language model – T5 for generating factoid (or wh)-questions. Despite the evident issues that a machine-translated dataset may bring when using it for training neural models, the automatic evaluation of our Portuguese neural QG models presents results in line with those obtained for English. To the best of our knowledge, this is the first study addressing Neural QG for Portuguese. The code and models are publicly available at https://github.com/bernardoleite/question-generation-t5-pytorch-lightning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
It seems that Du et al. (2017) [16] uses only 70,484 examples after a post-processing step. We use all available instances.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
Answer and paragraph are separated with the eos_token \({<}/\textrm{s}{>}\).
- 12.
References
Amidei, J., Piwek, P., Willis, A.: Evaluation methodologies in automatic question generation 2013–2018. In: Proceedings of the 11th International Conference on Natural Language Generation, pp. 307–317. ACL, Tilburg University, The Netherlands, November 2018. https://doi.org/10.18653/v1/W18-6537. https://aclanthology.org/W18-6537
Azevedo, P., Leite, B., Cardoso, H.L., Silva, D.C., Reis, L.P.: Exploring NLP and information extraction to jointly address question generation and answering. In: Maglogiannis, I., Iliadis, L., Pimenidis, E. (eds.) AIAI 2020. IAICT, vol. 584, pp. 396–407. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49186-4_33
Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, Michigan, pp. 65–72. ACL, June 2005. https://www.aclweb.org/anthology/W05-0909
Bao, H., et al.: UniLMv2: pseudo-masked language models for unified language model pre-training. In: International Conference on Machine Learning, pp. 642–652. PMLR (2020)
Carmo, D., Piau, M., Campiotti, I., Nogueira, R., Lotufo, R.: PTT5: pretraining and validating the T5 model on Brazilian Portuguese data. arXiv preprint arXiv:2008.09144 (2020)
Carrino, C.P., Costa-jussà, M.R., Fonollosa, J.A.R.: Automatic Spanish translation of SQuAD dataset for multi-lingual question answering. In: Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France, pp. 5515–5523. European Language Resources Association, May 2020. https://aclanthology.org/2020.lrec-1.677
Carvalho, N.R.: squad-v1.1-pt (2020). https://github.com/nunorc/squad-v1.1-pt
Carvalho, N.R., Simões, A., Almeida, J.J.: Bootstrapping a data-set and model for question-answering in Portuguese (short paper). In: 10th Symposium on Languages, Applications and Technologies (SLATE 2021). Schloss Dagstuhl-Leibniz-Zentrum für Informatik (2021)
Chan, Y.H., Fan, Y.C.: A recurrent BERT-based model for question generation. In: Proceedings of the 2nd Workshop on Machine Reading for Question Answering, Hong Kong, China, pp. 154–162. ACL, November 2019. https://doi.org/10.18653/v1/D19-5821. https://aclanthology.org/D19-5821
Correia, R., Baptista, J., Eskenazi, M., Mamede, N.: Automatic generation of Cloze question stems. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdigão, F. (eds.) PROPOR 2012. LNCS (LNAI), vol. 7243, pp. 168–178. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28885-2_19
Curto, S.L.: Automatic generation of multiple-choice tests. Master’s thesis, Instituto Superior Técnico (2010). https://fenix.tecnico.ulisboa.pt/departamentos/dei/dissertacao/2353642299631. Publication Title: Dissertation for obtaining the Master Degree in Information Systems and Computer Engineering
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 4171–4186. ACL, June 2019. https://doi.org/10.18653/v1/N19-1423. https://aclanthology.org/N19-1423
d’Hoffschmidt, M., Belblidia, W., Heinrich, Q., Brendlé, T., Vidal, M.: FQuAD: French question answering dataset. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 1193–1208. ACL, November 2020. https://doi.org/10.18653/v1/2020.findings-emnlp.107. https://aclanthology.org/2020.findings-emnlp.107
Diéguez, D., Rodrigues, R., Gomes, P.: Using CBR for Portuguese question generation. In: Proceedings of the 15th Portuguese Conference on Artificial Intelligence, pp. 328–341 (2011)
Dong, L., et al.: Unified language model pre-training for natural language understanding and generation. In: Wallach, H., Larochelle, H., Beygelzimer, A., Alché-Buc, F.d., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019). https://proceedings.neurips.cc/paper/2019/file/c20bb2d9a50d5ac1f713f8b34d9aac5a-Paper.pdf
Du, X., Shao, J., Cardie, C.: Learning to ask: neural question generation for reading comprehension. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada, pp. 1342–1352. ACL, July 2017. https://doi.org/10.18653/v1/P17-1123. https://aclanthology.org/P17-1123
Ferreira, J., Rodrigues, R., Gonçalo Oliveira, H.: Assessing factoid question-answer generation for Portuguese (short paper). In: 9th Symposium on Languages, Applications and Technologies (SLATE 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik (2020)
Gates, D.: Generating look-back strategy questions from expository texts. In: The Workshop on the Question Generation Shared Task and Evaluation Challenge, NSF, Arlington, VA (2008). http://www.cs.memphis.edu/~vrus/questiongeneration//1-Gates-QG08.pdf
Gonçalo Oliveira, H.: Answering fill-in-the-blank questions in Portuguese with transformer language models. In: Marreiros, G., Melo, F.S., Lau, N., Lopes Cardoso, H., Reis, L.P. (eds.) EPIA 2021. LNCS (LNAI), vol. 12981, pp. 739–751. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86230-5_58
Heilman, M., Smith, N.A.: Good question! Statistical ranking for question generation. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Los Angeles, California, pp. 609–617. ACL, June 2010. https://aclanthology.org/N10-1086
Leite, B.: Automatic question generation for the Portuguese language. Master’s thesis, Faculdade de Engenharia da Universidade do Porto (2020). https://repositorio-aberto.up.pt/handle/10216/128541. Dissertation for obtaining the Integrated Master Degree in Informatics and Computer Engineering
Leite, B., Lopes Cardoso, H., Reis, L.P., Soares, C.: Factual question generation for the Portuguese language. In: 2020 International Conference on INnovations in Intelligent SysTems and Applications (INISTA), pp. 1–7. IEEE (2020)
Li, J., Gao, Y., Bing, L., King, I., Lyu, M.R.: Improving question generation with to the point context. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 3216–3226. ACL, November 2019. https://doi.org/10.18653/v1/D19-1317. https://aclanthology.org/D19-1317
Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, Barcelona, Spain, pp. 74–81. ACL, July 2004. https://www.aclweb.org/anthology/W04-1013
Lindberg, D., Popowich, F., Nesbit, J., Winne, P.: Generating natural language questions to support learning on-line. In: Proceedings of the 14th European Workshop on Natural Language Generation, Sofia, Bulgaria, pp. 105–114. ACL, August 2013. https://aclanthology.org/W13-2114
Liu, M., Calvo, R., Rus, V.: G-Asks: an intelligent automatic question generation system for academic writing support. Dialogue Discourse 3, 101–124 (2012). https://doi.org/10.5087/dad.2012.205
Mazidi, K., Nielsen, R.D.: Linguistic considerations in automatic question generation. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Baltimore, Maryland, pp. 321–326. ACL (2014). https://doi.org/10.3115/v1/P14-2053. https://www.aclweb.org/anthology/P14-2053
Pan, L., Lei, W., Chua, T.S., Kan, M.Y.: Recent advances in neural question generation. CoRR abs/1905.0 (2019). http://arxiv.org/abs/1905.08949. eprint: 1905.08949
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, pp. 311–318. ACL, July 2002. https://doi.org/10.3115/1073083.1073135. https://www.aclweb.org/anthology/P02-1040
Pirovani, J., Spalenza, M., Oliveira, E.: Geração Automática de Questões a Partir do Reconhecimento de Entidades Nomeadas em Textos Didáticos. Brazilian Symposium on Computers in Education (Simpósio Brasileiro de Informática na Educação - SBIE) 28(1), 1147 (2017). https://doi.org/10.5753/cbie.sbie.2017.1147. https://www.br-ie.org/pub/index.php/sbie/article/view/7643
Qi, W.: ProphetNet: predicting future n-gram for sequence-to-sequence pre-training. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 2401–2410. ACL, November 2020. https://doi.org/10.18653/v1/2020.findings-emnlp.217. https://aclanthology.org/2020.findings-emnlp.217
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140), 1–67 (2020). http://jmlr.org/papers/v21/20-074.html
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, pp. 2383–2392. ACL, November 2016. https://doi.org/10.18653/v1/D16-1264. https://aclanthology.org/D16-1264
Rus, V., Cai, Z., Graesser, A.: Question generation: example of a multi-year evaluation campaign. In: Proceedings of the WS on the Question Generation Shared Task and Evaluation Challenge (2008)
Wagner Filho, J.A., Wilkens, R., Idiart, M., Villavicencio, A.: The brWaC corpus: a new open resource for Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (2018)
Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., Zhou, M.: MiniLM: deep self-attention distillation for task-agnostic compression of pre-trained transformers. arXiv preprint arXiv:2002.10957 (2020)
Xiao, D., et al.: ERNIE-GEN: an enhanced multi-flow pre-training and fine-tuning framework for natural language generation. In: Bessiere, C. (ed.) Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-2020, pp. 3997–4003. International Joint Conferences on Artificial Intelligence Organization (2020). https://doi.org/10.24963/ijcai.2020/553. https://doi.org/10.24963/ijcai.2020/553
Xie, Z.: Neural text generation: a practical guide. CoRR abs/1711.09534 (2017). http://arxiv.org/abs/1711.09534
Xue, L., et al.: mT5: a massively multilingual pre-trained text-to-text transformer. arXiv preprint arXiv:2010.11934 (2020)
Zhao, Y., Ni, X., Ding, Y., Ke, Q.: Paragraph-level neural question generation with maxout pointer and gated self-attention networks. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3901–3910 (2018)
Acknowledgments
This work was financially supported by Base Funding - UIDB/00027/2020 of the Artificial Intelligence and Computer Science Laboratory - LIACC - funded by national funds through the FCT/MCTES (PIDDAC). Bernardo Leite is supported by a PhD studentship (with reference 2021.05432.BD), funded by Fundação para a Ciência e a Tecnologia (FCT).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Leite, B., Lopes Cardoso, H. (2022). Neural Question Generation for the Portuguese Language: A Preliminary Study. In: Marreiros, G., Martins, B., Paiva, A., Ribeiro, B., Sardinha, A. (eds) Progress in Artificial Intelligence. EPIA 2022. Lecture Notes in Computer Science(), vol 13566. Springer, Cham. https://doi.org/10.1007/978-3-031-16474-3_63
Download citation
DOI: https://doi.org/10.1007/978-3-031-16474-3_63
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16473-6
Online ISBN: 978-3-031-16474-3
eBook Packages: Computer ScienceComputer Science (R0)