Skip to main content

Neural Question Generation for the Portuguese Language: A Preliminary Study

  • Conference paper
  • First Online:
Progress in Artificial Intelligence (EPIA 2022)

Abstract

Question Generation (QG) is an important and challenging problem that has attracted attention from the natural language processing (NLP) community over the last years. QG aims to automatically generate questions given an input. Recent studies in this field typically use widely available question-answering (QA) datasets (in English) and neural models to train and build these QG systems. As lower-resourced languages (e.g. Portuguese) lack large-scale quality QA data, it becomes a significant challenge to experiment with recent neural techniques. This study uses a Portuguese machine-translated version of the SQuAD v1.1 dataset to perform a preliminary analysis of a neural approach to the QG task for Portuguese. We frame our approach as a sequence-to-sequence problem by fine-tuning a pre-trained language model – T5 for generating factoid (or wh)-questions. Despite the evident issues that a machine-translated dataset may bring when using it for training neural models, the automatic evaluation of our Portuguese neural QG models presents results in line with those obtained for English. To the best of our knowledge, this is the first study addressing Neural QG for Portuguese. The code and models are publicly available at https://github.com/bernardoleite/question-generation-t5-pytorch-lightning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/bernardoleite/question-generation-portuguese.

  2. 2.

    https://www.tensorflow.org/datasets/catalog/c4.

  3. 3.

    https://github.com/rajpurkar/SQuAD-explorer/tree/master/dataset.

  4. 4.

    https://github.com/xinyadu/nqg/tree/master/data/raw.

  5. 5.

    It seems that Du et al. (2017) [16] uses only 70,484 examples after a post-processing step. We use all available instances.

  6. 6.

    https://www.pytorchlightning.ai/.

  7. 7.

    https://huggingface.co/.

  8. 8.

    https://huggingface.co/t5-base.

  9. 9.

    https://huggingface.co/unicamp-dl/ptt5-base-portuguese-vocab.

  10. 10.

    https://huggingface.co/google/mt5-base.

  11. 11.

    Answer and paragraph are separated with the eos_token \({<}/\textrm{s}{>}\).

  12. 12.

    https://github.com/xinyadu/nqg/tree/master/qgevalcap.

References

  1. Amidei, J., Piwek, P., Willis, A.: Evaluation methodologies in automatic question generation 2013–2018. In: Proceedings of the 11th International Conference on Natural Language Generation, pp. 307–317. ACL, Tilburg University, The Netherlands, November 2018. https://doi.org/10.18653/v1/W18-6537. https://aclanthology.org/W18-6537

  2. Azevedo, P., Leite, B., Cardoso, H.L., Silva, D.C., Reis, L.P.: Exploring NLP and information extraction to jointly address question generation and answering. In: Maglogiannis, I., Iliadis, L., Pimenidis, E. (eds.) AIAI 2020. IAICT, vol. 584, pp. 396–407. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49186-4_33

    Chapter  Google Scholar 

  3. Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, Michigan, pp. 65–72. ACL, June 2005. https://www.aclweb.org/anthology/W05-0909

  4. Bao, H., et al.: UniLMv2: pseudo-masked language models for unified language model pre-training. In: International Conference on Machine Learning, pp. 642–652. PMLR (2020)

    Google Scholar 

  5. Carmo, D., Piau, M., Campiotti, I., Nogueira, R., Lotufo, R.: PTT5: pretraining and validating the T5 model on Brazilian Portuguese data. arXiv preprint arXiv:2008.09144 (2020)

  6. Carrino, C.P., Costa-jussà, M.R., Fonollosa, J.A.R.: Automatic Spanish translation of SQuAD dataset for multi-lingual question answering. In: Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France, pp. 5515–5523. European Language Resources Association, May 2020. https://aclanthology.org/2020.lrec-1.677

  7. Carvalho, N.R.: squad-v1.1-pt (2020). https://github.com/nunorc/squad-v1.1-pt

  8. Carvalho, N.R., Simões, A., Almeida, J.J.: Bootstrapping a data-set and model for question-answering in Portuguese (short paper). In: 10th Symposium on Languages, Applications and Technologies (SLATE 2021). Schloss Dagstuhl-Leibniz-Zentrum für Informatik (2021)

    Google Scholar 

  9. Chan, Y.H., Fan, Y.C.: A recurrent BERT-based model for question generation. In: Proceedings of the 2nd Workshop on Machine Reading for Question Answering, Hong Kong, China, pp. 154–162. ACL, November 2019. https://doi.org/10.18653/v1/D19-5821. https://aclanthology.org/D19-5821

  10. Correia, R., Baptista, J., Eskenazi, M., Mamede, N.: Automatic generation of Cloze question stems. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdigão, F. (eds.) PROPOR 2012. LNCS (LNAI), vol. 7243, pp. 168–178. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28885-2_19

    Chapter  Google Scholar 

  11. Curto, S.L.: Automatic generation of multiple-choice tests. Master’s thesis, Instituto Superior Técnico (2010). https://fenix.tecnico.ulisboa.pt/departamentos/dei/dissertacao/2353642299631. Publication Title: Dissertation for obtaining the Master Degree in Information Systems and Computer Engineering

  12. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 4171–4186. ACL, June 2019. https://doi.org/10.18653/v1/N19-1423. https://aclanthology.org/N19-1423

  13. d’Hoffschmidt, M., Belblidia, W., Heinrich, Q., Brendlé, T., Vidal, M.: FQuAD: French question answering dataset. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 1193–1208. ACL, November 2020. https://doi.org/10.18653/v1/2020.findings-emnlp.107. https://aclanthology.org/2020.findings-emnlp.107

  14. Diéguez, D., Rodrigues, R., Gomes, P.: Using CBR for Portuguese question generation. In: Proceedings of the 15th Portuguese Conference on Artificial Intelligence, pp. 328–341 (2011)

    Google Scholar 

  15. Dong, L., et al.: Unified language model pre-training for natural language understanding and generation. In: Wallach, H., Larochelle, H., Beygelzimer, A., Alché-Buc, F.d., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019). https://proceedings.neurips.cc/paper/2019/file/c20bb2d9a50d5ac1f713f8b34d9aac5a-Paper.pdf

  16. Du, X., Shao, J., Cardie, C.: Learning to ask: neural question generation for reading comprehension. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada, pp. 1342–1352. ACL, July 2017. https://doi.org/10.18653/v1/P17-1123. https://aclanthology.org/P17-1123

  17. Ferreira, J., Rodrigues, R., Gonçalo Oliveira, H.: Assessing factoid question-answer generation for Portuguese (short paper). In: 9th Symposium on Languages, Applications and Technologies (SLATE 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik (2020)

    Google Scholar 

  18. Gates, D.: Generating look-back strategy questions from expository texts. In: The Workshop on the Question Generation Shared Task and Evaluation Challenge, NSF, Arlington, VA (2008). http://www.cs.memphis.edu/~vrus/questiongeneration//1-Gates-QG08.pdf

  19. Gonçalo Oliveira, H.: Answering fill-in-the-blank questions in Portuguese with transformer language models. In: Marreiros, G., Melo, F.S., Lau, N., Lopes Cardoso, H., Reis, L.P. (eds.) EPIA 2021. LNCS (LNAI), vol. 12981, pp. 739–751. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86230-5_58

    Chapter  Google Scholar 

  20. Heilman, M., Smith, N.A.: Good question! Statistical ranking for question generation. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Los Angeles, California, pp. 609–617. ACL, June 2010. https://aclanthology.org/N10-1086

  21. Leite, B.: Automatic question generation for the Portuguese language. Master’s thesis, Faculdade de Engenharia da Universidade do Porto (2020). https://repositorio-aberto.up.pt/handle/10216/128541. Dissertation for obtaining the Integrated Master Degree in Informatics and Computer Engineering

  22. Leite, B., Lopes Cardoso, H., Reis, L.P., Soares, C.: Factual question generation for the Portuguese language. In: 2020 International Conference on INnovations in Intelligent SysTems and Applications (INISTA), pp. 1–7. IEEE (2020)

    Google Scholar 

  23. Li, J., Gao, Y., Bing, L., King, I., Lyu, M.R.: Improving question generation with to the point context. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 3216–3226. ACL, November 2019. https://doi.org/10.18653/v1/D19-1317. https://aclanthology.org/D19-1317

  24. Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, Barcelona, Spain, pp. 74–81. ACL, July 2004. https://www.aclweb.org/anthology/W04-1013

  25. Lindberg, D., Popowich, F., Nesbit, J., Winne, P.: Generating natural language questions to support learning on-line. In: Proceedings of the 14th European Workshop on Natural Language Generation, Sofia, Bulgaria, pp. 105–114. ACL, August 2013. https://aclanthology.org/W13-2114

  26. Liu, M., Calvo, R., Rus, V.: G-Asks: an intelligent automatic question generation system for academic writing support. Dialogue Discourse 3, 101–124 (2012). https://doi.org/10.5087/dad.2012.205

    Article  Google Scholar 

  27. Mazidi, K., Nielsen, R.D.: Linguistic considerations in automatic question generation. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Baltimore, Maryland, pp. 321–326. ACL (2014). https://doi.org/10.3115/v1/P14-2053. https://www.aclweb.org/anthology/P14-2053

  28. Pan, L., Lei, W., Chua, T.S., Kan, M.Y.: Recent advances in neural question generation. CoRR abs/1905.0 (2019). http://arxiv.org/abs/1905.08949. eprint: 1905.08949

  29. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, pp. 311–318. ACL, July 2002. https://doi.org/10.3115/1073083.1073135. https://www.aclweb.org/anthology/P02-1040

  30. Pirovani, J., Spalenza, M., Oliveira, E.: Geração Automática de Questões a Partir do Reconhecimento de Entidades Nomeadas em Textos Didáticos. Brazilian Symposium on Computers in Education (Simpósio Brasileiro de Informática na Educação - SBIE) 28(1), 1147 (2017). https://doi.org/10.5753/cbie.sbie.2017.1147. https://www.br-ie.org/pub/index.php/sbie/article/view/7643

  31. Qi, W.: ProphetNet: predicting future n-gram for sequence-to-sequence pre-training. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 2401–2410. ACL, November 2020. https://doi.org/10.18653/v1/2020.findings-emnlp.217. https://aclanthology.org/2020.findings-emnlp.217

  32. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140), 1–67 (2020). http://jmlr.org/papers/v21/20-074.html

  33. Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, pp. 2383–2392. ACL, November 2016. https://doi.org/10.18653/v1/D16-1264. https://aclanthology.org/D16-1264

  34. Rus, V., Cai, Z., Graesser, A.: Question generation: example of a multi-year evaluation campaign. In: Proceedings of the WS on the Question Generation Shared Task and Evaluation Challenge (2008)

    Google Scholar 

  35. Wagner Filho, J.A., Wilkens, R., Idiart, M., Villavicencio, A.: The brWaC corpus: a new open resource for Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (2018)

    Google Scholar 

  36. Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., Zhou, M.: MiniLM: deep self-attention distillation for task-agnostic compression of pre-trained transformers. arXiv preprint arXiv:2002.10957 (2020)

  37. Xiao, D., et al.: ERNIE-GEN: an enhanced multi-flow pre-training and fine-tuning framework for natural language generation. In: Bessiere, C. (ed.) Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-2020, pp. 3997–4003. International Joint Conferences on Artificial Intelligence Organization (2020). https://doi.org/10.24963/ijcai.2020/553. https://doi.org/10.24963/ijcai.2020/553

  38. Xie, Z.: Neural text generation: a practical guide. CoRR abs/1711.09534 (2017). http://arxiv.org/abs/1711.09534

  39. Xue, L., et al.: mT5: a massively multilingual pre-trained text-to-text transformer. arXiv preprint arXiv:2010.11934 (2020)

  40. Zhao, Y., Ni, X., Ding, Y., Ke, Q.: Paragraph-level neural question generation with maxout pointer and gated self-attention networks. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3901–3910 (2018)

    Google Scholar 

Download references

Acknowledgments

This work was financially supported by Base Funding - UIDB/00027/2020 of the Artificial Intelligence and Computer Science Laboratory - LIACC - funded by national funds through the FCT/MCTES (PIDDAC). Bernardo Leite is supported by a PhD studentship (with reference 2021.05432.BD), funded by Fundação para a Ciência e a Tecnologia (FCT).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bernardo Leite .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Leite, B., Lopes Cardoso, H. (2022). Neural Question Generation for the Portuguese Language: A Preliminary Study. In: Marreiros, G., Martins, B., Paiva, A., Ribeiro, B., Sardinha, A. (eds) Progress in Artificial Intelligence. EPIA 2022. Lecture Notes in Computer Science(), vol 13566. Springer, Cham. https://doi.org/10.1007/978-3-031-16474-3_63

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16474-3_63

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16473-6

  • Online ISBN: 978-3-031-16474-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics