Skip to main content

Portuguese Neural Text Simplification Using Machine Translation

  • Conference paper
  • First Online:
Intelligent Systems (BRACIS 2021)

Abstract

Automatic Text Simplification (ATS) has played a significant role in the Natural Language Processing (NLP) field. ATS is a sequence-to-sequence problem aiming to create a new version of the original text removing complex and domain-specific words. It can improve communication and understanding of documents from specific domains, as well as support second language learning. This paper presents an empirical study on the use of state-of-the-art ATS methods to simplify texts in Portuguese. It is important to remark that the literature reports the challenge in analyzing Portuguese texts due to the lack of resources compared to other languages (i.e., English). More specifically, this work evaluated different Neural Machine Translation (NMT) techniques for ATS in Portuguese. The experiments showed that NMT achieved promising results in Portuguese texts, obtaining 40.89 BLEU score using multiple parallel corpora and raising the overall readability score by more than 5 points.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    versions are available at: http://altamiro.comunidades.net/biblias.

  2. 2.

    SARI and BLEU score implementation: https://github.com/feralvam/easse.

  3. 3.

    BLEU score scale: https://cloud.google.com/translate/automl/docs/evaluate.

References

  1. Al-Onaizan, Y., et al.: Statistical machine translation. In: Final Report, JHU Summer Workshop, vol. 30 (1999)

    Google Scholar 

  2. Al-Thanyyan, S.S., Azmi, A.M.: Automated text simplification: a survey. ACM Comput. Surv. (CSUR) 54(2), 1–36 (2021)

    Article  Google Scholar 

  3. Aluisio, S., Specia, L., Gasperin, C., Scarton, C.: Readability assessment for text simplification. In: Proceedings of the NAACL HLT 2010 5th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 1–9 (2010)

    Google Scholar 

  4. Aluísio, S.M., Gasperin, C.: Fostering digital inclusion and accessibility: the PorSimples project for simplification of Portuguese texts. In: Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas, pp. 46–53. Association for Computational Linguistics (2010)

    Google Scholar 

  5. Alva-Manchego, F., Martin, L., Scarton, C., Specia, L.: EASSE: easier automatic sentence simplification evaluation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations, Hong Kong, China, pp. 49–54. Association for Computational Linguistics (November 2019). https://www.aclweb.org/anthology/D19-3009

  6. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)

  7. Castilhos, S., Woloszyn, V., Barno, D., Wives, L.K.: Pylinguistics: an open source library for readability assessment of texts written in Portuguese. Revista de Sistemas de InformaĂ§Ă£o da FSMA 18, 36–42 (2016)

    Google Scholar 

  8. Chu, C., Wang, R.: A survey of domain adaptation for neural machine translation. arXiv preprint arXiv:1806.00258 (2018)

  9. Collantes, M., Hipe, M., Sorilla, J.L., Tolentino, L., Samson, B.: Simpatico: a text simplification system for senate and house bills. In: Proceedings of the 11th National Natural Language Processing Research Symposium, pp. 26–32 (2015)

    Google Scholar 

  10. Cooper, M., Shardlow, M.: CombiNMT: an exploration into neural text simplification models. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 5588–5594 (2020)

    Google Scholar 

  11. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  12. Freitag, M., Al-Onaizan, Y.: Fast domain adaptation for neural machine translation. arXiv preprint arXiv:1612.06897 (2016)

  13. Gao, Y., et al.: IBM MASTOR system: multilingual automatic speech-to-speech translator. Technical report, IBM Thomas J Watson Research Center Yorktown Heights, NY (2006)

    Google Scholar 

  14. Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Rodrigues, J., Aluisio, S.: Portuguese word embeddings: evaluating on word analogies and natural language tasks. arXiv preprint arXiv:1708.06025 (2017)

  15. Honnibal, M., Montani, I., Van Landeghem, S., Boyd, A.: spaCy: industrial-strength Natural Language Processing in Python (2020). https://doi.org/10.5281/zenodo.1212303

  16. JosĂ©, M., Finatto, B.: Acessibilidade textual e terminolĂ³gica: promovendo a traduĂ§Ă£o intralinguĂ­stica. Estudos LinguĂ­sticos (SĂ£o Paulo. 1978) 49(1), 72–96 (2020). https://doi.org/10.21165/el.v49i1.2775

  17. Kincaid, J.P., Fishburne, R.P., Jr., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Technical report, Naval Technical Training Command Millington TN Research Branch (1975)

    Google Scholar 

  18. Klein, G., Kim, Y., Deng, Y., Senellart, J., Rush, A.: OpenNMT: open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, Vancouver, Canada, pp. 67–72. Association for Computational Linguistics (July 2017). https://www.aclweb.org/anthology/P17-4012

  19. Krishna, K., Wieting, J., Iyyer, M.: Reformulating unsupervised style transfer as paraphrase generation. arXiv preprint arXiv:2010.05700 (2020)

  20. Liu, B., Lane, I.: Attention-based recurrent neural network models for joint intent detection and slot filling. arXiv preprint arXiv:1609.01454 (2016)

  21. Liu, Y., et al.: Multilingual denoising pre-training for neural machine translation. Trans. Assoc. Comput. Linguist. 8, 726–742 (2020)

    Article  Google Scholar 

  22. Martins, T.B., Ghiraldelo, C.M., das Graças Volpe Nunes, M., de Oliveira Junior, O.N.: Readability formulas applied to textbooks in Brazilian Portuguese. Icmsc-Usp (1996)

    Google Scholar 

  23. Nisioi, S., Štajner, S., Ponzetto, S.P., Dinu, L.P.: Exploring neural text simplification models. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short papers), pp. 85–91 (2017)

    Google Scholar 

  24. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp. 311–318 (2002)

    Google Scholar 

  25. Park, S.H., Kim, B., Kang, C.M., Chung, C.C., Choi, J.W.: Sequence-to-sequence prediction of vehicle trajectory via LSTM encoder-decoder architecture. In: 2018 IEEE Intelligent Vehicles Symposium (IV), pp. 1672–1678 (2018). https://doi.org/10.1109/IVS.2018.8500658

  26. Qiang, J.: Improving neural text simplification model with simplified corpora. arXiv preprint arXiv:1810.04428 (2018)

  27. Rescigno, A.A., Vanmassenhove, E., Monti, J., Way, A.: A case study of natural gender phenomena in translation a comparison of Google Translate, Bing Microsoft Translator and DeepL for English to Italian, French and Spanish. In: Association for Machine Translation in the Americas (AMTA): Workshop on the Impact of Machine Translation, iMpacT 2020, p. 62. Workshop on the Impact of Machine Translation (iMpacT 2020) at Association (2020)

    Google Scholar 

  28. Sikka, P., Singh, M., Pink, A., Mago, V.: A survey on text simplification. arXiv preprint arXiv:2008.08612 (2020)

  29. Souza, F., Nogueira, R., Lotufo, R.: BERTimbau: pretrained BERT models for Brazilian Portuguese. In: 9th Brazilian Conference on Intelligent Systems, BRACIS, Rio Grande do Sul, Brazil, 20–23 October (2020, to appear)

    Google Scholar 

  30. Specia, L.: Translating from complex to simplified sentences. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., de Lima, V.L.S. (eds.) PROPOR 2010. LNCS (LNAI), vol. 6001, pp. 30–39. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12320-7_5

    Chapter  Google Scholar 

  31. Sulem, E., Abend, O., Rappoport, A.: Semantic structural evaluation for text simplification. arXiv preprint arXiv:1810.05022 (2018)

  32. Sulem, E., Abend, O., Rappoport, A.: Simple and effective text simplification using semantic and neural methods. arXiv preprint arXiv:1810.05104 (2018)

  33. Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)

  34. Wang, T., Chen, P., Amaral, K., Qiang, J.: An experimental study of LSTM encoder-decoder model for text simplification. arXiv preprint arXiv:1609.03663 (2016)

  35. Xu, W., Napoles, C., Pavlick, E., Chen, Q., Callison-Burch, C.: Optimizing statistical machine translation for text simplification. Trans. Assoc. Comput. Linguist. 4, 401–415 (2016). https://cocoxu.github.io/publications/tacl2016-smt-simplification.pdf

  36. Xu, W., Napoles, C., Pavlick, E., Chen, Q., Callison-Burch, C.: Optimizing statistical machine translation for text simplification. Trans. Assoc. Comput. Linguist. 4, 401–415 (2016)

    Article  Google Scholar 

  37. Yamada, M.: The impact of Google Neural Machine Translation on post-editing by student translators. J. Specialised Transl. 31, 87–106 (2019)

    Google Scholar 

  38. Yang, Z., Hu, Z., Dyer, C., Xing, E.P., Berg-Kirkpatrick, T.: Unsupervised text style transfer using language models as discriminators. arXiv preprint arXiv:1805.11749 (2018)

  39. Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014)

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Tiago B. de Lima , André C. A. Nascimento or Rafael Ferreira Mello .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

de Lima, T.B., Nascimento, A.C.A., Valença, G., Miranda, P., Mello, R.F., Si, T. (2021). Portuguese Neural Text Simplification Using Machine Translation. In: Britto, A., Valdivia Delgado, K. (eds) Intelligent Systems. BRACIS 2021. Lecture Notes in Computer Science(), vol 13074. Springer, Cham. https://doi.org/10.1007/978-3-030-91699-2_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-91699-2_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-91698-5

  • Online ISBN: 978-3-030-91699-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics