Cross-Lingual Annotation Projection for Argument Mining in Portuguese

Sousa, Afonso; Leite, Bernardo; Rocha, Gil; Lopes Cardoso, Henrique

doi:10.1007/978-3-030-86230-5_59

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12981))

Included in the following conference series:

EPIA Conference on Artificial Intelligence

1919 Accesses

Abstract

While Argument Mining has seen increasing success in monolingual settings, especially for the English language, other less-resourced languages are still lagging behind. In this paper, we build a Portuguese projected version of the Persuasive Essays corpus and evaluate it both intrinsically (through back-projection) and extrinsically (in a sequence tagging task). To build the corpus, we project the token-level annotations into a new Portuguese version using translations and respective alignments. Intrinsic evaluation entails rebuilding the English corpus using back alignment and back projection from the Portuguese version, comparing against the original English annotations. For extrinsic evaluation, we assess and compare the performance of machine learning models on several language variants of the corpus (including the Portuguese one), following both in-language/projection training and direct transfer. Our evaluation highlights the quality of the generated corpus. Experimental results show the effectiveness of the projection approach, while providing competitive baselines for the Portuguese version of the corpus. The corpus and code are available (https://github.com/AfonsoSalgadoSousa/argumentation_mining_pt).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/UKPLab/acl2017-neural_end2end_am/tree/master/data/conll/Paragraph_Level.
2.
https://huggingface.co/Helsinki-NLP/opus-mt-en-ROMANCE.
3.
https://opus.nlpl.eu/.
4.
https://github.com/UKPLab/coling2018-xling_argument_mining/tree/master/data/AllData/MT/PE.
5.
https://huggingface.co/transformers/pretrained_models.html, model id “bert-base-multilingual-cased”.
6.
https://github.com/XuezheMax/NeuroNLP2.
7.
https://github.com/facebookresearch/MUSE.

References

Ajjour, Y., Chen, W.F., Kiesel, J., Wachsmuth, H., Stein, B.: Unit segmentation of argumentative texts. In: Proceedings of the 4th Workshop on Argument Mining, pp. 118–128. ACL (September 2017)
Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations (Conference Track Proceedings), ICLR 2015, San Diego, CA, USA, 7–9 May 2015 (2015)
Google Scholar
Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. In: Proceedings of the 58th Annual Meeting of the ACL, pp. 8440–8451. ACL (July 2020)
Google Scholar
Conneau, A., Lample, G., Ranzato, M., Denoyer, L., Jégou, H.: Word translation without parallel data. arXiv preprint arXiv:1710.04087 (2017)
Das, D., Petrov, S.: Unsupervised part-of-speech tagging with bilingual graph-based projections. In: Proceedings of the 49th Annual Meeting of the ACL: Human Language Technologies, pp. 600–609. ACL (June 2011)
Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the ACL: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. ACL (2019)
Google Scholar
Dyer, C., Chahuneau, V., Smith, N.A.: A simple, fast, and effective reparameterization of IBM model 2. In: Proceedings of the 2013 Conference of the North American Chapter of the ACL: Human Language Technologies, pp. 644–648. ACL (June 2013)
Google Scholar
Eckle-Kohler, J., Kluge, R., Gurevych, I.: On the role of discourse markers for discriminating claims and premises in argumentative discourse. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 2236–2242. ACL (September 2015)
Google Scholar
Eger, S., Daxenberger, J., Gurevych, I.: Neural end-to-end learning for computational argumentation mining. In: Proceedings of the 55th Annual Meeting of the ACL (Volume 1: Long Papers), pp. 11–22. ACL (July 2017)
Google Scholar
Eger, S., Daxenberger, J., Stab, C., Gurevych, I.: Cross-lingual argumentation mining: machine translation (and a bit of projection) is all you need! In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 831–844. ACL (August 2018)
Google Scholar
Garg, S., Peitz, S., Nallasamy, U., Paulik, M.: Jointly learning to align and translate with transformer models. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, 3–7 November 2019, pp. 4452–4461. ACL (2019)
Google Scholar
Habernal, I., Gurevych, I.: Argumentation mining in user-generated web discourse. Comput. Linguist. 43(1), 125–179 (2017)
Article MathSciNet Google Scholar
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)
Jalili Sabet, M., Dufter, P., Yvon, F., Schütze, H.: SimAlign: high quality word alignments without parallel training data using static and contextualized embeddings. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pp. 1627–1643. ACL (November 2020)
Google Scholar
Li, M., Geng, S., Gao, Y., Peng, S., Liu, H., Wang, H.: Crowdsourcing argumentation structures in Chinese hotel reviews. In: 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 87–92 (2017)
Google Scholar
Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In: Proceedings of the 54th Annual Meeting of the ACL (Volume 1: Long Papers), pp. 1064–1074. ACL (August 2016)
Google Scholar
Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany (Volume 1: Long Papers), pp. 1064–1074. Association for Computational Linguistics (August 2016). https://doi.org/10.18653/v1/P16-1101. https://www.aclweb.org/anthology/P16-1101
McDonald, R., Petrov, S., Hall, K.: Multi-source transfer of delexicalized dependency parsers. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 62–72. ACL (July 2011)
Google Scholar
Nguyen, H., Litman, D.: Context-aware argumentative relation mining. In: Proceedings of the 54th Annual Meeting of the ACL (Volume 1: Long Papers), pp. 1127–1137. ACL (August 2016)
Google Scholar
Och, F.J., Ney, H.: Improved statistical alignment models. In: Proceedings of the 38th Annual Meeting of the ACL, pp. 440–447. ACL (October 2000)
Google Scholar
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)
Article Google Scholar
Palau, R.M., Moens, M.F.: Argumentation mining: the detection, classification and structure of arguments in text. In: Proceedings of the 12th International Conference on Artificial Intelligence and Law, pp. 98–107. Association for Computing Machinery (2009)
Google Scholar
Peldszus, A., Stede, M.: From argument diagrams to argumentation mining in texts: a survey. Int. J. Cogn. Inform. Nat. Intell. 7(1), 1–31 (2013)
Article Google Scholar
Pikuliak, M., Šimko, M., Bieliková, M.: Cross-lingual learning for text processing: a survey. Expert Syst. Appl. 165, 113765 (2021)
Article Google Scholar
Plank, B., Søgaard, A., Goldberg, Y.: Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. In: Proceedings of the 54th Annual Meeting of the ACL (Volume 2: Short Papers), pp. 412–418. ACL (August 2016)
Google Scholar
Rocha, G., Lopes Cardoso, H.: Towards a relation-based argument extraction model for argumentation mining. In: Camelin, N., Estève, Y., Martín-Vide, C. (eds.) SLSP 2017. LNCS (LNAI), vol. 10583, pp. 94–105. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68456-7_8
Chapter Google Scholar
Rocha, G., Stab, C., Lopes Cardoso, H., Gurevych, I.: Cross-lingual argumentative relation identification: from English to Portuguese. In: Proceedings of the 5th Workshop on Argument Mining, pp. 144–154. ACL (November 2018)
Google Scholar
Stab, C., Gurevych, I.: Parsing argumentation structures in persuasive essays. Comput. Linguist. 43(3), 619–659 (2017)
Article MathSciNet Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4–9 December 2017, Long Beach, CA, USA. pp. 5998–6008 (2017)
Google Scholar
Wang, H., Huang, Z., Dou, Y., Hong, Y.: Argumentation mining on essays at multi scales. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 5480–5493. International Committee on Computational Linguistics (December 2020)
Google Scholar
Wolf, T., et al..: Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45. Association for Computational Linguistics (October 2020). https://www.aclweb.org/anthology/2020.emnlp-demos.6
Yang, S., Wang, Y., Chu, X.: A survey of deep learning techniques for neural machine translation. arXiv preprint arXiv:2002.07526 (2020)
Yang, Z., Salakhutdinov, R., Cohen, W.W.: Transfer learning for sequence tagging with hierarchical recurrent networks. In: 5th International Conference on Learning Representations (Conference Track Proceedings), ICLR 2017, Toulon, France, 24–26 April 2017. OpenReview.net (2017)
Google Scholar
Yarowsky, D., Ngai, G., Wicentowski, R.: Inducing multilingual text analysis tools via robust projection across aligned corpora. In: Proceedings of the 1st International Conference on Human Language Technology Research (2001)
Google Scholar
Zhang, Y., Gaddy, D., Barzilay, R., Jaakkola, T.: Ten pairs to tag - multilingual POS tagging via coarse mapping between embeddings. In: Proceedings of the 2016 Conference of the North American Chapter of the ACL: Human Language Technologies, pp. 1307–1317. ACL (June 2016)
Google Scholar

Download references

Acknowledgment

This research is supported by LIACC (FCT/UID/CEC/0027/2020) and by project DARGMINTS (POCI/01/0145/FEDER/031460), funded by Fundação para a Ciência e a Tecnologia (FCT). Gil Rocha is supported by a PhD studentship (with reference SFRH/BD/140125/2018) from FCT.

Author information

Authors and Affiliations

Faculdade de Engenharia, Universidade do Porto, Porto, Portugal
Afonso Sousa, Bernardo Leite, Gil Rocha & Henrique Lopes Cardoso
Laboratório de Inteligência Artificial e Ciência de Computadores (LIACC), Porto, Portugal
Bernardo Leite, Gil Rocha & Henrique Lopes Cardoso

Authors

Afonso Sousa
View author publications
You can also search for this author in PubMed Google Scholar
Bernardo Leite
View author publications
You can also search for this author in PubMed Google Scholar
Gil Rocha
View author publications
You can also search for this author in PubMed Google Scholar
Henrique Lopes Cardoso
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Henrique Lopes Cardoso .

Editor information

Editors and Affiliations

ISEP/GECAD, Polytechnic Institute of Porto, Porto, Portugal
Goreti Marreiros
IST/INESC-ID, University of Lisbon, Porto Salvo, Portugal
Francisco S. Melo
DETI/IEETA, University of Aveiro, Aveiro, Portugal
Nuno Lau
FEUP/LIACC, University of Porto, Porto, Portugal
Henrique Lopes Cardoso
FEUP/LIACC, University of Porto, Porto, Portugal
Luís Paulo Reis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sousa, A., Leite, B., Rocha, G., Lopes Cardoso, H. (2021). Cross-Lingual Annotation Projection for Argument Mining in Portuguese. In: Marreiros, G., Melo, F.S., Lau, N., Lopes Cardoso, H., Reis, L.P. (eds) Progress in Artificial Intelligence. EPIA 2021. Lecture Notes in Computer Science(), vol 12981. Springer, Cham. https://doi.org/10.1007/978-3-030-86230-5_59

Download citation

DOI: https://doi.org/10.1007/978-3-030-86230-5_59
Published: 03 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86229-9
Online ISBN: 978-3-030-86230-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics