Overview of Touché 2023: Argument and Causal Retrieval

Bondarenko, Alexander; Fröbe, Maik; Kiesel, Johannes; Schlatt, Ferdinand; Barriere, Valentin; Ravenet, Brian; Hemamou, Léo; Luck, Simon; Reimer, Jan Heinrich; Stein, Benno; Potthast, Martin; Hagen, Matthias

doi:10.1007/978-3-031-42448-9_31

Alexander Bondarenko¹⁷,
Maik Fröbe¹⁷,
Johannes Kiesel¹⁸,
Ferdinand Schlatt¹⁷,
Valentin Barriere¹⁹,
Brian Ravenet²⁰,
Léo Hemamou²¹,
Simon Luck²²,
Jan Heinrich Reimer¹⁷,
Benno Stein¹⁸,
Martin Potthast²³ &
…
Matthias Hagen¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14163))

Included in the following conference series:

International Conference of the Cross-Language Evaluation Forum for European Languages

640 Accesses
1 Citations

Abstract

This paper is a condensed overview of Touché: the fourth edition of the lab on argument and causal retrieval that was held at CLEF 2023. With the goal to create a collaborative platform for research on computational argumentation and causality, we organized four shared tasks: (a) argument retrieval for controversial topics, where participants retrieve web documents that contain high-quality argumentation and detect the argument stance, (b) causal retrieval, where participants retrieve documents that contain causal statements from a generic web crawl and detect the causal stance, (c) image retrieval for arguments, where participants retrieve from a focused web crawl images showing support or opposition to some stance, and (d) multilingual multi-target stance classification, where participants detect the stance of comments on proposals from an online multilingual participatory democracy platform.

L. Hemamou—Independent view, not influenced by Sanofi R &D France.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The term ‘touché’ is commonly “used to acknowledge a hit in fencing or the success or appropriateness of an argument, an accusation, or a witty point.” [https://merriam-webster.com/dictionary/touche]
2.
https://futureu.europa.eu
3.
https://trec.nist.gov/
4.
https://touche.webis.de/
5.
https://tira.io
6.
https://github.com/chatnoir-eu/chatnoir-api
7.
https://github.com/chatnoir-eu/chatnoir-pyterrier
8.
Pre-trained model: https://huggingface.co/facebook/bart-large-cnn; minimum length: 64; maximum length: 256.
9.
Pre-trained model: https://huggingface.co/google/flan-t5-base; maximum generated tokens: 3; the prompt is given in Appendix A.
10.
Pre-trained model: https://huggingface.co/facebook/bart-large-cnn; minimum length: 64; maximum length: 256.
11.
Pre-trained model: https://huggingface.co/google/flan-t5-base; maximum generated tokens: 3; the prompt is given in Appendix A.
12.
https://webis.de/data.html#touche-corpora
13.
As one of our suggested use case for image retrieval for arguments is getting a quick overview, we excluded overly large images.
14.
https://github.com/tesseract-ocr/tesseract
15.
To sharpen our focus on images, this year we tried to exclude images that are actually screenshots of text documents.
16.
https://www.phash.org/
17.
Archived using https://github.com/webis-de/scriptor
18.
https://cloud.google.com/vision
19.
Since no stance model convincingly outperformed naive baselines in their evaluation, we use the simple both-sides baseline that assigns each image to both stances.
20.
https://huggingface.co/facebook/bart-large-mnli
21.
https://chat.openai.com/chat
22.
German, English, Greek, French, Italian, and Hungarian.
23.
https://futureu.europa.eu
24.
From https://futureu.europa.eu/en/processes/GreenDeal/f/1/proposals/83
25.
roberta-base.
26.
xlm-roberta-large.
27.
bert-base-uncased.
28.
https://pypi.org/project/deep-translator/#google-translate-1

References

Ajzen, I.: The social psychology of decision making. In: Social psychology: Handbook of basic principles, pp. 297–325. Guilford Press (1996)
Google Scholar
Avila, J.P., Rodrigo, A., Centeno, R.: Silver surfer team at Touché task 4: Testing data augmentation and label propagation for multilingual stance detection. In: Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings. CEUR-WS.org (2023)
Google Scholar
Bar-Haim, R., Kantor, Y., Venezian, E., Katz, Y., Slonim, N.: Project debater APIs: Decomposing the AI grand challenge. In: Proceedings of EMNLP 2021, pp. 267–274. ACL (2021). https://doi.org/10.18653/v1/2021.emnlp-demo.31
Barriere, V., Balahur, A.: Multilingual multi-target stance recognition in online public consultations. Mathematics 11(9), 2161 (2023)
Article Google Scholar
Barriere, V., Balahur, A., Ravenet, B.: Debating Europe: A multilingual multi-target stance classification dataset of online debates. In: Proceedings of PoliticalNLP 2022, pp. 16–21. ELRA (2022). https://aclanthology.org/2022.politicalnlp-1.3
Barriere, V., Jacquet, G., Hemamou, L.: CoFE: A new dataset of intra-multilingual multi-target stance classification from an online European participatory democracy platform. In: Proceedings of AACL-IJCNLP 2022 (2022)
Google Scholar
Bevendorff, J., Stein, B., Hagen, M., Potthast, M.: Elastic ChatNoir: Search engine for the ClueWeb and the common crawl. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 820–824. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76941-7_83
Chapter Google Scholar
Bondarenko, A., et al.: Overview of Touché 2023: Argument and causal retrieval. In: Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings. CEUR-WS.org (2023)
Google Scholar
Bondarenko, A., et al.: Overview of Touché 2021: Argument retrieval. In: Proceedings of CLEF 2021. CEUR Workshop Proceedings, vol. 2936, pp. 2258–2284. CEUR-WS.org (2021). https://ceur-ws.org/Vol-2936/paper-205.pdf
Bondarenko, A., et al.: CausalQA: A benchmark for causal question answering. In: Proceedings of COLING 2022, pp. 3296–3308. ICCL (2022). https://aclanthology.org/2022.coling-1.291
Carnot, M.L., et al.: On stance detection in image retrieval for argumentation. In: Proceedings of SIGIR 2023. ACM (2023). https://doi.org/10.1145/3539618.3591917
Chernodub, A., et al.: TARGER: Neural argument mining at your fingertips. In: Proceedings of ACL 2019, pp. 195–200. ACL (2019). https://doi.org/10.18653/v1/p19-3031
Chung, H.W., et al.: Scaling instruction-finetuned language models. arXiv (2022). https://doi.org/10.48550/arXiv.2210.11416
Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. In: Proceedings of ACL 2020, pp. 8440–8451. ACL (2020). https://doi.org/10.18653/v1/2020.acl-main.747
Cormack, G.V., Smucker, M.D., Clarke, C.L.A.: Efficient and effective spam filtering and re-ranking for large web datasets. Inf. Retrieval J. 14(5), 441–465 (2011). https://doi.org/10.1007/s10791-011-9162-z
Article Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT 2019, pp. 4171–4186. ACL (2019). https://doi.org/10.18653/v1/n19-1423
Elagina, D., Heizmann, B.A., Koch, M., Lahmann, G., Ortlepp, C.: Neville longbottom at Touché 2023: Image retrieval for arguments using ChatGPT, CLIP and IBM debater. In: Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings. CEUR-WS.org (2023)
Google Scholar
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Book MATH Google Scholar
Fröbe, M., et al.: Continuous integration for reproducible shared tasks with TIRA.io. In: Kamps, J., et al. (eds.) ECIR 2023. LNCS, vol. 13982, pp. 236–241. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-28241-6_20
Chapter Google Scholar
Gaden, A., Reinhold, B., Zeit-Altpeter, L., Rausch, N.: Evidence retrieval for causal questions using query expansion and reranking. In: Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings. CEUR-WS.org (2023)
Google Scholar
Heindorf, S., Scholten, Y., Wachsmuth, H., Ngonga Ngomo, A.C., Potthast, M.: CauseNet: Towards a causality graph extracted from the web. In: 29th ACM International Conference on Information and Knowledge Management (CIKM 2020), pp. 3023–3030. ACM (2020). https://doi.org/10.1145/3340531.3412763
Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Proceedings of NeurIPS 2017, pp. 3146–3154. NeurIPS (2017). https://proceedings.neurips.cc/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf
Kiesel, J., Potthast, M., Stein, B.: Dataset Touché22-image-retrieval-for-arguments (2022). https://doi.org/10.5281/zenodo.6786948
Kiesel, J., Potthast, M., Stein, B.: Dataset Touché23-image-retrieval-for-arguments (2023). https://doi.org/10.5281/zenodo.7497994
Lewis, M., et al.: BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of ACL 2020, pp. 7871–7880. ACL (2020). https://doi.org/10.18653/v1/2020.acl-main.703
Liu, Y., et al.: RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004). https://doi.org/10.1023/B:VISI.0000029664.99615.94
Article Google Scholar
MacAvaney, S., Yates, A., Feldman, S., Downey, D., Cohan, A., Goharian, N.: Simplified data wrangling with ir_datasets. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2021, pp. 2429–2436. ACM (2021). https://doi.org/10.1145/3404835.3463254
Macdonald, C., Tonellotto, N., MacAvaney, S., Ounis, I.: PyTerrier: Declarative experimentation in Python from BM25 to dense retrieval. In: Proceedings of CIKM 2021, pp. 4526–4533. ACM (2021). https://doi.org/10.1145/3459637.3482013
Möbius, M., Enderling, M., Bachinger, S.: Jean-Luc Picard at Touché 2023: Comparing image generation, stance detection and feature matching for image retrieval for arguments. In: Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings. CEUR-WS.org (2023)
Google Scholar
Overwijk, A., Xiong, C., Callan, J.: ClueWeb22: 10 billion web documents with rich information. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2022), pp. 3360–3362. ACM (2022). https://doi.org/10.1145/3477495.3536321
Palotti, J.R.M., Scells, H., Zuccon, G.: TrecTools: An open-source Python library for information retrieval practitioners involved in TREC-like campaigns. In: Proceedings of SIGIR 2019, pp. 1325–1328. ACM (2019). https://doi.org/10.1145/3331184.3331399
Plenz, M., Buchmüller, R., Bondarenko, A.: Argument quality prediction for ranking documents. In: Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings. CEUR-WS.org (2023)
Google Scholar
Qi, P., Zhang, Y., Zhang, Y., Bolton, J., Manning, C.D.: Stanza: A python natural language processing toolkit for many human languages. In: Proceedings of ACL 2020, pp. 101–108. ACL (2020). https://doi.org/10.18653/v1/2020.acl-demos.14
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: Proceedings of ICML 2021. Proceedings of Machine Learning Research, vol. 139, pp. 8748–8763. PMLR (2021). https://proceedings.mlr.press/v139/radford21a.html
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC-3. In: Proceedings of TREC 1994. NIST Special Publication, vol. 500–225, pp. 109–126. NIST (1994)
Google Scholar
Robertson, S.E., Zaragoza, H., Taylor, M.J.: Simple BM25 extension to multiple weighted fields. In: Proceedings of CIKM 2004, pp. 42–49. ACM (2004). https://doi.org/10.1145/1031171.1031181
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of CVPR 2022, pp. 10674–10685. IEEE (2022). https://doi.org/10.1109/CVPR52688.2022.01042
Schaefer, K.: Queen of swords at Touché 2023: Intra-multilingual multi-target stance classification using BERT. In: Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings. CEUR-WS.org (2023)
Google Scholar
Su, H., et al.: One embedder, any task: Instruction-finetuned text embeddings. arXiv (2022). 10.48550/arXiv.2212.09741
Google Scholar
Sugiyama, A., Yoshinaga, N.: Data augmentation using back-translation for context-aware neural machine translation. In: Proceedings of DiscoMT@EMNLP 2019, pp. 35–44. ACL (2019). https://doi.org/10.18653/v1/D19-6504
Thakur, N., Reimers, N., Rücklé, A., Srivastava, A., Gurevych, I.: BEIR: A heterogeneous benchmark for zero-shot evaluation of information retrieval models. In: Proceedings of NeurIPS 2021. NeurIPS (2021). https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/65b9eea6e1cc6bb9f0cd2a47751a186f-Abstract-round2.html
Vamvas, J., Sennrich, R.: X-stance: A multilingual multi-target dataset for stance detection. In: Proceedings of SwissText/KONVENS 2020. CEUR-WS.org (2020). https://ceur-ws.org/Vol-2624/paper9.pdf
Wachsmuth, H., et al.: Computational argumentation quality assessment in natural language. In: Proceedings of EACL 2017, pp. 176–187. ACL (2017). https://doi.org/10.18653/v1/e17-1017
Xie, X., et al.: Grid-based evaluation metrics for web image search. In: Proceedings of WWW 2019, pp. 2103–2114. ACM (2019). https://doi.org/10.1145/3308558.3313514
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Proceedings of NIPS 2003, pp. 321–328. MIT Press (2003). https://proceedings.neurips.cc/paper/2003/hash/87682805257e619d49b8e0dfdc14affa-Abstract.html

Download references

Acknowledgment

This work has been partially supported by the Deutsche Forschungsgemeinschaft (DFG) in the project “ACQuA 2.0: Answering Comparative Questions with Arguments” (project 376430233) as part of the priority program “RATIO: Robust Argumentation Machines” (SPP 1999). V. Barriere’s work was funded by the National Center for Artificial Intelligence CENIA FB210017, Basal ANID. This work has been partially supported by the OpenWebSearch.eu project (funded by the EU; GA 101070014).

Author information

Authors and Affiliations

Friedrich-Schiller-Universität Jena, Jena, Germany
Alexander Bondarenko, Maik Fröbe, Ferdinand Schlatt, Jan Heinrich Reimer & Matthias Hagen
Bauhaus-Universität Weimar, Weimar, Germany
Johannes Kiesel & Benno Stein
Centro Nacional de Inteligencia Artificial (CENIA), Santiago, Chile
Valentin Barriere
CNRS-LISN, Université Paris-Saclay, Gif-sur-Yvette, France
Brian Ravenet
Sanofi R&D France, Chilly-Mazarin, France
Léo Hemamou
Alma Mater Studiorum – Università di Bologna, Bologna, Italy
Simon Luck
Leipzig University and ScaDS.AI, Leipzig, Germany
Martin Potthast

Authors

Alexander Bondarenko
View author publications
You can also search for this author in PubMed Google Scholar
Maik Fröbe
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Kiesel
View author publications
You can also search for this author in PubMed Google Scholar
Ferdinand Schlatt
View author publications
You can also search for this author in PubMed Google Scholar
Valentin Barriere
View author publications
You can also search for this author in PubMed Google Scholar
Brian Ravenet
View author publications
You can also search for this author in PubMed Google Scholar
Léo Hemamou
View author publications
You can also search for this author in PubMed Google Scholar
Simon Luck
View author publications
You can also search for this author in PubMed Google Scholar
Jan Heinrich Reimer
View author publications
You can also search for this author in PubMed Google Scholar
Benno Stein
View author publications
You can also search for this author in PubMed Google Scholar
Martin Potthast
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Hagen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Heinrich Reimer .

Editor information

Editors and Affiliations

Democritus University of Thrace, Xanthi, Greece
Avi Arampatzis
University of Amsterdam, Amsterdam, The Netherlands
Evangelos Kanoulas
CERTH-ITI, Thessaloniki, Greece
Theodora Tsikrika
CERTH-ITI, Thessaloniki, Greece
Stefanos Vrochidis
Utrecht University, Utrecht, The Netherlands
Anastasia Giachanou
Elsevier, Amsterdam, The Netherlands
Dan Li
University of Amsterdam, Amsterdam, The Netherlands
Mohammad Aliannejadi
University of Lausanne, Lausanne, Switzerland
Michalis Vlachos
University of Padua, Padova, Italy
Guglielmo Faggioli
University of Padua, Padova, Italy
Nicola Ferro

Appendices

A Zero-Shot Prompts

The zero-shot prompts used for the stance prediction baselines are given in Listing 1 (for Task 1, see Sect. 3) and in Listing 2 (for Task 2, see Sect. 4).

Table 9. Relevance results of all runs submitted to Task 1: Argument Retrieval for Controversial Questions. Reported are the mean nDCG@10 and the 95% confidence intervals. The baseline Puss in Boots is shown in bold.

Full size table

Table 10. Quality results of all runs submitted to Task 1: Argument Retrieval for Controversial Questions. Reported are the mean nDCG@10 and the 95% confidence intervals. The baseline Puss in Boots is shown in bold.

Full size table

B Full Evaluation Results of Touché 2023: Argument and Causal Retrieval

Table 11. Relevance results of all runs submitted to Task 2: Evidence Retrieval for Causal Questions. Reported are the mean nDCG@5 and the 95% confidence intervals. The baseline Puss in Boots is shown in bold.

Full size table

Table 12. On-topic relevance results of all runs submitted to Task 3: Image Retrieval for Argumentation. Reported are the mean precision@10 and the 95% confidence intervals. The baseline Minsc is shown in bold.

Full size table

Table 13. Argumentativeness results of all runs submitted to Task 3: Image Retrieval for Argumentation. Reported are the mean precision@10 and the 95% confidence intervals. The baseline Minsc is shown in bold.

Full size table

Table 14. Stance relevance results of all runs submitted to Task 3: Image Retrieval for Argumentation. Reported are the mean precision@10 and the 95% confidence intervals. The baseline Minsc is shown in bold.

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bondarenko, A. et al. (2023). Overview of Touché 2023: Argument and Causal Retrieval. In: Arampatzis, A., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2023. Lecture Notes in Computer Science, vol 14163. Springer, Cham. https://doi.org/10.1007/978-3-031-42448-9_31

Download citation

DOI: https://doi.org/10.1007/978-3-031-42448-9_31
Published: 11 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42447-2
Online ISBN: 978-3-031-42448-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Overview of Touché 2023: Argument and Causal Retrieval

Abstract

Access this chapter

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Zero-Shot Prompts

B Full Evaluation Results of Touché 2023: Argument and Causal Retrieval

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation