Skip to main content

Overview of Touché 2023: Argument and Causal Retrieval

  • Conference paper
  • First Online:
Experimental IR Meets Multilinguality, Multimodality, and Interaction (CLEF 2023)

Abstract

This paper is a condensed overview of Touché: the fourth edition of the lab on argument and causal retrieval that was held at CLEF 2023. With the goal to create a collaborative platform for research on computational argumentation and causality, we organized four shared tasks: (a) argument retrieval for controversial topics, where participants retrieve web documents that contain high-quality argumentation and detect the argument stance, (b) causal retrieval, where participants retrieve documents that contain causal statements from a generic web crawl and detect the causal stance, (c) image retrieval for arguments, where participants retrieve from a focused web crawl images showing support or opposition to some stance, and (d) multilingual multi-target stance classification, where participants detect the stance of comments on proposals from an online multilingual participatory democracy platform.

L. Hemamou—Independent view, not influenced by Sanofi R &D France.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The term ‘touché’ is commonly “used to acknowledge a hit in fencing or the success or appropriateness of an argument, an accusation, or a witty point.” [https://merriam-webster.com/dictionary/touche]

  2. 2.

    https://futureu.europa.eu

  3. 3.

    https://trec.nist.gov/

  4. 4.

    https://touche.webis.de/

  5. 5.

    https://tira.io

  6. 6.

    https://github.com/chatnoir-eu/chatnoir-api

  7. 7.

    https://github.com/chatnoir-eu/chatnoir-pyterrier

  8. 8.

    Pre-trained model: https://huggingface.co/facebook/bart-large-cnn; minimum length: 64; maximum length: 256.

  9. 9.

    Pre-trained model: https://huggingface.co/google/flan-t5-base; maximum generated tokens: 3; the prompt is given in Appendix A.

  10. 10.

    Pre-trained model: https://huggingface.co/facebook/bart-large-cnn; minimum length: 64; maximum length: 256.

  11. 11.

    Pre-trained model: https://huggingface.co/google/flan-t5-base; maximum generated tokens: 3; the prompt is given in Appendix A.

  12. 12.

    https://webis.de/data.html#touche-corpora

  13. 13.

    As one of our suggested use case for image retrieval for arguments is getting a quick overview, we excluded overly large images.

  14. 14.

    https://github.com/tesseract-ocr/tesseract

  15. 15.

    To sharpen our focus on images, this year we tried to exclude images that are actually screenshots of text documents.

  16. 16.

    https://www.phash.org/

  17. 17.

    Archived using https://github.com/webis-de/scriptor

  18. 18.

    https://cloud.google.com/vision

  19. 19.

    Since no stance model convincingly outperformed naive baselines in their evaluation, we use the simple both-sides baseline that assigns each image to both stances.

  20. 20.

    https://huggingface.co/facebook/bart-large-mnli

  21. 21.

    https://chat.openai.com/chat

  22. 22.

    German, English, Greek, French, Italian, and Hungarian.

  23. 23.

    https://futureu.europa.eu

  24. 24.

    From https://futureu.europa.eu/en/processes/GreenDeal/f/1/proposals/83

  25. 25.

    roberta-base.

  26. 26.

    xlm-roberta-large.

  27. 27.

    bert-base-uncased.

  28. 28.

    https://pypi.org/project/deep-translator/#google-translate-1

References

  1. Ajzen, I.: The social psychology of decision making. In: Social psychology: Handbook of basic principles, pp. 297–325. Guilford Press (1996)

    Google Scholar 

  2. Avila, J.P., Rodrigo, A., Centeno, R.: Silver surfer team at Touché task 4: Testing data augmentation and label propagation for multilingual stance detection. In: Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings. CEUR-WS.org (2023)

    Google Scholar 

  3. Bar-Haim, R., Kantor, Y., Venezian, E., Katz, Y., Slonim, N.: Project debater APIs: Decomposing the AI grand challenge. In: Proceedings of EMNLP 2021, pp. 267–274. ACL (2021). https://doi.org/10.18653/v1/2021.emnlp-demo.31

  4. Barriere, V., Balahur, A.: Multilingual multi-target stance recognition in online public consultations. Mathematics 11(9), 2161 (2023)

    Article  Google Scholar 

  5. Barriere, V., Balahur, A., Ravenet, B.: Debating Europe: A multilingual multi-target stance classification dataset of online debates. In: Proceedings of PoliticalNLP 2022, pp. 16–21. ELRA (2022). https://aclanthology.org/2022.politicalnlp-1.3

  6. Barriere, V., Jacquet, G., Hemamou, L.: CoFE: A new dataset of intra-multilingual multi-target stance classification from an online European participatory democracy platform. In: Proceedings of AACL-IJCNLP 2022 (2022)

    Google Scholar 

  7. Bevendorff, J., Stein, B., Hagen, M., Potthast, M.: Elastic ChatNoir: Search engine for the ClueWeb and the common crawl. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 820–824. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76941-7_83

    Chapter  Google Scholar 

  8. Bondarenko, A., et al.: Overview of Touché 2023: Argument and causal retrieval. In: Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings. CEUR-WS.org (2023)

    Google Scholar 

  9. Bondarenko, A., et al.: Overview of Touché 2021: Argument retrieval. In: Proceedings of CLEF 2021. CEUR Workshop Proceedings, vol. 2936, pp. 2258–2284. CEUR-WS.org (2021). https://ceur-ws.org/Vol-2936/paper-205.pdf

  10. Bondarenko, A., et al.: CausalQA: A benchmark for causal question answering. In: Proceedings of COLING 2022, pp. 3296–3308. ICCL (2022). https://aclanthology.org/2022.coling-1.291

  11. Carnot, M.L., et al.: On stance detection in image retrieval for argumentation. In: Proceedings of SIGIR 2023. ACM (2023). https://doi.org/10.1145/3539618.3591917

  12. Chernodub, A., et al.: TARGER: Neural argument mining at your fingertips. In: Proceedings of ACL 2019, pp. 195–200. ACL (2019). https://doi.org/10.18653/v1/p19-3031

  13. Chung, H.W., et al.: Scaling instruction-finetuned language models. arXiv (2022). https://doi.org/10.48550/arXiv.2210.11416

  14. Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. In: Proceedings of ACL 2020, pp. 8440–8451. ACL (2020). https://doi.org/10.18653/v1/2020.acl-main.747

  15. Cormack, G.V., Smucker, M.D., Clarke, C.L.A.: Efficient and effective spam filtering and re-ranking for large web datasets. Inf. Retrieval J. 14(5), 441–465 (2011). https://doi.org/10.1007/s10791-011-9162-z

    Article  Google Scholar 

  16. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT 2019, pp. 4171–4186. ACL (2019). https://doi.org/10.18653/v1/n19-1423

  17. Elagina, D., Heizmann, B.A., Koch, M., Lahmann, G., Ortlepp, C.: Neville longbottom at Touché 2023: Image retrieval for arguments using ChatGPT, CLIP and IBM debater. In: Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings. CEUR-WS.org (2023)

    Google Scholar 

  18. Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    Book  MATH  Google Scholar 

  19. Fröbe, M., et al.: Continuous integration for reproducible shared tasks with TIRA.io. In: Kamps, J., et al. (eds.) ECIR 2023. LNCS, vol. 13982, pp. 236–241. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-28241-6_20

    Chapter  Google Scholar 

  20. Gaden, A., Reinhold, B., Zeit-Altpeter, L., Rausch, N.: Evidence retrieval for causal questions using query expansion and reranking. In: Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings. CEUR-WS.org (2023)

    Google Scholar 

  21. Heindorf, S., Scholten, Y., Wachsmuth, H., Ngonga Ngomo, A.C., Potthast, M.: CauseNet: Towards a causality graph extracted from the web. In: 29th ACM International Conference on Information and Knowledge Management (CIKM 2020), pp. 3023–3030. ACM (2020). https://doi.org/10.1145/3340531.3412763

  22. Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Proceedings of NeurIPS 2017, pp. 3146–3154. NeurIPS (2017). https://proceedings.neurips.cc/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf

  23. Kiesel, J., Potthast, M., Stein, B.: Dataset Touché22-image-retrieval-for-arguments (2022). https://doi.org/10.5281/zenodo.6786948

  24. Kiesel, J., Potthast, M., Stein, B.: Dataset Touché23-image-retrieval-for-arguments (2023). https://doi.org/10.5281/zenodo.7497994

  25. Lewis, M., et al.: BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of ACL 2020, pp. 7871–7880. ACL (2020). https://doi.org/10.18653/v1/2020.acl-main.703

  26. Liu, Y., et al.: RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  27. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004). https://doi.org/10.1023/B:VISI.0000029664.99615.94

    Article  Google Scholar 

  28. MacAvaney, S., Yates, A., Feldman, S., Downey, D., Cohan, A., Goharian, N.: Simplified data wrangling with ir_datasets. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2021, pp. 2429–2436. ACM (2021). https://doi.org/10.1145/3404835.3463254

  29. Macdonald, C., Tonellotto, N., MacAvaney, S., Ounis, I.: PyTerrier: Declarative experimentation in Python from BM25 to dense retrieval. In: Proceedings of CIKM 2021, pp. 4526–4533. ACM (2021). https://doi.org/10.1145/3459637.3482013

  30. Möbius, M., Enderling, M., Bachinger, S.: Jean-Luc Picard at Touché 2023: Comparing image generation, stance detection and feature matching for image retrieval for arguments. In: Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings. CEUR-WS.org (2023)

    Google Scholar 

  31. Overwijk, A., Xiong, C., Callan, J.: ClueWeb22: 10 billion web documents with rich information. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2022), pp. 3360–3362. ACM (2022). https://doi.org/10.1145/3477495.3536321

  32. Palotti, J.R.M., Scells, H., Zuccon, G.: TrecTools: An open-source Python library for information retrieval practitioners involved in TREC-like campaigns. In: Proceedings of SIGIR 2019, pp. 1325–1328. ACM (2019). https://doi.org/10.1145/3331184.3331399

  33. Plenz, M., Buchmüller, R., Bondarenko, A.: Argument quality prediction for ranking documents. In: Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings. CEUR-WS.org (2023)

    Google Scholar 

  34. Qi, P., Zhang, Y., Zhang, Y., Bolton, J., Manning, C.D.: Stanza: A python natural language processing toolkit for many human languages. In: Proceedings of ACL 2020, pp. 101–108. ACL (2020). https://doi.org/10.18653/v1/2020.acl-demos.14

  35. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: Proceedings of ICML 2021. Proceedings of Machine Learning Research, vol. 139, pp. 8748–8763. PMLR (2021). https://proceedings.mlr.press/v139/radford21a.html

  36. Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC-3. In: Proceedings of TREC 1994. NIST Special Publication, vol. 500–225, pp. 109–126. NIST (1994)

    Google Scholar 

  37. Robertson, S.E., Zaragoza, H., Taylor, M.J.: Simple BM25 extension to multiple weighted fields. In: Proceedings of CIKM 2004, pp. 42–49. ACM (2004). https://doi.org/10.1145/1031171.1031181

  38. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of CVPR 2022, pp. 10674–10685. IEEE (2022). https://doi.org/10.1109/CVPR52688.2022.01042

  39. Schaefer, K.: Queen of swords at Touché 2023: Intra-multilingual multi-target stance classification using BERT. In: Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings. CEUR-WS.org (2023)

    Google Scholar 

  40. Su, H., et al.: One embedder, any task: Instruction-finetuned text embeddings. arXiv (2022). 10.48550/arXiv.2212.09741

    Google Scholar 

  41. Sugiyama, A., Yoshinaga, N.: Data augmentation using back-translation for context-aware neural machine translation. In: Proceedings of DiscoMT@EMNLP 2019, pp. 35–44. ACL (2019). https://doi.org/10.18653/v1/D19-6504

  42. Thakur, N., Reimers, N., Rücklé, A., Srivastava, A., Gurevych, I.: BEIR: A heterogeneous benchmark for zero-shot evaluation of information retrieval models. In: Proceedings of NeurIPS 2021. NeurIPS (2021). https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/65b9eea6e1cc6bb9f0cd2a47751a186f-Abstract-round2.html

  43. Vamvas, J., Sennrich, R.: X-stance: A multilingual multi-target dataset for stance detection. In: Proceedings of SwissText/KONVENS 2020. CEUR-WS.org (2020). https://ceur-ws.org/Vol-2624/paper9.pdf

  44. Wachsmuth, H., et al.: Computational argumentation quality assessment in natural language. In: Proceedings of EACL 2017, pp. 176–187. ACL (2017). https://doi.org/10.18653/v1/e17-1017

  45. Xie, X., et al.: Grid-based evaluation metrics for web image search. In: Proceedings of WWW 2019, pp. 2103–2114. ACM (2019). https://doi.org/10.1145/3308558.3313514

  46. Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Proceedings of NIPS 2003, pp. 321–328. MIT Press (2003). https://proceedings.neurips.cc/paper/2003/hash/87682805257e619d49b8e0dfdc14affa-Abstract.html

Download references

Acknowledgment

This work has been partially supported by the Deutsche Forschungsgemeinschaft (DFG) in the project “ACQuA 2.0: Answering Comparative Questions with Arguments” (project 376430233) as part of the priority program “RATIO: Robust Argumentation Machines” (SPP 1999). V. Barriere’s work was funded by the National Center for Artificial Intelligence CENIA FB210017, Basal ANID. This work has been partially supported by the OpenWebSearch.eu project (funded by the EU; GA 101070014).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan Heinrich Reimer .

Editor information

Editors and Affiliations

Appendices

A Zero-Shot Prompts

The zero-shot prompts used for the stance prediction baselines are given in Listing 1 (for Task 1, see Sect. 3) and in Listing 2 (for Task 2, see Sect. 4).

figure a
figure b
Table 9. Relevance results of all runs submitted to Task 1: Argument Retrieval for Controversial Questions. Reported are the mean nDCG@10 and the 95% confidence intervals. The baseline Puss in Boots is shown in bold.
Table 10. Quality results of all runs submitted to Task 1: Argument Retrieval for Controversial Questions. Reported are the mean nDCG@10 and the 95% confidence intervals. The baseline Puss in Boots is shown in bold.

B Full Evaluation Results of Touché 2023: Argument and Causal Retrieval

Table 11. Relevance results of all runs submitted to Task 2: Evidence Retrieval for Causal Questions. Reported are the mean nDCG@5 and the 95% confidence intervals. The baseline Puss in Boots is shown in bold.
Table 12. On-topic relevance results of all runs submitted to Task 3: Image Retrieval for Argumentation. Reported are the mean precision@10 and the 95% confidence intervals. The baseline Minsc is shown in bold.
Table 13. Argumentativeness results of all runs submitted to Task 3: Image Retrieval for Argumentation. Reported are the mean precision@10 and the 95% confidence intervals. The baseline Minsc is shown in bold.
Table 14. Stance relevance results of all runs submitted to Task 3: Image Retrieval for Argumentation. Reported are the mean precision@10 and the 95% confidence intervals. The baseline Minsc is shown in bold.

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bondarenko, A. et al. (2023). Overview of Touché 2023: Argument and Causal Retrieval. In: Arampatzis, A., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2023. Lecture Notes in Computer Science, vol 14163. Springer, Cham. https://doi.org/10.1007/978-3-031-42448-9_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-42448-9_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-42447-2

  • Online ISBN: 978-3-031-42448-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics