Abstract
While there are high-quality software frameworks for information retrieval experimentation, they do not explicitly support cross-language information retrieval (CLIR). To fill this gap, we have created Patapsco, a Python CLIR framework. This framework specifically addresses the complexity that comes with running experiments in multiple languages. Patapsco is designed to be extensible to many language pairs, to be scalable to large document collections, and to support reproducible experiments driven by a configuration file. We include Patapsco results on standard CLIR collections using multiple settings.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Patapsco is available at https://github.com/hltcoe/patapsco. A video demonstration is at https://www.youtube.com/watch?v=jYj1GAbABBc.
- 2.
References
Darwish, K., Oard, D.W.: Probabilistic structured query methods. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 338–344 (2003)
Honnibal, M., Montani, I., Landeghem, S.V., Boyd, A.: SPACY: industrial-strength natural language processing in python (2020). https://doi.org/10.5281/zenodo.1212303
Lin, J., Ma, X., Lin, S.C., Yang, J.H., Pradeep, R., Nogueira, R.: Pyserini: a python toolkit for reproducible information retrieval research with sparse and dense representations. In: Proceedings of the 44th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021) (2021)
MacAvaney, S.: Opennir: a complete neural ad-hoc ranking pipeline. In: Proceedings of the 13th International Conference on Web Search and Data Mining, pp. 845–848 (2020)
Macdonald, C., Tonellotto, N.: Declarative experimentation in information retrieval using pyterrier. In: Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval, pp. 161–168 (2020)
Peters, C., Braschler, M.: European research letter: cross-language system evaluation: the CLEF campaigns. J. Am. Soc. Inf. Sci. Technol. 52(12), 1067–1072 (2001)
Qi, P., Zhang, Y., Zhang, Y., Bolton, J., Manning, C.D.: Stanza: a python natural language processing toolkit for many human languages. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations (2020)
Van Gysel, C., de Rijke, M.: Pytrec_eval: an extremely fast python interface to trec_eval. In: SIGIR. ACM (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Costello, C., Yang, E., Lawrie, D., Mayfield, J. (2022). Patapasco: A Python Framework for Cross-Language Information Retrieval Experiments. In: Hagen, M., et al. Advances in Information Retrieval. ECIR 2022. Lecture Notes in Computer Science, vol 13186. Springer, Cham. https://doi.org/10.1007/978-3-030-99739-7_33
Download citation
DOI: https://doi.org/10.1007/978-3-030-99739-7_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99738-0
Online ISBN: 978-3-030-99739-7
eBook Packages: Computer ScienceComputer Science (R0)