Overview of the CLEF 2022 SimpleText Lab: Automatic Simplification of Scientific Texts

Ermakova, Liana; SanJuan, Eric; Kamps, Jaap; Huet, Stéphane; Ovchinnikova, Irina; Nurbakova, Diana; Araújo, Sílvia; Hannachi, Radia; Mathurin, Elise; Bellot, Patrice

doi:10.1007/978-3-031-13643-6_28

Liana Ermakova¹⁷,
Eric SanJuan¹⁸,
Jaap Kamps¹⁹,
Stéphane Huet¹⁸,
Irina Ovchinnikova²⁰,
Diana Nurbakova²¹,
Sílvia Araújo²²,
Radia Hannachi²³,
Elise Mathurin¹⁷ &
…
Patrice Bellot²⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13390))

Included in the following conference series:

International Conference of the Cross-Language Evaluation Forum for European Languages

Abstract

Although citizens agree on the importance of objective scientific information, yet they tend to avoid scientific literature due to access restrictions, its complex language or their lack of prior background knowledge. Instead, they rely on shallow information on the web or social media often published for commercial or political incentives rather than the correctness and informational value. This paper presents an overview of the CLEF 2022 SimpleText track addressing the challenges of text simplification approaches in the context of promoting scientific information access, by providing appropriate data and benchmarks, and creating a community of IR and NLP researchers working together to resolve one of the greatest challenges of today. The track provides a corpus of scientific literature abstracts and popular science requests. It features three tasks. First, content selection (what is in, or out?) challenges systems to select passages to include in a simplified summary in response to a query. Second, complexity spotting (what is unclear?) given a passage and a query, aims to rank terms/concepts that are required to be explained for understanding this passage (definitions, context, applications). Third, text simplification (rewrite this!) given a query, asks to simplify passages from scientific abstracts while preserving the main content.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Overview of the CLEF 2023 SimpleText Lab: Automatic Simplification of Scientific Texts

Automatic Simplification of Scientific Texts: SimpleText Lab at CLEF-2022

CLEF 2023 SimpleText Track

Notes

References

Text Analysis Conference (TAC) 2014 Biomedical Summarization Track (2014). https://tac.nist.gov/2014/BiomedSumm/
Alva-Manchego, F., Martin, L., Bordes, A., Scarton, C., Sagot, B., Specia, L.: Asset: a dataset for tuning and evaluation of sentence simplification models with multiple rewriting transformations (2020). https://arxiv.org/abs/2005.00481
Anand Deshmukh, A., Sethi, U.: IR-BERT: leveraging BERT for semantic search in background linking for news articles 2007, July 2020. http://adsabs.harvard.edu/abs/2020arXiv200712603A
Bellot, P., Moriceau, V., Mothe, J., SanJuan, E., Tannier, X.: INEX tweet contextualization task: evaluation, results and lesson learned. Inf. Process. Manage. 52(5), 801–819 (2016). https://doi.org/10.1016/j.ipm.2016.03.002
Article Google Scholar
Brown, T.B., et al.: Language models are few-shot learners, July 2020. http://arxiv.org/abs/2005.14165
Chandrasekaran, M.K., Feigenblat, G., Hovy, E., Ravichander, A., Shmueli-Scheuer, M., de Waard, A.: Overview and insights from the shared tasks at scholarly document processing 2020: Cl-scisumm, laysumm and longsumm. In: Proceedings of the First Workshop on Scholarly Document Processing, pp. 214–224 (2020)
Google Scholar
Cohan, A., Goharian, N.: Revisiting summarization evaluation for scientific articles, April 2016. http://arxiv.org/abs/1604.00400
De Clercq, O., Hoste, V., Desmet, B., van Oosten, P., De Cock, M., Macken, L.: Using the crowd for readability prediction. Nat. Lang. Eng. 20(3), 293–325 (2014). http://dx.doi.org/10.1017/S1351324912000344. ISSN 1469–8110
Dong, Y., Li, Z., Rezagholizadeh, M., Cheung, J.C.K.: EditNTS: an neural programmer-interpreter model for sentence simplification through explicit editing. In: Proceedings of the 57th Annual Meeting of the ACL, Florence, Italy, pp. 3393–3402. ACL, July 2019. https://www.aclweb.org/anthology/P19-1331
Ermakova, L., et al.: Overview of SimpleText 2021 - CLEF workshop on text simplification for scientific information access. In: Candan, K.S., et al. (eds.) CLEF 2021. LNCS, vol. 12880, pp. 432–449. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85251-1_27
Chapter Google Scholar
Ermakova, L., et al.: Text simplification for scientific information access: CLEF 2021 SimpleText workshop. In: Proceedings of Advances in Information Retrieval - 43nd European Conference on IR Research, ECIR 2021, Lucca, Italy, 28 March–1 April 2021 (2021)
Google Scholar
Ermakova, L., et al.: Automatic simplification of scientific texts: SimpleText lab at CLEF-2022. In: Hagen, M., et al. (eds.) Advances in Information Retrieval, vol. 13186, pp. 364–373. Springer, Cham (2022). ISBN 978-3-030-99738-0 978-3-030-99739-7
Google Scholar
Ermakova, L., Goeuriot, L., Mothe, J., Mulhem, P., Nie, J.-Y., SanJuan, E.: CLEF 2017 microblog cultural contextualization lab overview. In: Jones, G.J.F., et al. (eds.) CLEF 2017. LNCS, vol. 10456, pp. 304–314. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65813-1_27
Chapter Google Scholar
Ermakova, L.N., Nurbakova, D., Ovchinnikova, I.: Covid or not Covid? Topic shift in information cascades on Twitter. In: Association for Computational Linguistics (ed.) 3rd International Workshop on Rumours and Deception in Social Media (RDSM) Collocated with COLING 2020, pp. 32–37. Proceedings of the 3rd International Workshop on Rumours and Deception in Social Media (RDSM), Barcelona, Spain, December 2020. https://hal.archives-ouvertes.fr/hal-03066857
Faggioli, G., Ferro, N., Hanbury, A., Potthast, M. (eds.): Proc. of the Working Notes of CLEF 2022: Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings (2022)
Google Scholar
Futrell, R., et al.: The natural stories corpus: a reading-time corpus of English texts containing rare syntactic constructions. Lang. Resour. Eval. 55(1), 63–77 (2021). https://doi.org/10.1007/s10579-020-09503-7. ISSN 1574-0218
Article Google Scholar
Gala, N., François, T., Fairon, C.: Towards a French lexicon with difficulty measures: NLP helping to bridge the gap between traditional dictionaries and specialized lexicons. In: eLex-Electronic Lexicography (2013)
Google Scholar
Grabar, N., Farce, E., Sparrow, L.: Study of readability of health documents with eye-tracking approaches. In: 1st Workshop on Automatic Text Adaptation (ATA) (2018)
Google Scholar
Hoffart, J., et al.: Robust disambiguation of named entities in text. In: Proceedings of EMNLP 2011, pp. 782–792 (2011)
Google Scholar
Lieber, O., Sharir, O., Lentz, B., Shoham, Y.: Jurassic-1: technical details and evaluation, p. 9 (2021)
Google Scholar
Likert, R.: A technique for the measurement of attitudes. Arch. Psychol. 22(140), 55 (1932)
Google Scholar
Lommel, A., Görög, A., Melby, A., Uszkoreit, H., Burchardt, A., Popović, M.: Harmonised metric. Qual. Transl. 21(QT21) (2015). https://www.qt21.eu/wp-content/uploads/2015/11/QT21-D3-1.pdf
Maddela, M., Alva-Manchego, F., Xu, W.: Controllable text simplification with explicit paraphrasing, April 2021. http://arxiv.org/abs/2010.11004
Maddela, M., Xu, W.: A word-complexity lexicon and a neural readability ranking model for lexical simplification. In: Proceedings of EMNLP 2018, Brussels, Belgium, pp. 3749–3760. ACL (2018). https://www.aclweb.org/anthology/D18-1410
Nakov, P., et al.: Automated fact-checking for assisting human fact-checkers, May 2021. http://arxiv.org/abs/2103.07769
Narayan, S., Gardent, C., Cohen, S.B., Shimorina, A.: Split and rephrase. In: Proceedings of EMNLP 2017, Copenhagen, Denmark, pp. 606–616. ACL, September 2017. https://www.aclweb.org/anthology/D17-1064
Osgood, C.E.: Semantic differential technique in the comparative study of cultures. Am. Anthropol. 66(3), 171–200 (1964). https://onlinelibrary.wiley.com/doi/abs/10.1525/aa.1964.66.3.02a00880. ISSN 1548-1433
Ovchinnikova, I.: Impact of new technologies on the types of translation errors. In: CEUR Workshop Proceedings (2020)
Google Scholar
Ovchinnikova, I., Nurbakova, D., Ermakova, L.: What science-related topics need to be popularized? A comparative study. In: Faggioli, G., Ferro, N., Joly, A., Maistro, M., Piroi, F. (eds.) Proceedings of the Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum, Bucharest, Romania, 21–24 September 2021, vol. 2936, pp. 2242–2255. CEUR Workshop Proceedings (2021). http://ceur-ws.org/Vol-2936/paper-203.pdf
O’Reilly, T., Wang, Z., Sabatini, J.: How much knowledge is too little? When a lack of knowledge becomes a barrier to comprehension. Psychol. Sci., July 2019. https://journals.sagepub.com/doi/10.1177/0956797619862276
Pradeep, R., Ma, X., Nogueira, R., Lin, J.: Scientific claim verification with VerT5erini, October 2020. http://arxiv.org/abs/2010.11930
Sulem, E., Abend, O., Rappoport, A.: Simple and effective text simplification using semantic and neural methods. In: Proceedings of the 56th Annual Meeting of the ACL (Volume 1: Long Papers), Melbourne, Australia, pp. 162–173. ACL, July 2018. https://www.aclweb.org/anthology/P18-1016
Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: ArnetMiner: extraction and mining of academic social networks. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2008, Las Vegas, Nevada, USA, p. 990. ACM Press (2008). http://dl.acm.org/citation.cfm?doid=1401890.1402008. ISBN 978-1-60558-193-4
Wadden, D., et al.: Fact or fiction: verifying scientific claims, October 2020. http://arxiv.org/abs/2004.14974
Wang, W., Li, P., Zheng, H.T.: Consistency and coherency enhanced story generation, October 2020. http://arxiv.org/abs/2010.08822
Wubben, S., van den Bosch, A., Krahmer, E.: Sentence simplification by monolingual machine translation. In: Proceedings of the 50th Annual Meeting of the ACL (Volume 1: Long Papers), pp. 1015–1024 (2012)
Google Scholar
Xu, W., Callison-Burch, C., Napoles, C.: Problems in current text simplification research: new data can help. Trans. ACL 3, 283–297 (2015). https://www.mitpressjournals.org/doi/abs/10.1162/tacl_a_00139. ISSN 2307-387X
Xue, L., et al.: mT5: a massively multilingual pre-trained text-to-text transformer. In: Proceedings of the 2021 Conference of the North American Chapter of the ACL: Human Language Technologies, pp. 483–498. ACL, June 2021. https://aclanthology.org/2021.naacl-main.41
Yang, L., Zhang, M., Li, C., Bendersky, M., Najork, M.: Beyond 512 tokens: siamese multi-depth transformer-based hierarchical encoder for long-form document matching, April 2020. arXiv:2004.12297
Zhao, S., Meng, R., He, D., Saptono, A., Parmanto, B.: Integrating transformer and paraphrase rules for sentence simplification. In: Proceedings of EMNLP 2018, Brussels, Belgium, pp. 3164–3173. ACL, October 2018. https://www.aclweb.org/anthology/D18-1355
Zhong, Y., Jiang, C., Xu, W., Li, J.J.: Discourse level factors for sentence deletion in text simplification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, pp. 9709–9716, April 2020. https://ojs.aaai.org/index.php/AAAI/article/view/6520. ISSN 2374-3468

Download references

Acknowledgment

We like to acknowledge the support of the Lab Chairs of CLEF 2022, Allan Hanbury and Martin Potthast, for their help and patience.Special thanks to the University Translation Office of the Université de Bretagne Occidentale, and to Nicolas Poinsu and Ludivine Grégoire for their major impact in the train data construction and Léa Talec-Bernard and Julien Boccou for their help in evaluation of participants’ runs. We thank Josiane Mothe for reviewing papers. We also thank Alain Kerhervé, and the MaDICS (https://www.madics.fr/ateliers/simpletext/ research group.

Author information

Authors and Affiliations

Université de Bretagne Occidentale, HCTI, Brest, France
Liana Ermakova & Elise Mathurin
Avignon Université, LIA, Avignon, France
Eric SanJuan & Stéphane Huet
University of Amsterdam, Amsterdam, The Netherlands
Jaap Kamps
ManPower Language Solution, Tel Aviv, Israel
Irina Ovchinnikova
University of Lyon, INSA Lyon, CNRS, LIRIS, Lyon, France
Diana Nurbakova
University of Minho, Braga, Portugal
Sílvia Araújo
Université de Bretagne Sud, HCTI, Morbihan, France
Radia Hannachi
Aix Marseille Univ, Université de Toulon, CNRS, LIS, Toulon, France
Patrice Bellot

Authors

Liana Ermakova
View author publications
You can also search for this author in PubMed Google Scholar
Eric SanJuan
View author publications
You can also search for this author in PubMed Google Scholar
Jaap Kamps
View author publications
You can also search for this author in PubMed Google Scholar
Stéphane Huet
View author publications
You can also search for this author in PubMed Google Scholar
Irina Ovchinnikova
View author publications
You can also search for this author in PubMed Google Scholar
Diana Nurbakova
View author publications
You can also search for this author in PubMed Google Scholar
Sílvia Araújo
View author publications
You can also search for this author in PubMed Google Scholar
Radia Hannachi
View author publications
You can also search for this author in PubMed Google Scholar
Elise Mathurin
View author publications
You can also search for this author in PubMed Google Scholar
Patrice Bellot
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liana Ermakova .

Editor information

Editors and Affiliations

University of Bologna, Forlì, Italy
Alberto Barrón-Cedeño
University of Padua, Padova, Italy
Giovanni Da San Martino
University of Bologna, Bologna, Italy
Mirko Degli Esposti
Instituto di Scienza e Tecnologie dell' Informazione “Alessandro Faedo”, Pisa, Italy
Fabrizio Sebastiani
University of Glasgow, Glasgow, UK
Craig Macdonald
University Milano-Bicocca, Milan, Italy
Gabriella Pasi
TU Wien, Vienna, Austria
Allan Hanbury
Leipzig University, Leipzig, Germany
Martin Potthast
University of Padua, Padova, Italy
Guglielmo Faggioli
University of Padua, Padova, Italy
Nicola Ferro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ermakova, L. et al. (2022). Overview of the CLEF 2022 SimpleText Lab: Automatic Simplification of Scientific Texts. In: Barrón-Cedeño, A., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2022. Lecture Notes in Computer Science, vol 13390. Springer, Cham. https://doi.org/10.1007/978-3-031-13643-6_28

Download citation

DOI: https://doi.org/10.1007/978-3-031-13643-6_28
Published: 25 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-13642-9
Online ISBN: 978-3-031-13643-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Overview of the CLEF 2022 SimpleText Lab: Automatic Simplification of Scientific Texts

Abstract

Access this chapter

Similar content being viewed by others

Overview of the CLEF 2023 SimpleText Lab: Automatic Simplification of Scientific Texts

Automatic Simplification of Scientific Texts: SimpleText Lab at CLEF-2022

CLEF 2023 SimpleText Track

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Overview of the CLEF 2022 SimpleText Lab: Automatic Simplification of Scientific Texts

Abstract

Access this chapter

Similar content being viewed by others

Overview of the CLEF 2023 SimpleText Lab: Automatic Simplification of Scientific Texts

Automatic Simplification of Scientific Texts: SimpleText Lab at CLEF-2022

CLEF 2023 SimpleText Track

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation