Automatic Simplification of Scientific Texts: SimpleText Lab at CLEF-2022

Ermakova, Liana; Bellot, Patrice; Kamps, Jaap; Nurbakova, Diana; Ovchinnikova, Irina; SanJuan, Eric; Mathurin, Elise; Araújo, Sílvia; Hannachi, Radia; Huet, Stéphane; Poinsu, Nicolas

doi:10.1007/978-3-030-99739-7_46

Liana Ermakova ORCID: orcid.org/0000-0002-7598-7474¹⁵,
Patrice Bellot¹⁶,
Jaap Kamps¹⁷,
Diana Nurbakova¹⁸,
Irina Ovchinnikova¹⁹,
Eric SanJuan²⁰,
Elise Mathurin¹⁵,
Sílvia Araújo²¹,
Radia Hannachi²²,
Stéphane Huet²⁰ &
…
Nicolas Poinsu¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13186))

Included in the following conference series:

European Conference on Information Retrieval

2654 Accesses
5 Citations
1 Altmetric

Abstract

The Web and social media have become the main source of information for citizens, with the risk that users rely on shallow information in sources prioritizing commercial or political incentives rather than the correctness and informational value. Non-experts tend to avoid scientific literature due to its complex language or their lack of prior background knowledge. Text simplification promises to remove some of these barriers. The CLEF 2022 SimpleText track addresses the challenges of text simplification approaches in the context of promoting scientific information access, by providing appropriate data and benchmarks, and creating a community of NLP and IR researchers working together to resolve one of the greatest challenges of today. The track will use a corpus of scientific literature abstracts and popular science requests. It features three tasks. First, content selection (what is in, or out?) challenges systems to select passages to include in a simplified summary in response to a query. Second, complexity spotting (what is unclear?) given a passage and a query, aims to rank terms/concepts that are required to be explained for understanding this passage (definitions, context, applications). Third, text simplification (rewrite this!) given a query, asks to simplify passages from scientific abstracts while preserving the main content.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Overview of the CLEF 2022 SimpleText Lab: Automatic Simplification of Scientific Texts

Overview of the CLEF 2023 SimpleText Lab: Automatic Simplification of Scientific Texts

CLEF 2023 SimpleText Track

Notes

References

AMiner. https://www.aminer.org/citation
Alva-Manchego, F., Martin, L., Bordes, A., Scarton, C., Sagot, B., Specia, L.: Asset: a dataset for tuning and evaluation of sentence simplification models with multiple rewriting transformations. arXiv preprint arXiv:2005.00481 (2020)
Bellot, P., Moriceau, V., Mothe, J., SanJuan, E., Tannier, X.: INEX tweetcontextualization task: evaluation, results and lesson learned. Inf. Process.Manage. 52(5), 801–819 (2016). https://doi.org/10.1016/j.ipm.2016.03.002
Biran, O., Brody, S., Elhadad, N.: Putting it simply: a context-aware approach to lexical simplification. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 496–501. Association for Computational Linguistics, Portland, Oregon, USA, June 2011. https://www.aclweb.org/anthology/P11-2087
Chen, P., Rochford, J., Kennedy, D.N., Djamasbi, S., Fay, P., Scott, W.: Automatic text simplification for people with intellectual disabilities. In: Artificial Intelligence Science and Technology, pp. 725–731. WORLD SCIENTIFIC, November 2016. https://doi.org/10.1142/9789813206823_0091, https://www.worldscientific.com/doi/abs/10.1142/9789813206823_0091
Orphée, D.: Using the crowd for readability prediction. Nat. Lang. Eng. 20(3), 293–325 (2014), http://dx.doi.org/10.1017/S1351324912000344
Dong, Y., Li, Z., Rezagholizadeh, M., Cheung, J.C.K.: EditNTS: an neural programmer-interpreter model for sentence simplification through explicit editing. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 3393–3402. Association for Computational Linguistics, Florence, Italy, Jul 2019. https://doi.org/10.18653/v1/P19-1331, https://www.aclweb.org/anthology/P19-1331
Ermakova, L., et al.: Overview of simpletext 2021 - CLEF workshop on text simplification for scientific information access. In: Candan, K.S., et al (eds.) Experimental IR Meets Multilinguality, Multimodality, and Interaction, pp. 432–449. LNCS, Springer International Publishing, Cham (2021). https://doi.org/10.1007/978-3-030-85251-1_27
Ermakova, L., et al.: Text simplification for scientific information access. In: Hiemstra, D., Moens, M.-F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) ECIR 2021. LNCS, vol. 12657, pp. 583–592. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72240-1_68
Chapter Google Scholar
Ermakova, L., Bordignon, F., Turenne, N., Noel, M.: Is the abstract a mere teaser? evaluating generosity of article abstracts in the environmental sciences. Front. Res. Metr. Anal. 3 (2018). https://doi.org/10.3389/frma.2018.00016, https://www.frontiersin.org/articles/10.3389/frma.2018.00016/full
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. Language, Speech, and Communication, MIT Press, Cambridge, MA (1998)
Google Scholar
Fontelo, P., Gavino, A., Sarmiento, R.F.: Comparing data accuracy betweenstructured abstracts and full-text journal articles: implications in theiruse for informing clinical decisions. Evidence-Based Med.18(6), 207–11 (2013). https://doi.org/10.1136/eb-2013-101272,http://www.researchgate.net/publication/240308203_Comparing_data_accuracy_between_structured_abstracts_and_full-text_journal_articles_implications_in_their_use_for_informing_clinical_decisions
François, T., Fairon, C.: Les apports du tal à la lisibilité du français langue étrangère. Trait. Autom. des Langues 54, 171–202 (2013)
Google Scholar
Gala, N., François, T., Fairon, C.: Towards a french lexicon with difficulty measures: NLP helping to bridge the gap between traditional dictionaries and specialized lexicons. In: eLex-Electronic Lexicography (2013)
Google Scholar
Glavaš, G., Štajner, S.: Simplifying lexical simplification: do we need simplified corpora? In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 63–68. Association for Computational Linguistics, Beijing, China, July 2015. https://doi.org/10.3115/v1/P15-2011, https://www.aclweb.org/anthology/P15-2011
Grabar, N., Farce, E., Sparrow, L.: Study of readability of health documents with eye-tracking approaches. In: 1st Workshop on Automatic Text Adaptation (ATA) (2018)
Google Scholar
Grabar, N., Hamon, T.: A large rated lexicon with French medical words. In: LREC (Language Resources and Evaluation Conference) 2016 (2016)
Google Scholar
Jiang, C., Maddela, M., Lan, W., Zhong, Y., Xu, W.: Neural CRF Model for Sentence Alignment in Text Simplification. arXiv:2005.02324 [cs] (June 2020)
Koptient, A., Grabar, N.: Fine-grained text simplification in French: steps towards a better grammaticality. In: ISHIMR Proceedings of the 18th International Symposium on Health Information Management Research. Kalmar, Sweden, September 2020. https://doi.org/10.15626/ishimr.2020.xxx, https://hal.archives-ouvertes.fr/hal-03095247
Koptient, A., Grabar, N.: Rated lexicon for the simplification of medical texts. In: The Fifth International Conference on Informatics and Assistive Technologies for Health-Care, Medical Support and Wellbeing HEALTHINFO 2020. Porto, Portugal, October 2020. https://hal.archives-ouvertes.fr/hal-03095275
Koptient, A., Grabar, N.: Typologie de transformations dans la simplification de textes. In: Congrès mondial de la linguistique française. Montpellier, France, July 2020. https://hal.archives-ouvertes.fr/hal-03095235
Ladyman, J., Lambert, J., Wiesner, K.: What is a complex system? EuropeanJ. Philos. Sci. 3(1), 33–67 (2013).https://doi.org/10.1007/s13194-012-0056-8
Lieber, O., Sharir, O., Lentz, B., Shoham, Y.: Jurassic-1: Technical Details and Evaluation, p. 9 (2021)
Google Scholar
Liu, Y., Lapata, M.: Text Summarization with Pretrained Encoders. arXiv:1908.08345 [cs] (2019)
Maddela, M., Alva-Manchego, F., Xu, W.: Controllable Text Simplification with Explicit Paraphrasing. arXiv:2010.11004 [cs], April 2021
Maddela, M., Xu, W.: A word-complexity lexicon and a neural readability ranking model for lexical simplification. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3749–3760. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/D18-1410, https://www.aclweb.org/anthology/D18-1410
Martin, L., et al.: CamemBERT: a tasty French language model. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7203–7219. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.acl-main.645, https://www.aclweb.org/anthology/2020.acl-main.645
Ovchinnikova, I., Nurbakova, D., Ermakova, L.: What science-related topics need to be popularized? a comparative study. In: Faggioli, G., Ferro, N., Joly, A., Maistro, M., Piroi, F. (eds.) Proceedings of the Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum, Bucharest, Romania, September 21st - to - 24th, 2021. CEUR Workshop Proceedings, vol. 2936, pp. 2242–2255. CEUR-WS.org (2021). http://ceur-ws.org/Vol-2936/paper-203.pdf
Paetzold, G., Specia, L.: Lexical simplification with neural ranking. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: vol. 2, Short Papers, pp. 34–40. Association for Computational Linguistics, Valencia, Spain, April 2017. https://www.aclweb.org/anthology/E17-2006
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language Models are Unsupervised Multitask Learners, p. 24 (2019)
Google Scholar
Robertson, S.: Understanding inverse document frequency: on theoreticalarguments for IDF. J. Doc. 60(5), 503–520 (2004). https://doi.org/10.1108/00220410410560582, publisher: Emerald GroupPublishing Limited
Specia, L., Jauhar, S.K., Mihalcea, R.: SemEval-2012 task 1: English lexical simplification. In: *SEM 2012: The First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), pp. 347–355. Association for Computational Linguistics, Montréal, Canada (2012). https://www.aclweb.org/anthology/S12-1046
Wang, T., Chen, P., Rochford, J., Qiang, J.: Text simplification using neural machine translation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30, no. 1, March 2016. https://ojs.aaai.org/index.php/AAAI/article/view/9933, number: 1
Wiesner, K., Ladyman, J.: Measuring complexity. arXiv:1909.13243 [nlin], September 2020
Xu, W., Napoles, C., Pavlick, E., Chen, Q., Callison-Burch, C.: Optimizing statistical machine translation for text simplification. Trans. Assoc. Comput. Linguist. 4, 401–415. MIT Press (2016)
Google Scholar
Xue, L., et al.: mT5: a massively multilingual pre-trained text-to-text transformer. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 483–498. Association for Computational Linguistics, Online, June 2021. https://doi.org/10.18653/v1/2021.naacl-main.41, https://aclanthology.org/2021.naacl-main.41
Yaneva, V., Temnikova, I., Mitkov, R.: Accessible texts for autism: an eye-tracking study. In: Proceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility, pp. 49–57 (2015)
Google Scholar
Yatskar, M., Pang, B., Danescu-Niculescu-Mizil, C., Lee, L.: For the sake of simplicity: unsupervised extraction of lexical simplifications from Wikipedia. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 365–368. Association for Computational Linguistics, Los Angeles, California, June 2010. https://www.aclweb.org/anthology/N10-1056
Zhong, Y., Jiang, C., Xu, W., Li, J.J.: Discourse level factors forsentence deletion in text simplification. In: Proceedings of the AAAIConference on Artificial Intelligence, vol. 34, no. 05, pp. 9709–9716, April2020. https://doi.org/10.1609/aaai.v34i05.6520,https://ojs.aaai.org/index.php/AAAI/article/view/6520, number: 05
Zhu, Z., Bernhard, D., Gurevych, I.: A monolingual tree-based translation model for sentence simplification. In: Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pp. 1353–1361. Coling 2010 Organizing Committee, Beijing, China, August 2010. https://www.aclweb.org/anthology/C10-1152
Štajner, S., Nisioi, S.: A detailed evaluation of neural sequence-to-sequence models for in-domain and cross-domain text simplification. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan, May 2018. https://www.aclweb.org/anthology/L18-1479

Download references

Acknowledgments

We thank Alain Kerhervé, University Translation Office, master students in Translation from the Université de Bretagne Occidentale, and the MaDICS research group.

Author information

Authors and Affiliations

Université de Bretagne Occidentale, HCTI, EA 4249, Bretagne, France
Liana Ermakova, Elise Mathurin & Nicolas Poinsu
Aix Marseille Univ, Université de Toulon, CNRS, LIS, Marseille, France
Patrice Bellot
University of Amsterdam, Amsterdam, The Netherlands
Jaap Kamps
Institut National des Sciences Appliquées de Lyon, LIRIS UMR 5205 CNRS, Lyon, France
Diana Nurbakova
Sechenov University, Moscow, Russia
Irina Ovchinnikova
Avignon Université, LIA, Avignon, France
Eric SanJuan & Stéphane Huet
University of Minho, Braga, Portugal
Sílvia Araújo
Université de Bretagne Sud, HCTI, EA 4249, Bretagne, France
Radia Hannachi

Authors

Liana Ermakova
View author publications
You can also search for this author in PubMed Google Scholar
Patrice Bellot
View author publications
You can also search for this author in PubMed Google Scholar
Jaap Kamps
View author publications
You can also search for this author in PubMed Google Scholar
Diana Nurbakova
View author publications
You can also search for this author in PubMed Google Scholar
Irina Ovchinnikova
View author publications
You can also search for this author in PubMed Google Scholar
Eric SanJuan
View author publications
You can also search for this author in PubMed Google Scholar
Elise Mathurin
View author publications
You can also search for this author in PubMed Google Scholar
Sílvia Araújo
View author publications
You can also search for this author in PubMed Google Scholar
Radia Hannachi
View author publications
You can also search for this author in PubMed Google Scholar
Stéphane Huet
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Poinsu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liana Ermakova .

Editor information

Editors and Affiliations

Martin Luther University Halle-Wittenberg, Halle, Germany
Matthias Hagen
Leiden University, Leiden, The Netherlands
Suzan Verberne
University of Glasgow, Glasgow, UK
Craig Macdonald
University of Duisburg-Essen, Essen, Germany
Christin Seifert
University of Stavanger, Stavanger, Norway
Krisztian Balog
Norwegian University of Science and Technology, Trondheim, Norway
Kjetil Nørvåg
University of Stavanger, Stavanger, Norway
Vinay Setty

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ermakova, L. et al. (2022). Automatic Simplification of Scientific Texts: SimpleText Lab at CLEF-2022. In: Hagen, M., et al. Advances in Information Retrieval. ECIR 2022. Lecture Notes in Computer Science, vol 13186. Springer, Cham. https://doi.org/10.1007/978-3-030-99739-7_46

Download citation

DOI: https://doi.org/10.1007/978-3-030-99739-7_46
Published: 05 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99738-0
Online ISBN: 978-3-030-99739-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatic Simplification of Scientific Texts: SimpleText Lab at CLEF-2022

Abstract

Access this chapter

Similar content being viewed by others

Overview of the CLEF 2022 SimpleText Lab: Automatic Simplification of Scientific Texts

Overview of the CLEF 2023 SimpleText Lab: Automatic Simplification of Scientific Texts

CLEF 2023 SimpleText Track

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Automatic Simplification of Scientific Texts: SimpleText Lab at CLEF-2022

Abstract

Access this chapter

Similar content being viewed by others

Overview of the CLEF 2022 SimpleText Lab: Automatic Simplification of Scientific Texts

Overview of the CLEF 2023 SimpleText Lab: Automatic Simplification of Scientific Texts

CLEF 2023 SimpleText Track

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation