Skip to main content

Overview of SimpleText 2021 - CLEF Workshop on Text Simplification for Scientific Information Access

  • Conference paper
  • First Online:
Experimental IR Meets Multilinguality, Multimodality, and Interaction (CLEF 2021)

Abstract

Information retrieval has moved from traditional document retrieval in which search is an isolated activity, to modern information access where search and the use of the information are fully integrated. But non-experts tend to avoid authoritative primary sources such as scientific literature due to their complex language, internal vernacular, or lacking prior background knowledge. Text simplification approaches can remove some of these barriers, thereby avoiding that users rely on shallow information in sources prioritizing commercial or political incentives rather than the correctness and informational value. The CLEF 2021 SimpleText track addresses the opportunities and challenges of text simplification approaches to improve scientific information access head-on. We aim to provide appropriate data and benchmarks, starting with pilot tasks in 2021, and create a community of NLP and IR researchers working together to resolve one of the greatest challenges of today.

Everything should be made as simple as possible, but no simpler

Albert Einstein

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.dimensions.ai.

  2. 2.

    https://tac.nist.gov/2014/BiomedSumm.

  3. 3.

    https://www.thefreedictionary.com/background+knowledge.

  4. 4.

    https://ornlcda.github.io/SDProc/sharedtasks.html#laysumm.

  5. 5.

    https://www.aminer.org/citation.

  6. 6.

    https://stellargraph.readthedocs.io/.

  7. 7.

    https://www.univ-brest.fr/btu.

  8. 8.

    https://guacamole.univ-avignon.fr/nextcloud/index.php/apps/files/?dir=/simpleText/.

  9. 9.

    https://simpletext-madics.github.io/2021/clef/en/.

  10. 10.

    https://hal.archives-ouvertes.fr/.

  11. 11.

    https://arxiv.org/.

  12. 12.

    https://istex.fr/.

  13. 13.

    https://unpaywall.org/products/api.

  14. 14.

    https://sciencex.com/.

  15. 15.

    https://www.reddit.com/r/explainlikeimfive/.

References

  1. Aharoni, R., Goldberg, Y.: Split and rephrase: better evaluation and a stronger baseline. arXiv:1805.01035 [cs], May 2018. http://arxiv.org/abs/1805.01035

  2. Anand Deshmukh, A., Sethi, U.: IR-BERT: leveraging bert for semantic search in background linking for news articles. arXiv e-prints 2007. arXiv:2007.12603, July 2020. http://adsabs.harvard.edu/abs/2020arXiv200712603A

  3. Augenstein, I., Das, M., Riedel, S., Vikraman, L., McCallum, A.: Semeval 2017 task 10: scienceie-extracting keyphrases and relations from scientific publications. arXiv preprint arXiv:1704.02853 (2017)

  4. Bellot, P., Moriceau, V., Mothe, J., SanJuan, E., Tannier, X.: INEX tweet contextualization task: evaluation, results and lesson learned. Inf. Process. Manage. 52(5), 801–819 (2016). https://doi.org/10.1016/j.ipm.2016.03.002

    Article  Google Scholar 

  5. Biran, O., Brody, S., Elhadad, N.: Putting it simply: a context-aware approach to lexical simplification. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 496–501. Association for Computational Linguistics, Portland, June 2011. https://www.aclweb.org/anthology/P11-2087

  6. Botha, J.A., Faruqui, M., Alex, J., Baldridge, J., Das, D.: Learning to split and rephrase from Wikipedia edit history. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 732–737. Association for Computational Linguistics, Brussels, October 2018. https://doi.org/10.18653/v1/D18-1080. https://www.aclweb.org/anthology/D18-1080

  7. Cardon, R., Grabar, N.: Détection automatique de phrases paralléles dans un corpus biomédical comparable technique/simplifié. In: TALN 2019, Toulouse, France, July 2019. https://hal.archives-ouvertes.fr/hal-02430446

  8. Cardon, R., Grabar, N.: French biomedical text simplification: when small and precise helps. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 710–716. International Committee on Computational Linguistics, Barcelona, December 2020. https://doi.org/10.18653/v1/2020.coling-main.62. https://www.aclweb.org/anthology/2020.coling-main.62

  9. Chen, P., Rochford, J., Kennedy, D.N., Djamasbi, S., Fay, P., Scott, W.: Automatic text simplification for people with intellectual disabilities. In: Artificial Intelligence Science and Technology, pp. 725–731. World Scientific, November 2016. https://doi.org/10.1142/9789813206823_0091. https://www.worldscientific.com/doi/abs/10.1142/97898132068230091

  10. Collins-Thompson, K., Callan, J.: A language modeling approach to predicting reading difficulty. In: Proceedings of HLT/NAACL, vol. 4 (2004)

    Google Scholar 

  11. Coster, W., Kauchak, D.: Simple English Wikipedia: a new text simplification task. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 665–669 (2011)

    Google Scholar 

  12. Cram, D., Daille, B.: Terminology extraction with term variant detection. In: Proceedings of ACL-2016 System Demonstrations, pp. 13–18. Association for Computational Linguistics, Berlin, August 2016. https://doi.org/10.18653/v1/P16-4003. https://www.aclweb.org/anthology/P16-4003

  13. Dernoncourt, F., Lee, J.Y.: PubMed 200k RCT: a dataset for sequential sentence classification in medical abstracts. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 308–313. Asian Federation of Natural Language Processing, Taipei, November 2017. https://www.aclweb.org/anthology/I17-2052

  14. Dong, Y., Li, Z., Rezagholizadeh, M., Cheung, J.C.K.: EditNTS: an neural programmer-interpreter model for sentence simplification through explicit editing. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 3393–3402. Association for Computational Linguistics, Florence, July 2019. https://doi.org/10.18653/v1/P19-1331. https://www.aclweb.org/anthology/P19-1331

  15. Ermakova, L., et al..: Text simplification for scientific information access: CLEF 2021 simpletext workshop. In: Proceedings of Advances in Information Retrieval - 43nd European Conference on IR Research, ECIR 2021, Lucca, Italy, 28 March–1 April 2021. Lucca, Italy (2021)

    Google Scholar 

  16. Ermakova, L., Bordignon, F., Turenne, N., Noel, M.: Is the abstract a mere teaser? Evaluating generosity of article abstracts in the environmental sciences. Front. Res. Metrics Anal. 3 (2018). https://doi.org/10.3389/frma.2018.00016. https://www.frontiersin.org/articles/10.3389/frma.2018.00016/full

  17. Ermakova, L., Cossu, J.V., Mothe, J.: A survey on evaluation of summarization methods. Inf. Process. Manage. 56(5), 1794–1814 (2019). https://doi.org/10.1016/j.ipm.2019.04.001. http://www.sciencedirect.com/science/article/pii/S0306457318306241

  18. Ermakova, L., Goeuriot, L., Mothe, J., Mulhem, P., Nie, J.Y., SanJuan, E.: CLEF 2017 microblog cultural contextualization lab overview. In: Proceedings of Experimental IR Meets Multilinguality, Multimodality, and Interaction - 8th International Conference of the CLEF Association, CLEF 2017, Dublin, Ireland, 11–14 September 2017,pp. 304–314 (2017). https://doi.org/10.1007/978-3-319-65813-1_27

  19. Fang, F., Stevens, M.: Sentence simplification with transformer-XL and paraphrase rules, p. 10 (2019)

    Google Scholar 

  20. Fecher, B., Friesike, S.: Open science: one term, five schools of thought. In: Bartling, S., Friesike, S. (eds.) Opening Science, pp. 17–47. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-00026-8_2

    Chapter  Google Scholar 

  21. Flesch, R.: A new readability yardstick. J. Appl. Psychol. 32(3), 221–233 (1948)

    Article  Google Scholar 

  22. Fontelo, P., Gavino, A., Sarmiento, R.F.: Comparing data accuracy between structured abstracts and full-text journal articles: implications in their use for informing clinical decisions. Evidence-Based Med. 18(6), 207–11 (2013). https://doi.org/10.1136/eb-2013-101272. http://www.researchgate.net/publication/240308203_Comparing_data_accuracy_between_structured_abstracts_and_full-text_journal_articles_implications_in_their_use_for_informing_clinical_decisions

  23. Glavaš, G., Štajner, S.: Simplifying lexical simplification: do we need simplified corpora? In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). pp. 63–68. Association for Computational Linguistics, Beijing, July 2015. https://doi.org/10.3115/v1/P15-2011. https://www.aclweb.org/anthology/P15-2011

  24. Grabar, N., Cardon, R.: CLEAR-simple corpus for medical French, November 2018. https://halshs.archives-ouvertes.fr/halshs-01968355

  25. Hoffart, J., et al.: Robust disambiguation of named entities in text. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 782–792 (2011)

    Google Scholar 

  26. Jarreau, P.B., Porter, L.: Science in the social media age: profiles of science blog readers. J. Mass Commun. Q. 95(1), 142–168 (2018). https://doi.org/10.1177/1077699016685558

    Article  Google Scholar 

  27. Jiang, C., Maddela, M., Lan, W., Zhong, Y., Xu, W.: Neural CRF model for sentence alignment in text simplification. arXiv:2005.02324 [cs], June 2020. http://arxiv.org/abs/2005.02324

  28. Jin, D., Szolovits, P.: Hierarchical neural networks for sequential sentence classification in medical scientific abstracts. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3100–3109. Association for Computational Linguistics, Brussels, October 2018. https://doi.org/10.18653/v1/D18-1349. https://www.aclweb.org/anthology/D18-1349

  29. Kauchak, D.: Improving text simplification language modeling using unsimplified text data. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1537–1546. Association for Computational Linguistics, Sofia, August 2013. https://www.aclweb.org/anthology/P13-1151

  30. Leroy, G., Endicott, J.E., Kauchak, D., Mouradi, O., Just, M.: User evaluation of the effects of a text simplification algorithm using term familiarity on perception, understanding, learning, and information retention. J. Med. Internet Res. 15(7), e144 (2013)

    Article  Google Scholar 

  31. Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL–04 Workshop, pp. 74–81 (2004)

    Google Scholar 

  32. Liu, Y., Lapata, M.: Text summarization with pretrained encoders. arXiv:1908.08345 [cs], September 2019. http://arxiv.org/abs/1908.08345

  33. Maddela, M., Alva-Manchego, F., Xu, W.: Controllable text simplification with explicit paraphrasing. arXiv:2010.11004 [cs], April 2021.http://arxiv.org/abs/2010.11004

  34. Maddela, M., Xu, W.: A word-complexity lexicon and a neural readability ranking model for lexical simplification. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3749–3760. Association for Computational Linguistics, Brussels (2018). https://doi.org/10.18653/v1/D18-1410. https://www.aclweb.org/anthology/D18-1410

  35. Martin, L., et al.: CamemBERT: a tasty French language model. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7203–7219. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.acl-main.645. https://www.aclweb.org/anthology/2020.acl-main.645

  36. Maruyama, T., Yamamoto, K.: Extremely low resource text simplification with pre-trained transformer language model. In: International Conference on Asian Language Processing p. 6 (2019)

    Google Scholar 

  37. McCarthy, P.M., Guess, R.H., McNamara, D.S.: The components of paraphrase evaluations. Behav. Res. Methods 41(3), 682–690 (2009). https://doi.org/10.3758/BRM.41.3.682. https://doi.org/10.3758/BRM.41.3.682

  38. Michalsky, T.: When to scaffold motivational self-regulation strategies for high school students’ science text comprehension. Front. Psychol. 12 (2021). https://doi.org/10.3389/fpsyg.2021.658027. https://www.frontiersin.org/articles/10.3389/fpsyg.2021.658027/full

  39. Molek-Kozakowska, K.: Communicating environmental science beyond academia: stylistic patterns of newsworthiness in popular science journalism. Discour. Commun. 11(1), 69–88 (2017). https://doi.org/10.1177/1750481316683294

    Article  Google Scholar 

  40. Narayan, S., Gardent, C., Cohen, S.B., Shimorina, A.: Split and rephrase. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 606–616. Association for Computational Linguistics, Copenhagen, September 2017. https://doi.org/10.18653/v1/D17-1064. https://www.aclweb.org/anthology/D17-1064

  41. Nenkova, A., Passonneau, R., McKeown, K.: The pyramid method: incorporating human content selection variation in summarization evaluation. ACM Trans. Speech Lang. Process. 4(2) (2007). https://doi.org/10.1145/1233912.1233913

  42. Owczarzak, K., Dang, H.T.: Overview of the TAC 2011 summarization track: guided task and AESOP task. In: Proceedings of the Text Analysis Conference (TAC 2011), Gaithersburg, Maryland, USA, November 2011

    Google Scholar 

  43. O’Reilly, T., Wang, Z., Sabatini, J.: How much knowledge is too little? When a lack of knowledge becomes a barrier to comprehension. Psychol. Sci. (2019). https://doi.org/10.1177/0956797619862276. https://journals.sagepub.com/doi/10.1177/0956797619862276

  44. Paetzold, G., Specia, L.: Lexical simplification with neural ranking. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp. 34–40. Association for Computational Linguistics, Valencia, April 2017. https://www.aclweb.org/anthology/E17-2006

  45. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)

    Google Scholar 

  46. Rao, S., Tetreault, J.: Dear sir or madam, may i introduce the GYAFC dataset: corpus, benchmarks and metrics for formality style transfer. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 129–140 (2018)

    Google Scholar 

  47. Sadoski, M.: Reading comprehension is embodied: theoretical and practical considerations. Educ. Psychol. Rev. 30(2), 331–349 (2018). https://doi.org/10.1007/s10648-017-9412-8

    Article  Google Scholar 

  48. Si, L., Callan, J.: A statistical model for scientific readability. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, CIKM 2001, pp. 574–576. ACM, New York (2001). https://doi.org/10.1145/502585.502695

  49. Specia, L., Jauhar, S.K., Mihalcea, R.: SemEval-2012 task 1: english lexical simplification. In: *SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), pp. 347–355. Association for Computational Linguistics, Montréal (2012). https://www.aclweb.org/anthology/S12-1046

  50. Sulem, E., Abend, O., Rappoport, A.: Semantic structural evaluation for text simplification. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 685–696 (2018)

    Google Scholar 

  51. Wang, T., Chen, P., Rochford, J., Qiang, J.: Text simplification using neural machine translation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30, no. 1, March 2016. https://ojs.aaai.org/index.php/AAAI/article/view/9933

  52. Wang, W., Li, P., Zheng, H.T.: Consistency and coherency enhanced story generation. arXiv:2010.08822 [cs], October 2020. http://arxiv.org/abs/2010.08822

  53. Woodsend, K., Lapata, M.: Learning to simplify sentences with quasi-synchronous grammar and integer programming. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 409–420. Association for Computational Linguistics, Edinburgh, July 2011. https://www.aclweb.org/anthology/D11-1038

  54. Wubben, S., van den Bosch, A., Krahmer, E.: Sentence simplification by monolingual machine translation. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1015–1024 (2012)

    Google Scholar 

  55. Xu, W., Callison-Burch, C., Napoles, C.: Problems in current text simplification research: new data can help. Trans. Assoc. Comput. Linguist. 3, 283–297 (2015). https://doi.org/10.1162/tacl_a_00139. https://www.mitpressjournals.org/doi/abs/10.1162/tacla00139

  56. Xu, W., Napoles, C., Pavlick, E., Chen, Q., Callison-Burch, C.: Optimizing statistical machine translation for text simplification. Trans. Assoc. Comput. Linguist. 4, 401–415 (2016)

    Article  Google Scholar 

  57. Yatskar, M., Pang, B., Danescu-Niculescu-Mizil, C., Lee, L.: For the sake of simplicity: unsupervised extraction of lexical simplifications from Wikipedia. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 365–368. Association for Computational Linguistics, Los Angeles, June 2010. https://www.aclweb.org/anthology/N10-1056

  58. Zhao, S., Meng, R., He, D., Saptono, A., Parmanto, B.: Integrating transformer and paraphrase rules for sentence simplification. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3164–3173. Association for Computational Linguistics, Brussels, October 2018. https://doi.org/10.18653/v1/D18-1355. https://www.aclweb.org/anthology/D18-1355

  59. Zhong, Y., Jiang, C., Xu, W., Li, J.J.: Discourse level factors for sentence deletion in text simplification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, pp. 9709–9716, April 2020. https://doi.org/10.1609/aaai.v34i05.6520. https://ojs.aaai.org/index.php/AAAI/article/view/6520

  60. Zhu, Z., Bernhard, D., Gurevych, I.: A monolingual tree-based translation model for sentence simplification. In: Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pp. 1353–1361. Coling 2010 Organizing Committee, Beijing, August 2010. https://www.aclweb.org/anthology/C10-1152

  61. Štajner, S., Nisioi, S.: A detailed evaluation of neural sequence-to-sequence models for in-domain and cross-domain text simplification. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan, May 2018. https://www.aclweb.org/anthology/L18-1479

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liana Ermakova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ermakova, L. et al. (2021). Overview of SimpleText 2021 - CLEF Workshop on Text Simplification for Scientific Information Access. In: Candan, K.S., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2021. Lecture Notes in Computer Science(), vol 12880. Springer, Cham. https://doi.org/10.1007/978-3-030-85251-1_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-85251-1_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-85250-4

  • Online ISBN: 978-3-030-85251-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics