Abstract
Information retrieval has moved from traditional document retrieval in which search is an isolated activity, to modern information access where search and the use of the information are fully integrated. But non-experts tend to avoid authoritative primary sources such as scientific literature due to their complex language, internal vernacular, or lacking prior background knowledge. Text simplification approaches can remove some of these barriers, thereby avoiding that users rely on shallow information in sources prioritizing commercial or political incentives rather than the correctness and informational value. The CLEF 2021 SimpleText track addresses the opportunities and challenges of text simplification approaches to improve scientific information access head-on. We aim to provide appropriate data and benchmarks, starting with pilot tasks in 2021, and create a community of NLP and IR researchers working together to resolve one of the greatest challenges of today.
Everything should be made as simple as possible, but no simpler
Albert Einstein
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
References
Aharoni, R., Goldberg, Y.: Split and rephrase: better evaluation and a stronger baseline. arXiv:1805.01035 [cs], May 2018. http://arxiv.org/abs/1805.01035
Anand Deshmukh, A., Sethi, U.: IR-BERT: leveraging bert for semantic search in background linking for news articles. arXiv e-prints 2007. arXiv:2007.12603, July 2020. http://adsabs.harvard.edu/abs/2020arXiv200712603A
Augenstein, I., Das, M., Riedel, S., Vikraman, L., McCallum, A.: Semeval 2017 task 10: scienceie-extracting keyphrases and relations from scientific publications. arXiv preprint arXiv:1704.02853 (2017)
Bellot, P., Moriceau, V., Mothe, J., SanJuan, E., Tannier, X.: INEX tweet contextualization task: evaluation, results and lesson learned. Inf. Process. Manage. 52(5), 801–819 (2016). https://doi.org/10.1016/j.ipm.2016.03.002
Biran, O., Brody, S., Elhadad, N.: Putting it simply: a context-aware approach to lexical simplification. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 496–501. Association for Computational Linguistics, Portland, June 2011. https://www.aclweb.org/anthology/P11-2087
Botha, J.A., Faruqui, M., Alex, J., Baldridge, J., Das, D.: Learning to split and rephrase from Wikipedia edit history. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 732–737. Association for Computational Linguistics, Brussels, October 2018. https://doi.org/10.18653/v1/D18-1080. https://www.aclweb.org/anthology/D18-1080
Cardon, R., Grabar, N.: Détection automatique de phrases paralléles dans un corpus biomédical comparable technique/simplifié. In: TALN 2019, Toulouse, France, July 2019. https://hal.archives-ouvertes.fr/hal-02430446
Cardon, R., Grabar, N.: French biomedical text simplification: when small and precise helps. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 710–716. International Committee on Computational Linguistics, Barcelona, December 2020. https://doi.org/10.18653/v1/2020.coling-main.62. https://www.aclweb.org/anthology/2020.coling-main.62
Chen, P., Rochford, J., Kennedy, D.N., Djamasbi, S., Fay, P., Scott, W.: Automatic text simplification for people with intellectual disabilities. In: Artificial Intelligence Science and Technology, pp. 725–731. World Scientific, November 2016. https://doi.org/10.1142/9789813206823_0091. https://www.worldscientific.com/doi/abs/10.1142/97898132068230091
Collins-Thompson, K., Callan, J.: A language modeling approach to predicting reading difficulty. In: Proceedings of HLT/NAACL, vol. 4 (2004)
Coster, W., Kauchak, D.: Simple English Wikipedia: a new text simplification task. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 665–669 (2011)
Cram, D., Daille, B.: Terminology extraction with term variant detection. In: Proceedings of ACL-2016 System Demonstrations, pp. 13–18. Association for Computational Linguistics, Berlin, August 2016. https://doi.org/10.18653/v1/P16-4003. https://www.aclweb.org/anthology/P16-4003
Dernoncourt, F., Lee, J.Y.: PubMed 200k RCT: a dataset for sequential sentence classification in medical abstracts. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 308–313. Asian Federation of Natural Language Processing, Taipei, November 2017. https://www.aclweb.org/anthology/I17-2052
Dong, Y., Li, Z., Rezagholizadeh, M., Cheung, J.C.K.: EditNTS: an neural programmer-interpreter model for sentence simplification through explicit editing. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 3393–3402. Association for Computational Linguistics, Florence, July 2019. https://doi.org/10.18653/v1/P19-1331. https://www.aclweb.org/anthology/P19-1331
Ermakova, L., et al..: Text simplification for scientific information access: CLEF 2021 simpletext workshop. In: Proceedings of Advances in Information Retrieval - 43nd European Conference on IR Research, ECIR 2021, Lucca, Italy, 28 March–1 April 2021. Lucca, Italy (2021)
Ermakova, L., Bordignon, F., Turenne, N., Noel, M.: Is the abstract a mere teaser? Evaluating generosity of article abstracts in the environmental sciences. Front. Res. Metrics Anal. 3 (2018). https://doi.org/10.3389/frma.2018.00016. https://www.frontiersin.org/articles/10.3389/frma.2018.00016/full
Ermakova, L., Cossu, J.V., Mothe, J.: A survey on evaluation of summarization methods. Inf. Process. Manage. 56(5), 1794–1814 (2019). https://doi.org/10.1016/j.ipm.2019.04.001. http://www.sciencedirect.com/science/article/pii/S0306457318306241
Ermakova, L., Goeuriot, L., Mothe, J., Mulhem, P., Nie, J.Y., SanJuan, E.: CLEF 2017 microblog cultural contextualization lab overview. In: Proceedings of Experimental IR Meets Multilinguality, Multimodality, and Interaction - 8th International Conference of the CLEF Association, CLEF 2017, Dublin, Ireland, 11–14 September 2017,pp. 304–314 (2017). https://doi.org/10.1007/978-3-319-65813-1_27
Fang, F., Stevens, M.: Sentence simplification with transformer-XL and paraphrase rules, p. 10 (2019)
Fecher, B., Friesike, S.: Open science: one term, five schools of thought. In: Bartling, S., Friesike, S. (eds.) Opening Science, pp. 17–47. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-00026-8_2
Flesch, R.: A new readability yardstick. J. Appl. Psychol. 32(3), 221–233 (1948)
Fontelo, P., Gavino, A., Sarmiento, R.F.: Comparing data accuracy between structured abstracts and full-text journal articles: implications in their use for informing clinical decisions. Evidence-Based Med. 18(6), 207–11 (2013). https://doi.org/10.1136/eb-2013-101272. http://www.researchgate.net/publication/240308203_Comparing_data_accuracy_between_structured_abstracts_and_full-text_journal_articles_implications_in_their_use_for_informing_clinical_decisions
Glavaš, G., Štajner, S.: Simplifying lexical simplification: do we need simplified corpora? In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). pp. 63–68. Association for Computational Linguistics, Beijing, July 2015. https://doi.org/10.3115/v1/P15-2011. https://www.aclweb.org/anthology/P15-2011
Grabar, N., Cardon, R.: CLEAR-simple corpus for medical French, November 2018. https://halshs.archives-ouvertes.fr/halshs-01968355
Hoffart, J., et al.: Robust disambiguation of named entities in text. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 782–792 (2011)
Jarreau, P.B., Porter, L.: Science in the social media age: profiles of science blog readers. J. Mass Commun. Q. 95(1), 142–168 (2018). https://doi.org/10.1177/1077699016685558
Jiang, C., Maddela, M., Lan, W., Zhong, Y., Xu, W.: Neural CRF model for sentence alignment in text simplification. arXiv:2005.02324 [cs], June 2020. http://arxiv.org/abs/2005.02324
Jin, D., Szolovits, P.: Hierarchical neural networks for sequential sentence classification in medical scientific abstracts. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3100–3109. Association for Computational Linguistics, Brussels, October 2018. https://doi.org/10.18653/v1/D18-1349. https://www.aclweb.org/anthology/D18-1349
Kauchak, D.: Improving text simplification language modeling using unsimplified text data. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1537–1546. Association for Computational Linguistics, Sofia, August 2013. https://www.aclweb.org/anthology/P13-1151
Leroy, G., Endicott, J.E., Kauchak, D., Mouradi, O., Just, M.: User evaluation of the effects of a text simplification algorithm using term familiarity on perception, understanding, learning, and information retention. J. Med. Internet Res. 15(7), e144 (2013)
Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL–04 Workshop, pp. 74–81 (2004)
Liu, Y., Lapata, M.: Text summarization with pretrained encoders. arXiv:1908.08345 [cs], September 2019. http://arxiv.org/abs/1908.08345
Maddela, M., Alva-Manchego, F., Xu, W.: Controllable text simplification with explicit paraphrasing. arXiv:2010.11004 [cs], April 2021.http://arxiv.org/abs/2010.11004
Maddela, M., Xu, W.: A word-complexity lexicon and a neural readability ranking model for lexical simplification. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3749–3760. Association for Computational Linguistics, Brussels (2018). https://doi.org/10.18653/v1/D18-1410. https://www.aclweb.org/anthology/D18-1410
Martin, L., et al.: CamemBERT: a tasty French language model. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7203–7219. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.acl-main.645. https://www.aclweb.org/anthology/2020.acl-main.645
Maruyama, T., Yamamoto, K.: Extremely low resource text simplification with pre-trained transformer language model. In: International Conference on Asian Language Processing p. 6 (2019)
McCarthy, P.M., Guess, R.H., McNamara, D.S.: The components of paraphrase evaluations. Behav. Res. Methods 41(3), 682–690 (2009). https://doi.org/10.3758/BRM.41.3.682. https://doi.org/10.3758/BRM.41.3.682
Michalsky, T.: When to scaffold motivational self-regulation strategies for high school students’ science text comprehension. Front. Psychol. 12 (2021). https://doi.org/10.3389/fpsyg.2021.658027. https://www.frontiersin.org/articles/10.3389/fpsyg.2021.658027/full
Molek-Kozakowska, K.: Communicating environmental science beyond academia: stylistic patterns of newsworthiness in popular science journalism. Discour. Commun. 11(1), 69–88 (2017). https://doi.org/10.1177/1750481316683294
Narayan, S., Gardent, C., Cohen, S.B., Shimorina, A.: Split and rephrase. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 606–616. Association for Computational Linguistics, Copenhagen, September 2017. https://doi.org/10.18653/v1/D17-1064. https://www.aclweb.org/anthology/D17-1064
Nenkova, A., Passonneau, R., McKeown, K.: The pyramid method: incorporating human content selection variation in summarization evaluation. ACM Trans. Speech Lang. Process. 4(2) (2007). https://doi.org/10.1145/1233912.1233913
Owczarzak, K., Dang, H.T.: Overview of the TAC 2011 summarization track: guided task and AESOP task. In: Proceedings of the Text Analysis Conference (TAC 2011), Gaithersburg, Maryland, USA, November 2011
O’Reilly, T., Wang, Z., Sabatini, J.: How much knowledge is too little? When a lack of knowledge becomes a barrier to comprehension. Psychol. Sci. (2019). https://doi.org/10.1177/0956797619862276. https://journals.sagepub.com/doi/10.1177/0956797619862276
Paetzold, G., Specia, L.: Lexical simplification with neural ranking. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp. 34–40. Association for Computational Linguistics, Valencia, April 2017. https://www.aclweb.org/anthology/E17-2006
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)
Rao, S., Tetreault, J.: Dear sir or madam, may i introduce the GYAFC dataset: corpus, benchmarks and metrics for formality style transfer. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 129–140 (2018)
Sadoski, M.: Reading comprehension is embodied: theoretical and practical considerations. Educ. Psychol. Rev. 30(2), 331–349 (2018). https://doi.org/10.1007/s10648-017-9412-8
Si, L., Callan, J.: A statistical model for scientific readability. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, CIKM 2001, pp. 574–576. ACM, New York (2001). https://doi.org/10.1145/502585.502695
Specia, L., Jauhar, S.K., Mihalcea, R.: SemEval-2012 task 1: english lexical simplification. In: *SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), pp. 347–355. Association for Computational Linguistics, Montréal (2012). https://www.aclweb.org/anthology/S12-1046
Sulem, E., Abend, O., Rappoport, A.: Semantic structural evaluation for text simplification. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 685–696 (2018)
Wang, T., Chen, P., Rochford, J., Qiang, J.: Text simplification using neural machine translation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30, no. 1, March 2016. https://ojs.aaai.org/index.php/AAAI/article/view/9933
Wang, W., Li, P., Zheng, H.T.: Consistency and coherency enhanced story generation. arXiv:2010.08822 [cs], October 2020. http://arxiv.org/abs/2010.08822
Woodsend, K., Lapata, M.: Learning to simplify sentences with quasi-synchronous grammar and integer programming. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 409–420. Association for Computational Linguistics, Edinburgh, July 2011. https://www.aclweb.org/anthology/D11-1038
Wubben, S., van den Bosch, A., Krahmer, E.: Sentence simplification by monolingual machine translation. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1015–1024 (2012)
Xu, W., Callison-Burch, C., Napoles, C.: Problems in current text simplification research: new data can help. Trans. Assoc. Comput. Linguist. 3, 283–297 (2015). https://doi.org/10.1162/tacl_a_00139. https://www.mitpressjournals.org/doi/abs/10.1162/tacla00139
Xu, W., Napoles, C., Pavlick, E., Chen, Q., Callison-Burch, C.: Optimizing statistical machine translation for text simplification. Trans. Assoc. Comput. Linguist. 4, 401–415 (2016)
Yatskar, M., Pang, B., Danescu-Niculescu-Mizil, C., Lee, L.: For the sake of simplicity: unsupervised extraction of lexical simplifications from Wikipedia. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 365–368. Association for Computational Linguistics, Los Angeles, June 2010. https://www.aclweb.org/anthology/N10-1056
Zhao, S., Meng, R., He, D., Saptono, A., Parmanto, B.: Integrating transformer and paraphrase rules for sentence simplification. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3164–3173. Association for Computational Linguistics, Brussels, October 2018. https://doi.org/10.18653/v1/D18-1355. https://www.aclweb.org/anthology/D18-1355
Zhong, Y., Jiang, C., Xu, W., Li, J.J.: Discourse level factors for sentence deletion in text simplification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, pp. 9709–9716, April 2020. https://doi.org/10.1609/aaai.v34i05.6520. https://ojs.aaai.org/index.php/AAAI/article/view/6520
Zhu, Z., Bernhard, D., Gurevych, I.: A monolingual tree-based translation model for sentence simplification. In: Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pp. 1353–1361. Coling 2010 Organizing Committee, Beijing, August 2010. https://www.aclweb.org/anthology/C10-1152
Štajner, S., Nisioi, S.: A detailed evaluation of neural sequence-to-sequence models for in-domain and cross-domain text simplification. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan, May 2018. https://www.aclweb.org/anthology/L18-1479
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Ermakova, L. et al. (2021). Overview of SimpleText 2021 - CLEF Workshop on Text Simplification for Scientific Information Access. In: Candan, K.S., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2021. Lecture Notes in Computer Science(), vol 12880. Springer, Cham. https://doi.org/10.1007/978-3-030-85251-1_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-85251-1_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85250-4
Online ISBN: 978-3-030-85251-1
eBook Packages: Computer ScienceComputer Science (R0)