Insights from CL-SciSumm 2016: the faceted scientific document summarization Shared Task

  • Kokil Jaidka
  • Muthu Kumar Chandrasekaran
  • Sajal Rustagi
  • Min-Yen Kan
Article

Abstract

We describe the participation and the official results of the 2nd Computational Linguistics Scientific Summarization Shared Task (CL-SciSumm), held as a part of the BIRNDL workshop at the Joint Conference for Digital Libraries 2016 in Newark, New Jersey. CL-SciSumm is the first medium-scale Shared Task on scientific document summarization in the computational linguistics (CL) domain. Participants were provided a training corpus of 30 topics, each comprising of a reference paper (RP) and 10 or more citing papers, all of which cite the RP. For each citation, the text spans (i.e., citances) that pertain to the RP have been identified. Participants solved three sub-tasks in automatic research paper summarization using this text corpus. Fifteen teams from six countries registered for the Shared Task, of which ten teams ultimately submitted and presented their results. The annotated corpus comprised 30 target papers—currently the largest available corpora of its kind. The corpus is available for free download and use at https://github.com/WING-NUS/scisumm-corpus.

Keywords

Summarization Automated literature review Scientific document summarization Computational linguistics 

References

  1. 1.
    Aggarwal, P., Sharma, R.: Lexical and Syntactic cues to identify Reference Scope of Citance. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), pp. 103–112. Newark, NJ, USA (2016)Google Scholar
  2. 2.
    Cao, Z., Li, W., Wu, D.: PolyU at CL-SciSumm 2016. In: Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), pp. 132–138. Newark, NJ, USA (2016)Google Scholar
  3. 3.
    Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: 21st annual international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 335–336. Association of Computational Linguistics (1998)Google Scholar
  4. 4.
    Conroy, J., Davis, S.: Vector space and language models for scientific document summarization. In: NAACL-HLT, pp. 186–191. Association of Computational Linguistics, Newark, NJ, USA (2015)Google Scholar
  5. 5.
    Drouin, P.: Extracting a bilingual transdisciplinary scientific lexicon. In: eLexicography in the 21st century: new challenges, new applications, pp. 43–53. Presses Universitaires de Louvain, Louvain-la-Neuve (2010)Google Scholar
  6. 6.
    Hoang, C., Kan, M.: Towards automated related work summarization. In: Proceedings of COLING: posters, pp. 427–435. ACL (2010)Google Scholar
  7. 7.
    Jaidka, K., Chandrasekaran, M.K., Elizalde, B.F., Jha, R., Jones, C., Kan, M.Y., Khanna, A., Molla-Aliod, D., Radev, D.R., Ronzano, F., et al.: The computational linguistics summarization pilot task. In: Proceedings of Text Analysis Conference. Gaithersburg, USA (2014)Google Scholar
  8. 8.
    Jaidka, K., Khoo, C.S., Na, J.C.: Deconstructing human literature reviews—a framework for multi-document summarization. In: Proceedings of ENLG, pp. 125–135 (2013)Google Scholar
  9. 9.
    Jones, K.S.: Automatic summarising: the state of the art. Inf. Process. Manag. 43(6), 1449–1481 (2007)CrossRefGoogle Scholar
  10. 10.
    Klampfl, S., Rexha, A., Kern, R.: Identifying referenced text in scientific publications by summarisation and classification techniques. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), pp. 122–131. Newark, NJ, USA (2016)Google Scholar
  11. 11.
    Li, L., Mao, L., Zhang, Y., Chi, J., Huang, T., Cong, X., Peng, H.: CIST system for CL-SciSumm 2016 shared task. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), pp. 156–167. Newark, NJ, USA (2016)Google Scholar
  12. 12.
    Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. Text summarization branches out. In: Proceedings of the ACL-04 workshop 8 (2004)Google Scholar
  13. 13.
    Liu, F., Liu, Y.: Correlation between rouge and human evaluation of extractive meeting summaries. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, pp. 201–204. Association for Computational Linguistics (2008)Google Scholar
  14. 14.
    Lu, K., Mao, J., Li, G., Xu, J.: Recognizing reference spans and classifying their discourse facets. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), pp. 139–145. Newark, NJ, USA (2016)Google Scholar
  15. 15.
    Malenfant, B., Lapalme, G.: RALI system description for CL-SciSumm 2016 shared task. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), pp. 146–155. Newark, NJ, USA (2016)Google Scholar
  16. 16.
    Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: 21st National Conference on Artificial Intelligence, pp. 775–780. AAAI (2006)Google Scholar
  17. 17.
    Mohammad, S., Dorr, B., Egan, M., Hassan, A., Muthukrishan, P., Qazvinian, V., Radev, D.R., Zajic, D.: Using citations to generate surveys of scientific paradigms. In: Proceedings of NAACL, pp. 584–592. ACL (2009)Google Scholar
  18. 18.
    Moraes, L., Baki, S., Verma, R., Lee, D.: University of Houston at CL-SciSumm 2016: SVMs with tree kernels and sentence similarity. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), pp. 113–121. Newark, NJ, USA (2016)Google Scholar
  19. 19.
    Nakov, P.I., Schwartz, A.S., Hearst, M.: Citances: Citation sentences for semantic analysis of bioscience text. In: Proceedings of the SIGIR’04 Workshop on Search and Discovery in Bioinformatics, pp. 81–88 (2004)Google Scholar
  20. 20.
    Nomoto, T.: NEAL: A neurally enhanced approach to linking citation and reference. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), pp. 168–174. Newark, NJ, USA (2016)Google Scholar
  21. 21.
    Qazvinian, V., Radev, D.: Scientific paper summarization using citation summary networks. In: Proceedings of the 22nd International Conference on Computational Linguistics, vol. 1, pp. 689–696. ACL (2008)Google Scholar
  22. 22.
    Radev, D.R., Muthukrishnan, P., Qazvinian, V., Abu-Jbara, A.: The ACL anthology network corpus. Lang. Resour. Eval. (2013). doi:10.1007/s10579-012-9211-2 Google Scholar
  23. 23.
    Saggion, H.: SUMMA: a robust and adaptable summarization tool. Traitement Autom. des Lang. 49(2), 103–125 (2002)Google Scholar
  24. 24.
    Saggion, H., AbuRa’Ed, A., Ronzano, F.: Trainable citation-enhanced summarization of scientific articles. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), pp. 175–186. Newark, NJ, USA (2016)Google Scholar
  25. 25.
    Teufel, S., Moens, M.: Summarizing scientific articles: experiments with relevance and rhetorical status. Comput. Linguist. 28(4), 409–445 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  • Kokil Jaidka
    • 1
  • Muthu Kumar Chandrasekaran
    • 2
  • Sajal Rustagi
    • 3
  • Min-Yen Kan
    • 2
    • 4
  1. 1.School of Arts and SciencesUniversity of PennsylvaniaPennsylvaniaUSA
  2. 2.School of ComputingNational University of SingaporeSingaporeSingapore
  3. 3.Department of Computer Science and EngineeringIndian Institute of TechnologyRoorkeeIndia
  4. 4.Smart Systems InstituteNational University of SingaporeSingaporeSingapore

Personalised recommendations