Extracting Core Claims from Scientific Articles

  • Tom JansenEmail author
  • Tobias Kuhn
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 765)


The number of scientific articles has grown rapidly over the years and there are no signs that this growth will slow down in the near future. Because of this, it becomes increasingly difficult to keep up with the latest developments in a scientific field. To address this problem, we present here an approach to help researchers learn about the latest developments and findings by extracting in a normalized form core claims from scientific articles. This normalized representation is a controlled natural language of English sentences called AIDA, which has been proposed in previous work as a method to formally structure and organize scientific findings and discourse. We show how such AIDA sentences can be automatically extracted by detecting the core claim of an article, checking for AIDA compliance, and – if necessary – transforming it into a compliant sentence. While our algorithm is still far from perfect, our results indicate that the different steps are feasible and they support the claim that AIDA sentences might be a promising approach to improve scientific communication in the future.


Core claims Core sentences AIDA Text mining Information extraction Scientific findings 


  1. 1.
    Aggarwal, C.C., Zhai, C. (eds.): Mining text data. Springer Science & Business Media, New York (2012)Google Scholar
  2. 2.
    Barrera, A., Verma, R.: Combining syntax and semantics for automatic extractive single-document summarization. In: Gelbukh, A. (ed.) CICLing 2012, vol. 7182, pp. 366–377. Springer, Heidelberg (2012)Google Scholar
  3. 3.
    Chiticariu, L., Li, Y., Reiss, F.R.: Rule-based information extraction is dead! Long live rule-based information extraction systems! In: EMNLP, pp. 827–832, October 2013Google Scholar
  4. 4.
    Ferreira, R., de Souza Cabral, L., Lins, R.D., e Silva, G.P., Freitas, F., Cavalcanti, G., Lima, R., Simske, S.J., Favaro, L.: Assessing sentence scoring techniques for extractive text summarization. Expert Syst. Appl. 40(14), 5755–5764 (2013)CrossRefGoogle Scholar
  5. 5.
    Hong, B., Zhen, D.: An extended keyword extraction method. Phys. Procedia 24, 1120–1127 (2012)CrossRefGoogle Scholar
  6. 6.
    Kuhn, T.: A survey and classification of controlled natural languages. Comput. Linguist. 40(1), 121–170 (2014)CrossRefGoogle Scholar
  7. 7.
    Kuhn, T., Barbano, P.E., Nagy, M.L., Krauthammer, M.: Broadening the scope of nanopublications. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 487–501. Springer, Heidelberg (2013)Google Scholar
  8. 8.
    Larsen, P.O., Von Ins, M.: The rate of growth in scientific publication and the decline in coverage provided by Science Citation Index. Scientometrics 84(3), 575–603 (2010)CrossRefGoogle Scholar
  9. 9.
    Lloret, E., Romá-Ferri, M.T., Palomar, M.: COMPENDIUM: a text summarization system for generating abstracts of research papers. Data Knowl. Eng. 88, 164–175 (2013)CrossRefGoogle Scholar
  10. 10.
    Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts. Association for Computational Linguistics, Barcelona (2004)Google Scholar
  11. 11.
    Mons, B., van Haagen, H., Chichester, C., den Dunnen, J.T., et al.: The value of data. Nat. Genet. 43(4), 281–283 (2011)CrossRefGoogle Scholar
  12. 12.
    Ramos, J.: Using tf-idf to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning, December 2003Google Scholar
  13. 13.
    Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. Text Mining, pp. 1–20 (2010)Google Scholar
  14. 14.
    Saggion, H., Poibeau, T.: Automatic text summarization: past, present and future. In: Poibeau, T., Saggion, H., Piskorski, J., Yangarber, R. (eds.) Multi-Source, Multilingual Information Extraction and Summarization, pp. 3–21. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  15. 15.
    Shah, P.K., Perez-Iratxeta, C., Bork, P., Andrade, M.A.: Information extraction from full text scientific articles: Where are the keywords? BMC Bioinform. 4(1), 20 (2003)CrossRefGoogle Scholar
  16. 16.
    Siddiqi, S., Sharan, A.: Keyword and keyphrase extraction techniques: a literature review. J. Comput. Appl. 109(2), 18–23 (2015)Google Scholar
  17. 17.
    Tan, A.H.: Text mining: the state of the art and the challenges. In: Proceedings of the PAKDD 1999 Workshop on Knowledge Discovery from Advanced Databases 8, pp. 65–70 (1999)Google Scholar
  18. 18.
    Turney, P.D.: Learning algorithms for keyphrase extraction. Inform. Retrieval 2(4), 303–336 (2000)CrossRefGoogle Scholar
  19. 19.
    De Waard, A., Schneider, J.: Formalising uncertainty: an ontology of reasoning, certainty and attribution (ORCA). In: Proceedings of the Joint 2012 International Conference on Semantic Technologies Applied to Biomedical Informatics and Individualized Medicine, vol. 930, pp. 10–17., November 2012Google Scholar
  20. 20.
    Wartena, C., Brussee, R., Slakhorst, W.: Keyword extraction using word co-occurrence. In: 2010 Workshop on Database and Expert Systems Applications (DEXA), pp. 54–58. IEEE, August 2010Google Scholar
  21. 21.
    Zweigenbaum, P., Demner-Fushman, D., Yu, H., Cohen, K.B.: Frontiers of biomedical text mining: current progress. Brief. Bioinform. 8(5), 358–375 (2007)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Faculty of ScienceVrije Universiteit AmsterdamAmsterdamThe Netherlands

Personalised recommendations