Skip to main content
Log in

A joint framework for identifying the type and arguments of scientific contribution

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Scientific contribution is typically embodiment of the value of a scientific publication, which reflects the inspiration, promotion, and improvement of the publication on existing theories or guiding practices. To analyze scientific contribution efficiently, in this paper, we introduce the task of automatically identifying the contribution type and their corresponding arguments. For this novel task, we first construct a new dataset SciContri by manually annotating the contribution type and argument information of 783 scientific articles. And we propose a joint framework named SciContriExt for the scientific contribution extraction task, i.e., classifying the contribution type and extracting corresponding fine-grained arguments. Our proposed framework adopts a deep learning classification model and a extraction model which extracts the contribution arguments from both token and span level. We jointly train the classification and extraction models by performing a weighted summation of the loss functions of the two models. Experiments show that our proposed model outperforms the state-of-the-art approaches on both contribution type classification and argument extraction tasks. The SciContri dataset will be released for future research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. https://arxiv.org.

  2. https://grobid.readthedocs.io/en/latest/.

  3. https://brat.nlplab.org/.

  4. https://github.com/hanxiao/bert-as-service.

References

  • Achakulvisut, T., Bhagavatula, C., Acuna, D., & Kording, K. (2019). Claim extraction in biomedical publications using deep discourse model and transfer learning. arxiv:1907.00962

  • Akbik, A., Blythe, D., & Vollgraf, R. (2018). Contextual string embeddings for sequence labeling. In Proceedings of the 27th International Conference on Computational Linguistics, (pp. 1638–1649). Association for Computational Linguistics.

  • Aksnes, D. W. (2006). Citation rates and perceptions of scientific contribution. Journal of the Association for Information Science and Technology, 57(2), 169–185.

    Google Scholar 

  • Al Khatib, K., Ghosal, T., Hou, Y., de Waard, A., & Freitag, D. (2021). Argument mining for scholarly document processing: Taking stock and looking ahead. In Proceedings of the Second Workshop on Scholarly Document Processing, (pp. 56–65).

  • Augenstein, I., Das, M., Riedel, S., Vikraman, L., & McCallum, A. (2017). Semeval 2017 task 10: Scienceie—extracting keyphrases and relations from scientific publications. In Bethard, S., Carpuat, M., Apidianaki, M., Mohammad, S.M., Cer, D.M., Jurgens, D. (eds.) Proceedings of the 11th International Workshop on Semantic Evaluation, SemEval@ACL 2017, (pp. 546–555). Association for Computational Linguistics.

  • Cohan, A., Ammar, W., van Zuylen, M., & Cady, F. (2019). Structural scaffolds for citation intent classification in scientific publications. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (Vol. 1 (Long and Short Papers), pp. 3586–3596).

  • Contribution types. Retrieved January 4, 2021, from https://chi2021.acm.org/for-authors/presenting/papers/contributions-to-chi

  • Dixit, K., & Al-Onaizan, Y. (2019). Span-level model for relation extraction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, (pp. 5308–5314).

  • D’Souza, J., Auer, S., & Pedersen, T. (2021). Semeval-2021 task 11: Nlpcontributiongraph-structuring scholarly nlp contributions for a research knowledge graph. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), (pp. 364–376).

  • Eberts, M., & Ulges, A. (2019). Span-based joint entity and relation extraction with transformer pre-training. arxiv:1909.07755

  • Feng, S., Wang, Y., Liu, L., Wang, D., & Yu, G. (2019). Attention based hierarchical LSTM network for context-aware microblog sentiment classification. World Wide Web, 22(1), 59–81.

    Article  Google Scholar 

  • Fisas Elizalde, B., Ronzano, F., & Saggion, H. (2016). A multi-layered annotated corpus of scientific papers. In: Calzolari N, Choukri K, Declerck T, Goggi S, Grobelnik M, Maegaard B, Mariani J, Mazo H, Moreno A, Odijk J, Piperidis S, (eds.) LREC 2016. Tenth International Conference on Language Resources and Evaluation; 2016 May 23-28; Portorož, Slovenia.[Paris]: ELRA; 2016 (pp. 3081–3088). ELRA (European Language Resources Association)

  • Gábor, K., Buscaldi, D., Schumann, A.-K., QasemiZadeh, B., Zargayouna, H., & Charnois, T. (2018). SemEval-2018 task 7: Semantic relation extraction and classification in scientific papers. In Proceedings of The 12th International Workshop on Semantic Evaluation (pp. 679–688). Association for Computational Linguistics.

  • Hao, W., Li, Z., Qian, Y., Wang, Y., & Zhang, C. (2020). The acl fws-rc: A dataset for recognition and classification of sentence about future works. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (pp. 261–269).

  • He, L., Lee, K., Levy, O., & Zettlemoyer, L. (2018). Jointly predicting predicates and arguments in neural semantic role labeling. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, (Vol. 2: Short Papers, pp. 364–369). Association for Computational Linguistics.

  • Hu, Y., & Wan, X. (2015). Mining and analyzing the future works in scientific articles. arxiv:1507.02140

  • Hua, B., & Shin, Y. (2021). Extraction of sentences describing originality from conclusion in academic papers. In AII@ iConference (pp. 58–70).

  • Huang, Y., Giledereli, B., Köksal, A., Özgür, A., & Ozkirimli, E. (2021). Balancing methods for multi-label text classification with long-tailed class distribution. arxiv:2109.04712

  • Huang, Z., Xu, W., & Yu, K. (2015). Bidirectional lstm-crf models for sequence tagging. arxiv:1508.01991

  • Ji, B., Yu, J., Li, S., Ma, J., Wu, Q., Tan, Y., & Liu, H. (2020). Span-based joint entity and relation extraction with attention-based span-specific and contextual semantic representations. In Scott, D., Bel, N., Zong, C. (eds.) Proceedings of the 28th International Conference on Computational Linguistics (pp. 88–99). International Committee on Computational Linguistics.

  • Júnior, E. A. C., Silva, F. N., da Costa, F. L., & Amancio, D. R. (2017). Patterns of authors contribution in scientific manuscripts. Journal of Informetrics, 11(2), 498–510.

    Article  Google Scholar 

  • Jurgens, D., Kumar, S., Hoover, R., McFarland, D., & Jurafsky, D. (2018). Measuring the evolution of a scientific field through citation frames. Transactions of the Association for Computational Linguistics, 6, 391–406.

    Article  Google Scholar 

  • Lee, K., He, L., Lewis, M., & Zettlemoyer, L. (2017). End-to-end neural coreference resolution. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 188–197). Association for Computational Linguistics.

  • Li, K., & Yan, E. (2019). Using a keyword extraction pipeline to understand concepts in future work sections of research papers. In Catalano, G., Daraio, C., Gregori, M., Moed, H.F., Ruocco, G. (eds.) Proceedings of the 17th International Conference on Scientometrics and Informetrics (pp. 87–98). ISSI Society.

  • Lin, L., Wang, D., & Shen, S. (2021). Extraction of thesis research conclusion sentences in academic literature. In EEKE@ JCDL (pp. 74–76).

  • Luan, Y., He, L., Ostendorf, M., & Hajishirzi, H. (2018). Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 3219–3232).

  • Luan, Y., Wadden, D., He, L., Shah, A., Ostendorf, M., & Hajishirzi, H. (2019). A general framework for information extraction using dynamic span graphs. In Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Vol. 1 (Long and Short Papers), pp. 3036–3046). Association for Computational Linguistics.

  • Monthly Submissions. Retrieved January 4, 2022, from https://arxiv.org/stats/monthly_submissions

  • Nasar, Z., Jaffry, S. W., & Malik, M. K. (2018). Information extraction from scientific articles: A survey. Scientometrics, 117(3), 1931–1990.

    Article  Google Scholar 

  • Park, S., & Caragea, C. (2020). Scientific keyphrase identification and classification by pre-trained language models intermediate task transfer learning. In Proceedings of the 28th International Conference on Computational Linguistics (pp. 5409–5419). International Committee on Computational Linguistics.

  • Roman, M., Shahid, A., Khan, S., Koubaa, A., & Yu, L. (2021). Citation intent classification using word embedding. IEEE Access, 9, 9982–9995.

    Article  Google Scholar 

  • Teufel, S., Carletta, J., & Moens, M. (1999). An annotation scheme for discourse-level argumentation in research articles. In Ninth Conference of the European Chapter of the Association for Computational Linguistics (pp. 110–117).

  • Tkaczyk, D., Collins, A., & Beel, J. (2018). A method for discovering and extracting author contributions information from scientific biomedical publications. arxiv:1802.01174

  • Tuarob, S., Kang, S. W., Wettayakorn, P., Pornprasit, C., Sachati, T., Hassan, S.-U., & Haddawy, P. (2019). Automatic classification of algorithm citation functions in scientific literature. IEEE, 32(10), 1881–1896.

    Google Scholar 

  • Wadden, D., Wennberg, U., Luan, Y., & Hajishirzi, H. (2019). Entity, relation, and event extraction with contextualized span representations. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 5788–5793).

  • Yang, Z., Chen, H., Zhang, J., Ma, J., & Chang, Y. (2020). Attention-based multi-level feature fusion for named entity recognition. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020 (pp. 3594–3600).

  • Zhao, H., Luo, Z., Feng, C., & Ye, Y. (2019). A context-based framework for resource citation classification in scientific literatures. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1041–1044). ACM.

  • Zhao, H., Luo, Z., Feng, C., & Ye, Y. (2019). A context-based framework for resource citation classification in scientific literatures. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1041–1044).

  • Zhao, H., Luo, Z., Feng, C., Zheng, A., & Liu, X. (2019). A context-based framework for modeling the role and function of on-line resource citations in scientific literature. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 5209–5218).

  • Zhong, Z., & Chen, D. (2020). A frustratingly easy approach for joint entity and relation extraction. arxiv:2010.12812

  • Zhu, Z., Wang, D., & Shen, S. (2019). Recognizing sentences concerning future research from the full text of jasist. Proceedings of the Association for Information Science and Technology, 56(1), 858–859.

    Article  Google Scholar 

Download references

Funding

This work was supported by National Natural Science Foundation of China (No. 61976221) and Open Project Foundation of Key Laboratory of Intelligent Information Processing of Shanxi Province (under Grant CICIP2020002).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Xian Zhou or Zhunchen Luo.

Ethics declarations

Conflict of interest

The authors declared that they have no relevant financial or non-financial interests to disclose.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chao, W., Chen, M., Zhou, X. et al. A joint framework for identifying the type and arguments of scientific contribution. Scientometrics 128, 3347–3376 (2023). https://doi.org/10.1007/s11192-023-04694-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-023-04694-6

Keywords

Navigation