Enhancing identification of structure function of academic articles using contextual information

Ma, Bowen; Zhang, Chengzhi; Wang, Yuzhuo; Deng, Sanhong

doi:10.1007/s11192-021-04225-1

Enhancing identification of structure function of academic articles using contextual information

Published: 20 January 2022

Volume 127, pages 885–925, (2022)
Cite this article

Scientometrics Aims and scope Submit manuscript

Bowen Ma¹,
Chengzhi Zhang ORCID: orcid.org/0000-0001-9522-2914²,
Yuzhuo Wang² &
…
Sanhong Deng¹

839 Accesses
4 Citations
Explore all metrics

Abstract

With the enrichment of literature resources, researchers are facing the growing problem of information explosion and knowledge overload. To help scholars retrieve literature and acquire knowledge successfully, clarifying the semantic structure of the content in academic literature has become the essential research question. In the research on identifying the structure function of chapters in academic articles, only a few studies used the deep learning model and explored the optimization for feature input. This limits the application, optimization potential of deep learning models for the research task. This paper took articles of the ACL conference as the corpus. We employ the traditional machine learning models and deep learning models to construct the classifiers based on various feature input. Experimental results show that (1) Compared with the chapter content, the chapter title is more conducive to identifying the structure function of academic articles. (2) Relative position is a valuable feature for building traditional models. (3) Inspired by (2), this paper further introduces contextual information into the deep learning models and achieved significant results. Meanwhile, our models show good migration ability in the open test containing 200 sampled non-training samples. We also annotated the ACL main conference papers in recent five years based on the best practice performing models and performed a time series analysis of the overall corpus. This work explores and summarizes the practical features and models for this task through multiple comparative experiments and provides a reference for related text classification tasks. Finally, we indicate the limitations and shortcomings of the current model and the direction of further optimization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AHM: A Novel Model for Mining Academic Hot Spots Based on a Scientific Knowledge Graph

Bibliometric-enhanced information retrieval: a novel deep feature engineering approach for algorithm searching from full-text publications

Article 18 February 2019

Leveraging deep learning for automatic literature screening in intelligent bibliometrics

Article 15 December 2022

Notes

https://acl-arc.comp.nus.edu.sg/ Collection date: April, 2018.
https://www.aclweb.org/anthology/ Collection date: April, 2018.

References

Ahmed, I., & Afzal, M. T. (2020). A systematic approach to map the research articles’ sections to IMRAD. IEEE Access: Practical Innovations, Open Solutions, 8, 129359–129371. https://doi.org/10.1109/ACCESS.2020.3009021
Article Google Scholar
Asadi, N., Badie, K., & Mahmoudi, M. T. (2019). Automatic zone identification in scientific papers via fusion techniques. Scientometrics, 119(2), 845–862. https://doi.org/10.1007/s11192-019-03060-9
Article Google Scholar
Badie, K., Asadi, N., & Mahmoudi, M. T. (2018). Zone identification based on features with high semantic richness and combining results of separate classifiers. Journal of Information & Telecommunication, 2(4), 411–427. https://doi.org/10.1080/24751839.2018.1460083
Article Google Scholar
Bertin, M., Atanassova, I., Sugimoto, C. R., & Lariviere, V. (2016). The linguistic patterns and rhetorical structure of citation context: An approach using n-grams. Scientometrics, 109(3), 1417–1434. https://doi.org/10.1007/s11192-016-2134-8
Article Google Scholar
Bird, S., Dale, R., Dorr, B. J., Gibson, B. R., Joseph, M., Kan, M.-Y., … Tan, Y. F. (2008). The ACL anthology reference corpus: A reference dataset for bibliographic research in computational linguistics. In Proceedings of the 6th International Conference on Language Resources and Evaluation Conference, 1755–1759.
Bollacker, K. D., Lawrence, S., & Giles, C. L. (2002). Discovering relevant scientific literature on the web. IEEE Intelligent Systems & Their Applications, 15(2), 42–47.
Article Google Scholar
Chen, J., Huang, H., Tian, S., & Qu, Y. (2009). Feature selection for text classification with Naïve Bayes. Expert Systems with Applications, 36(3), 5432–5435. https://doi.org/10.1016/j.eswa.2008.06.054
Article Google Scholar
Clark, P., & Boswell, R. (1991). Rule induction with CN2: Some recent improvements. In Lecture Notes in Computer Science (pp. 151–163). Springer-Verlag. doi: https://doi.org/10.1007/bfb0017011
Cohen, & J. (1960). A coefficient of agreement for nominal scales. Educational & Psychological Measurement, 20(1), 37–46.
Article Google Scholar
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1023/a:1022627411411
Article MATH Google Scholar
Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27. https://doi.org/10.1109/tit.1967.1053964
Article MATH Google Scholar
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint https://arxiv.org/abs/1810.04805.
Ding, Y., Liu, X., Guo, C., & Cronin, B. (2013). The distribution of references across texts: Some implications for citation analysis. Journal of Informetrics, 7(3), 583–592. https://doi.org/10.1016/j.joi.2013.03.003
Article Google Scholar
Echeverria, M., Stuart, D., & Blanke, T. (2015). Medical theses and derivative articles: Dissemination of contents and publication patterns. Scientometrics, 102(1), 559–586. https://doi.org/10.1007/s11192-014-1442-0
Article Google Scholar
Guo, Y., Korhonen, A., Liakata, M., Silins, I., Sun, L., & Stenius, U. (2010). Identifying the information structure of scientific abstracts: An investigation of three different schemes. In Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, 99–107.
Habib, R., & Afzal, M. T. (2019). Sections-based bibliographic coupling for research paper recommendation. Scientometrics, 119(2), 643–656. https://doi.org/10.1007/s11192-019-03053-8
Article Google Scholar
Harmsze, F. A. P. (2000). A modular structure for scientific articles in an electronic environment. University of Amsterdam.
Heffernan, K., & Teufel, S. (2018). Identifying problems and solutions in scientific text. Scientometrics, 116(2), 1367–1382. Retrieved from https://doi.org/10.1007/s11192-018-2718-6
Hirohata, K., Okazaki, N., Ananiadou, S., & Ishizuka, M. (2008). Identifying sections in scientific abstracts using conditional random fields. In Proceedings of the Third International Joint Conference on Natural Language Processing, Vol I.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory[J]. Neural Computation, 9(8), 1735–1780.
Article Google Scholar
Hu, Z., Chen, C., & Liu, Z. (2013). Where are citations located in the body of scientific articles? A study of the distributions of citation locations. Journal of Informetrics, 7(4), 887–896. https://doi.org/10.1016/j.joi.2013.08.005
Article Google Scholar
Ji, Y., Zhang, Q., Shen, S, Wang, D., Huang, S. (2019). Research on Functional Structure Identification of Academic Text Based on Deep Learning. In Proceedings of 17th International Conference of the International-Society-for-Scientometrics-and-Informetrics (ISSI), Vol II.
Kafkas, S., Pi, X., Marinos, N., & Talo’, F., Morrison, A., & Mcentyre, J. R. (2015). Section level search functionality in Europe PMC. Journal of Biomedical Semantics, 6(1), 7. https://doi.org/10.1186/s13326-015-0003-7
Article Google Scholar
Kim, Y. (2014). Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1746–1751. doi: https://doi.org/10.3115/v1/D14-1181
Lafferty, J., McCallum, A., & Pereira, F. C. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the eighteenth international conference on machine learning, 1, 282-289
Lei, D., Zhang, H., Liu, H., Li, Z., & Wu, Y. (2019). Maximal uncorrelated multinomial logistic regression. IEEE Access, 7, 89924–89935. https://doi.org/10.1109/access.2019.2921820
Article Google Scholar
Liakata, M., Saha, S., Dobnik, S., Batchelor, C., & Rebholz-Schuhmann, D. (2012). Automatic recognition of conceptualization zones in scientific articles and two life science applications. Bioinformatics (oxford, England), 28(7), 991–1000. https://doi.org/10.1093/bioinformatics/bts071
Article Google Scholar
Liakata, M., Teufel, S., Siddharthan, A., & Batchelor, C. (2010). Corpora for the conceptualisation and zoning of scientific papers. Proceedings of LREC, 2010, 2054–2061.
Google Scholar
Lu, C., Ding, Y., & Zhang, C. (2017). Understanding the impact change of a highly cited article: A content-based citation analysis. Scientometrics, 112(2), 927–945.
Article Google Scholar
Lu, W., Huang, Y., Bu, Y., & Cheng, Q. (2018). Functional structure identification of scientific documents in computer science. Scientometrics, 115(1), 463–486.
Article Google Scholar
Ma, B., Wang, Y., & Zhang, C. (2020a). CSAA: An online annotating platform for classifying sections of academic articles. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in, 2020, 519–520. https://doi.org/10.1145/3383583.3398592
Article Google Scholar
Ma, B., Zhang, C., & Wang, Y. (2020b). Exploring significant characteristics and models for classification of structure function of academic documents. Data and Information Management, 5(1), 65–74. https://doi.org/10.2478/dim-2020-0031
Article Google Scholar
Nair, P. R. R., & Nair, V. D. (2014). Scientific writing and communication in agriculture and natural resources. Springer.
Book Google Scholar
Nguyen, T. D., & Kan, M.-Y. (2007). Keyphrase extraction in scientific publications. In International conference on Asian digital libraries (pp. 317–326). Springer.
Shahid, A., & Afzal, M. T. (2017). Section-wise indexing and retrieval of research articles. Cluster Computing, 21(1), 1–12.
Google Scholar
Soldatova, L. N., & Liakata, M. (2007). An ontology methodology and CISP-the proposed Core Information about Scientific Papers. JISC Project Report.
Sollaci, L. B., & Pereira, M. G. (2004). The introduction, methods, results, and discussion (IMRAD) structure: A fifty-year survey. Journal of the Medical Library Association, 92(3), 364–367. PMID:15243643.
Google Scholar
Sun, C., Qiu, X., Xu, Y., & Huang, X. (2019, October). How to fine-tune bert for text classification?. In China National Conference on Chinese Computational Linguistics (pp. 194–206). Springer, Cham.
Teufel, S., Siddharthan, A., & Batchelor, C. (2009). Towards domain-independent argumentative zoning: Evidence from chemistry and computational linguistics. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 1493–1502.
Teufel, S., Carletta, J., & Moens, M. (1999). An annotation scheme for discourse-level argumentation in research articles. Proceedings of the Ninth Conference of the European Chapter of the Association for Computational Linguistics. https://doi.org/10.3115/977035.977051
Article Google Scholar
Teufel, S., & Moens, M. (2002). Summarizing scientific articles: Experiments with relevance and rhetorical status. Computational Linguistics, 28(4), 409–445. https://doi.org/10.1162/089120102762671936
Article Google Scholar
Voos, H., & Dagaev, K. S. (1976). Are all citations equal? Or, Did We Op. Cit. Your Idem? Journal of Academic Librarianship, 1(6), 19–21.
Google Scholar
Xia, F., Wang, W., Bekele, T. M., & Liu, H. (2017). Big scholarly data: A survey. IEEE Transactions on Big Data, 3(1), 18–35. https://doi.org/10.1109/TBDATA.2016.2641460
Article Google Scholar
Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. In Proceedings of the Fourteenth International Conference on Machine Learning, 412–420.
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., & Hovy, E. (2016). Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1480–1489.
Yao, Y., & Huang, Z. (2016). Bi-directional LSTM Recurrent Neural Network for Chinese Word Segmentation. Processing of the Neural Information (pp. 345–353). Springer International Publishing. doi: https://doi.org/10.1007/978-3-319-46681-1_42
Zhang, Z., Krawczyk, B., Garcìa, S., Rosales-Pérez, A., & Herrera, F. (2016). Empowering one-vs-one decomposition with ensemble learning for multi-class imbalanced data. Knowledge-Based Systems, 106, 251–263. https://doi.org/10.1016/j.knosys.2016.05.048
Article Google Scholar
Zhou, S., & Li, X. (2020). Feature engineering vs deep learning for paper section identification: Toward applications in Chinese medical literature. Information Processing & Management, 57(3), 102206. https://doi.org/10.1016/j.ipm.2020.102206
Article MathSciNet Google Scholar
Zhu, X., Turney, P., Lemire, D., & Vellino, A. (2015). Measuring academic influence: Not all citations are equal. Journal of the Association for Information Science and Technology, 66(2), 408–427. https://doi.org/10.1002/asi.23179
Article Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Grant No.72074113) and Open Fund Project of Fujian Provincial Key Laboratory of Information Processing and Intelligent Control (Minjiang University) (No. MJUKF-IPIC201903).

Author information

Authors and Affiliations

School of Information Management, Nanjing University, Nanjing, 210023, China
Bowen Ma & Sanhong Deng
Department of Information Management, Nanjing University of Science and Technology, Nanjing, 210094, China
Chengzhi Zhang & Yuzhuo Wang

Authors

Bowen Ma
View author publications
You can also search for this author in PubMed Google Scholar
Chengzhi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yuzhuo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Sanhong Deng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chengzhi Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ma, B., Zhang, C., Wang, Y. et al. Enhancing identification of structure function of academic articles using contextual information. Scientometrics 127, 885–925 (2022). https://doi.org/10.1007/s11192-021-04225-1

Download citation

Received: 19 March 2021
Accepted: 25 November 2021
Published: 20 January 2022
Issue Date: February 2022
DOI: https://doi.org/10.1007/s11192-021-04225-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhancing identification of structure function of academic articles using contextual information

Abstract

Access this article

Similar content being viewed by others

AHM: A Novel Model for Mining Academic Hot Spots Based on a Scientific Knowledge Graph

Bibliometric-enhanced information retrieval: a novel deep feature engineering approach for algorithm searching from full-text publications

Leveraging deep learning for automatic literature screening in intelligent bibliometrics

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Enhancing identification of structure function of academic articles using contextual information

Abstract

Access this article

Similar content being viewed by others

AHM: A Novel Model for Mining Academic Hot Spots Based on a Scientific Knowledge Graph

Bibliometric-enhanced information retrieval: a novel deep feature engineering approach for algorithm searching from full-text publications

Leveraging deep learning for automatic literature screening in intelligent bibliometrics

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation