Abstract
A high quality reference list is important to the overall quality of a research paper. However, it requires domain knowledge and is time consuming to generate a reference list with good coverage, representativeness, and timeliness due to the large amount and fast growing of publications. In this paper, we deal with the specific problem of reference enhancement of research manuscripts with machine learning. A predictive model is trained by a large academic dataset with paper-related and venue-related information to discover additional references for a scientific draft with related information including an initial reference list. We propose a supervised approach called RefCom under the framework of learning-to-rank to predict the probability for a given paper to cite a reference candidate. Forty features in total are defined to describe pairs of papers with respect to author influence, venue influence and paper influence, as well as content and reference similarity. Unlike heuristic rule-based approaches, RefCom is able to integrate multiple features with learned weights. Experimental study with the AMiner dataset which contains 2 million papers and 1.7 million authors show the effectiveness of RefCom in citation prediction, suggesting its potential of being applied as an assistant tool in reference finalization.
Similar content being viewed by others
References
Avancini H, Straccia U (2004) Personalization, collaboration, and recommendation in the digital library environment cyclades. In: Proceedings of IADIS conference on applied computing, pp 67–74
Cao Z, Qin T, Liu TY, Tsai MF, Li H (2007) Learning to rank: from pairwise approach to listwise approach. In: Proceedings of the 24th international conference on machine learning, pp 129–136
Cao J, Zhang K, Luo M, Yin C, Lai X (2016) Extreme learning machine and adaptive sparse representation for image classification. Neural Netw 81:91–102
Champiri ZD, Shahamiri SR, Salim SSB (2015) A systematic review of scholar context-aware recommender systems. Expert Syst Appl 42(3):1743–1758
Chang CC, Chen RS (2006) Using data mining technology to solve classification problems: a case study of campus digital library. Electron Lib 24(3):307–321
Chen CC, Chen AP (2007) Using data mining technology to provide a recommendation service in the digital library. Electron Lib 25(6):711–724
Chen H, Martinez J, Ng TD, Schatz BR (1997) A concept space approach to addressing the vocabulary problem in scientific information retrieval: an experiment on the worm community system. J Am Soc Inf Sci 48(1):17–31
Ding Y, Chowdhury G, Foo S (2000) Organising keywords in a web search environment: a methodology based on co-word analysis. Adv Knowl Org 7:28–34
Geng X, Liu TY, Qin T, Li H (2007) Feature selection for ranking. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, pp 407–414
Liao IE, Hsu WC, Cheng MS, Chen LP (2010) A library recommender system based on a personal ontology model and collaborative filtering technique for english collections. Electron Lib 28(3):386–400
Liu TY (2009) Learning to rank for information retrieval. Foundations and trends®. Inf Retr 3(3):225–331
Lü L, Zhou T (2011) Link prediction in complex networks: a survey. Phys A Stat Mech Appl 390(6):1150–1170
Schatz BR, Johnson EH, Cochrane PA, Chen H (1996) Interactive term suggestion for users of digital libraries: using subject thesauri and co-occurrence lists for information retrieval. In: ACM international conference on digital libraries, pp 126–133
Schwarzer M, Schubotz M, Meuschke N, Breitinger C, Markl V, Gipp B (2016) Evaluating link-based recommendations for wikipedia. In: ACM/IEEE-CS joint conference on digital libraries, pp 191–200
Stallings J, Vance E, Yang J, Vannier MW, Liang J, Pang L, Dai L, Ye I, Wang G (2013) Determining scientific impact using a collaboration index. Proc Natl Acad Sci 110(24):9680–9685
Tang J, Yao L, Zhang D, Zhang J (2010) A combination approach to web user profiling. ACM Trans Knowl Discov Data 5:1–44
Tang J, Zhang J, Yao L, Li J, Zhang L, Su Z (2008) Arnetminer: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 990–998
Tsai CS, Chen MY (2008) Using adaptive resonance theory and data-mining techniques for materials recommendation based on the e-Library environment. Electron Lib 26(3):287–302
Xia L, Xu J, Lan Y, Guo J, Cheng X (2015) Learning maximal marginal relevance model via directly optimizing diversity evaluation measures. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, pp 113–122
Acknowledgements
This project was supported by the National Natural Science Foundation of China (Grant no. 61502420), and the Natural Science Foundation of Zhejiang Province (Grant no. LY16F020032).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Table 2 gives the complete list of features used in RefCom.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mei, JP., Chen, D., Fan, J. et al. Finalizing your reference list with machine learning. J Ambient Intell Human Comput 14, 14883–14892 (2023). https://doi.org/10.1007/s12652-018-0976-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-018-0976-z