Discovering related scientific literature beyond semantic similarity: a new co-citation approach
- 25 Downloads
We propose a new approach to recommend scientific literature, a domain in which the efficient organization and search of information is crucial. The proposed system relies on the hypothesis that two scientific articles are semantically related if they are co-cited more frequently than they would be by pure chance. This relationship can be quantified by the probability of co-citation, obtained from a null model that statistically defines what we consider pure chance. Looking for article pairs that minimize this probability, the system is able to recommend a ranking of articles in response to a given article. This system is included in the co-occurrence paradigm of the field. More specifically, it is based on co-cites so it can produce recommendations more focused on relatedness than on similarity. Evaluation has been performed on the ACL Anthology collection and on the DBLP dataset, and a new corpus has been compiled to evaluate the capacity of the proposal to find relationships beyond similarity. Results show that the system is able to provide, not only articles similar to the submitted one, but also articles presenting other kind of relations, thus providing diversity, i.e. connections to new topics.
KeywordsScientific related literature Recommendations Co-citation Statistical model Semantic similarity
This work has been partially supported by the Spanish Ministry of Science and Innovation within the projects PROSA-MED (TIN2016-77820-C3-2-R) and EXTRAE (IMIENS 2017).
Compliance with ethical standards
Conflict of Interest
The authors declare that they have no conflict of interest.
- Arnold, A., & Cohen, W. (2009). Information extraction as link prediction: Using curated citation networks to improve gene detection. In B. Liu, A. Bestavros, D.Z.Du, & J. Wang (Eds.), Wireless algorithms, systems, and applications. Lecture Notes in Computer Science (Vol. 5682, pp. 541–550). Berlin Heidelberg: Springer.Google Scholar
- Baez, M., Mirylenka, D., & Parra, C. (2011). Understanding and supporting search for scholarly knowledge. In 7th European computer science summit, Milano, Italy (pp. 1–8).Google Scholar
- Beel, J., Gipp, B., Langer, S., & Breitinger, C. (2016). Research-paper recommender systems: A literature survey. International Journal on Digital Libraries, 17(4), 305–338.Google Scholar
- Bird, S., Dale, R., Dorr, B. J., Gibson, B. R., Joseph, M., Kan, M. Y., et al. (2008). The ACL anthology reference corpus: A reference dataset for bibliographic research in computational linguistics. In European Language Resources Association (LREC) (pp. 1755–1759).Google Scholar
- Castells, P., Vargas, S., & Wang, J. (2011). Novelty and diversity metrics for recommender systems: Choice, discovery and relevance. In International workshop on diversity in document retrieval (DDR 2011) at the 33rd European conference on information retrieval (ECIR 2011), Dublin, Ireland. http://ir.ii.uam.es/rim3/publications/ddr11.pdf. Accessed 10 May 2019.
- Ding, Y., Yan, E., Frazho, A. R., & Caverlee, J. (2009). Pagerank for ranking authors in co-citation networks. JASIST, 60(11), 2229–2243.Google Scholar
- Eto, M. (2016). Rough co-citation as a measure of relationship to expand co-citation networks for scientific paper searches. Proceedings of the Association for Information Science and Technology, 53, 1–4.Google Scholar
- Feld, S. L. (1991). Why your friends have more friends than you do. American Journal of Sociology, 96(6), 1464–1477.Google Scholar
- Ge, M., Delgado-Battenfeld, C., & Jannach, D. (2010). Beyond accuracy: Evaluating recommender systems by coverage and serendipity. In Proceedings of the fourth ACM conference on recommender systems (RecSys ’10) (pp. 257–260). ACMGoogle Scholar
- Gipp, B., & Beel, J. (2009). Citation proximity analysis (CPA)—A new approach for identifying related work based on co-citation analysis. In B. Larsen & J. Leta (Eds.), Proceedings of the 12th international conference on scientometrics and informetrics (ISSI’09), international society for scientometrics and informetrics, Rio de Janeiro, Brazil (Vol. 2, pp 571–575). iSSN:2175-1935.Google Scholar
- Goldberg, D., Nichols, D., Oki, B. M., & Terry, D. (1992). Using collaborative filtering to weave an information tapestry. Communications of the ACM, 35(12), 61–70.Google Scholar
- Harispe, S., Ranwez, S., Janaqi, S., & Montmain, J. (2015). Semantic similarity from natural language and ontology analysis. Synthesis Lectures on Human Language Technologies, 8(1), 1–254.Google Scholar
- He, Q., Pei, J., Kifer, D., Mitra, P., & Giles, L. (2010). Context-aware citation recommendation. In Proceedings of the 19th international conference on world wide web ( WWW’10) (pp. 421–430). New York, NY: ACM.Google Scholar
- Jurgens, D., & Stevens, K. (2010). The s-space package: An open source package for word space models. In Proceedings of the ACL 2010 system demonstrations (ACLDemos ’10), Association for Computational Linguistics, Stroudsburg, PA, USA (pp. 30–35).Google Scholar
- Kim, H. J., Jeong, Y. K., & Song, M. (2016). Content- and proximity-based author co-citation analysis using citation sentences. Journal of Informetrics, 10(4), 954–966.Google Scholar
- Kotkov, D., Wang, S., & Veijalainen, J. (2016). A survey of serendipity in recommender systems. Knowledge-Based Systems, 111(C), 180–192.Google Scholar
- Liang, Y., Li, Q., & Qian, T. (2011). Finding relevant papers based on citation relations. In Proceedings of the 12th international conference on web-age information management (WAIM’11) (pp. 403–414). Berlin: Springer.Google Scholar
- Lops, P., de Gemmis, M., & Semeraro, G. (2011). Content-based recommender systems: State of the art and trends. In F. Ricci, L. Rokach, B. Shapira, & P. B. Kantor (Eds.), Recommender systems handbook (pp. 73–105). New York: Springer.Google Scholar
- Mustafee, N., Dwivedi, Y. K., Bell, D., & Williams, M. D. (2010). A methodology for profiling literature using co-citation analysis. In Sustainable IT collaboration around the globe. 16th Americas conference on information systems (AMCIS 2010), August 12–15, 2010, Lima, Peru (p. 359).Google Scholar
- Pedersen, T., Pakhomov, S. V., Patwardhan, S., & Chute, C. G. (2007). Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics, 40(3), 288–299.Google Scholar
- Pohl, S., Radlinski, F., & Joachims, T. (2007). Recommending related papers based on digital library access records. In Proceedings of the 7th ACM/IEEE-CS joint conference on digital libraries (JCDL ’07) (pp. 417–418). ACM.Google Scholar
- Radev, D., Muthukrishnan, P., Qazvinian, V., & Abu-Jbara, A. (2013). The ACL anthology network corpus. Language Resources and Evaluation, 1–26. https://doi.org/10.1007/s10579-012-9211-2.
- Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., & Riedl, J. (1994). Grouplens: An open architecture for collaborative filtering of netnews. In Proceedings of the 1994 ACM conference on computer supported cooperative work (CSCW ’94) (pp. 175–186). New York, NY: ACM.Google Scholar
- Small, H. (1973). Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24(4), 265–269.Google Scholar
- Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., & Su, Z. (2008). Arnetminer: Extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’08) (pp 990–998). New York, NY: ACM.Google Scholar
- Tejeda-Lorente, A., Porcel, C., Bernabé-Moreno, J., & Herrera-Viedma, E. (2015). Refore: A recommender system for researchers based on bibliometrics. Applied Soft Computing, 30, 778–791.Google Scholar
- White, H. D., & McCain, K. W. (1998). Visualizing a discipline: An author co-citation analysis of information science, 1972–1995. Journal of the American Society for Information Science, 49(4), 327–355.Google Scholar
- Zhou, D., Zhu, S., Yu, K., Song, X., Tseng, B.L., Zha, H., & Giles, C.L. (2008). Learning multiple graphs for document recommendations. In Proceedings of the 17th International Conference on World Wide Web (WWW ’08) ( pp 141–150). New York, NY: ACM.Google Scholar