Discovering related scientific literature beyond semantic similarity: a new co-citation approach

  • Oscar Rodriguez-Prieto
  • Lourdes AraujoEmail author
  • Juan Martinez-Romo


We propose a new approach to recommend scientific literature, a domain in which the efficient organization and search of information is crucial. The proposed system relies on the hypothesis that two scientific articles are semantically related if they are co-cited more frequently than they would be by pure chance. This relationship can be quantified by the probability of co-citation, obtained from a null model that statistically defines what we consider pure chance. Looking for article pairs that minimize this probability, the system is able to recommend a ranking of articles in response to a given article. This system is included in the co-occurrence paradigm of the field. More specifically, it is based on co-cites so it can produce recommendations more focused on relatedness than on similarity. Evaluation has been performed on the ACL Anthology collection and on the DBLP dataset, and a new corpus has been compiled to evaluate the capacity of the proposal to find relationships beyond similarity. Results show that the system is able to provide, not only articles similar to the submitted one, but also articles presenting other kind of relations, thus providing diversity, i.e. connections to new topics.


Scientific related literature Recommendations Co-citation Statistical model Semantic similarity 



This work has been partially supported by the Spanish Ministry of Science and Innovation within the projects PROSA-MED (TIN2016-77820-C3-2-R) and EXTRAE (IMIENS 2017).

Compliance with ethical standards

Conflict of Interest

The authors declare that they have no conflict of interest.


  1. Arnold, A., & Cohen, W. (2009). Information extraction as link prediction: Using curated citation networks to improve gene detection. In B. Liu, A. Bestavros, D.Z.Du, & J. Wang (Eds.), Wireless algorithms, systems, and applications. Lecture Notes in Computer Science (Vol. 5682, pp. 541–550). Berlin Heidelberg: Springer.Google Scholar
  2. Baez, M., Mirylenka, D., & Parra, C. (2011). Understanding and supporting search for scholarly knowledge. In 7th European computer science summit, Milano, Italy (pp. 1–8).Google Scholar
  3. Beel, J., Gipp, B., Langer, S., & Breitinger, C. (2016). Research-paper recommender systems: A literature survey. International Journal on Digital Libraries, 17(4), 305–338.Google Scholar
  4. Bird, S., Dale, R., Dorr, B. J., Gibson, B. R., Joseph, M., Kan, M. Y., et al. (2008). The ACL anthology reference corpus: A reference dataset for bibliographic research in computational linguistics. In European Language Resources Association (LREC) (pp. 1755–1759).Google Scholar
  5. Castells, P., Vargas, S., & Wang, J. (2011). Novelty and diversity metrics for recommender systems: Choice, discovery and relevance. In International workshop on diversity in document retrieval (DDR 2011) at the 33rd European conference on information retrieval (ECIR 2011), Dublin, Ireland. Accessed 10 May 2019.
  6. Ding, Y., Yan, E., Frazho, A. R., & Caverlee, J. (2009). Pagerank for ranking authors in co-citation networks. JASIST, 60(11), 2229–2243.Google Scholar
  7. Eto, M. (2016). Rough co-citation as a measure of relationship to expand co-citation networks for scientific paper searches. Proceedings of the Association for Information Science and Technology, 53, 1–4.Google Scholar
  8. Feld, S. L. (1991). Why your friends have more friends than you do. American Journal of Sociology, 96(6), 1464–1477.Google Scholar
  9. Ge, M., Delgado-Battenfeld, C., & Jannach, D. (2010). Beyond accuracy: Evaluating recommender systems by coverage and serendipity. In Proceedings of the fourth ACM conference on recommender systems (RecSys ’10) (pp. 257–260). ACMGoogle Scholar
  10. Gipp, B., & Beel, J. (2009). Citation proximity analysis (CPA)—A new approach for identifying related work based on co-citation analysis. In B. Larsen & J. Leta (Eds.), Proceedings of the 12th international conference on scientometrics and informetrics (ISSI’09), international society for scientometrics and informetrics, Rio de Janeiro, Brazil (Vol. 2, pp 571–575). iSSN:2175-1935.Google Scholar
  11. Goldberg, D., Nichols, D., Oki, B. M., & Terry, D. (1992). Using collaborative filtering to weave an information tapestry. Communications of the ACM, 35(12), 61–70.Google Scholar
  12. Harispe, S., Ranwez, S., Janaqi, S., & Montmain, J. (2015). Semantic similarity from natural language and ontology analysis. Synthesis Lectures on Human Language Technologies, 8(1), 1–254.Google Scholar
  13. He, Q., Pei, J., Kifer, D., Mitra, P., & Giles, L. (2010). Context-aware citation recommendation. In Proceedings of the 19th international conference on world wide web ( WWW’10) (pp. 421–430). New York, NY: ACM.Google Scholar
  14. Jurgens, D., & Stevens, K. (2010). The s-space package: An open source package for word space models. In Proceedings of the ACL 2010 system demonstrations (ACLDemos ’10), Association for Computational Linguistics, Stroudsburg, PA, USA (pp. 30–35).Google Scholar
  15. Kim, H. J., Jeong, Y. K., & Song, M. (2016). Content- and proximity-based author co-citation analysis using citation sentences. Journal of Informetrics, 10(4), 954–966.Google Scholar
  16. Kotkov, D., Wang, S., & Veijalainen, J. (2016). A survey of serendipity in recommender systems. Knowledge-Based Systems, 111(C), 180–192.Google Scholar
  17. Lao, N., & Cohen, W. W. (2010). Relational retrieval using a combination of path-constrained random walks. Machine Learning, 81(1), 53–67.MathSciNetGoogle Scholar
  18. Liang, Y., Li, Q., & Qian, T. (2011). Finding relevant papers based on citation relations. In Proceedings of the 12th international conference on web-age information management (WAIM’11) (pp. 403–414). Berlin: Springer.Google Scholar
  19. Lops, P., de Gemmis, M., & Semeraro, G. (2011). Content-based recommender systems: State of the art and trends. In F. Ricci, L. Rokach, B. Shapira, & P. B. Kantor (Eds.), Recommender systems handbook (pp. 73–105). New York: Springer.Google Scholar
  20. Martinez-Romo, J., Araujo, L., Borge-Holthoefer, J., Arenas, A., Capitán, J. A., & Cuesta, J. A. (2011). Disentangling categorical relationships through a graph of co-occurrences. Physical Review E, 84, 046108. Scholar
  21. Mustafee, N., Dwivedi, Y. K., Bell, D., & Williams, M. D. (2010). A methodology for profiling literature using co-citation analysis. In Sustainable IT collaboration around the globe. 16th Americas conference on information systems (AMCIS 2010), August 12–15, 2010, Lima, Peru (p. 359).Google Scholar
  22. Pedersen, T., Pakhomov, S. V., Patwardhan, S., & Chute, C. G. (2007). Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics, 40(3), 288–299.Google Scholar
  23. Pohl, S., Radlinski, F., & Joachims, T. (2007). Recommending related papers based on digital library access records. In Proceedings of the 7th ACM/IEEE-CS joint conference on digital libraries (JCDL ’07) (pp. 417–418). ACM.Google Scholar
  24. Radev, D., Muthukrishnan, P., Qazvinian, V., & Abu-Jbara, A. (2013). The ACL anthology network corpus. Language Resources and Evaluation, 1–26.
  25. Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., & Riedl, J. (1994). Grouplens: An open architecture for collaborative filtering of netnews. In Proceedings of the 1994 ACM conference on computer supported cooperative work (CSCW ’94) (pp. 175–186). New York, NY: ACM.Google Scholar
  26. Resnik, P. (1999). Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research, 11, 95–130.zbMATHGoogle Scholar
  27. Small, H. (1973). Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24(4), 265–269.Google Scholar
  28. Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., & Su, Z. (2008). Arnetminer: Extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’08) (pp 990–998). New York, NY: ACM.Google Scholar
  29. Tejeda-Lorente, A., Porcel, C., Bernabé-Moreno, J., & Herrera-Viedma, E. (2015). Refore: A recommender system for researchers based on bibliometrics. Applied Soft Computing, 30, 778–791.Google Scholar
  30. White, H. D., & McCain, K. W. (1998). Visualizing a discipline: An author co-citation analysis of information science, 1972–1995. Journal of the American Society for Information Science, 49(4), 327–355.Google Scholar
  31. Zhou, D., Zhu, S., Yu, K., Song, X., Tseng, B.L., Zha, H., & Giles, C.L. (2008). Learning multiple graphs for document recommendations. In Proceedings of the 17th International Conference on World Wide Web (WWW ’08) ( pp 141–150). New York, NY: ACM.Google Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2019

Authors and Affiliations

  1. 1.Computational Reflection Research-GroupUniversidad de OviedoOviedoSpain
  2. 2.Natural Language Processing and Information Retrieval GroupUniversidad Nacional de Educación a Distancia (UNED)MadridSpain
  3. 3.IMIENS: Instituto Mixto de InvestigaciónEscuela Nacional de SanidadMadridSpain

Personalised recommendations