Skip to main content
Log in

Discovering related scientific literature beyond semantic similarity: a new co-citation approach

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

We propose a new approach to recommend scientific literature, a domain in which the efficient organization and search of information is crucial. The proposed system relies on the hypothesis that two scientific articles are semantically related if they are co-cited more frequently than they would be by pure chance. This relationship can be quantified by the probability of co-citation, obtained from a null model that statistically defines what we consider pure chance. Looking for article pairs that minimize this probability, the system is able to recommend a ranking of articles in response to a given article. This system is included in the co-occurrence paradigm of the field. More specifically, it is based on co-cites so it can produce recommendations more focused on relatedness than on similarity. Evaluation has been performed on the ACL Anthology collection and on the DBLP dataset, and a new corpus has been compiled to evaluate the capacity of the proposal to find relationships beyond similarity. Results show that the system is able to provide, not only articles similar to the submitted one, but also articles presenting other kind of relations, thus providing diversity, i.e. connections to new topics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. These proposals use the relationships to contruct a graph and then they apply algorithms for graphs, such as clustering or page rank.

  2. The corpus has been annotated by the authors, that have a long experience in working with research papers.

References

  • Arnold, A., & Cohen, W. (2009). Information extraction as link prediction: Using curated citation networks to improve gene detection. In B. Liu, A. Bestavros, D.Z.Du, & J. Wang (Eds.), Wireless algorithms, systems, and applications. Lecture Notes in Computer Science (Vol. 5682, pp. 541–550). Berlin Heidelberg: Springer.

  • Baez, M., Mirylenka, D., & Parra, C. (2011). Understanding and supporting search for scholarly knowledge. In 7th European computer science summit, Milano, Italy (pp. 1–8).

  • Beel, J., Gipp, B., Langer, S., & Breitinger, C. (2016). Research-paper recommender systems: A literature survey. International Journal on Digital Libraries, 17(4), 305–338.

    Article  Google Scholar 

  • Bird, S., Dale, R., Dorr, B. J., Gibson, B. R., Joseph, M., Kan, M. Y., et al. (2008). The ACL anthology reference corpus: A reference dataset for bibliographic research in computational linguistics. In European Language Resources Association (LREC) (pp. 1755–1759).

  • Castells, P., Vargas, S., & Wang, J. (2011). Novelty and diversity metrics for recommender systems: Choice, discovery and relevance. In International workshop on diversity in document retrieval (DDR 2011) at the 33rd European conference on information retrieval (ECIR 2011), Dublin, Ireland. http://ir.ii.uam.es/rim3/publications/ddr11.pdf. Accessed 10 May 2019.

  • Ding, Y., Yan, E., Frazho, A. R., & Caverlee, J. (2009). Pagerank for ranking authors in co-citation networks. JASIST, 60(11), 2229–2243.

    Article  Google Scholar 

  • Eto, M. (2016). Rough co-citation as a measure of relationship to expand co-citation networks for scientific paper searches. Proceedings of the Association for Information Science and Technology, 53, 1–4.

    Article  Google Scholar 

  • Feld, S. L. (1991). Why your friends have more friends than you do. American Journal of Sociology, 96(6), 1464–1477.

    Article  Google Scholar 

  • Ge, M., Delgado-Battenfeld, C., & Jannach, D. (2010). Beyond accuracy: Evaluating recommender systems by coverage and serendipity. In Proceedings of the fourth ACM conference on recommender systems (RecSys ’10) (pp. 257–260). ACM

  • Gipp, B., & Beel, J. (2009). Citation proximity analysis (CPA)—A new approach for identifying related work based on co-citation analysis. In B. Larsen & J. Leta (Eds.), Proceedings of the 12th international conference on scientometrics and informetrics (ISSI’09), international society for scientometrics and informetrics, Rio de Janeiro, Brazil (Vol. 2, pp 571–575). iSSN:2175-1935.

  • Goldberg, D., Nichols, D., Oki, B. M., & Terry, D. (1992). Using collaborative filtering to weave an information tapestry. Communications of the ACM, 35(12), 61–70.

    Article  Google Scholar 

  • Harispe, S., Ranwez, S., Janaqi, S., & Montmain, J. (2015). Semantic similarity from natural language and ontology analysis. Synthesis Lectures on Human Language Technologies, 8(1), 1–254.

    Article  Google Scholar 

  • He, Q., Pei, J., Kifer, D., Mitra, P., & Giles, L. (2010). Context-aware citation recommendation. In Proceedings of the 19th international conference on world wide web ( WWW’10) (pp. 421–430). New York, NY: ACM.

  • Jurgens, D., & Stevens, K. (2010). The s-space package: An open source package for word space models. In Proceedings of the ACL 2010 system demonstrations (ACLDemos ’10), Association for Computational Linguistics, Stroudsburg, PA, USA (pp. 30–35).

  • Kim, H. J., Jeong, Y. K., & Song, M. (2016). Content- and proximity-based author co-citation analysis using citation sentences. Journal of Informetrics, 10(4), 954–966.

    Article  Google Scholar 

  • Kotkov, D., Wang, S., & Veijalainen, J. (2016). A survey of serendipity in recommender systems. Knowledge-Based Systems, 111(C), 180–192.

    Article  Google Scholar 

  • Lao, N., & Cohen, W. W. (2010). Relational retrieval using a combination of path-constrained random walks. Machine Learning, 81(1), 53–67.

    Article  MathSciNet  Google Scholar 

  • Liang, Y., Li, Q., & Qian, T. (2011). Finding relevant papers based on citation relations. In Proceedings of the 12th international conference on web-age information management (WAIM’11) (pp. 403–414). Berlin: Springer.

  • Lops, P., de Gemmis, M., & Semeraro, G. (2011). Content-based recommender systems: State of the art and trends. In F. Ricci, L. Rokach, B. Shapira, & P. B. Kantor (Eds.), Recommender systems handbook (pp. 73–105). New York: Springer.

    Chapter  Google Scholar 

  • Martinez-Romo, J., Araujo, L., Borge-Holthoefer, J., Arenas, A., Capitán, J. A., & Cuesta, J. A. (2011). Disentangling categorical relationships through a graph of co-occurrences. Physical Review E, 84, 046108. https://doi.org/10.1103/PhysRevE.84.046108.

    Article  Google Scholar 

  • Mustafee, N., Dwivedi, Y. K., Bell, D., & Williams, M. D. (2010). A methodology for profiling literature using co-citation analysis. In Sustainable IT collaboration around the globe. 16th Americas conference on information systems (AMCIS 2010), August 12–15, 2010, Lima, Peru (p. 359).

  • Pedersen, T., Pakhomov, S. V., Patwardhan, S., & Chute, C. G. (2007). Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics, 40(3), 288–299.

    Article  Google Scholar 

  • Pohl, S., Radlinski, F., & Joachims, T. (2007). Recommending related papers based on digital library access records. In Proceedings of the 7th ACM/IEEE-CS joint conference on digital libraries (JCDL ’07) (pp. 417–418). ACM.

  • Radev, D., Muthukrishnan, P., Qazvinian, V., & Abu-Jbara, A. (2013). The ACL anthology network corpus. Language Resources and Evaluation, 1–26. https://doi.org/10.1007/s10579-012-9211-2.

  • Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., & Riedl, J. (1994). Grouplens: An open architecture for collaborative filtering of netnews. In Proceedings of the 1994 ACM conference on computer supported cooperative work (CSCW ’94) (pp. 175–186). New York, NY: ACM.

  • Resnik, P. (1999). Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research, 11, 95–130.

    Article  MATH  Google Scholar 

  • Small, H. (1973). Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24(4), 265–269.

    Article  Google Scholar 

  • Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., & Su, Z. (2008). Arnetminer: Extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’08) (pp 990–998). New York, NY: ACM.

  • Tejeda-Lorente, A., Porcel, C., Bernabé-Moreno, J., & Herrera-Viedma, E. (2015). Refore: A recommender system for researchers based on bibliometrics. Applied Soft Computing, 30, 778–791.

    Article  Google Scholar 

  • White, H. D., & McCain, K. W. (1998). Visualizing a discipline: An author co-citation analysis of information science, 1972–1995. Journal of the American Society for Information Science, 49(4), 327–355.

    Google Scholar 

  • Zhou, D., Zhu, S., Yu, K., Song, X., Tseng, B.L., Zha, H., & Giles, C.L. (2008). Learning multiple graphs for document recommendations. In Proceedings of the 17th International Conference on World Wide Web (WWW ’08) ( pp 141–150). New York, NY: ACM.

Download references

Acknowledgements

This work has been partially supported by the Spanish Ministry of Science and Innovation within the projects PROSA-MED (TIN2016-77820-C3-2-R) and EXTRAE (IMIENS 2017).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lourdes Araujo.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rodriguez-Prieto, O., Araujo, L. & Martinez-Romo, J. Discovering related scientific literature beyond semantic similarity: a new co-citation approach. Scientometrics 120, 105–127 (2019). https://doi.org/10.1007/s11192-019-03125-9

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-019-03125-9

Keywords

Navigation