Related records retrieval and pennant retrieval: an exploratory case study

Abstract

The Related Records feature in the Web of Science retrieves records that share at least one item in their reference lists with the references of a seed record. This search method, known as bibliographic coupling, does not always yield topically relevant results. Our exploratory case study asks: How do retrievals of the type used in pennant diagrams compare with retrievals through Related Records? Pennants are two-dimensional visualizations of documents co-cited with a seed paper. In them, the well-known tf*idf (term frequency*inverse document frequency) formula is used to weight the co-citation counts. The weights have psychological interpretations from relevance theory; given the seed, tf predicts a co-cited document’s cognitive effects on the user, and idf predicts the user’s relative ease in relating its title to the seed’s title. We chose two seed papers from information science, one with only two references and the other with 20, and used them to retrieve 50 documents per method in WoS for each of our two seeds. We illustrate with pennant diagrams. Pennant retrieval indeed produced more relevant documents, especially for the paper with only two references, and it produced mostly different ones. Related Records performed almost as well on the paper with the longer reference list, improving remarkably as the coupling units between the seed and other papers increased. We argue that relevance rankings based on co-citation, with pennant-style weighting as an option, would be a desirable addition to WoS and similar databases.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Notes

  1. 1.

    Macros are available at: http://www.mugeakbulut.com/YL_Tez/veriler_makrolar/makrolar/.

  2. 2.

    We needed only PY and CR from the 60 field tags in the WoS standard file to calculate the frequencies. For all tags and their definitions, see Clarivate Analytics (2018a).

  3. 3.

    For those unfamiliar with the convention, nonsensical characters are an old-fashioned way of comically indicating an unprintable curse-word.

References

  1. Ahmad, S., & Afzal, M. T. (2017). Combining co-citation and metadata for recommending more related papers. In 15th International Conference on Frontiers of Information Technology (FIT) (pp. 218–222). Islamabad, Pakistan: IEEE.

  2. Akbulut, M. (2016a). Atıf klasiklerinin etkisinin ve ilgililik sıralamalarının pennant diyagramları ile analizi [The analysis of the impact of citation classics and relevance rankings using pennant diagrams]. Yayımlanmamış yüksek lisans tezi, Hacettepe Üniversitesi, Ankara [Unpublished master’s thesis, Hacettepe University, Ankara]. Retrieved June 22, 2019, from https://www.mugeakbulut.com/yayinlar/Muge_Akbulut_YL_Tez.pdf

  3. Akbulut, M. (2016b). Extended abstract: The analysis of the impact of citation classics and relevance rankings using pennant diagrams. Retrieved June 22, 2019, from https://www.mugeakbulut.com/yayinlar/tez_extended_abstract.pdf.

  4. Åström, F. (2007). Changes in the LIS research front: Time-sliced cocitation analyses of LIS journal articles, 1990–2004. Journal of the American Society for Information Science and Technology,58(7), 947–957.

    Article  Google Scholar 

  5. Beel, J., Gipp, B., Langer, S., & Breitinger, C. (2016). Research-paper recommender systems: A literature survey. International Journal on Digital Libraries,17(4), 305–338.

    Article  Google Scholar 

  6. Belter, C. W. (2017). A relevance ranking method for citation-based search results. Scientometrics,112(2), 731–746.

    Article  Google Scholar 

  7. Bensman, S. J. (2013). Eugene Garfield, Francis Narin, and PageRank: The theoretical bases of the Google Search Engine. https://arxiv.org/pdf/1312.3872.pdf.

  8. Bichteler, J., & Eaton, E. A., III. (1980). The combined use of bibliographic coupling and cocitation for document retrieval. Journal of the American Society for Information Science,31(4), 278–282.

    Article  Google Scholar 

  9. Bitirim, Y., Tonta, Y., & Sever, H. (2002). Information retrieval effectiveness of Turkish search engines. Lecture Notes in Computer Science,2457, 93–103.

    MATH  Article  Google Scholar 

  10. Bollmann, P. (1983). The normalized recall and related measures. In Proceedings of the 6th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ‘83) (pp. 122–128). New York: ACM Press. https://doi.org/10.1145/511793.511811.

  11. Borlund, P., & Ingwersen, P. (1997). The development of a method for the evaluation of interactive information retrieval systems. Journal of Documentation,53(3), 225–250.

    Article  Google Scholar 

  12. Carevic, Z., & Mayr, P. (2014). Recommender systems using pennant diagrams in digital libraries. (arXiv:1407.7276). NKOS Workshop London. https://arxiv.org/pdf/1407.7276.pdf.

  13. Carevic, Z., & Schaer, P. (2014). On the connection between citation-based and topical relevance ranking: Results of a pretest using iSearch. In Proceedings of the First Workshop on Bibliometric-enhanced Information Retrieval (pp. 37–44). Amsterdam, The Netherlands. Retrieved June 5, 2019, from https://ceur-ws.org/Vol-1143/paper5.pdf.

  14. Clarivate Analytics (2017a). Related Records. Retrieved June 5, 2019, from https://images.webofknowledge.com/images/help/WOS/hp_related_records.html.

  15. Clarivate Analytics (2017b). Research Areas (Categories/Classification). Retrieved June 5, 2019, from https://images.webofknowledge.com/images/help/WOS/hp_research_areas_easca.html.

  16. Clarivate Analytics (2018a). Advanced Search Examples. Retrieved June 5, 2019, from https://images.webofknowledge.com/images/help/WOS/hp_advanced_examples.html.

  17. Clarivate Analytics (2018b). Research Area Schemes. Retrieved June 5, 2019, from https://help.incites.clarivate.com/inCites2Live/filterValuesGroup/researchAreaSchema.html.

  18. Clarivate Analytics (2019). Web of Science platform: Web of science: Summary of coverage. Retrieved June 5, 2019, from https://clarivate.libguides.com/webofscienceplatform/coverage.

  19. Clough, P., & Sanderson, M. (2013). Evaluating the performance of information retrieval systems using test collections. Information Research, 18(2) paper 582. Retrieved May 2, 2019, from https://InformationR.net/ir/18-2/paper582.html.

  20. Colavizza, G., Boyack, K. W., Van Eck, N. J., & Waltman, L. (2018). The closer the better: Similarity of publication pairs at different cocitation levels. Journal of the Association for Information Science & Technology,69(4), 600–609.

    Article  Google Scholar 

  21. Cooper, W. S. (1988). Getting beyond Boole. Information Processing and Management,24(3), 243–248.

    Article  Google Scholar 

  22. Cooper, W. S., & Maron, M. E. (1978). Foundations of probabilistic and utility-theoretic indexing. Journal of the ACM,25(1), 67–80.

    MathSciNet  MATH  Article  Google Scholar 

  23. Croft, W. B., & Harper, D. J. (1979). Using probabilistic models of document retrieval without relevance information. Journal of Documentation,35(4), 285–295.

    Article  Google Scholar 

  24. Eto, M. (2013). Evaluations of context-based co-citation searching. Scientometrics,94, 651–673.

    Article  Google Scholar 

  25. Fuhr, N. (1989). Models for retrieval with probabilistic indexing. Information Processing and Management,22(1), 55–72.

    Article  Google Scholar 

  26. Garfield, E. (2001). From bibliographic coupling to co-citation analysis via algorithmic historio-bibliography. A citationist’s tribute to Belver C. Griffith. Paper presented at Drexel University, Philadelphia, PA. Retrieved June 5, 2019, from https://garfield.library.upenn.edu/papers/drexelbelvergriffith92001.pdf.

  27. Giles, C. L., Bollacker, K. D., & Lawrence, S. (1998). CiteSeer: An automatic citation indexing system. In I. Witten et al. (Ed.), Digital LibrariesThird ACM Conference on Digital Libraries (pp. 89–98). New York: ACM Press. Retrieved May 2, 2019, from https://clgiles.ist.psu.edu/papers/DL-1998-citeseer.pdf.

  28. Haruna, K., Ismail, M. A., Bichi, A. B., Chang, V., Wibawa, S., & Herawan, T. (2018). A citation-based recommender system for scholarly paper recommendation. In O. Gervasi et al. (Eds.). International Conference on Computational Science and Its Applications, ICCSA 2018, LNCS 10960 pp 514–525).

  29. He, Q., Pei, J., Kifer, D., Mitra, P., & Giles, L. (2010). Context-aware citation recommendation. In Proceedings of the 19th International Conference on World Wide Web (Raleigh, North Carolina, USA, April 26 - 30, 2010. WWW’10. ACM, New York, NY, pp. 421–430. Retrieved June 5, 2019, from https://www.cse.psu.edu/~duk17/papers/citationrecommendation.pdf.

  30. Hiemstra, D. (2000). A probabilistic justification for using tf × idf term weighting in information retrieval. International Journal on Digital Libraries,3(2), 131–139.

    Article  Google Scholar 

  31. Horsley, T., Dingwall, O., & Sampson, M. (2011). Checking reference lists to find additional studies for systematic reviews. Cochrane Database Systems Review. https://doi.org/10.1002/14651858.MR000026.pub2.

    Article  Google Scholar 

  32. Huang, S., Xue, G-R., Zhang, B-Y., Chen, Z., Yu, Y., & Ma, W-Y. (2004). TSSP: A reinforcement algorithm to find related papers. In: WI 2004, Washington, DC, USA (pp. 117–123). IEEE Computer Society, Los Alamitos. Retrieved June 5, 2019, from https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1410792&tag=1.

  33. Kessler, M. M. (1963). Bibliographic coupling between scientific papers. American Documentation,14(1), 10–25.

    Article  Google Scholar 

  34. Kraft, D. H. (1985). Advances in information retrieval: Where is that /#*&@¢ record? In M. C. Yovits (Ed.), Advances in Computers (pp. 277–318). New York: Academic Press.

    Google Scholar 

  35. Liang, Y., Li, Q., & Qian, T. (2011). Finding relevant papers based on citation relations. In Proceedings of the International Conference on Web-Age Information Management (pp. 403–414). Berlin: Springer.

  36. Lin, J., & Wilbur, W. J. (2007). PubMed related articles: A probabilistic topic-based model for content similarity. BMC Bioinformatics,8, 423. https://doi.org/10.1186/1471-2105-8-423.

    Article  Google Scholar 

  37. Manning, C., & Schütze, H. (2000). Foundations of statistical natural language processing (2nd edn). Cambridge: MIT Press. Retrieved June 22, 2019, from https://ics.upjs.sk/~pero/web/documents/pillar/Manning_Schuetze_StatisticalNLP.pdf.

  38. Maron, M. E. (1977). On indexing, retrieval and the meaning of about. Journal of the American Society for Information Science,28(1), 38–43.

    Article  Google Scholar 

  39. Maron, M. E. (1988). Probabilistic design principles for conventional and full-text retrieval systems. Information Processing and Management,24(3), 249–255.

    Article  Google Scholar 

  40. Maron, M. E. (2008). An historical note on the origins of probabilistic indexing. Information Processing and Management,44(2), 971–972.

    Article  Google Scholar 

  41. Maron, M. E., & Kuhns, J. L. (1960). On relevance, probabilistic indexing and information retrieval. Journal of the ACM,7(3), 216–244.

    Article  Google Scholar 

  42. Peterson, G., & Graves, R. S. (2009). How similar is similar? An evaluation of “related records” applications among health literature portals. Proceedings of the Association for Information Science & Technology,46(1), 1–3.

    Google Scholar 

  43. Prevedelli, D., Simonini, R., & Ansaloni, I. (2001). Relationship of non-specific commensalism in the colonization of the deep layers of sediment. Journal of the Marine Biological Association of the United Kingdom,81(6), 897–901. https://doi.org/10.1017/S0025315401004817.

    Article  Google Scholar 

  44. Robertson, S. E. (1977). The probability ranking principle in IR. Journal of Documentation, 33(4), 294–304. Retrieved June 22, 2019, from https://parnec.nuaa.edu.cn/xtan/IIR/readings/jdRobertson1977.pdf.

    Article  Google Scholar 

  45. Robertson, S. E., Maron, M. E., & Cooper, W. S. (1982). Probability of relevance: A unification of two competing models for document retrieval. Information Technology: Research and Development,1(1), 1–21.

    Google Scholar 

  46. Robertson, S. E., & Sparck Jones, K. (1976). Relevance weighting of search terms. Journal of the American Society for Information Science,27(3), 129–146.

    Article  Google Scholar 

  47. Salton, G., & Yang, C. S. (1973). On the specification of term values in automatic indexing. Journal of Documentation,29(4), 351–372.

    Article  Google Scholar 

  48. Scopus. (2018). References and related documents. Retrieved April 25, 2019. https://service.elsevier.com/app/answers/detail/a_id/14190/supporthub/scopus/.

  49. Shannon, C. E., & Weaver, W. (1949). The mathematical theory of communication. Urbana, Ill: University of Illinois Press.

    Google Scholar 

  50. Shen, S., Zhu, D., Rousseau, R., Su, X., & Wang, D. (2019). A refined method for computing bibliographic coupling strengths. Journal of Informetrics,13, 605–615.

    Article  Google Scholar 

  51. Small, H. (1973). Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science,24(4), 265–269.

    MathSciNet  Article  Google Scholar 

  52. Smith, C. H., Georges, P., & Nguyen, N. (2015). Statistical tests for ‘related records’ search results. Scientometrics,105(3), 1665–1677.

    Article  Google Scholar 

  53. Soll, J. B., Milkman, K. L., & Payne, J. W. (2015). Outsmart your own biases. Harvard Business Review,93(5), 64–71.

    Google Scholar 

  54. Sparck Jones, K. (1972). A statistical interpretation of term specificity and its application to retrieval. Journal of Documentation,28(1), 11–21.

    Article  Google Scholar 

  55. Sperber, D., & Wilson, D. (1995). Relevance: Communication and cognition (2d ed.). Oxford: Blackwell.

    Google Scholar 

  56. Sugimoto, C. R., & Larivière, V. (2018). Measuring research: What everyone needs to know. New York: Oxford University Press.

    Google Scholar 

  57. Swanson, D. R. (1986). Subjective versus objective relevance in bibliographic retrieval systems. The Library Quarterly,56(4), 389–398.

    Article  Google Scholar 

  58. Thompson, P. (1990a). A combination of expert opinion approach to probabilistic information retrieval, Part 1: The conceptual model. Information Processing and Management,26(3), 371–382.

    Article  Google Scholar 

  59. Thompson, P. (1990b). A combination of expert opinion approach to probabilistic information retrieval, Part 2: Mathematical treatment of CEO model 3. Information Processing and Management,26(3), 383–394.

    Article  Google Scholar 

  60. Thompson, P. (2008). Looking back: On relevance, probabilistic indexing and information retrieval. Information Processing and Management,44(2), 963–970.

    MathSciNet  Article  Google Scholar 

  61. Tonta, Y., & Özkan Çelik, A. E. (2013). Cahit Arf: Exploring his scientific influence using social network analysis, author co-citation maps and single publication h index. Journal of Scientometric Research,2(1), 37–51.

    Article  Google Scholar 

  62. Wesley-Smith, I., Bergstrom, C. T., & West, J. D. (2016). Static ranking of scholarly papers using article-level eigenfactor (ALEF). In The 9th ACM International Conference on Web Search and Data Mining (WSDM). February 22–25, 2016, San Francisco, CA, USA. Retrieved May 2, 2019, from https://octavia.zoology.washington.edu/publications/WesleySmithEtAl16.pdf.

  63. White, H. D. (2007a). Combining bibliometrics, information retrieval, and relevance theory. Part 1: First examples of a synthesis. Journal of the American Society for Information Science and Technology,58(4), 536–559.

    Article  Google Scholar 

  64. White, H. D. (2007b). Combining bibliometrics, information retrieval, and relevance theory. Part 2: Some implications for information science. Journal of the American Society for Information Science and Technology,58(4), 583–605.

    Article  Google Scholar 

  65. White, H. D. (2009). Pennants for Strindberg and Persson. Celebrating scholarly communication studies: A festschrift for Olle Persson at his 60th birthday. ISSI Newsletter, (Vol. 5-S, pp. 71–83). Retrieved May 2, 2019, from https://portal.research.lu.se/portal/files/5902071/1458992.pdf.

  66. White, H. D. (2010). Some new tests of relevance theory in information science. Scientometrics,83(3), 653–667.

    Article  Google Scholar 

  67. White, H. D. (2015). Co-cited author retrieval and relevance theory: Examples from the humanities. Scientometrics,102(3), 2275–2299.

    Article  Google Scholar 

  68. White, H. D. (2018a). Pennants for Garfield: Bibliometrics and document retrieval. Scientometrics,114(2), 757–778.

    Article  Google Scholar 

  69. White, H. D. (2018b). Bag of works retrieval: TF*IDF weighting of works co-cited with a seed. International Journal of Digital Libraries,19(2–3), 139–149.

    Article  Google Scholar 

  70. White, H. D., & Mayr, P. (2013). Pennants for descriptors. In NKOS Workshop 2013. Valletta, Malta. https://arxiv.org/abs/1310.3808.

  71. Wilson, D., & Sperber, D. (2002). Relevance theory. In G. Ward & L. Horn (Eds.), Handbook of pragmatics. Oxford: Blackwell. Retrieved May 2, 2019, from https://www.dan.sperber.fr/?p=93.

  72. Yao, Y. (1995). Measuring retrieval effectiveness based on user preference of documents. Journal of the American Society for Information Science,46(2), 133–145.

    MathSciNet  Article  Google Scholar 

  73. Yule, G. U. (1912). On the methods of measuring association between two attributes. Journal of the Royal Statistical Society,75(6), 579–652.

    Article  Google Scholar 

  74. Zarrinkalam, F., & Kahani, M. (2012). A new metric for measuring relatedness of scientific papers based on non-textual features. Intelligent Information Management, 4(4), 99–107. Retrieved May 2, 2019, from https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.459.6934&rep=rep1&type=pdf.

Download references

Author information

Affiliations

Authors

Contributions

Akbulut and Tonta: Conceptualization, methodology, software, formal analysis, and visualizations; original and revised drafts. White: Review, editing and revisions, final draft.

Corresponding author

Correspondence to Müge Akbulut.

Appendices

Appendix 1

See Table 5.

Table 5 Top 50 (a) Related Records and (b) pennant retrievals for Maron and Kuhns*

Appendix 2

See Table 6.

Table 6 Top 50 (a) Related Records and (b) pennant retrievals for Cooper*

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Akbulut, M., Tonta, Y. & White, H.D. Related records retrieval and pennant retrieval: an exploratory case study. Scientometrics 122, 957–987 (2020). https://doi.org/10.1007/s11192-019-03303-9

Download citation

Keywords

  • Bibliographic coupling
  • Co-citation analysis
  • Relevance theory
  • tf * idf
  • Web of Science