The Related Records feature in the Web of Science retrieves records that share at least one item in their reference lists with the references of a seed record. This search method, known as bibliographic coupling, does not always yield topically relevant results. Our exploratory case study asks: How do retrievals of the type used in pennant diagrams compare with retrievals through Related Records? Pennants are two-dimensional visualizations of documents co-cited with a seed paper. In them, the well-known tf*idf (term frequency*inverse document frequency) formula is used to weight the co-citation counts. The weights have psychological interpretations from relevance theory; given the seed, tf predicts a co-cited document’s cognitive effects on the user, and idf predicts the user’s relative ease in relating its title to the seed’s title. We chose two seed papers from information science, one with only two references and the other with 20, and used them to retrieve 50 documents per method in WoS for each of our two seeds. We illustrate with pennant diagrams. Pennant retrieval indeed produced more relevant documents, especially for the paper with only two references, and it produced mostly different ones. Related Records performed almost as well on the paper with the longer reference list, improving remarkably as the coupling units between the seed and other papers increased. We argue that relevance rankings based on co-citation, with pennant-style weighting as an option, would be a desirable addition to WoS and similar databases.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Macros are available at: http://www.mugeakbulut.com/YL_Tez/veriler_makrolar/makrolar/.
We needed only PY and CR from the 60 field tags in the WoS standard file to calculate the frequencies. For all tags and their definitions, see Clarivate Analytics (2018a).
For those unfamiliar with the convention, nonsensical characters are an old-fashioned way of comically indicating an unprintable curse-word.
Ahmad, S., & Afzal, M. T. (2017). Combining co-citation and metadata for recommending more related papers. In 15th International Conference on Frontiers of Information Technology (FIT) (pp. 218–222). Islamabad, Pakistan: IEEE.
Akbulut, M. (2016a). Atıf klasiklerinin etkisinin ve ilgililik sıralamalarının pennant diyagramları ile analizi [The analysis of the impact of citation classics and relevance rankings using pennant diagrams]. Yayımlanmamış yüksek lisans tezi, Hacettepe Üniversitesi, Ankara [Unpublished master’s thesis, Hacettepe University, Ankara]. Retrieved June 22, 2019, from https://www.mugeakbulut.com/yayinlar/Muge_Akbulut_YL_Tez.pdf.
Akbulut, M. (2016b). Extended abstract: The analysis of the impact of citation classics and relevance rankings using pennant diagrams. Retrieved June 22, 2019, from https://www.mugeakbulut.com/yayinlar/tez_extended_abstract.pdf.
Åström, F. (2007). Changes in the LIS research front: Time-sliced cocitation analyses of LIS journal articles, 1990–2004. Journal of the American Society for Information Science and Technology,58(7), 947–957.
Beel, J., Gipp, B., Langer, S., & Breitinger, C. (2016). Research-paper recommender systems: A literature survey. International Journal on Digital Libraries,17(4), 305–338.
Belter, C. W. (2017). A relevance ranking method for citation-based search results. Scientometrics,112(2), 731–746.
Bensman, S. J. (2013). Eugene Garfield, Francis Narin, and PageRank: The theoretical bases of the Google Search Engine. https://arxiv.org/pdf/1312.3872.pdf.
Bichteler, J., & Eaton, E. A., III. (1980). The combined use of bibliographic coupling and cocitation for document retrieval. Journal of the American Society for Information Science,31(4), 278–282.
Bitirim, Y., Tonta, Y., & Sever, H. (2002). Information retrieval effectiveness of Turkish search engines. Lecture Notes in Computer Science,2457, 93–103.
Bollmann, P. (1983). The normalized recall and related measures. In Proceedings of the 6th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ‘83) (pp. 122–128). New York: ACM Press. https://doi.org/10.1145/511793.511811.
Borlund, P., & Ingwersen, P. (1997). The development of a method for the evaluation of interactive information retrieval systems. Journal of Documentation,53(3), 225–250.
Carevic, Z., & Schaer, P. (2014). On the connection between citation-based and topical relevance ranking: Results of a pretest using iSearch. In Proceedings of the First Workshop on Bibliometric-enhanced Information Retrieval (pp. 37–44). Amsterdam, The Netherlands. Retrieved June 5, 2019, from https://ceur-ws.org/Vol-1143/paper5.pdf.
Clarivate Analytics (2017a). Related Records. Retrieved June 5, 2019, from https://images.webofknowledge.com/images/help/WOS/hp_related_records.html.
Clarivate Analytics (2017b). Research Areas (Categories/Classification). Retrieved June 5, 2019, from https://images.webofknowledge.com/images/help/WOS/hp_research_areas_easca.html.
Clarivate Analytics (2018a). Advanced Search Examples. Retrieved June 5, 2019, from https://images.webofknowledge.com/images/help/WOS/hp_advanced_examples.html.
Clarivate Analytics (2018b). Research Area Schemes. Retrieved June 5, 2019, from https://help.incites.clarivate.com/inCites2Live/filterValuesGroup/researchAreaSchema.html.
Clarivate Analytics (2019). Web of Science platform: Web of science: Summary of coverage. Retrieved June 5, 2019, from https://clarivate.libguides.com/webofscienceplatform/coverage.
Clough, P., & Sanderson, M. (2013). Evaluating the performance of information retrieval systems using test collections. Information Research, 18(2) paper 582. Retrieved May 2, 2019, from https://InformationR.net/ir/18-2/paper582.html.
Colavizza, G., Boyack, K. W., Van Eck, N. J., & Waltman, L. (2018). The closer the better: Similarity of publication pairs at different cocitation levels. Journal of the Association for Information Science & Technology,69(4), 600–609.
Cooper, W. S. (1988). Getting beyond Boole. Information Processing and Management,24(3), 243–248.
Cooper, W. S., & Maron, M. E. (1978). Foundations of probabilistic and utility-theoretic indexing. Journal of the ACM,25(1), 67–80.
Croft, W. B., & Harper, D. J. (1979). Using probabilistic models of document retrieval without relevance information. Journal of Documentation,35(4), 285–295.
Eto, M. (2013). Evaluations of context-based co-citation searching. Scientometrics,94, 651–673.
Fuhr, N. (1989). Models for retrieval with probabilistic indexing. Information Processing and Management,22(1), 55–72.
Garfield, E. (2001). From bibliographic coupling to co-citation analysis via algorithmic historio-bibliography. A citationist’s tribute to Belver C. Griffith. Paper presented at Drexel University, Philadelphia, PA. Retrieved June 5, 2019, from https://garfield.library.upenn.edu/papers/drexelbelvergriffith92001.pdf.
Giles, C. L., Bollacker, K. D., & Lawrence, S. (1998). CiteSeer: An automatic citation indexing system. In I. Witten et al. (Ed.), Digital Libraries – Third ACM Conference on Digital Libraries (pp. 89–98). New York: ACM Press. Retrieved May 2, 2019, from https://clgiles.ist.psu.edu/papers/DL-1998-citeseer.pdf.
Haruna, K., Ismail, M. A., Bichi, A. B., Chang, V., Wibawa, S., & Herawan, T. (2018). A citation-based recommender system for scholarly paper recommendation. In O. Gervasi et al. (Eds.). International Conference on Computational Science and Its Applications, ICCSA 2018, LNCS 10960 pp 514–525).
He, Q., Pei, J., Kifer, D., Mitra, P., & Giles, L. (2010). Context-aware citation recommendation. In Proceedings of the 19th International Conference on World Wide Web (Raleigh, North Carolina, USA, April 26 - 30, 2010. WWW’10. ACM, New York, NY, pp. 421–430. Retrieved June 5, 2019, from https://www.cse.psu.edu/~duk17/papers/citationrecommendation.pdf.
Hiemstra, D. (2000). A probabilistic justification for using tf × idf term weighting in information retrieval. International Journal on Digital Libraries,3(2), 131–139.
Horsley, T., Dingwall, O., & Sampson, M. (2011). Checking reference lists to find additional studies for systematic reviews. Cochrane Database Systems Review. https://doi.org/10.1002/14651858.MR000026.pub2.
Huang, S., Xue, G-R., Zhang, B-Y., Chen, Z., Yu, Y., & Ma, W-Y. (2004). TSSP: A reinforcement algorithm to find related papers. In: WI 2004, Washington, DC, USA (pp. 117–123). IEEE Computer Society, Los Alamitos. Retrieved June 5, 2019, from https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1410792&tag=1.
Kessler, M. M. (1963). Bibliographic coupling between scientific papers. American Documentation,14(1), 10–25.
Kraft, D. H. (1985). Advances in information retrieval: Where is that /#*&@¢ record? In M. C. Yovits (Ed.), Advances in Computers (pp. 277–318). New York: Academic Press.
Liang, Y., Li, Q., & Qian, T. (2011). Finding relevant papers based on citation relations. In Proceedings of the International Conference on Web-Age Information Management (pp. 403–414). Berlin: Springer.
Lin, J., & Wilbur, W. J. (2007). PubMed related articles: A probabilistic topic-based model for content similarity. BMC Bioinformatics,8, 423. https://doi.org/10.1186/1471-2105-8-423.
Manning, C., & Schütze, H. (2000). Foundations of statistical natural language processing (2nd edn). Cambridge: MIT Press. Retrieved June 22, 2019, from https://ics.upjs.sk/~pero/web/documents/pillar/Manning_Schuetze_StatisticalNLP.pdf.
Maron, M. E. (1977). On indexing, retrieval and the meaning of about. Journal of the American Society for Information Science,28(1), 38–43.
Maron, M. E. (1988). Probabilistic design principles for conventional and full-text retrieval systems. Information Processing and Management,24(3), 249–255.
Maron, M. E. (2008). An historical note on the origins of probabilistic indexing. Information Processing and Management,44(2), 971–972.
Maron, M. E., & Kuhns, J. L. (1960). On relevance, probabilistic indexing and information retrieval. Journal of the ACM,7(3), 216–244.
Peterson, G., & Graves, R. S. (2009). How similar is similar? An evaluation of “related records” applications among health literature portals. Proceedings of the Association for Information Science & Technology,46(1), 1–3.
Prevedelli, D., Simonini, R., & Ansaloni, I. (2001). Relationship of non-specific commensalism in the colonization of the deep layers of sediment. Journal of the Marine Biological Association of the United Kingdom,81(6), 897–901. https://doi.org/10.1017/S0025315401004817.
Robertson, S. E. (1977). The probability ranking principle in IR. Journal of Documentation, 33(4), 294–304. Retrieved June 22, 2019, from https://parnec.nuaa.edu.cn/xtan/IIR/readings/jdRobertson1977.pdf.
Robertson, S. E., Maron, M. E., & Cooper, W. S. (1982). Probability of relevance: A unification of two competing models for document retrieval. Information Technology: Research and Development,1(1), 1–21.
Robertson, S. E., & Sparck Jones, K. (1976). Relevance weighting of search terms. Journal of the American Society for Information Science,27(3), 129–146.
Salton, G., & Yang, C. S. (1973). On the specification of term values in automatic indexing. Journal of Documentation,29(4), 351–372.
Scopus. (2018). References and related documents. Retrieved April 25, 2019. https://service.elsevier.com/app/answers/detail/a_id/14190/supporthub/scopus/.
Shannon, C. E., & Weaver, W. (1949). The mathematical theory of communication. Urbana, Ill: University of Illinois Press.
Shen, S., Zhu, D., Rousseau, R., Su, X., & Wang, D. (2019). A refined method for computing bibliographic coupling strengths. Journal of Informetrics,13, 605–615.
Small, H. (1973). Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science,24(4), 265–269.
Smith, C. H., Georges, P., & Nguyen, N. (2015). Statistical tests for ‘related records’ search results. Scientometrics,105(3), 1665–1677.
Soll, J. B., Milkman, K. L., & Payne, J. W. (2015). Outsmart your own biases. Harvard Business Review,93(5), 64–71.
Sparck Jones, K. (1972). A statistical interpretation of term specificity and its application to retrieval. Journal of Documentation,28(1), 11–21.
Sperber, D., & Wilson, D. (1995). Relevance: Communication and cognition (2d ed.). Oxford: Blackwell.
Sugimoto, C. R., & Larivière, V. (2018). Measuring research: What everyone needs to know. New York: Oxford University Press.
Swanson, D. R. (1986). Subjective versus objective relevance in bibliographic retrieval systems. The Library Quarterly,56(4), 389–398.
Thompson, P. (1990a). A combination of expert opinion approach to probabilistic information retrieval, Part 1: The conceptual model. Information Processing and Management,26(3), 371–382.
Thompson, P. (1990b). A combination of expert opinion approach to probabilistic information retrieval, Part 2: Mathematical treatment of CEO model 3. Information Processing and Management,26(3), 383–394.
Thompson, P. (2008). Looking back: On relevance, probabilistic indexing and information retrieval. Information Processing and Management,44(2), 963–970.
Tonta, Y., & Özkan Çelik, A. E. (2013). Cahit Arf: Exploring his scientific influence using social network analysis, author co-citation maps and single publication h index. Journal of Scientometric Research,2(1), 37–51.
Wesley-Smith, I., Bergstrom, C. T., & West, J. D. (2016). Static ranking of scholarly papers using article-level eigenfactor (ALEF). In The 9th ACM International Conference on Web Search and Data Mining (WSDM). February 22–25, 2016, San Francisco, CA, USA. Retrieved May 2, 2019, from https://octavia.zoology.washington.edu/publications/WesleySmithEtAl16.pdf.
White, H. D. (2007a). Combining bibliometrics, information retrieval, and relevance theory. Part 1: First examples of a synthesis. Journal of the American Society for Information Science and Technology,58(4), 536–559.
White, H. D. (2007b). Combining bibliometrics, information retrieval, and relevance theory. Part 2: Some implications for information science. Journal of the American Society for Information Science and Technology,58(4), 583–605.
White, H. D. (2009). Pennants for Strindberg and Persson. Celebrating scholarly communication studies: A festschrift for Olle Persson at his 60th birthday. ISSI Newsletter, (Vol. 5-S, pp. 71–83). Retrieved May 2, 2019, from https://portal.research.lu.se/portal/files/5902071/1458992.pdf.
White, H. D. (2010). Some new tests of relevance theory in information science. Scientometrics,83(3), 653–667.
White, H. D. (2015). Co-cited author retrieval and relevance theory: Examples from the humanities. Scientometrics,102(3), 2275–2299.
White, H. D. (2018a). Pennants for Garfield: Bibliometrics and document retrieval. Scientometrics,114(2), 757–778.
White, H. D. (2018b). Bag of works retrieval: TF*IDF weighting of works co-cited with a seed. International Journal of Digital Libraries,19(2–3), 139–149.
White, H. D., & Mayr, P. (2013). Pennants for descriptors. In NKOS Workshop 2013. Valletta, Malta. https://arxiv.org/abs/1310.3808.
Wilson, D., & Sperber, D. (2002). Relevance theory. In G. Ward & L. Horn (Eds.), Handbook of pragmatics. Oxford: Blackwell. Retrieved May 2, 2019, from https://www.dan.sperber.fr/?p=93.
Yao, Y. (1995). Measuring retrieval effectiveness based on user preference of documents. Journal of the American Society for Information Science,46(2), 133–145.
Yule, G. U. (1912). On the methods of measuring association between two attributes. Journal of the Royal Statistical Society,75(6), 579–652.
Zarrinkalam, F., & Kahani, M. (2012). A new metric for measuring relatedness of scientific papers based on non-textual features. Intelligent Information Management, 4(4), 99–107. Retrieved May 2, 2019, from https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.459.6934&rep=rep1&type=pdf.
About this article
Cite this article
Akbulut, M., Tonta, Y. & White, H.D. Related records retrieval and pennant retrieval: an exploratory case study. Scientometrics 122, 957–987 (2020). https://doi.org/10.1007/s11192-019-03303-9
- Bibliographic coupling
- Co-citation analysis
- Relevance theory
- tf * idf
- Web of Science