Skip to main content
Log in

Bag of works retrieval: TF*IDF weighting of works co-cited with a seed

  • Published:
International Journal on Digital Libraries Aims and scope Submit manuscript

Abstract

Although not presently possible in any system, the style of retrieval described here combines familiar components—co-citation linkages of documents and TF*IDF weighting of terms—in a way that could be implemented in future databases. Rather than entering keywords, the user enters a string identifying a work—a seed—to retrieve the strings identifying other works that are co-cited with it. Each of the latter is part of a “bag of works,” and it presumably has both a co-citation count with the seed and an overall citation count in the database. These two counts can be plugged into a standard formula for TF*IDF weighting such that all the co-cited items can be ranked for relevance to the seed, given that the entire retrieval is relevant to it by evidence from multiple co-citing authors. The result is analogous to, but different from, traditional “bag of words” retrieval, which it supplements. Some properties of the ranking are illustrated by works co-cited with three seeds: an article on search behavior, an information retrieval textbook, and an article on centrality in networks. While these are case studies, their properties apply to bag of works retrievals in general and have implications for users (e.g., humanities scholars, domain analysts) that go beyond any one example.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Wu, H.C., Luk, R.W.P., Wong, K.F., Kwok, K.L.: Interpreting tf-idf term weights as making relevance decisions. ACM Trans. Inf. Syst. 26(3), 13 (2008)

    Article  Google Scholar 

  2. Huang, W., Kataria, S., Caragea, C., Mitra, P., Giles, C.L., Rokach, L.: Recommending citations: translating papers into references. In: Proceedings of the 21st International Conference on Information and Knowledge Management, pp. 1910–1914 (2012)

  3. Nascimento, C., Laender, A.H.F., da Silva, A.S., Gonçalves, M.A.: A source independent framework for research paper recommendation. In: Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital libraries, pp. 297–306 (2011)

  4. Manning, C.D., Raghavan, P., Schütze, H.: An Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    Book  MATH  Google Scholar 

  5. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  6. Eto, M.: Evaluations of context-based co-citation searching. Scientometrics 94, 651–673 (2013)

    Article  Google Scholar 

  7. Liu, S., Chen, C.: The proximity of co-citation. Scientometrics 91, 495–511 (2012)

    Article  Google Scholar 

  8. Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28, 11–21 (1972)

    Article  Google Scholar 

  9. Carevic, Z., Schaer, P.: On the connection between citation-based and topical relevance ranking: Results of a pretest using iSearch. In: Proceedings of the First Workshop on Bibliometric-enhanced Information Retrieval, pp. 37–44 (2014)

  10. White, H.D.: Some new tests of relevance theory in information science. Scientometrics 83, 653–667 (2010)

    Article  Google Scholar 

  11. Beel, J., Gipp, B., Langer, S., Breitinger, C.: Research paper recommender systems: a literature survey. Int. J. Digit. Libr. 17(4), 305–338 (2016)

  12. Small, H.: Co-citation in the scientific literature: a new measure of the relationship between two documents. J. Am. Soc. Inf. Sci. 24, 265–269 (1973)

    Article  Google Scholar 

  13. Lawrence, S., Giles, C.L., Bollacker, K.: Digital libraries and autonomous citation indexing. IEEE Comput. 32(6), 67–71 (1999)

    Article  Google Scholar 

  14. Huynh, T., Hoang, K., Do, L., Tran, H., Luong, H., Gauch, S.: Scientific publication recommendations based on collaborative citation networks. In: Proceedings of the International Conference on Collaboration Technologies and Systems (CTS), pp. 316–321 (2012)

  15. Liang, Y., Li, Q., Qian, T.: Finding relevant papers based on citation relations. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds.) Lecture Notes on Computer Science, vol. 6897, pp. 403–414 (2011)

  16. Küçüktunç, O., Saule, E., Kaya, K., Çatalyürek, U.V.: Towards a personalized, scalable, and exploratory academic recommendation service. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 636–641 (2013)

  17. Pan, L., Dai, X., Huang, S., Chen, J.: Academic paper recommendation based on heterogeneous graph. In: Sun, M., Liu, Z., Zhang, M., Liu, Y. (eds.) Lecture Notes on Computer Science, vol. 9427, pp. 381–392 (2015)

  18. Beel, J., Breitinger, C., Langer, S.: Evaluating the CC-IDF citation-weighting scheme: how effectively can ‘Inverse Document Frequency’ (IDF) be applied to references? In: Proceedings of the 12th iConference (in press) (2017)

  19. Bates, M.J.: The design of browsing and berrypicking techniques for the online search interface. Online Review 13: 407–424 [Quoted as reprinted in her (2016) Information users and information system design. Ketchikan Press, Berkeley, California, pp 195–216] (1989)

  20. White, H.D.: Co-cited author retrieval and relevance theory: examples from the humanities. Scientometrics 102, 2275–2299 (2014)

    Article  Google Scholar 

  21. Bonacich, P.: Power and centrality: a family of measures. Am. J. Sociol. 92, 1170–1182 (1987)

    Article  Google Scholar 

  22. Bonacich, P.: Factoring and weighting approaches to status scores and clique identification. J. Math. Sociol. 2, 113–120 (1972)

    Article  Google Scholar 

  23. White, H.D.: Combining bibliometrics, information retrieval, and relevance theory, part 1: first examples of a synthesis. J. Am. Soc. Inf. Sci. Technol. 58, 536–559 (2007)

    Article  Google Scholar 

  24. White, H.D.: Combining bibliometrics, information retrieval, and relevance theory, part 2: some implications for information science. J. Am. Soc. Inf. Sci. Technol. 58, 583–605 (2007)

    Article  Google Scholar 

  25. White, H.D.: Pennants for Strindberg and Persson. In: Celebrating Scholarly Communication Studies: A Festschrift for Olle Persson at his 60th Birthday, pp. 71–83 (2009). http://www.issi-society.org/ollepersson60/

  26. White, H.D., Mayr, P.: Pennants for descriptors. Paper presented at the 12th International Conference on Theory and Practice of Digital Libraries. arXiv:1310.3808 (2013)

  27. Bates, M.J.: Document familiarity, relevance, and Bradford’s Law: the Getty Online Searching Project report no. 5. Information Processing & Management 32, 697–707 [Reprinted in her (2016) Information users and information system design. Ketchikan Press, Berkeley, California, pp. 283–300], (1996)

  28. Jarneving, B.: A comparison of two bibliometric methods for mapping of the research front. Scientometrics 65, 245–263 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Howard D. White.

Additional information

This paper revises and considerably expands on one that appeared in the Proceedings of the 3rd Workshop on Bibliometric-enhanced Information Retrieval (BIR2016), pp 63–72 http://ceur-ws.org/Vol -1567/paper7.pdf

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

White, H.D. Bag of works retrieval: TF*IDF weighting of works co-cited with a seed. Int J Digit Libr 19, 139–149 (2018). https://doi.org/10.1007/s00799-017-0217-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00799-017-0217-7

Keywords

Navigation