Advertisement

Using Terms from Citations for IR: Some First Results

  • Anna Ritchie
  • Simone Teufel
  • Stephen Robertson
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4956)

Abstract

We present the results of experiments using terms from citations for scientific literature search. To index a given document, we use terms used by citing documents to describe that document, in combination with terms from the document itself. We find that the combination of terms gives better retrieval performance than standard indexing of the document terms alone and present a brief analysis of our results. This paper marks the first experimental results from a new test collection of scientific papers, created by us in order to study citation-based methods for IR.

Keywords

Relevant Document Retrieval Model Retrieval Performance Query Term Test Collection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Teufel, S., Siddharthan, A., Tidhar, D.: Automatic classification of citation function. In: Proceedings of Empirical Methods in Natural Language Processing, pp. 103–110 (2006)Google Scholar
  2. 2.
    Schwartz, A.S., Hearst, M.: Summarizing key concepts using citation sentences. In: Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology, pp. 134–135 (2006)Google Scholar
  3. 3.
    Schneider, J.: Verification of bibliometric methods’ applicability for thesaurus construction. PhD thesis, Royal School of Library and Information Science (2004)Google Scholar
  4. 4.
    Strohman, T., Croft, W.B., Jensen, D.: Recommending citations for academic papers. In: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR), pp. 705–706 (2007)Google Scholar
  5. 5.
    Fujii, A.: Enhancing patent retrieval by citation analysis. In: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR), pp. 793–794 (2007)Google Scholar
  6. 6.
    Meij, E., de Rijke, M.: Using prior information derived from citations in literature search. In: Proceedings of the International Conference on Recherche d’Information Assistée par Ordinateur (RIAO) (2007)Google Scholar
  7. 7.
    McBryan, O.: GENVL and WWWW: Tools for taming the web. In: Proceedings of the World Wide Web Conference (WWW) (1994)Google Scholar
  8. 8.
    Hawking, D., Craswell, N.: The very large collection and web tracks. In: Voorhees, E.M., Harman, D.K. (eds.) TREC: Experiment and Evaluation in Information Retrieval, MIT Press, Cambridge (2005)Google Scholar
  9. 9.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30, 107–117 (1998)CrossRefGoogle Scholar
  10. 10.
    Davison, B.D.: Topical locality in the web. In: Proceedings of Research and Development in Information Retrieval (SIGIR), pp. 272–279 (2000)Google Scholar
  11. 11.
    Bradshaw, S.: Reference directed indexing: Redeeming relevance for subject search in citation indexes. In: Koch, T., Sølvberg, I.T. (eds.) ECDL 2003. LNCS, vol. 2769, pp. 499–510. Springer, Heidelberg (2003)Google Scholar
  12. 12.
    Dunlop, M.D., van Rijsbergen, C.J.: Hypermedia and free text retrieval. Information Processing and Management 29(3), 287–298 (1993)CrossRefGoogle Scholar
  13. 13.
    Pitkow, J., Pirolli, P.: Life, death, and lawfulness on the electronic frontier. In: Proceedings of the Conference on Human Factors in Computing Systems (1997)Google Scholar
  14. 14.
    O’Connor, J.: Citing statements: Computer recognition and use to improve retrieval. Information Processing and Management 18(3), 125–131 (1982)CrossRefGoogle Scholar
  15. 15.
    Ritchie, A., Teufel, S., Robertson, S.: Creating a test collection for citation-based IR experiments. In: Proceedings of Human Language Technology conference and the North American Chapter of the Association for Computational Linguistics (HLT-NAACL) (2006)Google Scholar
  16. 16.
    Ritchie, A., Robertson, S., Teufel, S.: Creating a test collection: Relevance judgements of cited & non-cited papers. In: Proceedings of the International Conference on Recherche d’Information Assistée par Ordinateur (RIAO) (2007)Google Scholar
  17. 17.
    Kluck, M.: The GIRT data in the evaluation of CLIR systems - from 1997 until 2003. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 376–390. Springer, Heidelberg (2004)Google Scholar
  18. 18.
    Hersh, W., Bhupatiraju, R.T.: Trec genomics track overview. In: Proceedings of the Text REtrieval Conference (TREC), pp. 14–23 (2003)Google Scholar
  19. 19.
    Hersh, W., Bhupatiraju, R.T., Ross, L., Johnson, P., Cohen, A.M., Kraemer, D.F.: Trec 2004 genomics track overview. In: Proceedings of the Text REtrieval Conference (TREC) (2004)Google Scholar
  20. 20.
    Hersh, W., Cohen, A.M., Roberts, P., Rekapilli, H.K.: Trec 2006 genomics track overview. In: Proceedings of the Text REtrieval Conference (TREC) (2006)Google Scholar
  21. 21.
    Cleverdon, C., Mills, J., Keen, M.: Factors determining the performance of indexing sytems, vol. 1, design. Technical report, ASLIB Cranfield Project, (1966)Google Scholar
  22. 22.
    Strohman, T., Metzler, D., Turtle, H., Croft, W.B.: Indri: a language-model based search engine for complex queries. Technical report, University of Massachusetts (2005)Google Scholar
  23. 23.
    Powley, B., Dale, R.: Evidence-based information extraction for high accuracy citation and author name identification. In: Proceedings of the International Conference on Recherche d’Information Assistée par Ordinateur (RIAO) (2007)Google Scholar
  24. 24.
    Briscoe, E., Carroll, J.: Robust accurate statistical annotation of general text. In: Proceedings of the Conference on Language Resources and Evaluation (LREC), pp. 1499–1504 (2002)Google Scholar
  25. 25.
    Ritchie, A., Teufel, S., Robertson, S.: How to find better index terms through citations. In: Proceedings of COLING/ACL Workshop on How Can Computational Linguistics Improve Information Retrieval? (2006)Google Scholar
  26. 26.
    Zaragoza, H., Craswell, N., Taylor, M., Saria, S., Robertson, S.: Microsoft Cambridge at TREC-13: Web and HARD tracks. In: Proceedings of the Thirteenth Text REtrieval Conference (TREC) (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Anna Ritchie
    • 1
  • Simone Teufel
    • 1
  • Stephen Robertson
    • 2
  1. 1.Computer LaboratoryUniversity of CambridgeCambridgeU.K.
  2. 2.Microsoft Research LtdCambridgeU.K.

Personalised recommendations