How Effective Is Query Expansion for Finding Novel Information?

  • Min Zhang
  • Chuan Lin
  • Shaoping Ma
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3248)


The task of finding novel information in information retrieval (IR) has been proposed recently and paid more attention to. Compared with techniques in traditional document-level retrieval, query expansion (QE) is dominant in the new task. This paper gives an empirical study on the effectiveness of different QE techniques on finding novel information. The conclusion is drawn according to experiments on two standard test collections of TREC2002 and TREC2003 novelty tracks. Local co-occurrence-based QE approach performs best and makes more than 15% consistent improvement, which enhances both precision and recall in some cases. Proximity-based and dependency-based QE are also effective that both make about 10% progress. Pseudo relevance feedback works better than semantics-based QE and the latter one is not helpful on finding novel information.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Miller, G.A., et al.: Introduction to WordNet: An on-line lexical database. International Journal of Lexicography (special issue) 3(4), 235–312 (1990)Google Scholar
  2. 2.
    Smeaton, A.F., Berrut, C.: Thresholding postings lists, query expansion by word-word distances and POS tagging of Spanish text. In: Proceedings of the 4th Text Retrieval Conference (1996)Google Scholar
  3. 3.
    Rijbergen, V.: A theoretical basis for the use of co-occurrence data in information retrieval. Journal of Documentation, 106–119 (1977)Google Scholar
  4. 4.
    Crouch, C.J., Yong, B.: Experiments in automatic statistical thesaurus construction. In: Proceedings of 15th Int. ACM/SIGIR Conf on R&D in Information Retrieval, Copenhagen, Denmark, pp. 77–87 (1992)Google Scholar
  5. 5.
    Schutze, H., Pedersen, J.O.: A cooccurrence-based thesaurus and two applications to information retrieval. In: Proceedings of RIAO 1994, pp. 266–274 (1994)Google Scholar
  6. 6.
    Chen, H., et al.: Automatic thesaurus generation for an electronic community system. Journal of American Society for Information Science 46(3), 175–193 (1995)zbMATHCrossRefGoogle Scholar
  7. 7.
    Lin, D., et al.: Identifying Synonyms among Distributionally Similar Words. In: Proceedings of IJCAI 2003 (2003)Google Scholar
  8. 8.
    Ruge, G.: Experiments on linguistically-based term associations. Information Processing and Management 28(3), 317–332 (1992)CrossRefGoogle Scholar
  9. 9.
    Grefenstette, G.: Explorations in automatic thesaurus discovery. Kluwer Academic Publishers, Dordrecht (1994)zbMATHGoogle Scholar
  10. 10.
    Voorhees. E. M.: Query Expansion Using Lexical-Semantic Relations. In: 17th Annual International ACM SIGIR conference (1994)Google Scholar
  11. 11.
    Xu, J.: Croft. W.B.: Query Expansion Using Local and Global Document Analysis. In: Proceedings of the 19th Annual International ACM SIGIR Conference, pp. 4–11 (1996)Google Scholar
  12. 12.
    Lin, D.: Pantel. P.: Concept Discovery from Text. In: Proceedings of Conference on Computational Linguistics 2002, Taipei, Taiwan, pp. 577–583 (2002)Google Scholar
  13. 13.
    Rocchio, J.: Relevance feedback in information retrieval. In: The Smart retrieval system experiments in automatic document processing, pp. 313–323. Prentice-Hall, Englewood Cliffs (1971)Google Scholar
  14. 14.
    Attar, R., Fraenkel, A.S.: Local feedback in full-text retrieval systems. Journal of the Association for Computing Machinery 24(3), 397–417 (1977)zbMATHGoogle Scholar
  15. 15.
    Harries, Z.S.: Mathematical Structures of Language. Wiley Publisher, New York (1968)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Min Zhang
    • 1
  • Chuan Lin
    • 1
  • Shaoping Ma
    • 1
  1. 1.State Key Lab of Intelligent Tech. and SysTsinghua UniversityBeijingChina

Personalised recommendations