A New Measure for Query Disambiguation Using Term Co-occurrences

  • Hiromi Wakaki
  • Tomonari Masada
  • Atsuhiro Takasu
  • Jun Adachi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4224)


This paper explores techniques that discover terms to replace given query terms from a selected subset of documents. The Internet allows access to large numbers of documents archived in digital format. However, no user can be an expert in every field, and they trouble finding the documents that suit their purposes experts when they cannot formulate queries that narrow the search to the context they have in mind. Accordingly, we propose a method for extracting terms from searched documents to replace user-provided query terms. Our results show that our method is successful in discovering terms that can be used to narrow the search.


Average Precision Query Term Query Expansion Inverse Document Frequency Term Extraction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)Google Scholar
  2. 2.
    Church, K., Gale, W.: Inverse document frequency (IDF): A measure of deviations from poisson. In: Proc. of 3rd Workshop on Very Large Corpora, pp. 121–130 (1995)Google Scholar
  3. 3.
    Eguchi, K., Oyama, K., Ishida, E., Kando, N., Kuriyama, K.: Overview of the Web retrieval task at the third NTCIR workshop. In: Proc. of NTCIR-3, pp. 1–24 (2003)Google Scholar
  4. 4.
    Fang, H., Tao, T., Zhai, C.: A formal study of information retrieval heuristics. In: Proc. of SIGIR 2004, pp. 49–56 (2004)Google Scholar
  5. 5.
    Hisamitsu, T., Niwa, Y., Nishioka, S., Sakurai, H., Imaichi, O., Iwayama, M., Takano, A.: Extracting terms by a combination of term frequency and a measure of term representativeness. Terminology 6(2), 211–232 (2001)Google Scholar
  6. 6.
  7. 7.
    Matsuo, Y., Ishizuka, M.: Keyword extraction from a single document using word co-ocurrence statistical information. International Journal on Artificial Intelligence Tools 13, 157–169 (2004)CrossRefGoogle Scholar
  8. 8.
  9. 9.
    Rennie, J., Jaakkola, T.: Using term informativeness for named entity detection. In: Proc. of SIGIR 2005, pp. 353–360 (2005)Google Scholar
  10. 10.
    Robertson, S.E.: On term selection for query expansion. Journal of Documentation 46(4), 359–364 (1990)CrossRefGoogle Scholar
  11. 11.
    Toyoda, M., Kitsuregawa, M., Mano, H., Itoh, H., Ogawa, Y.: University of Tokyo/RICOH at NTCIR-3 Web retrieval task. In: Proc. of NTCIR-3, pp. 31–38 (2003)Google Scholar
  12. 12.
  13. 13.
    Yang, Y., Pedersen, J.: A comparative study on feature selection in text categorization. In: Proc. of ICML 1997, pp. 412–420 (1997)Google Scholar
  14. 14.
    Yoshioka, M., Haraguchi, M.: Study on the combination of probabilistic and boolean ir models for www documents retrieval. Working Notes of NTCIR-4(Supplement Volume), 9–16 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Hiromi Wakaki
    • 1
  • Tomonari Masada
    • 2
  • Atsuhiro Takasu
    • 2
  • Jun Adachi
    • 2
  1. 1.Graduate School of Information Science and TechnologyThe University of TokyoTokyoJapan
  2. 2.The National Institute of InformaticsTokyoJapan

Personalised recommendations