Advertisement

Boosting Web Retrieval Through Query Operations

  • Gilad Mishne
  • Maarten de Rijke
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3408)

Abstract

We explore the use of phrase and proximity terms in the context of web retrieval, which is different from traditional ad-hoc retrieval both in document structure and in query characteristics. We show that for this type of task, the usage of both phrase and proximity terms is highly beneficial for early precision as well as for overall retrieval effectiveness. We also analyze why phrase and proximity terms are far more effective for web retrieval than for ad-hoc retrieval.

Keywords

Average Precision Mean Average Precision Query Operator Proximity Operator Anchor Text 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ahn, D., Jijkoun, V., Kamps, J., Mishne, G., Müller, K., de Rijke, M., Schlobach, S.: The University of Amsterdam at TREC 2004. In: TREC 2004 Conference Notebook, Gaithersburg, Maryland USA (2004)Google Scholar
  2. 2.
    Amitay, E., Carmel, D., Darlow, A., Herscovici, M., Kraft, R., Lempel, R., Soffer, A., Zien, J.: Juru at TREC 2003 - Topic Distillation using Query-Sensitive Tuning and Cohesiveness Filtering. In: Proceedings of the 12th Text REtrieval Conference (2003)Google Scholar
  3. 3.
    Arampatzis, A.T., van der Weide, T.P., Koster, C.H.A., van Bommel, P.: An Evaluation of Linguistically-motivated Indexing Schemes. In: Proceedings of the 22nd BCS-IRSG Colloquium on IR Research (2000)Google Scholar
  4. 4.
    Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. ACM Press / Addison-Wesley (1999)Google Scholar
  5. 5.
    Bartell, B.T., Cottrell, G.W., Belew, R.K.: Automatic Combination of Multiple Ranked Retrieval Systems. In: Research and Development in Information Retrieval, pp. 173–181 (1994)Google Scholar
  6. 6.
    Brill, E., Dumais, S., Banko, M.: An analysis of the AskMSR question-answering system. In: Proceedings 39th Annual ACL (2002)Google Scholar
  7. 7.
    Cacheda, F., Vina, A.: Understanding how people use search engines: a statistical analysis for e-business. In: Proceedings of the e-Business and e-Work Conference and Exhibition, Venice, Italy, October 2001, pp. 319–325 (2001)Google Scholar
  8. 8.
    Chakrabarti, S.: Mining the Web: Analysis of Hypertext and Semi Structured Data. Morgan Kaufmann, San Francisco (2002)Google Scholar
  9. 9.
    Clarke, C.L.A., Cormack, G.V.: Shortest-substring retrieval and ranking. ACM Transactions on Information Systems (TOIS) 18(1), 44–78 (2000)CrossRefGoogle Scholar
  10. 10.
    Craswell, N., Hawking, D.: Overview of the TREC-2002 web track. In: Proceedings of TREC-2002, Gaithersburg, Maryland USA (November 2002)Google Scholar
  11. 11.
    Craswell, N., Hawking, D., Wilkinson, R., Wu, M.: Overview of the TREC-2003 web track. In: Proceedings of TREC 2003, Gaithersburg, Maryland USA (November 2003)Google Scholar
  12. 12.
    Croft, W.B., Turtle, H.R., Lewis, D.D.: The use of phrases and structured queries in information retrieval. In: Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval, Chicago, Illinois, United States, pp. 32–45. ACM Press, New York (1991)CrossRefGoogle Scholar
  13. 13.
    Craswell, N., et al.: Overview of the TREC-2004 web track. In: Proceedings 13th Text REtrieval Conference, Gaithersburg, Maryland USA (2004) (to appear)Google Scholar
  14. 14.
    Fagan, J.L.: Experiments in automatic phrase indexing for document retrieval: A comparison of syntactic and non-syntactic methods. Technical report, Cornell University (1987)Google Scholar
  15. 15.
    Fuhr, N., Lalmas, M., Malik, S. (eds.): INEX 2003 Workshop Proceedings (2004)Google Scholar
  16. 16.
    Hawking, D., Thistlewaite, P.: Proximity operators—So near and yet so far. In: Proceedings TREC-4, pp. 131–143 (1996)Google Scholar
  17. 17.
    Hawking, D., Thistlewaite, P.: Relevance weighting using distance between term occurrences. Technical Report TR-CS-96-08, Department of Computer Science, Australian National University (1996)Google Scholar
  18. 18.
    Hersh, W., Bhupatiraju, R.T.: TREC GENOMICS Track Overview. In: Proceedings TREC 2003, pp. 14–23 (2004)Google Scholar
  19. 19.
    Hull, D.A., Grefenstette, G., Schultze, B.M., Gaussier, E., Schutze, H., Pedersen, J.O.: Xerox TREC-5 Site Report: Routing, Filtering, NLP, and Spanish Tracks. In: Proceedings TREC-5, pp. 167–180 (1997)Google Scholar
  20. 20.
    Jansen, B.J., Spink, A., Saracevic, T.: Real life, real users, and real needs: a study and analysis of user queries on the web. Information Processing and Management 36(2), 207–227 (2000)CrossRefGoogle Scholar
  21. 21.
    Kamps, J., Mishne, G., de Rijke, M.: The University of Amsterdam at TREC 2004. In: Proceedings of the 13th Text REtrieval Conference (2004) (to appear)Google Scholar
  22. 22.
    Keen, E.M.: Term position ranking: some new test results. In: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 66–76. ACM Press, New York (1992)CrossRefGoogle Scholar
  23. 23.
    Kraaij, W., Pohlmann, R.: Comparing the effect of syntactic vs. Statistical phrase indexing strategies for dutch. In: Nikolaou, C., Stephanidis, C. (eds.) ECDL 1998. LNCS, vol. 1513, pp. 605–617. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  24. 24.
    Mitra, M., Buckley, C., Singhal, A., Cardie, C.: An analysis of statistical and syntactic phrases. In: Proceedings of RIAO 1997 (1997)Google Scholar
  25. 25.
    Mittal, V., Baluja, S., Sahami, M.: Google tutorial on web information retrieval. In: RIAO 2004 (2004)Google Scholar
  26. 26.
    Ogilvie, P., Callan, J.: Combining document representations for known-item search. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. ACM Press, New York (2003)Google Scholar
  27. 27.
    Pickens, J., Croft, W.B.: An exploratory analysis of phrases in text retrieval. In: Proceedings of RIAO 2000 (2000)Google Scholar
  28. 28.
    Rasolofo, Y., Savoy, J.: Term proximity scoring for keyword-based retrieval systems. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 207–218. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  29. 29.
    Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Inc, New York (1986)Google Scholar
  30. 30.
    Savoy, J., Rasolofo, Y., Perret, L.: Report on the TREC-2003 experiment: Genomic and web searches. In: Proceedings TREC 2003, pp. 739–750 (2004)Google Scholar
  31. 31.
    Spink, A., Jansen, B.J., Wolfram, D., Saracevic, T.: From e-sex to e-commerce: Web search changes. Computer 35(3), 107–109 (2002)CrossRefGoogle Scholar
  32. 32.
    Spink, A., Wolfram, D., Jansen, B.J., Saracevic, T.: Searching the web: the public and their queries. Journal of the American Society for Information Science and Technology 52(3), 226–234 (2001)CrossRefGoogle Scholar
  33. 33.
    Wen, J., Song, R., Cai, D., Zhu, K., Yu, S., Ye, S., Ma, W.-Y.: Microsoft Research Asia at the Web Track of TREC 2003. In: Proceedings TREC 2003, pp. 408–417 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Gilad Mishne
    • 1
  • Maarten de Rijke
    • 1
  1. 1.Informatics InstituteUniversity of AmsterdamAmsterdamThe Netherlands

Personalised recommendations