Abstract
This paper reports on a large-scale experiment for the evaluation of a formal query-biased combination of evidence mechanism. We use the Dempster-Shafer theory of evidence to combine optimally results obtained by content and link analyses on the Web. The query-biased mechanism is based on the query scope, a measure of the query specificity. The query scope is defined using a probabilistic propagation mechanism on top of the hierarchical structure of concepts provided by WordNet. We use two standard Web test collections and two different link analysis approaches. The results show that the proposed approach could improve the retrieval effectiveness.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Aizawa N (2000) The feature quantity: An information theoretic perspective of Tfidf-like measures. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, pp. 104–111.
Amati G and Ounis I (2000) Conceptual graphs and first order logic. The Computer Journal, 43(1):1–12.
Amati G, Ounis I and Plachouras V (2003) The absorbing model for the Web. Submitted to Information Processing Letters.
Amati G and van Rijsbergen CJ (2002) Probabilistic models of Information Retrieval based on measuring divergence from randomness. ACM Transactions on Information Systems, 20(4):357–389.
Amento B, Terveen L and Hill W (2000) Does “authority” mean quality? predicting expert quality ratings of Web documents. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, pp. 296–303.
Bailey P, Craswell N and Hawking D (2003) Engineering a multi-purpose test collection for Web retrieval experiments. Information Processing & Management, 39(6):853–871.
Barnett JA (1981) Computational methods for a mathematical theory of evidence. In: Proceedings of the 7th International Joint Conference on Artificial Intelligence (IJCAI’81), Vancouver, BC, Canada, pp. 868–875.
Bharat K and Henzinger MR (1998) Improved algorithms for topic distillation in a hyperlinked environment. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, pp. 104–111.
Brezeale D (1999) The organization of internet web pages using wordnet and self-organizing maps. Master Thesis, University of Texas at Arlington, Texas, USA.
Brin S and Page L (1998) The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1–7):107–117.
Calado P, Ribeiro-Neto B, Ziviani N, Moura E and Silva I (2003) Local versus global link information in the web. ACM Transactions on Information Systems, 21(1):42–63.
Chakrabarti S, Dom B, Gibson D, Kleinberg J, Raghavan P and Rajagopalan S (1998) Automatic resource list compilation by analyzing hyperlink structure and associated text. Computer Networks and ISDN Systems, 30(1–7):65–74.
Cohn D and Chang H (2000) Learning to probabilistically identify authoritative documents. In: Proceedings of the 17th International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, pp. 167–174.
Craswell N and Hawking D (2002) Overview of the TREC-2002 Web track. In: Proceedings of the 11th Text Retrieval Conference (TREC 2002), Gaithersburg, MD, USA, pp. 86–93.
Craswell N, Hawking D, Wilkinson R and Wu M (2003) Overview of the TREC-2003 Web track. In: Proceedings of the 12th Text REtrieval Conference (TREC 2003), Gaithersburg, MD, USA.
Croft WB and Turtle H (1989) A retrieval model for incorporating hypertext links. In: Proceedings of the 2nd annual ACM conference on Hypertext (Hypertext’89), Pittsburgh, PA, USA, pp. 213–224.
Cronen-Townsend S, Zhou Y and Croft WB (2002) Predicting query performance. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, pp. 299–306.
Diligenti M, Gori M and Maggini M (2002) Web page scoring systems for horizontal and vertical search. In: Proceedings of the 11th International World Wide Web Conference (WWW 2002), Honolulu, HI, USA, pp. 508–516.
Dominich S (2002) Connectionist interaction information retrieval. Information Processing & Management, 39(2):167–193.
Fellbaum C, (1998) Ed. WordNet An Electronic Lexical Database. MIT Press.
Haveliwala TH (2002) Topic-sensitive PageRank. In: Proceedings of the 11th International World Wide Web Conference (WWW 2002), Honolulu, HI, USA, pp. 517–526.
Hawking D and Craswell N (2001) Overview of the TREC-2001 Web track. In: Proceedings of the 10th Text REtrieval Conference (TREC 2001), Gaithersburg, MD, USA, pp. 61–67.
Jose JM (1998) An integrated approach for multimedia information retrieval. PhD Thesis, The Robert Gordon University, Aberdeen, Scotland.
Jose JM and Harper DJ (1997) A retrieval mechanism for semi-structured photographic collections. In: Proceedings of the Database and Expert Systems Applications, 8th International Conference (DEXA’97), Toulouse, France, pp. 276–292.
Kao H-Y, Chen M-S, Lin S-H and Ho J-M (2002) Entropy-based link analysis for mining web informative structures. In: Proceedings of the 2002 ACM CIKM International Conference on Information and Knowledge Management, McLean, VA, USA, pp. 574–581.
Kim SJ and Lee SH (2002) Improved computation of the PageRank algorithm. In: Proceedings of the 24th BCS-IRSG European Colloquium on IR Research, Glasgow, UK, pp. 73–85.
Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604–632.
Kraaij W, Westerveld T and Hiemstra D (2002) The importance of prior probabilities for entry page search. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, pp. 27–34.
Lempel R and Moran S (2000) The stochastic approach for link-structure analysis (SALSA) and the TKC effect. Computer Networks, 33(1–6):387–401.
Miller G (1995) WordNet: A lexical database for English. Communications of the ACM, 38(11):39–41.
Ng AY, Zheng AX and Jordan MI (2001) Stable algorithms for link analysis. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA, USA, pp. 258–266.
Ounis I (1998) Un modèle d’indexation relationnel pour les graphes conceptuels fondé sur une interprétation logique. PhD Thesis, Université Joseph Fourier, Grenoble, France.
Plachouras V, Cacheda F, Ounis I and van Rijsbergen CJ (2003a) University of Glasgow at the Web track: Dynamic application of hyperlink analysis using the query scope. In: Proceedings of the 12th Text Retrieval Conference (TREC 2003), Gaithersburg, MD, USA.
Plachouras V, Ounis I and Amati G (2003b) A utility-oriented hyperlink analysis model for the web. In: Proceedings of the 1st Latin Web Conference, Santiago, Chile, pp. 123–131.
Porter MF (1980) An algorithm for suffix stripping. Program, 14(3):130–137.
Ribeiro-Neto B and Muntz R (1996) A belief network model for IR. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp. 253–260.
Richardson M and Domingos P (2002) The intelligent surfer: probabilistic combination of link and content information in PageRank. In: Advances in Neural Information Processing Systems 14 (Neural Information Processing Systems: Natural and Synthetic, NIPS 2001), Vancouver, BC, Canada, pp. 1441–1448.
Rölleke T (2003) A frequency-based and a poisson-based definition of the probability of being informative. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada, pp. 227–234.
Ruthven I, Lalmas M and van Rijsbergen CJ (2002) Combining and selecting characteristics of information use. Journal of the American Society for Information Science and Technology, 53(5):378–396.
Sanderson M and Croft B (1999) Deriving concept hierarchies from text. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, USA, pp. 206–213.
Shafer G (1976) A Mathematical Theory of Evidence. Princeton University Press.
Silva I, Ribeiro-Neto B, Calado P, Moura E and Ziviani N (2000) Link-based and content-based evidential information in a belief network model. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, pp. 96–103.
Spärck Jones K (1972) A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1):11–20.
van Rijsbergen CJ (1979) Information Retrieval, 2nd ed. Buttersworth, London.
Voorhees E (1994) Using WordNet to disambiguate word senses for text retrieval. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Pittsburgh, PA, USA, pp. 171–180.
Wong SKM and Yao Y (1992) An information-theoretic measure of term specificity. Journal of the American Society for Information Science, 43(1):54–61.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Plachouras, V., Ounis, I. Dempster-Shafer Theory for a Query-Biased Combination of Evidence on the Web. Inf Retrieval 8, 197–218 (2005). https://doi.org/10.1007/s10791-005-5659-7
Issue Date:
DOI: https://doi.org/10.1007/s10791-005-5659-7