Web Page Retrieval by Combining Evidence

  • Carlos G. Figuerola
  • José L. Alonso Berrocal
  • Angel F. Zazo
  • Emilio Rodríguez Vázquez de Aldana
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4022)

Abstract

The participation of the REINA Research Group in WebCLEF 2005 focused in the monolingual mixed task. Queries or topics are of two types: named and home pages. For both, we first perform a search by thematic contents; for the same query, we do a search in several elements of information from every page (title, some meta tags, anchor text) and then we combine the results. For queries about home pages, we try to detect using a method based in some keywords and their patterns of use. After, a re-rank of the results of the thematic contents retrieval is performed, based on Page-Rank and Centrality coeficients.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Figuerola, C.G., Zazo Rodríguez, A., Alonso Berrocal, J.L., Rodríguez, E.: Karpanta: Un motor de búsqueda para la investigación experimental en recuperación de la información. In: IBERSID 2003, Zaragoza, Spain (2003)Google Scholar
  2. 2.
    Figuerola, C.G., Zazo, Á.F., Rodríguez Vázquez de Aldana, E., Alonso Berrocal, J.L.: La recuperación de información en español y la normalización de términos. Revista Iberoamericana de Inteligencia Artificial 8(22), 135–145 (2004)Google Scholar
  3. 3.
    Beitzel, S., Jensen, E., Cathey, R., Ma, L., Grossman, D., Frieder, O., Chowdury, A., Pass, G., Vandermolen, H.: Task classification and document structure for known-item search. In: The Twelfth Text REtrieval Conference (TREC 2003). NIST Special Publication 500-255, Gaithersburg, Maryland (2003)Google Scholar
  4. 4.
    Fox, E.A., Shaw, J.A.: Combination of multiples searches. In: Overview of the Third Text REtrieval Conference (TREC-3), NIST Special Publication 500-226, pp. 243–252 (1994)Google Scholar
  5. 5.
    Lee, J.H.: Combining multiple evidence from different relevance feedback methods. Technical Report, Center for Intelligent Information Retrieval (CIIR), Department of Computer Science, University of Massachusetts (1996)Google Scholar
  6. 6.
    Thompson, P.: A combination of expert opinion approach to probabilistic information retrieval, part 1: The conceptual model. Information Processing and Management 26(3), 371–382 (1990)CrossRefGoogle Scholar
  7. 7.
    Basterr, B.T., Cottrell, G.W., Belew, R.K.: Automatic combination of multiple ranked retrieval systems. In: Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval (Special Issue of the SIGIR Forum), Dublin, Ireland, July 3-6, ACM/Springer-Verlag (1994)Google Scholar
  8. 8.
    Lee, J.H.: Analyses of multiple evidence combination. In: SIGIR 1997: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 267–276. ACM Press, New York (1997)CrossRefGoogle Scholar
  9. 9.
    Kraaij, W., Westerveld, T., Hiemstra, D.: The importance of prior probabilities for entry page search. In: 5th Annual International ACM SIGIR Conference, Association for Computing Machinery, pp. 27–34 (2002)Google Scholar
  10. 10.
    Plachouras, V., Ounis, I., Rijsbergen, C.J.V., Cacheda, F.: University of Glasgow at the Web Track: Dynamic application of hyperlink analysis using the query scope. In: The Twelfth Text REtrieval Conference (TREC 2003). NIST Special Publication 500-255, Gaithersburg, Maryland (2003)Google Scholar
  11. 11.
    Tomlinson, S.: Robust, Web and Terabyte retrieval with Hummingbird Searchserver at TREC 2004. In: The Thirteen Text REtrieval Conference (TREC 2004), NIST Special Publication 500-261 (2004)Google Scholar
  12. 12.
    Hawking, D., Craswell, N.: Very large scale retrieval and Web search. In: Voorhees, E., Harman, D. (eds.) TREC: Experiment and Evaluation in Information Retrieval. MIT Press, Cambridge (2005), http://es.csiro.au/pubs/trecbook_for_website.pdf Google Scholar
  13. 13.
    Yang, K., Albertson, D.: Widit in TREC 2004 genomics, hard, robust and Web tracks. In: The Thirteen Text REtrieval Conference (TREC 2004), NIST Special Publication 500-261 (2004)Google Scholar
  14. 14.
    Zaragoza, H., Craswell, N., Taylor, M., Saria, S., Robertson, S.: Microsoft Cambridge at TREC-13: Web and hard tracks. In: The Thirteen Text REtrieval Conference (TREC 2004), NIST Special Publication 500-261 (2004)Google Scholar
  15. 15.
    Farah, M., Vanderpooten, D.: Novel approaches in text information retrieval. Experiments in the Web track of TREC-2004. In: The Thirteen Text REtrieval Conference (TREC 2004), NIST Special Publication 500-261 (2004)Google Scholar
  16. 16.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30(1–7), 107–117 (1998)CrossRefGoogle Scholar
  17. 17.
    Kleinberg, J.M., Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.S.: The Web as a Graph: Measurements, Models, and Methods. In: Asano, T., Imai, H., Lee, D.T., Nakano, S.-i., Tokuyama, T. (eds.) COCOON 1999. LNCS, vol. 1627, Springer, Heidelberg (1999)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Carlos G. Figuerola
    • 1
  • José L. Alonso Berrocal
    • 1
  • Angel F. Zazo
    • 1
  • Emilio Rodríguez Vázquez de Aldana
    • 1
  1. 1.REINA Research GroupUniversity of Salamanca 

Personalised recommendations