Algorithmic Challenges in Web Search Engines

  • Ricardo Baeza-Yates
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4007)


We present the main algorithmic challenges that large Web search engines face today. These challenges are present in all the modules of a Web retrieval system, ranging from the gathering of the data to be indexed (crawling) to the selection and ordering of the answers to a query (searching and ranking). Most of the challenges are ultimately related to the quality of the answer or the efficiency in obtaining it, although some are relevant even to the existence of current search engines: context based advertising.

As the Web grows and changes at a fast pace, the algorithms behind these challenges must rely in large scale experimentation, both in data volume and computation time, to understand the main issues that affect them. We show examples of our own research and of the state of the art. The full version of this paper appears in [1] .


Information Retrieval Large Scale Experimentation Internet Search Engine Impedance Coupling Search Engine Result 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Baeza-Yates, R.: Algorithmic Challenges in Web Search Engines. In: Correa, J.R., Hevia, A., Kiwi, M. (eds.) LATIN 2006. LNCS, vol. 3887, pp. 1–7. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  2. 2.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval, p. 513. Addison-Wesley, England (1999)Google Scholar
  3. 3.
    Baeza-Yates, R.: Information Retrieval in the Web: beyond current search engines, Int. Journal of Approximate Reasoning 34(2-3), 97–104 (2003)CrossRefMathSciNetMATHGoogle Scholar
  4. 4.
    Baeza-Yates, R., Castillo, C., Marin, M., Rodriguez, A.: Crawling a Country: Better Strategies than Breadth-First for Page Ordering. In: WWW 2005, Industrial Track, ACM Press, Chiba, Japan (2005)Google Scholar
  5. 5.
    Baeza-Yates, R.A., Hurtado, C.A., Mendoza, M.: Query Recommendation Using Query Logs in Search Engines. In: Lindner, W., Mesiti, M., Türker, C., Tzitzikas, Y., Vakali, A.I. (eds.) EDBT 2004. LNCS, vol. 3268, pp. 588–596. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  6. 6.
    Baeza-Yates, R.: A Fast Set Intersection Algorithm for Sorted Sequences. In: 15th Combinatorial Pattern Matching 2004, Turkey. LNCS, Springer, Istanbul, Turkey (2004)Google Scholar
  7. 7.
    Baeza-Yates, R.: Applications of Web Query Mining. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, Springer, Heidelberg (2005)CrossRefGoogle Scholar
  8. 8.
    Baeza-Yates, R., Poblete, B.: A Website Mining Model Centered on User Queries. In: Berendt, B., et al. (eds.) European Web Mining Forum, Oporto, Portugal, October 2005, pp. 3–15 (2005)Google Scholar
  9. 9.
    Baeza-Yates, R., Pereira, A., Ziviani, N.: WIM: A Web Information Mining Model for the Web. In: LA-WEB 2005, pp. 233–241. IEEE CS Press, Los Alamitos (2005)Google Scholar
  10. 10.
    Bhargava, H.K., Feng, J.: Paid placement strategies for internet search engines. In: Proceedings of the eleventh international conference on World Wide Web, pp. 117–123. ACM Press, New York (2002)CrossRefGoogle Scholar
  11. 11.
    Chakrabarti, S.: Mining the Web: Discovering knowledge from hypertext data. Morgan Kaufmann, San Francisco (2003)Google Scholar
  12. 12.
    Davison, B.: Workshop on Adversarial Information Retrieval on the Web, Chiba, Japan (May 2005),
  13. 13.
    Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1998); Preliminary version presented at SODA 1998CrossRefMathSciNetGoogle Scholar
  14. 14.
    Kleinberg, J., Raghavan, P.: Query Incentive Networks. In: Proc. 46th IEEE Symposium on Foundations of Computer Science (2005)Google Scholar
  15. 15.
    Koster, M.: A standard for robot exclusion (1996),
  16. 16.
    Makinen, V., Navarro, G.: Compressed Full Text Indexes. Technical Report TR/DCC-, -7, Dept. of Computer Science, University of Chile (June 2005), Available at:
  17. 17.
    Nicholson, S., Sierra, T., Eseryel, U.Y., Park, J.H., Barkow, P., Pozo, E.J., Ward, J.: How Much of It is Real? Analysis of Paid Placement in Web Search Engine Results. In: JASIST (2005)Google Scholar
  18. 18.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The Pagerank citation algorithm: bringing order to the web. Technical report, Stanford Digital Library Technologies Project (1998)Google Scholar
  19. 19.
    Ribeiro-Neto, B., Cristo, M., Golgher, P., Silva de Moura, E.: Impedance coupling in content-targeted advertising. In: Proceedings of the 28th Annual international ACM SIGIR Conference on Research and Development in information Retrieval, SIGIR 2005, Salvador, Brazil, August 15 - 19, 2005, pp. 496–503. ACM Press, New York (2005)CrossRefGoogle Scholar
  20. 20.
    Wellman, B.: Computer Networks As Social Networks. Science 293(5537), 2031–2034 (2001)CrossRefGoogle Scholar
  21. 21.
    Yao, A.C.-C. (ed.): WINE 2005. LNCS, vol. 3828. Springer, Heidelberg (2005), Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Ricardo Baeza-Yates
    • 1
  1. 1.Spain & SantiagoYahoo! ResearchBarcelonaChile

Personalised recommendations