Applications of Web Query Mining

  • Ricardo Baeza-Yates
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3408)


Server logs of search engines store traces of queries submitted by users, which include queries themselves along with Web pages selected in their answers. The same is true in Web site logs where queries and later actions are recorded from search engine referrers or from an internal search box. In this paper we present two applications based in analyzing and clustering queries. The first one suggest changes to improve the text and structure of a Web site and the second does relevance ranking boosting and query recommendation in search engines.


Search Engine Input Query Related Query Search Engine Query Query Trace 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Akwan Information Technologies. Myweb search,
  2. 2.
    Baeza-Yates, R.: Excavando la web (mining the web, original in Spanish). El profesional de la información (The Information Professional) 13(1), 4–10 (2004)Google Scholar
  3. 3.
    Baeza-Yates, R., Hurtado, C., Mendoza, M.: Ranking boosting based in query clustering. In: Favela, J., Menasalvas, E., Chávez, E. (eds.) AWIC 2004. LNCS (LNAI), vol. 3034. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  4. 4.
    Baeza-Yates, R., Hurtado, C., Mendoza, M.: Query Recommendation Using Query Logs in Search Engines. In: Lindner, W., Mesiti, M., Türker, C., Tzitzikas, Y., Vakali, A.I. (eds.) EDBT 2004. LNCS, vol. 3268, pp. 588–596. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  5. 5.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley & ACM Press (1999)Google Scholar
  6. 6.
    Baeza-Yates, R., Castillo, C.: Relating web structure and user search behavior (extended poster). In: 10th World Wide Web Conference, Hong Kong, China (May 2001)Google Scholar
  7. 7.
    Baeza-Yates, R., Saint-Jean, F.: A three level search engine index based in query log distribution. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 56–65. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  8. 8.
    Baeza-Yates, R.: Query Usage Mining in Search Engines. In: Scime, A. (ed.) Web Mining: Applications and Techniques, pp. 307–321. Idea Group, USA (2004)Google Scholar
  9. 9.
    Baeza-Yates, R., Poblete, B.: A Web Usage and Content Mining Tool Centered in Queries (2004) (submitted)Google Scholar
  10. 10.
    Batista, P., Silva, M.J.: Mining on-line newspaper web access logs. In: RPEC2- Workshop on recommendation and personalization on e-commerce, Spain (2002)Google Scholar
  11. 11.
    Beeferman, D., Berger, A.: Agglomerative clustering of a search engine query log. In: KDD 2000, Boston, MA, USA, pp. 407–416 (2000)Google Scholar
  12. 12.
    Cooley, R., Tan, P., Srivastava, J.: Websift: the web site information filter system (1999)Google Scholar
  13. 13.
    Davison, B.D., Deschenes, D.G., Lewanda, D.B.: Finding relevant website queries. In: Poster Proceedings of the Twelfth International World Wide Web Conference, Budapest, Hungary (May 2003)Google Scholar
  14. 14.
    Ding, C., Chi, C.: Towards an adaptive and task-specific ranking mechanism in web searching (poster session). In: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, Athens, Greece, pp. 375–376. ACM Press, New York (2000), CrossRefGoogle Scholar
  15. 15.
    DirectHit: Main Page (1998),
  16. 16.
    Fonseca, B.M., Golgher, P.B., De Moura, E.S., Ziviani, N.: Using association rules to discovery search engines related queries. In: First Latin American Web Congress (LA-WEB 2003), Santiago, Chile (November 2003)Google Scholar
  17. 17.
    Hlscher, C., Strube, G.: Web Search Behavior of Internet Experts and Newbies. In: WWW9, Amsterdam, Netherlands, May 15 - 19 (2000)Google Scholar
  18. 18.
    Huang, Z., Ng, J., Cheung, D., Ng, M., Ching, W.: A cube model for web access sessions and cluster analysis (2001)Google Scholar
  19. 19.
    Karypis, G.: CLUTO, a clustering toolkit. Technical Report 02-017, Dept. of Computer Science, University of Minnesota (2002), Available at
  20. 20.
    Markatos, E.P.: On Caching Search Engine Query Results. In: Proceedings of the 5th International Web Caching and Content Delivery Workshop (May 2000)Google Scholar
  21. 21.
    Masseglia, F., Poncelet, P., Teisseire, M.: Using data mining techniques on web access logs to dynamically improve hypertext structure (1999)Google Scholar
  22. 22.
    Oconnor, M., Herlocker, J.: Clustering items for collaborative filtering. Technical report, University of Minnesota, Minneapolis, MN (1999),
  23. 23.
    Pei, J., Han, J., Mortazavi-asl, B., Zhu, H.: Mining access patterns efficiently from web logs. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 396–407 (2000)Google Scholar
  24. 24.
    Pirolli, P.: Computational Models of Information Scent-Following in a Very Large Browsable Text Collection. In: Human Factors in Computing Systems: Proceedings of the CHI 1997 Conference, pp. 3–10. ACM Press, New York (1997)CrossRefGoogle Scholar
  25. 25.
    Pramudiono, I., Shintani, T., Takahashi, K., Kitsuregawa, M.: User Behavior Analysis of Location Aware Search Engine. Mobile Data Management, 139–145 (2002)Google Scholar
  26. 26.
    Saraiva, P.C., de Moura, E.S., Ziviani, N., Meira, W., Fonseca, R., Ribeiro-Neto, B.: Rank-preserving two-level caching for scalable search engines. In: Proceedings of the 24th annual international ACM Conference on Research and Development in Information Retrieval, New Orleans, USA, September 2001, pp. 51–58 (2001)Google Scholar
  27. 27.
    Schaale, A., Wulf-Mathies, C., Lieberam-Schmidt, S.: A new approach to relevancy in Internet searching - the  SVox Populi Algorithm T, e-Print archive (August. 2003)Google Scholar
  28. 28.
    Seno, M., Karypis, G.: LPMINER: An algorithm for finding frequent itemsets using length-decreasing support constraint. In: Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 505–512. IEEE Computer Society, Los Alamitos (2001)CrossRefGoogle Scholar
  29. 29.
    Silverstein, C., Henzinger, M., Hannes, M., Moricz, M.: Analysis of a very large alta vista query log. SIGIR Forum 33(3), 6–12 (1999)CrossRefGoogle Scholar
  30. 30.
    Spiliopoulou, M., Faulstich, L.C.: WUM: a Web Utilization Miner. In: Atzeni, P., Mendelzon, A.O., Mecca, G. (eds.) WebDB 1998. LNCS, vol. 1590, pp. 109–115. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  31. 31.
    Spiliopoulou, M., Pohle, C., Faulstich, L.: Improving the effectiveness of a web site with web usage mining. In: Masand, B., Spiliopoulou, M. (eds.) WebKDD 1999. LNCS (LNAI), vol. 1836, pp. 142–162. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  32. 32.
    Spink, A., Wolfram, D., Jansen, B.J., Saracevic, T.: Searching the Web: the public and their queries. Journal of the American Society for Information Science and Technology 52(3), 226–234 (2001)CrossRefGoogle Scholar
  33. 33.
    Spink, A., Jansen, B.J., Wolfram, D., Saracevic, T.: From E-Sex to E-Commerce: Web Search Changes. IEEE Computer 35(3), 107–109 (2002)Google Scholar
  34. 34.
    Spink, A., Ozmutlu, S., Ozmutlu, H.C., Jansen, B.J.: U.S. Versus European Web Searching Trends. SIGIR Forum 26(2) (2002)Google Scholar
  35. 35.
    Todocl - Todo Chile en Internet (2002),
  36. 36.
    Wen, J., Mie, J., Zhang, H.: Clustering user queries of a search engine. In: Proc. at 10th International World Wide Web Conference, W3C (2001)Google Scholar
  37. 37.
    Wolfram, D.: A Query-Level Examination of End User Searching Behaviour on the Excite Search Engine. In: Proceedings of the 28th Annual Conference Canadian Association for Information Science (2000)Google Scholar
  38. 38.
    Xie, Y., O’Hallaron, D.: Locality in Search Engine Queries and Its Implications for Caching. Infocom (2002)Google Scholar
  39. 39.
    Xu, J., Croft, W.B.: Improving the effectiveness of information retrieval with the local context analysis. ACM Transaction of Information Systems 1(18), 79–112 (2000)CrossRefGoogle Scholar
  40. 40.
    Xue, G.-R., Zeng, H.-J., Chen, Z., Ma, W.-Y., Lu, C.-J.: Log Mining to Improve the Performance of Site Search. In: 1st Int. Workshop for Enhanced Web Search (MEWS 2002), Singapore, pp. 238–245. IEEE CS Press, Los Alamitos (2002)Google Scholar
  41. 41.
    Zaiane, O.R., Strilets, A.: Finding similar queries to satisfy searches based on query traces. In: Proceedings of the International Workshop on Efficient Web-Based Information Systems (EWIS), Montpellier, France (September 2002)Google Scholar
  42. 42.
    Zhang, D., Dong, Y.: A novel web usage mining approach for search engines. Computer Networks 39(3), 303–310 (2002)CrossRefGoogle Scholar
  43. 43.
    Zhao, Y., Karypis, G.: Comparison of agglomerative and partitional document clustering algorithms. In: SIAM Workshop on Clustering High-dimensional Data and its Applications (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Ricardo Baeza-Yates
    • 1
    • 2
  1. 1.Center for Web Research, Dept. of Computer ScienceUniversidad de ChileSantiago
  2. 2.ICREA Research Professor, Technology DepartmentUniversitat Pompeu FabraSpain

Personalised recommendations