Information Retrieval

, Volume 14, Issue 3, pp 315–336 | Cite as

The sum of its parts: reducing sparsity in click estimation with query segments

  • Dustin HillardEmail author
  • Eren Manavoglu
  • Hema Raghavan
  • Chris Leggetter
  • Erick Cantú-Paz
  • Rukmini Iyer
Web Mining for Search


The critical task of predicting clicks on search advertisements is typically addressed by learning from historical click data. When enough history is observed for a given query-ad pair, future clicks can be accurately modeled. However, based on the empirical distribution of queries, sufficient historical information is unavailable for many query-ad pairs. The sparsity of data for new and rare queries makes it difficult to accurately estimate clicks for a significant portion of typical search engine traffic. In this paper we provide analysis to motivate modeling approaches that can reduce the sparsity of the large space of user search queries. We then propose methods to improve click and relevance models for sponsored search by mining click behavior for partial user queries. We aggregate click history for individual query words, as well as for phrases extracted with a CRF model. The new models show significant improvement in clicks and revenue compared to state-of-the-art baselines trained on several months of query logs. Results are reported on live traffic of a commercial search engine, in addition to results from offline evaluation.


Query log mining Clicks Relevance Advertising 


  1. Agarwal, D., Agrawal, R., Khanna, R., & Kota, N. (2010). Estimating rates of rare events with multiple hierarchies through scalable log-linear models. In KDD (pp. 213–222).Google Scholar
  2. Agarwal, D., Broder, A. Z., Chakrabarti, D., Diklic, D., Josifovski, V., & Sayyadian, M. (2007). Estimating rates of rare events at multiple resolutions. In KDD (pp. 16–25). New York, NY, USA, ACM.Google Scholar
  3. Agichtein, E., Brill, E., & Dumais, S. (2006). Improving web search ranking by incorporating user behavior information. In SIGIR.Google Scholar
  4. Anastasakos, T., Hillard, D., Kshetramade, S., & Raghavan, H. (2009). A collaborative filtering approach to ad recommendation using the query ad click graph. Technical Report YL-2009-006, Yahoo! Labs, Aug.Google Scholar
  5. Ashkan, A., Clarke, C. L. A., Agichtein, E., & Guo, Q. (2009). Estimating ad clickthrough rate through query intent analysis. In WI-IAT ’09: Proceedings of the 2009 IEEE/WIC/ACM international joint conference on web intelligence and intelligent agent technology (pp. 222–229). Washington, DC, USA: IEEE Computer Society.Google Scholar
  6. Baeza-Yates, R., Hurtado, C., & Mendoza, M. (2007). Improving search engines by query clustering. Journal of the American Society for Information Science and Technology, 58(12), 1793–1804.CrossRefGoogle Scholar
  7. Beeferman, D., & Berger, A. (2000). Agglomerative clustering of a search engine query log. In KDD.Google Scholar
  8. Broder, A., Ciccolo, P., Gabrilovich, E., Josifovski, V., Metzler, D., Riedel, L., et al. (2009). Online expansion of rare queries for sponsored search. In WWW (pp. 511–520).Google Scholar
  9. Broder, A. Z., Ciccolo, P., Fontoura, M., Gabrilovich, E., Josifovski, V., & Riedel, L. (2008). Search advertising using web relevance feedback. In CIKM.Google Scholar
  10. Chapelle, O., & Zhang, Y. (2009). A dynamic bayesian network click model for web search ranking. WWW.Google Scholar
  11. Chen, S., & Rosenfeld, R. (1999). A gaussian prior for smoothing maximum entropy models. Technical report, Carnegie Mellon University.Google Scholar
  12. Ciaramita, M., Murdock, V., & Plachouras, V. (2008). Online learning from click data for sponsored search. In WWW.Google Scholar
  13. Craswell, N., Zoeter, O., Taylor, M., & Ramsey, B. (2008). An experimental comparison of click position-bias models. WSDM.Google Scholar
  14. Dupret, G. E., & Piwowarski, B. (2008). A user browsing model to predict search engine click data from past observations. In SIGIR.Google Scholar
  15. Edelman, B., Ostrovsky, M., & Schwarz, M. (2007). Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords. American Economic Review, 97(1), 242–259.Google Scholar
  16. Guo, F., Liu, C., Kannan, A., Minka, T., Taylor, M., Wang, Y., & Faloutsos, C. (2009). Click chain model in web search. WWW.Google Scholar
  17. Hillard, D., Schroedl, S., Manavoglu, E., Raghavan, H., & Leggetter, C. (2010). Improving ad relevance in sponsored search. In WSDM.Google Scholar
  18. Jansen, B., & Resnick, M. (2005). Examining searcher perceptions of and interactions with sponsored results. In Workshop on Sponsored Search Auctions.Google Scholar
  19. Joachims, T., Granka, L., Pan, B., Hembrooke, H., & Gay, G. (2005). Accurately interpreting clickthrough data as implicit feedback. In SIGIR.Google Scholar
  20. Jones, R., Rey, B., Madani, O., & Greiner, W. (2006). Generating query substitutions. In WWW.Google Scholar
  21. Li, X., Wang, Y.-Y., & Acero, A. (2009). Extracting structured information from user queries with semi-supervised conditional random fields. In SIGIR (pp. 572–579).Google Scholar
  22. Minka, T. (2003). A comparison of numerical optimizers for logistic regression. Technical report, Microsoft.Google Scholar
  23. Mordecai A. (2003) Nonlinear programming: Analysis and methods. New York: Dover PublishingGoogle Scholar
  24. Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In SIGIR.Google Scholar
  25. Rabiner, L. R. (1989). A tutorial on hidden markov models and selected applications in speech recognition. In Proceedings of the IEEE (pp. 257–286).Google Scholar
  26. Radlinski, F., Broder, A., Ciccolo, P., Gabrilovich, E., Josifovski, V., & Riedel, L. (2008). Optimizing relevance and revenue in ad search:a query substitution approach. In SIGIR.Google Scholar
  27. Raghavan, H., & Iyer, R. (2008). Evaluating vector-space and probabilistic models for query to ad matching. In SIGIR ’08 Workshop on information retrieval in advertising (IRA).Google Scholar
  28. Raghavan, H., & Iyer, R. (2010). Probabilistic first pass retrieval for search advertising: From theory to practice. In CIKM.Google Scholar
  29. Regelson, M., & Fain, D. C. (2007). Predicting click-through rate using keyword clusters. In In electronic commerce (EC). ACM.Google Scholar
  30. Richardson, M., Dominowska, E., & Ragno, R. (2007). Predicting clicks: Estimating the click-through rate for new ads. In WWW.Google Scholar
  31. Sculley, D., Malkin, R. G., Basu, S., & Bayardo, R. J. (2009). Predicting bounce rates in sponsored search advertisements. In KDD (pp. 1325–1334).Google Scholar
  32. Shaparenko, B., Cetin, O., & Iyer, R. (2009). Data driven text features for sponsored search click prediction. In AdKDD Workshop.Google Scholar
  33. Srikant, R., Basu, S., Wang, N., & Pregibon, D. (2010). User browsing models: Relevance versus examination. In KDD.Google Scholar
  34. Tan, B., & Peng, F. (2008). Unsupervised query segmentation using generative language models and wikipedia. In WWW (pp. 347–356).Google Scholar
  35. Tang, D., Agarwal, A., O’Brien, D., & Meyer, M. (2010). Overlapping experiment infrastructure: More, better, faster experimentation. In KDD (pp. 17–26). New York, NY, USA: ACM.Google Scholar
  36. Xu, W., Manavoglu, E., Cantú-Paz, E. (2010). Temporal click model for sponsored search. In SIGIR.Google Scholar
  37. Zhang, W. V., & Jones, R. (2007). Comparing click logs and editorial labels for training query rewriting. In Amitay, E., Murray, C. G., & Teevan, J., (Eds.), Query log analysis: Social and technological challenges. A workshop at the 16th international World Wide Web conference (WWW 2007), May.Google Scholar
  38. Zhang, Z., & Nasraoui, O. (2006). Mining search engine query logs for query recommendation. In WWW.Google Scholar
  39. Zheng, Z., Zha, H., Zhang, T., Chapelle, O., Chen, K., & Sun, G. (2008). A general boosting method and its application to learning ranking functions for web search. In NIPS (pp. 1697–1704).Google Scholar
  40. Zhou, D., Bolelli, L., Li, J., Giles, C. L., & Zha, H. (2007). Learning user clicks in web search. In IJCAI.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Dustin Hillard
    • 1
    Email author
  • Eren Manavoglu
    • 1
  • Hema Raghavan
    • 1
  • Chris Leggetter
    • 1
  • Erick Cantú-Paz
    • 1
  • Rukmini Iyer
    • 1
  1. 1.Yahoo! IncSunnyvaleUSA

Personalised recommendations