Information Retrieval

, Volume 14, Issue 3, pp 315–336

The sum of its parts: reducing sparsity in click estimation with query segments


    • Yahoo! Inc
  • Eren Manavoglu
    • Yahoo! Inc
  • Hema Raghavan
    • Yahoo! Inc
  • Chris Leggetter
    • Yahoo! Inc
  • Erick Cantú-Paz
    • Yahoo! Inc
  • Rukmini Iyer
    • Yahoo! Inc
Web Mining for Search

DOI: 10.1007/s10791-010-9152-6

Cite this article as:
Hillard, D., Manavoglu, E., Raghavan, H. et al. Inf Retrieval (2011) 14: 315. doi:10.1007/s10791-010-9152-6


The critical task of predicting clicks on search advertisements is typically addressed by learning from historical click data. When enough history is observed for a given query-ad pair, future clicks can be accurately modeled. However, based on the empirical distribution of queries, sufficient historical information is unavailable for many query-ad pairs. The sparsity of data for new and rare queries makes it difficult to accurately estimate clicks for a significant portion of typical search engine traffic. In this paper we provide analysis to motivate modeling approaches that can reduce the sparsity of the large space of user search queries. We then propose methods to improve click and relevance models for sponsored search by mining click behavior for partial user queries. We aggregate click history for individual query words, as well as for phrases extracted with a CRF model. The new models show significant improvement in clicks and revenue compared to state-of-the-art baselines trained on several months of query logs. Results are reported on live traffic of a commercial search engine, in addition to results from offline evaluation.


Query log miningClicksRelevanceAdvertising

Copyright information

© Springer Science+Business Media, LLC 2011