Web Mining for Search

Information Retrieval

, Volume 14, Issue 3, pp 315-336

First online:

The sum of its parts: reducing sparsity in click estimation with query segments

  • Dustin HillardAffiliated withYahoo! Inc Email author 
  • , Eren ManavogluAffiliated withYahoo! Inc
  • , Hema RaghavanAffiliated withYahoo! Inc
  • , Chris LeggetterAffiliated withYahoo! Inc
  • , Erick Cantú-PazAffiliated withYahoo! Inc
  • , Rukmini IyerAffiliated withYahoo! Inc

Rent the article at a discount

Rent now

* Final gross prices may vary according to local VAT.

Get Access


The critical task of predicting clicks on search advertisements is typically addressed by learning from historical click data. When enough history is observed for a given query-ad pair, future clicks can be accurately modeled. However, based on the empirical distribution of queries, sufficient historical information is unavailable for many query-ad pairs. The sparsity of data for new and rare queries makes it difficult to accurately estimate clicks for a significant portion of typical search engine traffic. In this paper we provide analysis to motivate modeling approaches that can reduce the sparsity of the large space of user search queries. We then propose methods to improve click and relevance models for sponsored search by mining click behavior for partial user queries. We aggregate click history for individual query words, as well as for phrases extracted with a CRF model. The new models show significant improvement in clicks and revenue compared to state-of-the-art baselines trained on several months of query logs. Results are reported on live traffic of a commercial search engine, in addition to results from offline evaluation.


Query log mining Clicks Relevance Advertising