In sponsored search it is critical to match ads that are relevant to a query and to accurately predict their likelihood of being clicked. Commercial search engines typically use machine learning models for both query-ad relevance matching and click-through-rate (CTR) prediction. However, matching models are based on the similarity between a query and an ad, ignoring the fact that a retrieved ad may not attract clicks, while click models rely on click history, limiting their use for new queries and ads. We propose a deeply supervised architecture that jointly learns the semantic embeddings of a query and an ad as well as their corresponding CTR. We also propose a novel cohort negative sampling technique for learning implicit negative signals. We trained the proposed architecture using one billion query-ad pairs from a major commercial web search engine. This architecture improves the best-performing baseline deep neural architectures by 2% of AUC for CTR prediction and by statistically significant 0.5% of NDCG for query-ad matching.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
We use word cohort to disambiguate our sampling strategy from the traditional mini-batch i.i.d. sampling.
Aiello L, Arapakis I, Baeza-Yates R, Bai X, Barbieri N, Mantrach A, Silvestri F (2016) The role of relevance in sponsored search. In: 25th ACM international conference on information and knowledge management. ACM, pp 185–194
Baeza-Yates RA, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley Longman Publishing Co. Inc, Boston
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: International conference on learning representations
Bhamidipati N, Kant R, Mishra S (2017) A large scale prediction engine for app install clicks and conversions. In: Conference on information and knowledge management. ACM, pp 167–175
Cheng H, Cantú-Paz E (2010) Personalized click prediction in sponsored search. In: 3rd ACM international conference on web search and data mining. ACM, pp 351–360
Chen Y, Yan TW (2012) Position-normalized click prediction in search advertising. In: 18th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 795–803
Chen T, Sun Y, Shi Y, Hong L (2017) On sampling strategies for neural network-based collaborative filtering. In: 23rd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 767–776
Conneau A, Schwenk H, Barrault L, Lecun Y (2017) Very deep convolutional networks for text classification. In: 15th Conference of the European chapter of the association for computational linguistics, pp 1107–1116
Edizel B, Mantrach A, Bai X (2017) Deep character-level click-through rate prediction for sponsored search. In: 40th ACM SIGIR international conference on research and development in information retrieval, pp 305–314
Fuxman A, Tsaparas P, Achan K, Agrawal R (2008) Using the wisdom of the crowds for keyword generation. In: 17th international conference on world wide web. ACM, pp 61–70
Gligorijevic D, Gligorijevic J, Raghuveer A, Grbovic M, Obradovic Z (2018a) Modeling mobile user actions for purchase recommendation using deep memory networks. In: The 41st international ACM SIGIR conference on research and development in information retrieval, pp 1021–1024
Gligorijevic D, Stojanovic J, Satz W, Stojkovic I, Schreyer K, Del Portal D, Obradovic Z (2018b) Deep attention model for triage of emergency department patients. In: SIAM international conference on data mining, pp 297–305
Graepel T, Candela JQ, Borchert T, Herbrich R (2010) Web-scale bayesian click-through rate prediction for sponsored search advertising in Microsoft’s Bing search engine. In: 27th international conference on machine learning, pp 13–20
Grbovic M, Djuric N, Radosavljevic V, Silvestri F, Bhamidipati N (2015) Context- and content-aware embeddings for query rewriting in sponsored search. In: International ACM SIGIR conference on research and development in information retrieval, pp 383–392
Grbovic M, Djuric N, Radosavljevic V, Silvestri F, Baeza-Yates R, Feng A, Ordentlich E, Yang L, Owens L (2016) Scalable semantic matching of search queries to ads in sponsored search advertising. In: international ACM SIGIR conference on research and development in information retrieval, pp 375–384
Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2017) LSTM: a search space odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232
Guo J, Fan Y, Ai Q, Croft WB (2016) A deep relevance matching model for ad-hoc retrieval. In: 25th ACM international conference on information and knowledge management. ACM, pp 55–64
He X, Pan J, Jin O, Xu T, Liu B, Xu T, Shi Y, Atallah A, Herbrich R, Bowers S, et al (2014) Practical lessons from predicting clicks on ads at Facebook. In: 8th international workshop on data mining for online advertising. ACM, pp 1–9
Huang PS, He X, Gao J, Deng L, Acero A, Heck L (2013) Learning deep structured semantic models for web search using clickthrough data. In: 22nd ACM international conference on information and knowledge management. ACM, pp 2333–2338
Jaech A, Kamisetty H, Ringger E, Clarke C (2017) Match-tensor: a deep relevance model for search. arXiv preprint arXiv:1701.07795
Jiang Z (2016) Research on CTR prediction for contextual advertising based on deep architecture model. J Control Eng Appl Inform 18(1):11–19
Jones R, Rey B, Madani O, Greiner W (2006) Generating query substitutions. In: 15th international conference on world wide web. ACM, pp 387–396
Kingma D, Ba J (2015) Adam: a method for stochastic optimization. In: 3rd international conference on learning representations
Lee CY, Xie S, Gallagher P, Zhang Z, Tu Z (2015) Deeply-supervised nets. In: Artificial intelligence and statistics, pp 562–570
Li H, Xu J et al (2014) Semantic matching in search. Found Trends Inf Retr 7(5):343–469
Liu P, Qiu X, Huang X (2016) Deep multi-task learning with shared memory. In: Conference on empirical methods in natural language processing, pp 118–127
McMahan HB, Holt G, Sculley D, Young M, Ebner D, Grady J, Nie L, Phillips T, Davydov E, Golovin D, et al (2013) Ad click prediction: a view from the trenches. In: 19th ACM SIGKDD international conference on knowledge discovery and data mining
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013a) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst 26:3111–3119
Mikolov T, Chen K, Corrado G, Dean J (2013b) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
Mitra B, Diaz F, Craswell N (2017) Learning to match using local and distributed representations of text for web search. In: 26th international conference on world wide web. International World Wide Web Conferences Steering Committee, pp 1291–1299
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Empirical methods in natural language processing, pp 1532–1543
Richardson M, Dominowska E, Ragno R (2007) Predicting clicks: estimating the click-through rate for new ads. In: 16th international conference on world wide web. ACM, pp 521–530
Robertson SE, Walker S (1994) Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In: International ACM SIGIR conference on research and development in information retrieval. Springer, New York, pp 232–241
Rumelhart DE, Hinton GE, Williams RJ et al (1988) Learning representations by back-propagating errors. Cognit Model 5(3):1
Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681
Shan Y, Hoens TR, Jiao J, Wang H, Yu D, Mao J (2016) Deep crossing: web-scale modeling without manually crafted combinatorial features. In: 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 255–262
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A, et al (2015) Going deeper with convolutions. In: International conference on learning representations, pp 1–9
Wang Y, Wang L, Li Y, He D, Chen W, Liu TY (2013) A theoretical analysis of NDCG ranking measures, vol. 8. In: 26th annual conference on learning theory
Yan S, Lin W, Wu T, Xiao D, Zheng X, Wu B, Liu K (2018) Beyond keywords and relevance: a personalized ad retrieval framework in e-commerce sponsored search. In: 27th international conference on world wide web, pp 1919–1928
Zhai S, Chang Kh, Zhang R, Zhang ZM (2016) Deepintent: learning attentions for online advertising with recurrent neural networks. In: 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1295–1304
Zhang Y, Dai H, Xu C, Feng J, Wang T, Bian J, Wang B, Liu TY (2014) Sequential click prediction for sponsored search with recurrent neural networks. In: AAAI conference on artificial intelligence, pp 1369–1375
Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. In: Advances in neural information processing systems, pp 649–657
Zhang Y, Lee K, Lee H (2016) Augmenting supervised neural networks with unsupervised objectives for large-scale image classification. In: International conference on machine learning, pp 612–621
Zheng Z, Zha H, Zhang T, Chapelle O, Chen K, Sun G (2008) A general boosting method and its application to learning ranking functions for web search. In: Advances in neural information processing systems, pp 1697–1704
The authors gratefully thank to Lee Yang for his invaluable help in deploying our models on distributed GPU clusters, as well as Aleksandar Obradovic and Stefan Obradovic for proofreading and editing the language of the manuscript. The authors would like to thank the anonymous referees for their valuable comments and suggestions.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Amit Goyal: The work was done when the author was with Yahoo Research.
Responsible editor: Po-ling Loh, Evimaria Terzi, Antti Ukkonen, Karsten Borgwardt.
About this article
Cite this article
Gligorijevic, J., Gligorijevic, D., Stojkovic, I. et al. Deeply supervised model for click-through rate prediction in sponsored search. Data Min Knowl Disc 33, 1446–1467 (2019). https://doi.org/10.1007/s10618-019-00625-3
- Deep learning
- Click prediction
- Query to ad matching