Background

Paid search (it is also known as sponsored search, search engine advertising and search advertising) is an important form of online advertising that serves ads to match user’s query on search engine result page. It has been exponential growth and become the central business model of the major search engine companies since its inception in 1998 [1, 2]. In paid search, a set of ads are displayed and clearly labeled as sponsored along with organic search results when answering a query. Figure 1 shows an example of paid search result on the search engine of Bing, and all ads are indicated by the red box. Although displayed simultaneously and in similar forms, paid search results are actually generated by a quite different mechanism from that of organic search. While organic search results are produced according to the relevance of each web page to the query, the paid search results are generated according to an auction process participated by three players of advertisers, search users and publishers(i.e., search engine) [3, 4], which is very large scale, with billions of keywords, tens of millions of ads, billions of users, millions of advertisers where events such as clicks and actions can be extremely rare.

Fig. 1
figure 1

One example of paid search result on the search engine of Bing with the user query of “auto insurance”

Figure 2 shows a typical ecosystem of the paid search [5]. When one search user submits a query, the search engine will first retrieve some candidate ads whose bid keywords match the query, which are provided by the advertisers in advance. Then the search engine will run an auction on these candidate ads by considering both the ad quality and its bid price [6]. Those ads are ranked by the product of bid prices and quality scores, and the top ones will win the auction and be displayed at the different positions on search result page according to the rank1. If one ad is clicked by the user, its advertiser will be charged by the ad platform, e.g., Microsoft advertising adCenter2 and Google AdWords3. Mainstream ad platform adopts the generalized second price (GSP) [7] auction mechanism, which means that the advertiser’s cost of a click depends on the bid price and the relevance score of the next ad in the ranking list of the auction.

Fig. 2
figure 2

Paid search ecosystem

In the system of paid search, each advertiser is asked to set up its accounts off-line with the common organization structure like the one shown in Fig. 3. It is a tree structure of three layers (i.e., Account, Campaign and Ad Group). In this structure, each account is associated with a unique email address and billing information, each campaign has its own budget and targeting setting that determine where/when its ads appear, and each ad group contains a set of similar ads, keywords, matchtypes and bid prices. The matchtypes help platform determine how closely a search query must match the keyword, and the definitions are shown in Table 1. Generally, the more precise the matchtype (e.g., EM is the most precise matchtype), the higher conversion rates and the lower volume tends to be4. A triple of <keyword, matchtype, bid > is usually referred to as an orderitem, which participates in the auction process actually.

Fig. 3
figure 3

One illustrated example of advertiser structure

Table 1 Match type definition

In order to win the auction, an advertiser should carefully consider which keywords to bid and what price to set for each of these keywords. However, not many advertisers are good at dealing with this because paid search ia a complex system [8]. It is highly dynamic in terms of the rapid change of user information needs, non-stationary bids of advertisers, and the frequent modifications of ad campaigns. Moreover, those three players interact with each other harmoniously but exhibit a conflict of interest when it comes to risk and revenue objectives. Although each advertiser understands their business logic very well, they usually are not familiar with the other parts of the paid search system, that’s why they usually need the ad platform to give suggestions to help optimize their setting to achieve their expected performance.

Adaptive models help advertisers optimize their bids, keywords and budgets, which are firstly estimated from the historical auctions over the market place, then adapted by the required advertiser’s historical performance. These models simulate each auction process to predict the results of the orderitem with different bid prices. The results include the probabilities that the orderitem wins the auction, and the probabilities that the ad is shown at different positions on search engine and clicked by the search users. After sum of click probabilities over all auctions of that keyword participated in, the advertisers can select those keywords with appropriate bids which get the highest click number5.

Numerous methods have been proposed to optimize advertisers’ revenue in paid search [1, 2], yet we are unaware of comprehensive review of the subject over the recent decade, which is the aim of this paper. In the rest of the paper, we first introduce some existing prominent approaches to improve the advertisers’ revenue, followed by the recent progress in adaptive models for large-scale advertisers optimization in paid search with several practical tools. Finally, we conclude and discuss about future work.

Related works

A wide range of models have been proposed to optimize paid search, including the auction mechanism, relevance modeling, and advertisers setting. In this section, we survey some prominent approaches.

Auction mechanism optimization

In the auction process, an auction mechanism serves as the core role in the paid search platform, which is used to select the ads shown to search users and determine the prices charged from advertisers. There have been several pieces of work in the literature that investigated how to design an auction mechanism in order to optimize the revenue of the advertisers and search engine [9, 10]. The generalized first price (GFP) mechanism was the first mechanism to find application in paid search6, where advertisers pay their bids if their ads are shown and clicked. Due to the instability, Google adopted generalized second price (GSP) [7] mechanism in 2002, where advertisers pay minimum amount necessary to maintain their position. Since then, GSP has become the industry standard in paid search and has attracted a lot of research attention [1114]. However, GSP is not truthful essentially in the multi-slot setting, and assumed that all the advertisers have well-defined utilities and have necessary information and computational power to optimize the utilities. Some alternatives like the Vickrey-Clarke-Groves (VCG)7 and weighted GSP [15] auction raise. Although they may be generic truthful mechanisms [3, 16], they get different complications because they fundamentally change the way prices are computed or they are not as easy-to-understand as GSP. In recent years, many works begin to think the auction mechanism as an optimization problem of the revenue maximization from the view of game theory machine learning [1721].

Relevance modeling optimization

The relevance models of user query and ad are important in at least two aspects in the paid search: ads selection and ads ranking.

Firstly, the search engine selects the candidate ads according to the relevance score. For example, some approaches relied on the text relevance among several text streams like query, keyword, ad copy, or the landing pages [22, 23], some employed the graph information from query logs or ad click logs [24, 25], other works like Hillard et al. [26] imported both text relevance and graph information into the learning model as features, and recently, Hui et al. added the keyword monetization ability to maximize the relevance and revenue [27].

Secondly, the selected ads are ranked based on their expected revenue that is computed by the product between the relevance score and bid prices [28]. The rank position on the search engine result page will directly affect the cognitive attention of the search users and influence their subsequent clicking and purchasing behaviors [29]. In this model, the relevance score is usually represented as the predicted probability of being clicked (click-ability). A large body of work discussing paid search is devoted to finding models and techniques enabling the most precise prediction of click-ability of an ad when returned to a user [3037]. In this context, some works focused on the advanced machine learning models, like Graepel et al. [30] investigated Bayesian Networks in this problem, while Ling et al. [37] employed ensemble models. Most works investigated sufficient features to improve click prediction accuracy, like Hillard et al. [31] leveraged query segments features, and Cetin et al. [32] used a mixture of Gaussians model to characterize the historical click features. Due to the pervasive success of deep learning, recent works have detailed how deep systems can be beneficial to click prediction, like the paper of Zhai et al. [35] exploited a recurrent neural network model and Jiang et al. [36] proposed a deep neural network model. The deep model was used for automatically extracting abstract and sophisticated features from advertisements content, users’ profiles, and clicks, and these features were then used to train a logistic regression model.

Advertisers optimization

Advertisers optimization is a straightforward way to improve their revenue, which generally includes recommending appropriate keywords with matchtypes with rational bid prices and suggesting appropriate budget. For keyword recommendation, most approaches can be divided into the following types: mining query-click logs or advertiser log [25], mining semantic relationships between terms [38, 39], generating bid keywords from given ad landing pages [40, 41]. All of these works generated potential candidates and captured semantic similarity between terms based on fewer types of data logs. Recently, Yang et al. [42] utilized rich relevance feedback information from search users and advertisers and different types of relationships between keywords. In addition, the paper of Kiritchenko et al. [43] achieved the appropriate keywords via feature selection method, and Budhiraja et al. [44] focused on the long tail keywords via the concepts rather than keywords themselves. Besides generating new keywords to the advertisers directly, several works studied how to improve the broad match of existing keywords for a given query [22, 24, 4547].

In addition, there have been a number of researches [4, 4853] who investigated how advertisers determine their bid prices, and how their bid strategies influence the equilibrium of the paid search system. For example, the paper [4] assumed that the advertisers bid the amount at which their value per click equals the incremental cost per click to maximize their utilities. The paper [51] studied how to estimate value per click, by assuming advertisers were on the locally envy-free equilibrium, and assuming the distributions of all the advertisers’ bids were independent and identically distributed. Xu et al. [53] investigated advertiser behaviors via game theory machine learning, which considered different levels of rationality of advertisers. Since the advertisers are budget constrained, several dedicated works have been proposed to optimize the bid prices under the constrain of the budget [5456]. Further, to get an appropriate budget, the paper of Zhang et al. [57] studied the hierarchical structure of paid search advertisers and proposed a joint optimization of bid price and campaign budget allocation under a multi-campaign sponsored search account, and Nikolay et al. [58] optimized the budget using a Markov model of user carryover effect.

Main text

As mentioned above, keywords (together with matchtypes) and bids play a critical role in advertiser optimization for their revenue. In addition, advertisers typically prefer to spend their budget smoothly over the time in order to reach a wider range of search users accessible throughout a month (or a day) and have a sustainable impact, so an appropriate budget is required. In this section, we introduce the methods for bid suggestion, keywords recommendation, and budget optimization using adaptive models. Adaptive models are powerful because their parameters are optimized according to both the overall market place performance and the advertisers’ historical performance.

Bid suggestion

In the auction process, the candidate ads are ranked by their rankscores, which is usually calculated by

$$\begin{array}{@{}rcl@{}} Rankscore= Bid \times QualityScore, \end{array} $$
(1)

where QualityScore is an estimation of the quality of the ad, keyword, and landing page. Higher quality ads can lead to lower prices and better ad positions. From this equation, we can see that the Bid is another factor to influence the rank. Unfortunately, advertisers usually do not truthfully bid their ads because they do not know the competitors’ bids and the market place performance over the search engine.

Bid suggestion is proposed to give advertisers the competitive bids, like the minimum bid to show the ad at the side-bar position (also known as first-page bid which is abbreviated to FP bid) or the mainline position (which is abbreviated to ML bid) or the best position (also known as mainline one bid which is abbreviated to ML 1 bid). It is obviously that FPbid<=MLbid<=ML 1 bid, and higher position can lead the ad with higher click-ability [29].

The challenge in bid suggestion is that the auction process is real time, so we can not optimize bids in the auction online. The general solution is that we simulate the historical auctions with diffident bids to get a bid landscape offline, by assuming that the market place will not be fluctuant. This bid landscape can help advertisers see how different bids might change their ads’ performance, including the impression, clicks, and the charged cost. In details, we first simulate the bids in one auction. We extract the rankscores and ad positions of all orderitems participated in this auction from the historical auction logs, and calculate the required bids to show on different positions according to Eq. (1) because the QualityScore of that ad in this auction is not changed. Using this bid, we know its estimated impression is one, click is estimated by the click prediction adjusted by the ad position [59, 60], and cost is estimated by bid×click 8. In summary, we can get a series of points with the quintuple of <bid,impresson,click,cost,adPosition> in this auction, and we merge this quintuple in all of historical auctions to get the final bid landscape like one example shown in Fig. 4. In the combination of those points in two auctions, the higher bid beats the lower bid in the same ad position, and the impression, click and cost are directly added, respectively, so the combined cost-per-click is much lower than the bid, as we can see that in the Fig. 4.

Fig. 4
figure 4

One illustrated bid landscape example

In addition, we can get bid suggestions from the bid landscape in above by using different ad position thresholds. For example, we can get the ML 1 bid by choosing the minimum bid whose ad position is better than the threshold of mainline 1 position. By comparing the performance of the new bid to that of the current bid, we can show the advertisers that how much performance will be increased by setting the new bid.

Keyword recommendation

In the keyword setting, advertisers may know which keywords are related to their ads, however, the keyword relevance only helps the ads to be chosen as candidates in the auction, which can not guarantee that the ads can win the auction because these keywords are also generally set in the competitors’ campaigns. A straightforward method to solve this problem is to suggest related keywords and bids with high performance to the advertisers [8]. This method includes two tasks: (1) how to get related keywords, (2) how to set appropriate bids for these candidates and estimate the performance.

For the first task, we try to suggest related keywords to help the advertisers’ ads participate in more auctions. Those keywords for similar ads in the historical auction logs are used, because it has been proven that these keywords are related to the advertisers’ ads since they were selected into the auction by the search engine. In addition, we can modify the matchtypes of the existing keywords to match more queries, like using broad match in Table 1.

Once the candidate keywords and matchtypes are selected, we will estimate the bids and performance in the second task. This is a big challenge because these keywords and matchtypes are new for that campaign and they have no historical auctions. To overcome this issue, we firstly use all historical auctions of the same keywords and matchtypes in other campaigns to get the bid suggestion result via the method introduced in the last section. Secondly, one advertiser adjustment factor is used to adapt the market place result into the required campaign to get the final bid suggestion performance for those new added keywords or matchtypes. Because each campaign has characteristic performance, this adjustment factor is required, and it can be estimated from the performance of the other keywords in this required campaign compared to that of the same keywords in the market place.

At last, those keywords or matchtypes with higher performance (i.e., impression, clicks and cost) will be recommended to the advertisers, and we can show the achieved performance to the advertisers if they adopt them.

Budget optimization

Advertisers can set up budgets on the ad platform to control their expenses on search ads in a period (e.g., a day or a month), however, the problem is that most of advertisers have no idea on what budget is appropriate. On one hand, advertisers do not want to spend too much on the ads in paid search, on the other hand, they always expect the budget is enough to show ads throughout that period, because their performance will be constrained when the budget is not enough. Budget optimization is proposed to help advertisers to solve this problem, and the objective is that the optimized budget should be acceptable by the advertisers and enough throughout that period on the ad platform.

As shown in Fig. 3, budget is set at the campaign level. When the campaign is out of budget in one time slot (e.g., one hour), all auctions the campaign participated in that time slot will be indicated as lost due to budget issue in the auction logs. If we can estimate the performance in that time slot, then we can get the performance for that campaign in the whole period, which can be used to estimate its appropriate budget.

Formally, we split the period into a sequence of time slots (slot 1slot n , n is the slot number) and get campaigns’ performance in each slot from the auction logs. Table 2 shows this kind of formation, where \(x_{i}^{j}\) represents the performance (i.e., impression, click or cost) of the i-campaign in the j-th time slot, and the symbol of ? represents a missing value when one campaign is out of budget in a time slot. The last row denotes the average performance of market place, which is calculated only by the existing performance in each time slot. If all campaigns are out of budget in one time slot, then that average performance is also a missing value. In this context, this problem is transformed into the estimation of missing value ? in the Table 2.

Table 2 Campaign performance in a sequence of time slots

To estimate missing value ?, we use a very simple method of linear regression by assuming that all campaigns’ performance is linearly dependent on the market place performance, which can be expressed by

$$\begin{array}{@{}rcl@{}} y^{j}= a_{i}\cdot x_{i}^{j} + b_{i}, i=1,2,\ldots m, j=1,2,\ldots n, \end{array} $$
(2)

where a i and b i are the parameters for the i-th campaign and the m denotes the campaign number in the market place. These parameters can be learned by the least square method in the existing value set V i , which can be calculated by

$$\begin{array}{@{}rcl@{}} b_{i} &=& \frac{\sum_{j=1}^{n} x_{i}^{j}y^{j}-n\bar{x_{i}}\bar{y}}{\sum_{j=1}^{n} \left(x_{i}^{j}\right)^{2} - n\bar{x_{i}}^{2}}, j \in V_{i} \end{array} $$
(3)
$$\begin{array}{@{}rcl@{}} a_{i} &=& \bar{y} - b_{i} \bar{x_{i}} \end{array} $$
(4)

where \(\bar {x_{i}}\) and \(\bar {y}\) are the mean of the existing values in the set of V i for i-th campaign and market place, respectively. Once the parameters of a i and b i are learned, the missing values for the i-th campaign are easily calculated by Eq. (2). For example, the missing value ? in the j-th slot for the i-th campaign in the Table 2 can be estimated by (y jb i )/a i .

At last, we sum the cost in all time slots to get the whole cost of that campaign in the period, which is the ideal budget for the campaign because the cost represents how much the campaign is charged in paid search.

To evaluate the performance of this approach, we will check those campaigns’ performance in the next period. Totally, we have m campaigns with budget issue in the Table 2, of which m 1 campaigns accept the estimated budget and their performance are not constrained in the next period (i.e., no missing values), then we get the satisfaction ratio of m 1/m as the evaluation metric of budget optimization approach, the higher the better.

Practical tools in the ad platform

Advertisers optimization is very useful to improve the satisfaction of advertisers and increase the revenue of search engines simultaneously. There are many such kind of tools provided by the ad platform, and we will introduce two typical tools in this section: opportunity tool and keyword planner tool.

Opportunity tool is a group of service to give suggestions to improve the advertisers’ performance, mainly including bid opportunity, keyword opportunity and budget opportunity. Bid opportunity is to suggest competitive bids to get expected position, like the best position, mainline position and side-bar position, and this service adopts the bid suggestion method in above. Keyword opportunity (or broad match opportunity) is to suggest the advertisers to add new keywords or modify the matchtype to broad match, and this service is usually based on the keyword recommendation method in above. Budget opportunity is from the method of budget optimization method in above, which is to help the campaign to set up a sufficient budget. Both Microsoft BingAds9 and Google AdWords10 have this kind of tool, and Fig. 5 shows one example of opportunity tool in BingAds.

Fig. 5
figure 5

One example of opportunity tool in BingAds

Keyword planner is an ensemble tool that provides both the keyword ideas and performance estimation, which is mainly based on the combination of bid suggestion and keyword recommendation. Compared to the opportunity tool, this tool supports location targeting, and it is more flexible. In addition, it provides some historical statistics on the search history for a keyword or how competitive that keyword is. So this tool is very useful for the new campaigns, and it is also provided by Microsoft BingAds11 and Google AdWords12.

Conclusions

In this paper, we conducted an intensive study on adaptive modeling for the large-scale advertisers optimization in paid search, which has attracted much attention in both industry and research community and has achieved tremendous advance over the recent decade. We firstly discuss the analysis of paid search and the challenges for advertisers optimization, along with its importance and usefulness. Then we review some related works on optimizing paid search revenue. At last, we introduce the adaptive models for advertisers optimization from three aspects of the bid suggestion, keyword recommendation and budget optimization, and the related tools in the commercial ad platform show their effectiveness though not perfect.

There are several future directions to be discussed. The first direction is to refine the adaptive models to perform optimization for multiple advertisers simultaneously. In particular, we will consider the change of other advertisers’ setting and the traffic fluctuation in the market place. Overcoming such kind of fluctuation will definitely improve the performance of adaptive models.

The second direction is to propose a unified optimization framework, which can optimize the keyword, bid and budget simultaneously. The advertisers optimization methods we introduce in this paper are isolated, however, the information of keyword, bid and budget are interrelated to affect the advertisers’ performance in paid search. Taking the full advantage of the unified optimization framework, we can improve the performance effectively.

The third direction is to apply more advanced machine learning algorithm into this problem, like using matrix completion [61] for budget optimization. Matrix completion method has been widely used in collaborative filtering [62], how to take advantage of matrix completion to solve budget optimization is very worth studying.

The fourth direction is to consider user experience in the advertisers optimization models. As we all know, paid search is a three-player game in which search user is one of the key players. Therefore, we should consider user experience besides relevance and competitiveness to obtain a comprehensive evaluation of the entire ecosystem.

Endnotes

1 Usually a reserve score is set and the ads whose scores are greater than the reserve score are shown.

2 BingAds: https://bingads.microsoft.com/

3 AdWords: https://adwords.google.com/home/

4 Matchtype: https://help.bingads.microsoft.com/apex/index/3/en-us/50822

5 More clicks usually bring more conversions and more revenue for those advertisers.

6 GFP mechanism: https://en.wikipedia.org/wiki/Generalized_first-price_auction

7 VCG mechanism: https://en.wikipedia.org/wiki/Vickrey-Clarke-Groves_mechanism

8 Here, this bid is the minimum bid for this ad position, which is used as cost-per-click (CPC).

9 Opportunity in BingAds: https://help.bingads.microsoft.com/apex/index/3/en-us/51103

10 Opportunity in AdWords: https://support.google.com/adwords/answer/3448398

11 Keyword Planner in BingAds: https://advertise.bingads.microsoft.com/en-us/solutions/tools/keyword-planner

12 Keyword Planner in AdWords: https://support.google.com/adwords/answer/2999770