Abstract
This paper presents the results of the Dynamic Pricing Challenge, held on the occasion of the 17th INFORMS Revenue Management and Pricing Section Conference on June 29–30, 2017 in Amsterdam, The Netherlands. For this challenge, participants submitted algorithms for pricing and demand learning of which the numerical performance was analyzed in simulated market environments. This allows consideration of market dynamics that are not analytically tractable or can not be empirically analyzed due to practical complications. Our findings implicate that the relative performance of algorithms varies substantially across different market dynamics, which confirms the intrinsic complexity of pricing and learning in the presence of competition.
This is a preview of subscription content, access via your institution.
Notes
https://www.profitero.com/2013/12/profiterorevealsthatamazoncommakesmorethan25millionpricechangeseveryday/, visited on December 12, 2017.
https://camelcamelcamel.com/HolyBibleJamesVersionBurgundy/product/0718015592, visited on December 12, 2017.
The figure for the shopper segment is omitted to save space since it is very similar to Fig. 7, except for that the revenues decrease in the shopper share.
A plot of wls vs bgrid is omitted to save space, but is very similar to the plot of wls vs bbucket.
References
Aghion, P., M.P. Espinosa, and B. Jullien. 1993. Dynamic duopoly with learning through market experimentation. Economic Theory 3 (3): 517–539.
Akcay, Y., H.P. Natarajan, and S.H. Xu. 2010. Joint dynamic pricing of multiple perishable products under consumer choice. Management Science 56 (8): 1345–1361.
Alepuz, M.D., and A. Urbano. 1999. Duopoly experimentation: Cournot competition. Mathematical Social Sciences 37 (2): 165–188.
Anufriev, M., D. Kopányi, and J. Tuinstra. 2013. Learning cycles in Bertrand competition with differentiated commodities and competing learning rules. Journal of Economic Dynamics and Control 37 (12): 2562–2581.
Araman, V., and R. Caldentey. 2009. Dynamic pricing for nonperishable products with demand learning. Operations Research 57 (5): 1169–1188.
Belleflamme, P., and F. Bloch. 2001. Price and quantity experimentation: A synthesis. International Journal of Industrial Organization 19 (10): 1563–1582.
Bergemann, D., and J. Valimaki. 1996. Market experimentation and pricing, Cowles Foundation Discussion Paper 1122, Cowles Foundation for Research in Economics at Yale University, https://ideas.repec.org/p/cwl/cwldpp/1122.html. Accessed 5 June 2018.
Bertsimas, D., and G. Perakis. 2006. Dynamic pricing: A learning approach. In Mathematical and Computational Models for Congestion Charging, 45–79. New York: Springer.
Besbes, O., and A. Zeevi. 2009. Dynamic pricing without knowing the demand function: Risk bounds and nearoptimal algorithms. Operations Research 57 (6): 1407–1420.
Bischi, G.I., C. Chiarella, and M. Kopel. 2004. The long run outcomes and global dynamics of a duopoly game with misspecified demand functions. International Game Theory Review 06 (03): 343–379.
Bischi, G.I., A.K. Naimzada, and L. Sbragia. 2007. Oligopoly games with local monopolistic approximation. Journal of Economic Behavior & Organization 62 (3): 371–388.
Boissier, M., R. Schlosser, N. Podlesny, S. Serth, M. Bornstein, J. Latt, J. Lindemann, J. Selke, and M. Uflacker. 2017. Datadriven repricing strategies in competitive markets: An interactive simulation platform. In Proceedings of the Eleventh ACM Conference on Recommender Systems, RecSys’17, 355–357.
Broder, J., and P. Rusmevichientong. 2012. Dynamic pricing under a general parametric choice model. Operations Research 60 (4): 965–980.
Brousseau, V., and A. Kirman. 1992. Apparent convergence of learning processes in misspecified games. In Game Theory and Economic Applications, ed. B. Dutta, D. Mookherjee, T. Parthasarathy, T.E.S. Raghavan, D. Ray, and S. Tijs, 303–331. Berlin: Springer.
Chen, Y., and V. Farias. 2013. Simple policies for dynamic pricing with imperfect forecasts. Operations Research 61 (3): 612–624.
Cheung, W. C., D. SimchiLevi, and H. Wang. 2013. Dynamic pricing and demand learning with limited price experimentation, Working paper, http://web.mit.edu/wanghe/www/research.html. Accessed 2 February 2018.
Chung, B.D., J. Li, T. Yao, C. Kwon, and T.L. Friesz. 2012. Demand learning and dynamic pricing under competition in a statespace framework. IEEE Transactions on Engineering Management 59 (2): 240–249.
Cooper, W. L., T.H. de Mello, and A.J. Kleywegt. 2014. Learning and pricing with models that do not explicitly incorporate competition, Working paper, University of Minnesota. http://www.menet.umn.edu/~billcoop/competitive2014.pdf.
Cyert, R.M., and M.H. DeGroot. 1970. Bayesian analysis and duopoly theory. Journal of Political Economy 78 (5): 1168–1184.
Dasgupta, P., and R. Das. 2000. Dynamic pricing with limited competitor information in a multiagent economy. In Cooperative information systems. Lecture notes in computer science, vol. 1901, eds. P. Scheuermann, and O. Etzion, 299–310. Berlin, Heidelberg: Springer.
den Boer, A.V. 2015. Dynamic pricing and learning: Historical origins, current research, and new directions. Surveys in Operations Research and Management Science 20 (1): 1–18.
den Boer, A., and B. Zwart. 2014. Simultaneously learning and optimizing using controlled variance pricing. Management Science 60 (3): 770–783.
den Boer, A., and B. Zwart. 2015. Dynamic pricing and learning with finite inventories. Operations Research 63 (4): 965–978.
DiMicco, J.M., P. Maes, and A. Greenwald. 2003. Learning curve: A simulationbased approach to dynamic pricing. Electronic Commerce Research 3 (3–4): 245–276.
Dimitrova, M., and E.E. Schlee. 2003. Monopoly, competition and information acquisition. International Journal of Industrial Organization 21 (10): 1623–1642.
Farias, V., and B. van Roy. 2010. Dynamic pricing with a prior on market response. Operations Research 58 (1): 16–29.
Fisher, M., S. Gallino, and J. Li. 2017. Competitionbased dynamic pricing in online retailing: A methodology validated with field experiments. Management Science 64 (6): 2496–2514.
Fishman, A., and N. Gandal. 1994. Experimentation and learning with networks effects. Economics Letters 44 (1–2): 103–108.
Friesz, T. L., C. Kwon, T.I. Kim, L. Fan, and T. Yao. 2012. Competitive robust dynamic pricing in continuous time with fixed inventories. arXiv:1208.4374 [math.OC].
Gallego, A.G. 1998. Oligopoly experimentation of learning with simulated markets. Journal of Economic Behavior & Organization 35 (3): 333–355.
Greenwald, A.R., and J.O. Kephart. 1999. Shopbots and pricebots. In Agent mediated electronic commerce II, ed. A. Moukas, C. Sierra, and F. Ygge, 1–23. Berlin: Springer.
Harrington, J.E. 1995. Experimentation and learning in a differentiatedproducts duopoly. Journal of Economic Theory 66 (1): 275–288.
Harrison, J.M., N.B. Keskin, and A. Zeevi. 2012. Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution. Management Science 58 (3): 570–586.
Isler, K., and H. Imhof. 2008. A game theoretic model for airline revenue management and competitive pricing. Journal of Revenue and Pricing Management 7 (4): 384–396.
Johnson Ferreira, K., D. SimchiLevi, and H. Wang. 2016. Online network revenue management using Thompson sampling. Working paper, https://ssrn.com/abstract=2588730. Accessed 23 August 2018.
Jumadinova, J., and P. Dasgupta. 2008. Fireflyinspired synchronization for improved dynamic pricing in online markets. In SelfAdaptive and SelfOrganizing Systems, 2008. SASO ’08. Second IEEE International Conference on, IEEE, pp. 403–412.
Jumadinova, J., and P. Dasgupta. 2010. Multiattribute regretbased dynamic pricing. In Agentmediated electronic commerce and trading agent design and analysis. Lecture notes in business information processing, vol. 44, eds. W. Ketter, H. La Poutré, N. Sadeh, O. Shehory, and W. Walsh, 73–87. Berlin: Springer.
Keller, G., and S. Rady. 2003. Price dispersion and learning in a dynamic differentiatedgoods duopoly. The RAND Journal of Economics 34 (1): 138–165.
Keskin, N.B., and A. Zeevi. 2014. Dynamic pricing with an unknown demand model: Asymptotically optimal semimyopic policies. Operations Research 62 (5): 1142–1167.
Kirman, A. 1983. On mistaken beliefs and resultant equilibria. In Individual forecasting and aggregate outcomes, ed. R. Frydman, and E.S. Phelps, 147–166. New York: Cambridge University Press.
Kirman, A.P. 1975. Learning by firms about demand conditions. In Adaptive economics, ed. R.H. Day, and T. Graves, 137–156. New York: Academic Press.
Kirman, A.P. 1995. Learning in oligopoly: Theory, simulation, and experimental evidence. In Learning and rationality in economics, ed. A.P. Kirman, and M. Salmon, 127–178. Cambridge, MA: Basil Blackwell.
Könönen, V. 2006. Dynamic pricing based on asymmetric multiagent reinforcement learning. International Journal of Intelligent Systems 21 (1): 73–98.
Kutschinski, E., T. Uthmann, and D. Polani. 2003. Learning competitive pricing strategies by multiagent reinforcement learning. Journal of Economic Dynamics and Control 27 (11–12): 2207–2218.
Kwon, C., T.L. Friesz, R. Mookherjee, T. Yao, and B. Feng. 2009. Noncooperative competition among revenue maximizing service providers with demand learning. European Journal of Operational Research 197 (3): 981–996.
Li, J., T. Yao, and H. Gao. 2010. A revenue maximizing strategy based on Bayesian analysis of demand dynamics. In Proceedings of the 2009 SIAM conference on mathematics for industry, society for industrial and applied mathematics, eds. D.A. Field, and T.J. Peters, 174–181. Philadelphia.
Mirman, L.J., L. Samuelson, and A. Urbano. 1993. Duopoly signal jamming. Economic Theory 3 (1): 129–149.
Perakis, G., and A. Sood. 2006. Competitive multiperiod pricing for perishable products: A robust optimization approach. Mathematical Programming 107 (1): 295–335.
Ramezani, S., Bosman, P. A. N. and H. La Poutre. 2011. Adaptive strategies for dynamic pricing agents. In 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology IEEE, pp. 323–328.
Rassenti, S., S.S. Reynolds, V.L. Smith, and F. Szidarovszky. 2000. Adaptation and convergence of behavior in repeated experimental Cournot games. Journal of Economic Behavior & Organization 41 (2): 117–146.
Schinkel, M.P., J. Tuinstra, and D. Vermeulen. 2002. Convergence of Bayesian learning to general equilibrium in misspecified models. Journal of Mathematical Economics 38 (4): 483–508.
Sutton, R.S., and A.G. Barto. 1998. Reinforcement learning: An introduction, vol. 1. Cambridge: MIT Press.
Tesauro, G., and J.O. Kephart. 2002. Pricing in agent economies using multiagent Qlearning. Autonomous Agents and MultiAgent Systems 5 (1): 289–304.
Tuinstra, J. 2004. A price adjustment process in a model of monopolistic competition. International Game Theory Review 6 (3): 417–442.
Acknowledgements
Chris Bayliss and Christine Currie were funded by the EPSRC under Grant Number EP/N006461/1. Andria Ellina and Simos Zachariades were part funded by EPSRC as part of their PhD studentships. Asbjørn Nilsen Riseth was partially funded by EPSRC Grant EP/L015803/1.
Author information
Authors and Affiliations
Corresponding author
Additional information
R. van de Geer, A. V. den Boer: dynamic pricing challenge organizer, lead author;
C. Bayliss, C. S. M. Currie, A. Ellina, M. Esders, A. Haensel, X. Lei, K. D.S. Maclean, A. MartinezSykora, A. N. Riseth, F. Ødegaard, S. Zachariades: dynamic pricing challenge participant, coauthor.
Appendix: Competitor algorithms
Appendix: Competitor algorithms
Competitor logit
This competitor models the demand according to a finite mixture logit model, where the mixture is taken over the number of possible customer arrivals. Thus, a probability distribution over the number of arrivals in a single period is estimated and for each possible number of arrivals, a different multinomial logit model is estimated as well. Each multinomial logit model here, induces a probability distribution over the competitors, i.e., it specifies with which probability an arriving customer purchases from each competitor (including a nopurchase option). In doing so, it is assumed that the utility of buying from competitor i is of the form \(abp_i\), where \(p_i\) is the price posted by competitor i and a and b are assumed to differ across the mixture components.
In practice, this competitor uses the first 100 time periods to estimate the maximum number of arriving customers in a single time period. This is done by setting a price of 0 for the first period and for each of the following 99 periods of this exploration phase, the price is set as the minimum of the prices observed in the previous period. After these 100 periods, an upper bound on the number of arrivals in a single period is taken as the maximum realized demand in a single period multiplied by \((m+1)\), i.e., the number of competitors plus one. Subsequently, an ExpectationMaximization algorithm is used to estimate a probability distribution over the number of arrivals, as well as the parameters of the multinomial logit models. All these parameters are updated every 20 time periods.
To optimize prices, in every period the competitors’ prices for the period to come are predicted. For this purpose, it is assumed that the sorted prices of the competitors follow a multivariate normal distribution, where the sorted prices are used to mitigate the effect of price symmetries. Subsequently, 1000 competitor prices are sampled from the multivariate normal distribution and the revenue function is approximated by averaging over these realization. To optimize the price, a crude line search over a discretization of the assumed price space (0, 100) is executed and the price with the highest revenue is chosen.
Competitor ols
The approach of this competitor to pricing is to favor simplicity. The view is taken that competitors’ actions cannot be controlled and that for all intents and purposes, they are random. Thus, they are modeled as an aggregate source of random “noise” and the focus is on how the competitor’s own price influences demand in this environment. The algorithm is split into an exploration segment and a “running” segment. The exploration segment lasts for the first 40 periods and the running segment lasts for the rest of the 960 periods.
In the exploration segment, the algorithm explores the field to ensure sufficient variation in data. In each period, a price is sampled uniformly from the interval (0, 100). After the exploration period, the algorithm enters the running segment. In the running segment, the majority of the time consists of estimating a demand curve based only on the competitor’s own historical prices and optimizing accordingly. To do so, four linear regression models are fit, taking all combinations of logtransformation of both independent (price) and dependent (demand) variables, and the model with the highest \(R^2\) value is chosen (ols is an acronym for ordinary least squares). Using this model, the price is optimized using a crude line search and, subsequently, a small perturbation is added to the price for further exploration.
Finally, in each period in the running segment there is a 5% chance of further exploration and a 1% chance of “competitive disruption”. Here, “competitive disruption” is an action designed to intentionally confuse competitors who attempt predict competitor prices or who use competitor prices in their model. When this action is initiated the model sets the price to zero in an attempt to confuse competitors via extreme actions.
Competitor bgrid
bgrid is adapted from the \(\varepsilon\)greedy multiarmed bandit algorithm (Sutton and Barto 1998). It assumes a bandit framework with ten arms, where the arms pertain to the prices \(10, 20, \ldots , 100\) (bgrid is an acronym for bandit on a grid). Thus selecting the first arm means posting a price of 10. This algorithm neglects competition and simply keeps track of the average revenue under each arm. With probability \(\varepsilon\), an arm is selected randomly, whereas with probability \(1\varepsilon\), the arm that has the highest observed average revenue is selected. The exploration parameter \(\varepsilon\) is set to 0.2, so that on average 200 time periods are used for exploration and 800 for exploitation.
Competitor bbucket
This competitor considers the problem of learning and pricing in a multiarmed bandit framework similar to that of bgrid. In doing so, the optimal price is assumed to be contained in the interval (0, 100], which is split into ten intervals of even length, i.e., it is split into price buckets \((0, 10], (10, 20], \ldots , (90, 100]\). Each of these price buckets pertains to one arm and selecting a specific arm means posting a price that is uniformly sampled from the corresponding price bucket (bbucket is an acronym for bandit with buckets).
To incorporate the competitors’ prices, it is assumed that the arms’ values, i.e., revenues, depend on the prices posted by the other competitors. More precisely, in each time period the competitors’ modal price bucket is forecast using exponential smoothing. The modal price bucket is the bucket that is predicted to contain most of the competitors’ prices. We assume that the optimal choice of price to offer is dependent on this modal price bucket.
In practice this works as follows. At each time step, with probability \(\varepsilon\) an exploration step is performed in which an arm is selected randomly. Alternatively, with probability \(1\varepsilon\), an exploitation step is undertaken. In this case, the algorithm selects a price from the price bucket with the highest observed average revenue for the predicted modal price bucket. The exploration parameter \(\varepsilon\) is set to 0.2, so that on average 200 time periods are used for exploration and 800 for exploitation.
Competitor bmodel
This competitor advocates a bandit formulation of the problem as well, although its design differs conceptually from that of bbucket and bgrid. Where the aforementioned two competitors assign prices (or prices buckets) to arms, here, an arm pertains to a demand model (bmodel is an acronym for bandit with models). The demand models that constitute the four arms are the following:

Demand Model 1: (Bargain hunters) assumes that the distribution of customers’ willingness to pay (WTP) is normally distributed and that customers select a competitor’s price from the subset of prices that fall below their WTP with probability proportional to \(\left( \frac{{\rm WTP}p_{i}}{{\rm WTP}}\right) ^{b}\), where \(p_i\) is the price being offered by competitor i and b is a parameter that influences customers’ price sensitivity. High (low) values of \(b>1\) (\(<1\)) capture customer populations that are highly (in)sensitive to prices close to their reserve price. In general this first demand model captures bargain hunters as in all cases customers will tend to choose low prices where possible.

Demand Model 2: (Quality seekers) is a variant of the first demand model but the reserve price of a customer is proportional to \(\left( 1\frac{{\rm WTP}p_{i}}{{\rm WTP}}\right) ^{c}\). This model captures customers who use price as an indicator of quality. The parameter c has a similar interpretation to b above.

Demand Model 3: (Cheapest price subset) assumes that each customer sees a different random subset of the available prices. Customers are assumed to select the cheapest price that is visible to them. The subset is assumed to include a random number of options uniformly distributed between d and e, which are parameters that can be estimated from the demand data.
The fourth arm alone is used for the first 100 time periods with a relatively high exploration rate to provide sufficient data for estimating the parameters a, b, c, and d of the three demand models by means of simulated annealing. After 100 time periods the reward vectors are reset and the fourarmed bandit assumes control of pricing. Similar to the previous two bandit algorithms, with probability \(\varepsilon\) an arm is selected randomly and otherwise the most profitable arm is selected.
Optimal prices are chosen based on a forecast of the competitor price (duopoly) or the profile of competitor prices (oligopoly), where we define the competitor price profile to be an ordered list of competitors’ prices. For the oligopoly the competitor price profile is forecast for the next time period using exponential smoothing with trend. In order to estimate the optimal price to charge under each demand model, the algorithm generates a set of potential prices and the projected revenue is evaluated at each price, for the forecast competitor price profile. The price with the highest predicted revenue is assumed to be the best price for this demand model. In an exploitation step, the algorithm selects the arm with the highest predicted revenue and offers the best price for this arm.
Competitor ml
The approach of this competitor is to rely on machine learning techniques to predict demand and optimize prices accordingly (ml is an acronym for machine learning). Much emphasis is put on learning the demand characteristics, as the algorithm dynamically switches back and forth from exploration to exploitation mode over time. In exploration mode, during forty time periods, prices are set according to a cosine function around the mean price level observed to test a variety of price levels and, possibly, confuse competitors. After this learning cycle, demand is modeled using own prices and the competitor prices as covariates by means of a variety of regression models (leastsquares, ridge regression, Lasso regression, Bayesian ridge regression, stochastic gradient descent regression, and random forest) and the best model, in terms of demand prediction, is selected through crossvalidation.
Subsequently, the model of choice is used during an exploitation cycle of variable length: the length is sampled uniformly between 70 and 150, however, if the revenue earned deteriorates too fast, then, immediately a new exploration cycle is initiated. The price is optimized by discretizing the price space and computing the revenue for all prices. When a new exploration cycle starts, so either when the exploitation cycle was finished or because the revenue deteriorated significantly, all historical data is disregarded for the benefit of capturing shifts and shocks in the market most adequately.
Competitor greedy
This competitor advocates a particularly simple strategy: set the price as the minimum price observed in the previous time period. To avoid a “race to the bottom” with another competitor, the following facility is implemented: if the minimum price observed in the previous period is lower than the 10% percentile of all the prices observed in the last 30 time periods, then the price for the coming period is set as the maximum of this percentile and 5 (i.e., if this 10% percentile is smaller than 5, the price is set to 5).
Competitor wls
The characterizing feature of this competitor is that it aims to maximize own revenue relative to its competitors. More precisely, it attempts to maximize own revenue minus the revenue of the competitor that earns the most revenue. In doing so, it is assumed that demand of competitor k equals \(d(p_k,p_{\bar{k}})\), where \(p_k\) is the price of competitor k, \(p_{\bar{k}}\) is the \((m1)\)vector with the prices of the competitors of k, and where the notion of time is suppressed. In addition, it is assumed that \(d(\cdot ,\cdot )\) is independent of permutations in its second argument, i.e., in the vector \(p_{\bar{k}}\). Thus, this algorithm aims to obtain the price that maximizes own revenue compared to the competitors, that is to solve in each time step,
where the competitors are indexed 1 to m (the number of competitors) and wls is indexed 1. Note that \(p_1\in p_{\bar{k}}\) for \(k\in \{2,\ldots ,m\}\).
The demand function is assumed to be of the form \(d(x,y) = a+bx+c\sum _{k=1}^{m1} y_k\) and the parameters a, b, and c are estimated using weighted least squares (hence the name wls). To capture different timedependent aspects of demand, various schemes for the weighting of observations are considered and evaluated based on the Median Absolute Error of their historical demand predictions. The best weighting scheme is used in (2) to optimize the price. For this purpose, the price for competitor k in the period to come is predicted based on the median of the historical prices over some window, where the window length is chosen to minimize the Median Absolute Error of historical price predictions.
Finally, for the purpose of exploration, during the first ten periods prices are randomized to guarantee sufficient variance in the observations to estimate the demand models. In addition, when after these ten periods this competitor’s own price is constant for three subsequent periods, the prices are randomized for the next period to induce exploration.
Rights and permissions
About this article
Cite this article
van de Geer, R., den Boer, A.V., Bayliss, C. et al. Dynamic pricing and learning with competition: insights from the dynamic pricing challenge at the 2017 INFORMS RM & pricing conference. J Revenue Pricing Manag 18, 185–203 (2019). https://doi.org/10.1057/s41272018001644
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1057/s41272018001644