Rational Bidding Using Reinforcement Learning

An Application in Automated Resource Allocation
  • Nikolay Borissov
  • Arun Anandasivam
  • Niklas Wirström
  • Dirk Neumann
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5206)


The application of autonomous agents by the provisioning and usage of computational resources is an attractive research field. Various methods and technologies in the area of artificial intelligence, statistics and economics are playing together to achieve i) autonomic resource provisioning and usage of computational resources, to invent ii) competitive bidding strategies for widely used market mechanisms and to iii) incentivize consumers and providers to use such market-based systems.

The contributions of the paper are threefold. First, we present a framework for supporting consumers and providers in technical and economic preference elicitation and the generation of bids. Secondly, we introduce a consumer-side reinforcement learning bidding strategy which enables rational behavior by the generation and selection of bids. Thirdly, we evaluate and compare this bidding strategy against a truth-telling bidding strategy for two kinds of market mechanisms – one centralized and one decentralized.


Bid Generation Reinforcement learning Service Provisioning and Usage Grid Computing 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Byde, A., Salle, M., Bartolini, C.: Market-based resource allocation for utility data centers. HP Lab, Bristol, Technical Report HPL-2003-188 (September 2003)Google Scholar
  2. 2.
    Smith, W., Foster, I., Taylor, V.: Predicting application run times using historical information. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1998, SPDP-WS 1998, and JSSPP 1998. LNCS, vol. 1459. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  3. 3.
    Anjomshoaa, A., Brisard, F., Drescher, M., Fellows, D., Ly, A., McGough, S., Pulsipher, D., Savva, A.: Job Submission Description Language (JSDL) Specification, Version 1.0. Job Submission Description Language WG (JSDL-WG) (2005)Google Scholar
  4. 4.
    Stoesser, J., Neumann, D.: A model of preference elicitation for distributed market-based resource allocation. Working paper, University of Karlsruhe (TH) (2008)Google Scholar
  5. 5.
    Borissov, N., Blau, B., Neumann, D.: Semi-automated provisioning and usage of configurable services. In: 16th European Conference on Information Systems (ECIS 2008), Galway, Ireland (2008)Google Scholar
  6. 6.
    Heydenreich, B., Müller, R., Uetz, M.: Decentralization and Mechanism Design for Online Machine Scheduling. METEOR, Maastricht research school of Economics of TEchnology and ORganizations (2006)Google Scholar
  7. 7.
    Phelps, S.: Evolutionary mechanism design. Ph.D. Thesis (July 2007)Google Scholar
  8. 8.
    Watkins, C., Dayan, P.: Q-learning. Machine Learning 8(3), 279–292 (1992)zbMATHGoogle Scholar
  9. 9.
    Kaelbling, L., Littman, M., Moore, A.: Reinforcement learning: A survey. Arxiv preprint cs.AI/9605103 (1996)Google Scholar
  10. 10.
    Whiteson, S., Stone, P.: On-line evolutionary computation for reinforcement learning in stochastic domains. In: Proceedings of the 8th annual conference on Genetic and evolutionary computation, pp. 1577–1584 (2006)Google Scholar
  11. 11.
    Tesauro, G., Das, R.: High-performance bidding agents for the continuous double auction. In: Proceedings of the 3rd ACM conference on Electronic Commerce, pp. 206–209 (2001)Google Scholar
  12. 12.
    Cliff, D.: Minimal-intelligence agents for bargaining behaviors in market-based environments. TechnicalReport, Hewlett Packard Labs (1997)Google Scholar
  13. 13.
    Medernach, E., des Cezeaux, C.: Workload analysis of a cluster in a grid environment. In: Feitelson, D.G., Frachtenberg, E., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2005. LNCS, vol. 3834. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  14. 14.
    Luce, R., Tukey, J.: Simultaneous conjoint measurement: A new type of fundamental measurement. Journal of Mathematical Psychology 1(1), 1–27 (1964)CrossRefzbMATHGoogle Scholar
  15. 15.
    Green, P., Rao, V.: Conjoint Measurement for Quantifying Judgmental Data. Journal of Marketing Research 8(3), 355–363 (1971)CrossRefGoogle Scholar
  16. 16.
    Saaty, T.: Axiomatic foundation of the analytic hierarchy process. Management Science 32(7), 841–855 (1986)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Wellman, M., Greenwald, A., Stone, P.: Autonomous Bidding Agents: Strategies and Lessons from the Trading Agent Competition. MIT Press, Cambridge (2007)Google Scholar
  18. 18.
    Sherstov, A., Stone, P.: Three automated stock-trading agents: Acomparative study. In: Agent-mediated Electronic Commerce VI: Theories for and Engineering of Distributed Mechanisms and Systems: AAMAS 2004 Workshop, AMEC 2004, New York, NY, USA, July 19, 2004, Revised Selected Papers (2006)Google Scholar
  19. 19.
    Stone, P.: Multiagent learning is not the answer. it is the question. Artificial Intelligence (to appear, 2007)Google Scholar
  20. 20.
    Vytelingum, P., Dash, R., David, E., Jennings, N.: A risk-based bidding strategy for continuous double auctions. In: Proc. 16th European Conference on Artificial Intelligence, pp. 79–83 (2004)Google Scholar
  21. 21.
    He, M., Leung, H., Jennings, N.: A fuzzy-logic based bidding strategy for autonomous agents in continuous double auctions. IEEE Transactions on Knowledge and Data Engineering 15(6), 1345–1363 (2003)CrossRefGoogle Scholar
  22. 22.
    Reeves, D., Wellman, M., MacKie-Mason, J., Osepayshvili, A.: Exploring bidding strategies for market-based scheduling. Decision Support Systems 39(1), 67–85 (2005)CrossRefGoogle Scholar
  23. 23.
    Li, J., Yahyapour, R.: Learning-based negotiation strategies for grid scheduling. In: Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID 2006), vol. 00, pp. 576–583 (2006)Google Scholar
  24. 24.
    Li, J., Yahyapour, R.: A strategic negotiation model for grid scheduling. Journal International Transactions on Systems Science and Applications, 411–420 (2006)Google Scholar
  25. 25.
    Gode, D., Sunder, S.: Allocative efficiency of markets with zero-intelligence traders: Market as a partial substitute for individual rationality. The Journal of Political Economy 101(1), 119–137 (1993)CrossRefGoogle Scholar
  26. 26.
    Kaplan, S., Weisbach, M.: The success of acquisitions: Evidence from divestitures. The Journal of Finance 47(1), 107–138 (1992)CrossRefGoogle Scholar
  27. 27.
    Park, S., Durfee, E., Birmingham, W.: An adaptive agent bidding strategy based on stochastic modeling. In: Proceedings of the third annual conference on Autonomous Agents, pp. 147–153 (1999)Google Scholar
  28. 28.
    Das, R., Hanson, J., Kephart, J., Tesauro, G.: Agent-human interactions in the continuous double auction. In: Proceedings of the International Joint Conference on Artificial Intelligence, vol. 26 (2001)Google Scholar
  29. 29.
    Sherstov, A., Stone, P.: Three automated stock-trading agents: A comparative study. In: Faratin, P., Rodriguez-Aguilar, J. (eds.) AMEC 2004. LNCS (LNAI), vol. 3435, pp. 173–187. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  30. 30.
    Kearns, M., Ortiz, L.: The penn-lehman automated trading project. Intelligent Systems, IEEE [see also IEEE Intelligent Systems and Their Applications] 18(6), 22–31 (2003)CrossRefGoogle Scholar
  31. 31.
    Stone, P.: Learning and multiagent reasoning for autonomous agents. In: The 20th International Joint Conference on Artificial Intelligence, pp. 13–30 (January 2007)Google Scholar
  32. 32.
    van den Herik, H.J., Hennes, D., Kaisers, M., Tuyls, K., Verbeeck, K.: Multi-agent learning dynamics: A survey. In: Klusch, M., Hindriks, K.V., Papazoglou, M.P., Sterling, L. (eds.) CIA 2007. LNCS (LNAI), vol. 4676, pp. 36–56. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  33. 33.
    Erev, I., Roth, A.: Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. The American Economic Review 88(4), 848–881 (1998)Google Scholar
  34. 34.
    Shoham, Y., Powers, R., Grenager, T.: If multi-agent learning is the answer, what is the question? Artificial Intelligence 171(7), 365–377 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  35. 35.
    Panait, L., Luke, S.: Cooperative multi-agent learning: The state of the art. Autonomous Agents and Multi-Agent Systems 11(3), 387–434 (2005)CrossRefGoogle Scholar
  36. 36.
    Lai, K., Rasmusson, L., Adar, E., Zhang, L., Huberman, B.: Tycoon: An implementation of a distributed, market-based resource allocation system. Multiagent and Grid Systems 1(3), 169–182 (2005)CrossRefzbMATHGoogle Scholar
  37. 37.
    Stoica, I., Abdel-Wahab, H., Jeffay, K., Baruah, S., Gehrke, J., Plaxton, C.: A proportional share resource allocation algorithm for real-time, time-shared systems. In: Proceedings of the 17th IEEE Real-Time Systems Symposium, pp. 288–299 (1996)Google Scholar
  38. 38.
    Sanghavi, S., Hajek, B.: Optimal allocation of a divisible good to strategic buyers. In: 43rd IEEE Conference on Decision and Control-CDC (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Nikolay Borissov
    • 1
  • Arun Anandasivam
    • 1
  • Niklas Wirström
    • 2
  • Dirk Neumann
    • 3
  1. 1.Information Management and SystemsUniversity of KarlsruheKarlsruhe
  2. 2.Swedish Institute of Computer ScienceKistaSweden
  3. 3.University of Freiburg,Platz der Alten SynagogeFreiburgGermany

Personalised recommendations