SPEEDING Up the Metabolism in E-commerce by Reinforcement Mechanism DESIGN

  • Hua-Lin HeEmail author
  • Chun-Xiang Pan
  • Qing Da
  • An-Xiang Zeng
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11053)


In a large E-commerce platform, all the participants compete for impressions under the allocation mechanism of the platform. Existing methods mainly focus on the short-term return based on the current observations instead of the long-term return. In this paper, we formally establish the lifecycle model for products, by defining the introduction, growth, maturity and decline stages and their transitions throughout the whole life period. Based on such model, we further propose a reinforcement learning based mechanism design framework for impression allocation, which incorporates the first principal component based permutation and the novel experiences generation method, to maximize short-term as well as long-term return of the platform. With the power of trial-and-error, it is possible to recognize in advance the potentially hot products in the introduction stage as well as the potentially slow-selling products in the decline stage, so the metabolism can be speeded up by an optimal impression allocation strategy. We evaluate our algorithm on a simulated environment built based on one of the largest E-commerce platforms, and a significant improvement has been achieved in comparison with the baseline solutions. Code related to this paper is available at:


Reinforcement learning Mechanism design E-commerce 


  1. 1.
    Abdi, H., Williams, L.J.: Principal component analysis. Wiley interdisc. Rev.: Comput. Stat. 2(4), 433–459 (2010)CrossRefGoogle Scholar
  2. 2.
    Burges, C., et al.: Learning to rank using gradient descent. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 89–96. ACM (2005)Google Scholar
  3. 3.
    Cai, Q., Filos-Ratsikas, A., Tang, P., Zhang, Y.: Reinforcement mechanism design for e-commerce. CoRR abs/1708.07607 (2017)Google Scholar
  4. 4.
    Cai, Q., Filos-Ratsikas, A., Tang, P., Zhang, Y.: Reinforcement mechanism design for fraudulent behaviour in e-commerce (2018)Google Scholar
  5. 5.
    Cao, H., Folan, P.: Product life cycle: the evolution of a paradigm and literature review from 1950–2009. Prod. Plann. Control 23(8), 641–662 (2012)CrossRefGoogle Scholar
  6. 6.
    Cao, Z., Qin, T., Liu, T.Y., Tsai, M.F., Li, H.: Learning to rank: from pairwise approach to listwise approach. In: Proceedings of the 24th International Conference on Machine Learning, pp. 129–136. ACM (2007)Google Scholar
  7. 7.
    Cheng, Y.H., Yi, J.Q., Zhao, D.B.: Application of actor-critic learning to adaptive state space construction. In: 2004 Proceedings of 2004 International Conference on Machine Learning and Cybernetics, vol. 5, pp. 2985–2990. IEEE (2004)Google Scholar
  8. 8.
    Deng, Y., Shen, Y., Jin, H.: Disguise adversarial networks for click-through rate prediction. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 1589–1595. AAAI Press (2017)Google Scholar
  9. 9.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014).
  10. 10.
    Koren, Y., Bell, R.: Advances in collaborative filtering. In: Ricci, F., Rokach, L., Shapira, B. (eds.) Recommender Systems Handbook, pp. 77–118. Springer, Boston, MA (2015). Scholar
  11. 11.
    Levitt, T.: Exploit the product life cycle. Harvard Bus. Rev. 43, 81–94 (1965)Google Scholar
  12. 12.
    Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
  13. 13.
    Lin, L.J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn. 8(3–4), 293–321 (1992)Google Scholar
  14. 14.
    Linden, G., Smith, B., York, J.: Amazon. com recommendations: item-to-item collaborative filtering. IEEE Internet comput. 7(1), 76–80 (2003)CrossRefGoogle Scholar
  15. 15.
    Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
  16. 16.
    Myerson, R.B.: Optimal auction design. Math. Oper. Res. 6(1), 58–73 (1981)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Nisan, N., Ronen, A.: Algorithmic mechanism design. Games Econ. Behav. 35(1–2), 166–196 (2001)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Papadimitriou, C.H., Tsitsiklis, J.N.: The complexity of markov decision processes. Math. Oper. Res. 12(3), 441–450 (1987)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Shoham, Y., Leyton-Brown, K.: Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press, Cambridge (2008)CrossRefGoogle Scholar
  20. 20.
    Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 387–395 (2014)Google Scholar
  21. 21.
    Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems, pp. 1057–1063 (2000)Google Scholar
  22. 22.
    Tang, P.: Reinforcement mechanism design. In: Early Carrer Highlights at Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI, pp. 5146–5150 (2017)Google Scholar
  23. 23.
    Vickrey, W.: Counterspeculation, auctions, and competitive sealed tenders. J. Financ. 16(1), 8–37 (1961)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Wu, Y., Tian, Y.: Training agent for first-person shooter game with actor-critic curriculum learning (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

Personalised recommendations