A Dynamic Pricing Mechanism in IoT for DaaS: A Reinforcement Learning Approach

  • Binpeng Song
  • Jinze Song
  • Jian YeEmail author
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1075)


With the rapid development of the Internet of things, a large amount of data has been accumulated. However, how to make full use of these data has become a new problem. In this article, we will focus on how to develop data resources using the intelligent data pricing (SDP) approach. Establish a B2B data marketplace for integrating, storing, and analyzing business data. Simulate interactions between service providers and enterprises in the marketplace. Since the service provider has markov consciousness, q-learning algorithm is adopted to solve the model. Experimental results show that q-learning algorithm can make every participant in the market obtain the optimal profit.


Dynamic pricing Markov decision process Q-learning 



This work is supported by the National Key Research and Development Program of China (2016YFB1001100).


  1. 1.
    Biru, A., Minerva, R., Rotondi, D.: Towards a definition of the Internet of Things (IoT). IEEE Technical report (2015).
  2. 2.
    The Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of Things (2014).
  3. 3.
    Liang, F., Yu, W., An, D., Yang, Q., Fu, X., Zhao, W.: A survey on big data market: pricing, trading and protection. IEEE Access 6, 15132–15154 (2017)CrossRefGoogle Scholar
  4. 4.
  5. 5.
    Al-Fagih, A.E., Al-Turjman, F.M., Alsalih, W.M., Hassanein, H.S.: A priced public sensing framework for heterogeneous IoT architectures. IEEE Trans. Emerg. Top. Comput. 1(1), 133–147 (2013)CrossRefGoogle Scholar
  6. 6.
    Femminella, M., Pergolesi, M., Reali, G.: IoT, big data, and cloud computing value chain: pricing issues and solutions. Ann. Telecommun. 73(7–8), 511–520 (2018)CrossRefGoogle Scholar
  7. 7.
    Niyato, D., Hoang, D.T., Luong, N.C., Wang, P., Kim, D.I., Han, Z.: Smart data pricing models for the Internet of Things: a bundling strategy approach. IEEE Netw. 30(2), 18–25 (2016)CrossRefGoogle Scholar
  8. 8.
    Guijarro, L., Pla, V., Vidal, J.R., Naldi, M.: Game theoretical analysis of service provision for the Internet of Things Based on sensor virtualization. IEEE J. Sel. Areas Commun. 35(3), 691–706 (2017)CrossRefGoogle Scholar
  9. 9.
    Wang, W., Wang, Q.: Price the QoE, not the data: SMP-economic resource allocation in wireless multimedia Internet of Things. IEEE Commun. Mag. 56(9), 74–79 (2018)CrossRefGoogle Scholar
  10. 10.
    Jiao, Y.T., Wang, P., Feng, S.H., Niyato, D.: Profit maximization mechanism and data management for data analytics services. IEEE Internet Things J. 5(3), 2001–2014 (2018)CrossRefGoogle Scholar
  11. 11.
    Hayat, R., Sabir, E., Badidi, E., ElKoutbi, M.: A signaling game-based approach for Data-as-a-Service provisioning in IoT-Cloud. Future Gener. Comput. Syst. Int. J. Esci. 92, 1040–1050 (2019)CrossRefGoogle Scholar
  12. 12.
    Watkins, C., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)zbMATHGoogle Scholar
  13. 13.
    Rahimiyan, M., Mashhadi, H.R.: An adaptive Q-learning algorithm developed for agent-based computational modeling of electricity market. IEEE Trans. Syst. Man Cybern. C Appl. Rev. 40(5), 547–556 (2010)CrossRefGoogle Scholar
  14. 14.
    Prashanth, L.A., Bhatnagar, S.: Reinforcement learning with function approximation for traffic signal control. IEEE Trans. Intell. Transp. Syst. 12(2), 412–421 (2011)CrossRefGoogle Scholar
  15. 15.
    Kar, S., Moura, J.M.F., Poor, H.V.: QD-learning: a collaborative distributed strategy for multi-agent reinforcement learning through consensus + innovations. IEEE Trans. Signal Process. 61(7), 1848–1862 (2013)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Huang, T., Liu, D.: A self-learning scheme for residential energy system control and management. Neural Comput. Appl. 22(2), 259–269 (2013)CrossRefGoogle Scholar
  17. 17.
    Sun, Q., Zhou, J., Guerrero, J.M., Zhang, H.: Hybrid threephase/single-phase microgrid architecture with power management capabilities. IEEE Trans. Power Electron. 30(10), 5964–5977 (2015)CrossRefGoogle Scholar
  18. 18.
    Sun, Q., Han, R., Zhang, H., Zhou, J., Guerrero, J.M.: A multiagent-based consensus algorithm for distributed coordinated control of distributed generators in the energy Internet. IEEE Trans. Smart Grid 6(6), 3006–3019 (2015). Scholar
  19. 19.
    Sun, Q., Zhang, Y., He, H., Ma, D., Zhang, H.: A novel energy function-based stability evaluation and nonlinear control for energy Internet. IEEE Trans. Smart Grid. To be published. Scholar
  20. 20.
    Ni, J., Liu, M., Ren, L., Yang, S.X.: A multiagent Q-learning based optimal allocation approach for urban water resource management system. IEEE Trans. Autom. Sci. Eng. 11(1), 204–214 (2014)CrossRefGoogle Scholar
  21. 21.
    Lu, R., Hong, S.H., Zhang, X.: A dynamic pricing demand response algorithm for smart grid: reinforcement learning approach. Appl. Energy 220, 220–230 (2018)CrossRefGoogle Scholar
  22. 22.
    Yu, H., Zhang, M.: Data pricing strategy based on data quality. Comput. Ind. Eng. 112, 1–10 (2017)CrossRefGoogle Scholar
  23. 23.
    Melo, F.S.: Convergence of Q-learning: a simple proof. Technical report, pp. 1–4. Institute of Systems and Robotics (2001)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Institute of Computing TechnologyChinese Academy of SciencesBeijingChina
  2. 2.University of Chinese Academy of SciencesBeijingChina
  3. 3.Northeast UniversityShenyangChina

Personalised recommendations