Advertisement

On Optimizing Operational Efficiency in Storage Systems via Deep Reinforcement Learning

  • Sunil SrinivasaEmail author
  • Girish Kathalagiri
  • Julu Subramanyam Varanasi
  • Luis Carlos Quintela
  • Mohamad Charafeddine
  • Chi-Hoon Lee
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11053)

Abstract

This paper deals with the application of deep reinforcement learning to optimize the operational efficiency of a solid state storage rack. Specifically, we train an on-policy and model-free policy gradient algorithm called the Advantage Actor-Critic (A2C). We deploy a dueling deep network architecture to extract features from the sensor readings off the rack and devise a novel utility function that is used to control the A2C algorithm. Experiments show performance gains greater than 30% over the default policy for deterministic as well as random data workloads.

Keywords

Data center Storage system Operational efficiency Deep reinforcement learning Actor-critic methods 

Notes

Acknowledgements

We want to thank the Memory Systems lab team, Samsung Semiconductors Inc. for providing us a SSD storage rack, workload, data and fan control API for running our experiments. We also thank the software engineering team at Samsung SDS for developing a DRL framework [24] that was used extensively for model building, training and serving.

References

  1. 1.
  2. 2.
    Cisco Global Cloud Index: Forecast and Methodology, 2016–2021 White Paper, February 2018. https://www.cisco.com/c/en/us/solutions/collateral/service-provider/global-cloud-index-gci/white-paper-c11-738085.html
  3. 3.
    Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: A brief survey of deep reinforcement learning. IEEE Sig. Process. Mag. 34(6), 26–28 (2017)CrossRefGoogle Scholar
  4. 4.
    Shuja, J., Madani, S.A., Bilal, K., Hayat, K., Khan, S.U., Sarwar, S.: Energy-efficient data centers. Computing 94(12), 973–994 (2012)CrossRefGoogle Scholar
  5. 5.
    Sun, J., Reddy, A.: Optimal control of building HVAC systems using complete simulation-based sequential quadratic programming (CSBSQP). Build. Environ. 40(5), 657–669 (2005)CrossRefGoogle Scholar
  6. 6.
    Ma, Z., Wang, S.: An optimal control strategy for complex building central chilled water systems for practical and real-time applications. Build. Environ. 44(6), 1188–1198 (2009)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Evans, R., Gao, J.: DeepMind AI Reduces Google Data Centre Cooling Bill by 40%, July 2016. Blog: https://deepmind.com/blog/deepmind-ai-reduces-google-data-centre-cooling-bill-40/
  8. 8.
    Li, Y., Wen, Y., Guan, K., Tao, D.: Transforming Cooling Optimization for Green Data Center via Deep Reinforcement Learning (2017). https://arxiv.org/abs/1709.05077
  9. 9.
    Peters, J., Schaal, S.: Reinforcement learning of motor skills with policy gradients. Neural Netw. (2008 Spec. Issue) 21(4), 682–697 (2008)CrossRefGoogle Scholar
  10. 10.
    Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)zbMATHGoogle Scholar
  11. 11.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, Cambridge (2017)Google Scholar
  12. 12.
    Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems, vol. 12, pp. 1057–1063 (2000)Google Scholar
  13. 13.
    Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3–4), 229–256 (1992)zbMATHGoogle Scholar
  14. 14.
    Greensmith, E., Bartlett, P.L., Baxter, J.: Variance reduction techniques for gradient estimates in reinforcement learning. J. Mach. Learn. Res. 5, 1471–1530 (2004)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRefGoogle Scholar
  16. 16.
    Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)Google Scholar
  17. 17.
    Wang, Z., Schaul, T., Hessel, M., Hasselt, H.V., Lanctot, M., Freitas, N.D.: Dueling network architectures for deep reinforcement learning. In: International Conference on International Conference on Machine Learning, vol. 48 (2016)Google Scholar
  18. 18.
    Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: International Conference on Machine Learning (2014)Google Scholar
  19. 19.
    Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. US Patent Application, No. US20170024643A1. https://patents.google.com/patent/US20170024643A1/en
  20. 20.
  21. 21.
  22. 22.
  23. 23.
    O’Donoghue, B., Munos, R., Kavukcuoglu, K., Mnih, V.: Combining policy gradient and Q-learning. In: International Conference on Learning Representations (2017)Google Scholar
  24. 24.
    Parthasarathy, K., Kathalagiri, G., George, J.: Scalable implementation of machine learning algorithms for sequential decision making. In: Machine Learning Systems, ICML Workshop, June 2016Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Sunil Srinivasa
    • 1
    Email author
  • Girish Kathalagiri
    • 1
  • Julu Subramanyam Varanasi
    • 2
  • Luis Carlos Quintela
    • 1
  • Mohamad Charafeddine
    • 1
  • Chi-Hoon Lee
    • 1
  1. 1.Samsung SDS AmericaSan JoseUSA
  2. 2.Samsung Semiconductors Inc.San JoseUSA

Personalised recommendations