Policy Gradient Reinforcement Learning for I/O Reordering on Storage Servers

  • Kumar Dheenadayalan
  • Gopalakrishnan Srinivasaraghavan
  • V. N. Muralidhara
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10634)


Deep customization of storage architectures to the applications they support is often undesirable — nature of application data is dynamic, applications are replaced far more often than storage systems are and usage patterns change dynamically with time. A continuously learning software intervention that dynamically adapts to the changing workload pattern would be the easiest way to bridge this ‘gap’. As borne out by our experiments, the overhead induced by such software interventions turns out to be negligible for large-scale storage systems. Reinforcement Learning offers a way to dynamically learn from a continuous data stream and take appropriate actions towards optimizing a future goal. We adapt policy gradient reinforcement learning to learn a policy that minimizes I/O wait time that in turn maximizes I/O throughput. A set of discrete actions consisting of switches between scheduling schemes is considered to dynamically re-order client-specific I/O operations. Results reveal that I/O reordering policy learned using reinforcement learning results in significant improvement in the overall I/O throughput.


Policy gradient Filer, I/O reordering Overload Throughput 



This research was supported in part through a research grant from NetApp Advanced Technology Group.


  1. 1.
    Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing Atari with deep reinforcement learning. In: NIPS Deep Learning Workshop (2013)Google Scholar
  2. 2.
    Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32(11), 1238–1274 (2013)CrossRefGoogle Scholar
  3. 3.
    Wang, Y., Merchant, A.: Proportional-share scheduling for distributed storage systems. In: Proceedings of the 5th USENIX Conference on File and Storage Technologies, FAST 2007 (2007)Google Scholar
  4. 4.
    Vengerov, D.: A reinforcement learning approach to dynamic resource allocation. Technical report, Sun Microsystems (2005)Google Scholar
  5. 5.
    Ipek, E., Mutlu, O., Martínez, J.F., Caruana, R.: Self-optimizing memory controllers: a reinforcement learning approach. In: Proceedings of the 35th Annual International Symposium on Computer Architecture, ISCA 2008, pp. 39–50 (2008)Google Scholar
  6. 6.
    Wu, J., Xu, X., Zhang, P., Liu, C.: A novel multi-agent reinforcement learning approach for job scheduling in grid computing. Future Gener. Comput. Syst. 27(5), 430–439 (2011)CrossRefGoogle Scholar
  7. 7.
    Singh, S., Bertsekas, D.: Reinforcement learning for dynamic channel allocation in cellular telephone systems. In: Advances in Neural Information Processing Systems, NIPS 96Google Scholar
  8. 8.
    Zhang, W., Dietterich, T.G.: A reinforcement learning approach to job-shop scheduling. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, vol. 2, IJCAI 1995 (1995)Google Scholar
  9. 9.
    Zomaya, A.Y., Clements, M., Olariu, S.: A framework for reinforcement-based scheduling in parallel processor systems. IEEE Trans. Parallel Distrib. Syst. 9(3), 249–260 (1998)CrossRefGoogle Scholar
  10. 10.
    Vengerov, D.: Dynamic tuning of online data migration policies in hierarchical storage systems using reinforcement learning. Technical report (2006)Google Scholar
  11. 11.
    Deshpande, S., Dheenadayalan, K., Srinivasaraghavan, G., Muralidhara, V.: Filer response time prediction using adaptively-learned forecasting models based on counter time series data. In: 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 13–18 (2016)Google Scholar
  12. 12.
    Dheenadayalan, K., Srinivasaraghavan, G., Muralidhara, V.N.: Self-tuning filers — overload prediction and preventive tuning using pruned random forest. In: Kim, J., Shim, K., Cao, L., Lee, J.-G., Lin, X., Moon, Y.-S. (eds.) PAKDD 2017. LNCS (LNAI), vol. 10235, pp. 495–507. Springer, Cham (2017). doi: 10.1007/978-3-319-57529-2_39 CrossRefGoogle Scholar
  13. 13.
    Peters, J., Schaal, S.: Reinforcement learning of motor skills with policy gradients. Neural Netw. 21(4), 682–697 (2008)CrossRefGoogle Scholar
  14. 14.
    Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Proceedings of the 12th International Conference on Neural Information Processing Systems, NIPS 1999, pp. 1057–1063. MIT Press (1999)Google Scholar
  15. 15.
    Kerrisk, M.: The Linux Programming Interface: A Linux and UNIX System Programming Handbook, 1st edn. No Starch Press, San Francisco (2010)Google Scholar
  16. 16.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2010, pp. 249–256 (2010)Google Scholar
  17. 17.
    Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw. Mach. Learn. 4, 26–31 (2012)Google Scholar
  18. 18.
    Norcott, W., Capps, D.: IOzone file system benchmark (2006).

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Kumar Dheenadayalan
    • 1
  • Gopalakrishnan Srinivasaraghavan
    • 1
  • V. N. Muralidhara
    • 1
  1. 1.International Institute of Information Technology, BangaloreBengaluruIndia

Personalised recommendations