Abstract
Reducing power consumption for GPU cluster in large-scale stream computing process can bring various benefits such as reducing operating costs and environmental effect. We formulate the problem of power consumption as a constrained optimization problem, minimizing power state of cluster nodes to reduce power consumption while guaranteeing system performance and reliability. The proposed control model based on Model Prediction Control is designed to make a comprehensive metric of GPU cluster achieve expected performance, energy efficiency and reliability. It is different from the previous models, which just consider power consumption as the sole control objective. The event-triggering mechanism is introduced to reduce control overhead. It successfully separates sampling cluster status signals from control model. So the controller needs not to periodically interrupt computing process to solve optimal solutions. Finally, we evaluate and compare this control model with the previous control model by using artificial and real-world workloads. The experimental results show that our proposed control model is able to outperform existing techniques.
Similar content being viewed by others
References
Mike, S., Jeremy, E., Craig, S., et al.: ECOG: a power-efficient GPU cluster architecture for scientific computing. Comput. Sci. Eng. 13(2), 83–87 (2011)
Abbas, K., Nirwan, A.: Toward low-cost workload distribution for integrated green data centers. IEEE Commun. Lett. 19(1), 26–29 (2015)
Ashwin, M.A., Lokendra, S., et al.: MPI-ACC: accelerator-aware MPI for scientific applications. IEEE Trans. Parallel Distrib. Syst. 27(5), 1401–1414 (2016)
Wang, H., Sreeram, P., Devendar, B., et al.: GPU-aware MPI on rdma-enabled cluster:design, implementation and evaluation. IEEE Trans. Parallel Distrib. Syst. 25(10), 2595–2605 (2014)
Dario, B., Audric, L., et al.: Modeling and evaluation of energy policies in green clouds. IEEE Trans. Parallel Distrib. Syst. 26(11), 3052–3065 (2015)
Zhang, Y., Mueller, F.: Autogeneration and autotuning of 3D stencil codes on homogeneous and heterogeneous GPU clusters. IEEE Trans. Parallel Distrib Syst. 24(3), 417–427 (2013)
Tang, Y., Gedik, B.: Autopipelining for data stream processing. IEEE Trans. Parallel Distrib Syst. 24(12), 2344–2354 (2013)
Deng, Z., X, W., Wang, L., et al.: Parallel processing of dynamic continuous queries over streaming data flows. IEEE Trans. Parallel Distrib. Syst. 26(3), 834–864 (2015)
Yang, J., Zeng, K., et al.: Dynamic cluster reconfiguration for energy conservation in computation intensive service. IEEE Trans. Comput. 61(10), 1401–1416 (2012)
Wang, H., Cao, Y.: Predicting power consumption of GPUs with fuzzy wavelet neural networks. Parallel Comput. 44(5), 18–36 (2015)
Gandhi, A., Harchol-Balter, M. et al.: Optimal power allocation in server farms. In: Proceeding of the 11th International Joint Conference Measurement and Modeling of Computer Systems, pp. 157–168 (2009)
Ewa, N.S., Andrzej, S., et al.: Dynamic power management in energy-aware computer networks and data intensive computing systems. Future Gener. Comput. Syst. 37, 284–296 (2014)
Liu, Y., Zhu, H., Lu, K., Liu, Y.: A power provision and capping architecture for large scale systems. In: Proceeding of the 26th IEEE International Parallel and Distributed Processing Symposium Workship& PHD Forum, pp. 954–963 (2012)
Bertini, L., J, C.B., Daniel, M.: Power and performance control of soft real-time web server clusters. Inf. Process. Lett. 110, 767–773 (2010)
Lefurgy, C., Wang, X., Ware, M: Server-level power control. In: Proceeding of the Fourth International Conference on Autonomic Computing(ICAC’07), (2007)
Wang, X., Chen, M., Xing, F.: MIMI power control for high-density servers in an enclosure. IEEE Trans. Parallel Distrib Syst. 21(10), 1412–1426 (2010)
Wang, X., Wang, Y.: Coordinating power control and performance management for virtualized server clusters. IEEE Trans. Parallel Distrib. Syst. 22(2), 245–259 (2011)
Wang, X., Chen, M., Lefurgy, C., Keller, T.W.: SHIP: a scalable hierarchical power control architecture for large-scale data centers. IEEE Trans. Parallel Distrib. Syst. 23(1), 168–176 (2012)
Gong, J., Xu, X.: A gray-box feedback control approach for system-level peak power management. In: Proceeding of the 39th International Conference on Parallel Processing, pp. 555–564 (2010)
Lama, P., Zhou, X.: Coordinated power and performance guarantee with fuzzy MIMO control in virtualized server clusters. IEEE Trans. Comput. 64(1), 97–111 (2015)
Enokido, T., Takizawa, M.: An extended power consumption model for distributed applications. In: Proceeding of the 26th IEEE International Conference on Advanced Information Networking and Applications, pp. 912–919 (2012)
Sergio, N., Cristian, P., et al.: Controlling datacenter power consumption while maintaining temperature and QoS levels. In: IEEE 3rd International Conference on Cloud Networking, pp. 242–247 (2014)
Saul, C.L., Marcelo, D.F.: On the control of power consumption in server farms via heavy traffic approximation. In: IEEE 53rd Conference on Decision and Control, pp. 3683–3688 (2014)
Dimitrov, M., Mantor, M., Zhou, H.: Understanding software approaches for GPGPU reliability. In: Proceedings of 2nd workshop on general purpose processing on graphics processing units. ACM, New York
Dal, D., Mansouri, N.: Power optimization with power islands synthesis. IEEE Trans. Comput. Aided Design Integr. Circuits Syst. 28(7), 1025–1037 (2009)
Padoin, E.L., Pilla, L.L., et al.: Evaluating application performance and energy consumption on hybrid CPU + GPU architecture. Clust. Comput. 16, 511–525 (2013)
Degalahal, V., Li, L., Narayanan, V.: Soft errors issues in low-power caches. IEEE Trans. Very Large Scale Integr. Syst. 13(10), 1157–1166 (2005)
Firouzi, F., Azarpeyvand, A., et al.: Adaptive fault-tolerant DVFS with dynamic online AVF prediction. Microelectron. Reliab. 52, 1197–1208 (2012)
Zhu, D., Aydin, H.: Reliability-aware energy management for periodic real-time tasks. IEEE Trans. Comput. 58(10), 1382–1397 (2009)
Dixit, A., Wood, A.: The impact of new technology on soft error rates. 2011 IEEE International Reliability Physics Symposium(IRPS), pp. 5B.4.1–5B.4.7 (2011)
Zhao, B. Aydin, H., Zhu, D.: Energy management under general task-level reliability constraints. In: 2012 IEEE 18th Real Time and Embedded Technology and Applications Symposium, pp. 1080–1812 (2012)
Hancao, L., Haddad, W.M.: Model predictive control for a multi-compartment respiratory system. IEEE Trans. Instrum. Meas. 21(5), 1988–1995 (2013)
Chen, Y., Zhang, J., et al.: A service selection model using mixed intelligent optimization. Chin. J. Comput. 36(2), 384–391 (2013). (in Chinese)
Li, X.: A novel effective solution for non-differentiable optimization problems. Sci. Sin. Math. 24(4), 371–377 (1994). (in Chinese)
Li, S., Zheng, Y., Lin, Z.: Impacted-region optimization for distributed model predictive control systems with constraints. IEEE Trans. Autom. Sci. Eng. 99(5), 1–14 (2014)
Hsueh, Y., Chen, H.: Map matching for low-sampling-rate GPS trajectories by exploring real-time moving directions. Inf. Sci. 433, 55–69 (2018)
Yuan, J., Zheng, Y., Xie, X., Sun, G.: Driving with knowledge from the physical world. In: the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, KDD’11, New York. ACM (2011)
Deng, Z., Yangyang, H., et al.: A scalable and fast OPTICS for clustering trajectory big data. Clust. Comput. 18, 549–562 (2015)
Acknowledgements
This work was supported by the National Nature Science Foundation of China (No. 60970012), Shandong Provincial Natural Science Foundation, China (No. ZR2017MF050), Project of Shandong Province Higher Educational Science and technology program (No. J17KA049) and Shandong Province Key Research and Development Program of China (No. 2018GGX101005, 2017CXGC0701, 2016GGX109001).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, H., Cao, Y. Event driven power consumption optimization control model of GPU clusters. Cluster Comput 22, 965–979 (2019). https://doi.org/10.1007/s10586-018-02886-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-018-02886-x