Task Scheduling of GPU Cluster for Large-Scale Data Process with Temperature Constraint
With the development of GPU general-purpose computing, GPU heterogeneous cluster has become a widely used parallel processing solution for Large-scale data. Considering temperature management and controlling becomes a new research topic in high-performance computing field. A novel task scheduling model for GPU cluster with temperature limitation was built to balance the heat distribution and prevent the temperature hotspots occur. The scheduling index was introduced by combining the utilization of GPU and temperature. And the state matrix was designed to monitor the GPU cluster and provided status information for scheduler. When the temperature exceeds specific threshold value, the scheduler can improve the speed of fans to reduce the temperature. The experimental results show that the proposed scheduler can balance the heat distribution and prevent the temperature hotspots. Compared with the benchmark scheduling model, the loss of scheduling performance is in the acceptable range.
KeywordsGPU cluster Task scheduling model Temperature limitation Large-scale data
This project is supported by Shandong Provincial Natural Science Foundation, China (No. ZR2017MF050), Project of Shandong Province Higher Educational Science and technology program (No. J17KA049), Shandong Province Key Research and Development Program of China (No. 2018GGX101005, 2017CXGC0701, 2016GGX109001) Shandong Province Independent Innovation and Achievement Transformation, China (No. 2014ZZCX02702).
- 1.Kaur, T., Chana, I.: Energy efficiency techniques in cloud computing: a survey and taxonomy. ACM Comput. Surv. 48(2), 22–54 (2016)Google Scholar
- 5.Wang, H.F., Cao, Y.P.: GPU power consumption optimization control model of GPU clusters. Acta Electronica Sin. 43(10), 1904–1910 (2015). (in Chinese)Google Scholar
- 6.Huo, H.P., Hu, X.M., Sheng, C.C., Wu, B.F.: An energy efficient task scheduling scheme for node-layer heterogeneous GPU clusters. Comput. Appl. Softw. 30(3), 283–286 (2013). (in Chinese)Google Scholar
- 8.Zhang, S., Chatha, K.S.: Approximation algorithm for the temperature-aware scheduling problem. In: IEEE/ACM ACM International Conference on Computer-Aided Design, San Jose, USA, pp. 281–288 (2007)Google Scholar
- 9.Li, X., Jiang, X.H., Wu, Z.H., Ye, K.J.: Research of thermal management methods for green data centers. Chin. J. Comput. 37(5), 1–21 (2014). (in Chinese)Google Scholar
- 11.Liu, H., Wang, J.G., Ge, Z.Z., Gu, Q., Chen, Q., Du, J.C.: Self-learning load balancing scheduling algorithm for GPU heterogeneous cluster. J. Xi’an Shiyou Univ. (Nat. Sci. Ed.) 30(3), 105–111 (2015). (in Chinese)Google Scholar