Applying Operations Management Principles on Optimisation of Scientific Computing Clusters
We apply operations management principles on production scheduling and allocation to computing clusters and their storage resources to increase throughput and reduce lead time of scientific computing jobs. In addition, we study how this approach affects the amount of energy consumed by a computing job comprised of hundreds of calculation tasks. Methodologically we use the design science approach by applying domain knowledge of operations management and efficient resource allocation on the efficient management of the computing resources. Using a test cluster we collected data on CPU and memory utilisation along with energy consumption on different ways of allocating the jobs. We challenge the traditional one job per one processor core method of scheduling scientific clusters with parallel processing and bottleneck management. We observed that by increasing the utilisation rate of the cluster memory increases throughput and decreases energy consumption. We studied also scheduling methods running multiple tasks per CPU core and scheduling based on the amount of free memory available. The test results showed that, at best these methods both decreased energy consumption down to 45% and increased throughput up to 100% compared to the standard practices used in scientific computing. The results are being further tested to eventually support LHC computing of CERN.
KeywordsLarge Hadron Collider Computing Cluster Schedule Method Decrease Energy Consumption Memory Utilisation
Unable to display preview. Download preview PDF.
- Brown, D., Reams, C.: Toward energy-efficient computing. Queue 8(2), 30–43 (2010)Google Scholar
- Ge, R., Feng, X., Cameron, K.: Performance-constrained distributed dvs scheduling for scientific applications on power-aware clusters. In: SC ’05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing, p. 34. IEEE Computer Society, Washington, DC, USA (2005)Google Scholar
- Goes, L., Guerra, P., Coutinho, B., Rocha, L., Meira, W., Ferreira, R., Guedes, D., Cirne, W.: AnthillSched: A scheduling strategy for irregular and iterative I/O-intensive parallel jobs. In: Scheduling Strategies for Parallel Processing, pp. 108–122. Springer, Heidelberg (2005)CrossRefGoogle Scholar
- Goldratt EM, Cox J (1984) The Goal. North River Press, Croton-on-HudsonGoogle Scholar
- Hanselman, S., Pegah, M.: The wild wild waste: e-waste. In: Proceedings of the 35th annual ACM SIGUCCS conference on user services, ACM, New York, NY, USA, pp. 157–162 (2007)Google Scholar
- Hevner, A., March, S., Park, J., Ram, S.: Design science in information systems research. MIS Quarterly 28(1), 75–105 (2004)Google Scholar
- Hopp, W., Spearman, M.: Factory physics. Irwin, Chicago (1996)Google Scholar
- Marwah, M., Sharma, R., Shih, R., Patel, C., Bhatia, V., Mekanapurath, M., Velumani, R., Velayudhan, S.: Data analysis, visualization and knowledge discovery in sustainable data centers. In: Proceedings of the 2nd Bangalore Annual Compute Conference, pp. 1–8. ACM, New York, NY, USA (2009)CrossRefGoogle Scholar
- Niemi T, Kommeri J, Ari-Pekka H (2009) Energy-efficient scheduling of grid computing clusters. In: Proceedings of the 17th Annual International Conference on Advanced Computing and Communications (ADCOM 2009), Bengaluru, IndiaGoogle Scholar
- Santos-Neto, E., Cirne, W., Brasileiro, F., Lima, A.: Exploiting replication and data reuse to efficiently schedule data-intensive applications on grids. In: The 10th Workshop on Job Scheduling Strategies for Parallel Processing, pp. 210–232. Springer, Heidelberg (2004)Google Scholar
- Sun Microsystems (2008) Beginner’s Guide To Suntm Grid Engine 6.2 Installation And Configuration. Tech. rep., Sun MicrosystemsGoogle Scholar
- Wang, C., Huang, X., Hsu, C.: Bi-objective optimization: An online algorithm for job assignment. In: GPC 2009, Geneva, Switzerland (2009)Google Scholar