Optimization of Execution Time under Power Consumption Constraints in a Heterogeneous Parallel System with GPUs and CPUs

  • Paweł Czarnul
  • Paweł Rościszewski
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8314)


The paper proposes an approach for parallelization of computations across a collection of clusters with heterogeneous nodes with both GPUs and CPUs. The proposed system partitions input data into chunks and assigns to particular devices for processing using OpenCL kernels defined by the user. The system is able to minimize the execution time of the application while maintaining the power consumption of the utilized GPUs and CPUs below a given threshold. We present real measurements regarding performance and power consumption of various GPUs and CPUs used in a modern parallel system. Furthermore we show, for a parallel application for breaking MD5 passwords, how the execution time of the real application changes with various upper bounds on the power consumption.


parallel computing GPGPU OpenCL heterogeneous environments performance power consumption 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Buyya, R. (ed.): High Performance Cluster Computing, Programming and Applications. Prentice Hall (1999)Google Scholar
  2. 2.
    Kirk, D.B., Hwu, W.-M.W.: Programming Massively Parallel Processors: A Hands-on Approach 2nd edn. Morgan Kaufmann (2012) ISBN-13: 978-0124159921Google Scholar
  3. 3.
    Sanders, J., Kandrot, E.: CUDA by Example: An Introduction to General-Purpose GPU Programming. Addison-Wesley Professional (2010) ISBN-13: 978-0131387683Google Scholar
  4. 4.
    Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Mancheck, R., Sunderam, V.: PVM Parallel Virtual Machine. In: A Users Guide and Tutorial for Networked Parallel Computing. MIT Press, Cambridge (1994), Google Scholar
  5. 5.
    Wilkinson, B., Allen, M.: Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers. Prentice Hall (1999)Google Scholar
  6. 6.
    Czarnul, P., Grzeda, K.: Parallel simulations of electrophysiological phenomena in myocardium on large 32 and 64-bit linux clusters. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J. (eds.) EuroPVM/MPI 2004. LNCS, vol. 3241, pp. 234–241. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  7. 7.
    Balicki, J., Krawczyk, H., Nawarecki, E. (eds.): Grid and Volunteer Computing. Gdansk University of Technology, Faculty of Electronics Telecommunication and Informatics Press, Gdansk (2012) ISBN: 978-83-60779-17-0Google Scholar
  8. 8.
    Karonis, N.T., Toonen, B., Foster, I.: Mpich-g2: A grid-enabled implementation of the message passing interface. Journal of Parallel and Distributed Computing 63, 551–563 (2003); Special Issue on Computational GridsGoogle Scholar
  9. 9.
    Keller, R., Müller, M.: The Grid-Computing library PACX-MPI: Extending MPI for Computational Grids,
  10. 10.
    Czarnul, P.: BC-MPI: Running an MPI application on multiple clusters with beesyCluster connectivity. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2007. LNCS, vol. 4967, pp. 271–280. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  11. 11.
    Sotomayor, B.: The globus toolkit 4 programmer’s tutorial (2005),
  12. 12.
    Garg, S.K., Buyya, R., Siegel, H.J.: Time and cost trade-off management for scheduling parallel applications on utility grids. Future Gen. Comp. Systems 26, 1344–1355 (2010)CrossRefGoogle Scholar
  13. 13.
    Chin, S.H., Suh, T., Yu, H.C.: Adaptive service scheduling for workflow applications in service-oriented grid. J. Supercomput. 52, 253–283 (2010)CrossRefGoogle Scholar
  14. 14.
    Yu, J., Buyya, R.: A taxonomy of workflow management systems for grid computing. Journal of Grid Computing 3, 171–200 (2005)CrossRefGoogle Scholar
  15. 15.
    Anderson, D.P.: Boinc: A system for public-resource computing and storage. In: Proceedings of 5th IEEE/ACM International Workshop on Grid Computing, Pittsburgh, USA (2004)Google Scholar
  16. 16.
    Barak, A., Ben-nun, T., Levy, E., Shiloh, A.: A package for opencl based heterogeneous computing on clusters with many gpu devices. In: Proc. of Int. Conf. on Cluster Computing, pp. 1–7 (2011)Google Scholar
  17. 17.
    He, C., Du, P.: Cuda performance study on hadoop mapreduce clusters. Univ. of Nebraska-Lincoln (2010),
  18. 18.
    Stan, M.R., Skadron, K.: Guest editors’ introduction: Power-aware computing. IEEE Computer 36, 35–38 (2003)CrossRefGoogle Scholar
  19. 19.
    Cameron, K.W., Ge, R., Feng, X.: High-performance, power-aware distributed computing for scientific applications. Computer 38, 40–47 (2005)CrossRefGoogle Scholar
  20. 20.
    Li, D., De Supinski, B., Schulz, M., Cameron, K., Nikolopoulos, D.: Hybrid mpi/openmp power-aware computing. In: 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS), pp. 1–12 (2010)Google Scholar
  21. 21.
    Kasichayanula, K., Terpstra, D., Luszczek, P., Tomov, S., Moore, S., Peterson, G.D.: Power aware computing on gpus. In: Symposium on Application Accelerators in High-Performance Computing, pp. 64–73 (2012)Google Scholar
  22. 22.
    Lawson, B., Smirni, E.: Power-aware resource allocation in high-end systems via online simulation. In: Arvind, Rudolph, L. (ed.) ICS, pp. 229–238. ACM (2005)Google Scholar
  23. 23.
    Garg, S., Buyya, R.: Exploiting heterogeneity in grid computing for energy-efficient resource allocation. In: Proceedings of the 17th International Conference on Advanced Computing and Communications (ADCOM 2009), Bengaluru, India (2009)Google Scholar
  24. 24.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms. The Massachusetts Institute of Technology (1994)Google Scholar
  25. 25.
    Czarnul, P.: Integration of compute-intensive tasks into scientific workflows in beesyCluster. In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2006. LNCS, vol. 3993, pp. 944–947. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  26. 26.
    Czarnul, P.: A model, design, and implementation of an efficient multithreaded workflow execution engine with data streaming, caching, and storage constraints. Journal of Supercomputing 63, 919–945 (2013)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Paweł Czarnul
    • 1
  • Paweł Rościszewski
    • 1
  1. 1.Faculty of Electronics, Telecommunications and InformaticsGdansk University of TechnologyPoland

Personalised recommendations