Journal of Signal Processing Systems

, Volume 87, Issue 1, pp 33–48 | Cite as

Energy-Awareness and Performance Management with Parallel Dataflow Applications

  • Simon HolmbackaEmail author
  • Erwan Nogues
  • Maxime Pelcat
  • Sébastien Lafond
  • Daniel Menard
  • Johan Lilius


Applications have traditionally been executed as fast as possible (Race-to-Idle) and mapped to as many cores as possible (Fair scheduling) to minimize the energy consumption. With modern hardware, this method has become inefficient because of the power characteristics of the platforms. Instead, applications should utilize an optimal combination of clock frequency and number of cores to balance the dynamic and static power. Such approaches have been difficult to achieve since resource allocation is based only on CPU utilization. Resources are then allocated to prohibit over utilization rather than following software performance requirements. By adjusting the clock frequency directly according to software requirements and activating CPU cores according to the application parallelism, significant energy can be saved by lowering the average power dissipation. To enforce these recommendations, this paper provides means of expressing performance and parallelism in applications for more tight integration with the power management to balance the execution speed and mapping on multi-core systems. An interface between the applications and the hardware resources is provided in combination with a novel power management runtime system called Bricktop. A signal processing case study demonstrates real-world energy savings up to 50 % without performance degradation.


Power management Dataflow Parallelism Multi-core 


  1. 1.
    Aydin, H., Melhem, R., Mosse, D., & Mejia-Alvarez, P. (2004). Power-aware scheduling for periodic real-time tasks. IEEE Transactions on Computers, 53(5), 584–600. doi: 10.1109/TC.2004.1275298.CrossRefGoogle Scholar
  2. 2.
    Azeemi, N.Z. (2006). Exploiting parallelism for energy efficient source code high performance computing. In IEEE International Conference on Industrial Technology, 2006. ICIT 2006. doi: 10.1109/ICIT.2006.372685(pp. 2741–2746).
  3. 3.
    Brodowski, D. (2013). Cpu frequency and voltage scaling code in the linux(tm) kernel.
  4. 4.
    Cervin, A., Henriksson, D., Lincoln, B., Eker, J., & Årzén, K.E. (2003). How does control timing affect performance? Analysis and simulation of timing using Jitterbug and TrueTime. IEEE Control Systems Magazine, 23 (3), 16–30.CrossRefGoogle Scholar
  5. 5.
    Chandrakasan, A., Sheng, S., & Brodersen, R. (1992). Low-power cmos digital design. Solid-State Circuits . Journal of IEEE, 27(4), 473–484. doi: 10.1109/4.126534.Google Scholar
  6. 6.
    Cho, S., & Melhem, R. (2010). On the interplay of parallelization, program performance, and energy consumption. Parallel and Distributed Systems. Transactions on IEEE, 21(3), 342–353. doi: 10.1109/TPDS.2009.41.Google Scholar
  7. 7.
    Cristea, A., & Okamoto, T. (1999). Speed-up opportunities for ann in a time-share parallel environment. In International Joint Conference on Neural Networks, 1999. IJCNN ’99. vol. 4. doi: 10.1109/IJCNN.1999.833446, (Vol. 4 pp. 2410–2413).
  8. 8.
    Lee, E., & D.m. (1987). Static scheduling of synchronous data-flow programs for digital signal processing. IEEE Transactions on Computers, 24–35.Google Scholar
  9. 9.
    Eyerman, S., Eeckhout, L., Karkhanis, T., & Smith, J.E. (2009). A mechanistic performance model for superscalar out-of-order processors. ACM Transactions on Computer Systems 27 (2), 3:1–3:37. doi: 10.1145/1534909.1534910.
  10. 10.
    Gill, P. E., Murray, W., & Michael, Saunders, M.A. (1997). Snopt An sqp algorithm for large-scale constrained optimization. SIAM Journal on Optimization, 12, 979–1006.MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Hähnel, M., & Härtig, H. (2014). Heterogeneity by the numbers: A study of the odroid xu+e big. little platform. In Proceedings of the 6th USENIX Conference on Power-Aware Computing and Systems, HotPower’14, pp. 3–3. USENIX Association, Berkeley, CA, USA.
  12. 12.
    Hällis, F., Holmbacka, S., Lund, W., Slotte, R., Lafond, S., & Lilius, J. (2013). Thermal influence on the energy efficiency of workload consolidation in many-core architectures. In Digital Communications - Green ICT (TIWDC), 2013 24th Tyrrhenian International Workshop on. doi: 10.1109/TIWDC.2013.6664218 (pp. 1–6).
  13. 13.
    Haque, M., Aydin, H., & Zhu, D. (2013). Energy-aware task replication to manage reliability for periodic real-time applications on multicore platforms. In International Green Computing Conference (IGCC), 2013. doi: 10.1109/IGCC.2013.6604518 (pp. 1–11).
  14. 14.
    He, Y., Leiserson, C.E., & Leiserson, W.M. (2010). The cilkview scalability analyzer. In Proceedings of the Twenty-second Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’10, pp. 145–156. ACM, New York, NY, USA. doi: 10.1145/1810479.1810509.
  15. 15.
    Hoffmann, H., Eastep, J., Santambrogio, M.D., Miller, J.E., & Agarwal, A. (2010). Application heartbeats: A generic interface for specifying program performance and goals in autonomous computing environments. In Proceedings of the 7th International Conference on Autonomic Computing, ICAC ’10, pp. 79–88. ACM, New York, NY, USA. doi: 10.1145/1809049.1809065.
  16. 16.
    Hoffmann, H., Eastep, J., Santambrogio, M.D., Miller, J.E., & Agarwal, A. (2010). Application heartbeats for software performance and health. SIGPLAN Not, 45(5), 347–348. doi: 10.1145/1837853.1693507.
  17. 17.
    Hoffmann, H., Sidiroglou, S., Carbin, M., Misailovic, S., Agarwal, A., & Rinard, M. (2011). Dynamic knobs for responsive power-aware computing. SIGPLAN Not, 46(3), 199–212. doi: 10.1145/1961296.1950390.
  18. 18.
    Holmbacka, S., Lafond, S., & Lilius, J. (2015). Performance monitor based power management for big.little platforms. In HIPEAC Workshop on energy efficiency with heterogeneous computing (pp. 1–6).Google Scholar
  19. 19.
    Hong, I., Kirovski, D., Qu, G., Potkonjak, M., & Srivastava, M. (1998). Power optimization of variable voltage core-based systems. In Design automation conference, 1998. Proceedings (pp. 176–181).Google Scholar
  20. 20.
    Huang, K., Santinelli, L., Chen, J.J., Thiele, L., & Buttazzo, G. (2009). Adaptive dynamic power management for hard real-time systems. In Real-Time Systems Symposium, 2009, RTSS 2009. 30th IEEE. doi: 10.1109/RTSS.2009.25 (pp. 23–32).
  21. 21.
    Huang, K., Santinelli, L., Chen, J.J., Thiele, L., & Buttazzo, G. (2009). Periodic power management schemes for real-time event streams. In CDC/CCC 2009. Proceedings of the 48th IEEE Conference. doi: 10.1109/CDC.2009.5400034 (pp. 6224–6231).
  22. 22.
    Iondry, K. (1999). Iterative methods for optimization society for industrial and applied mathematics.Google Scholar
  23. 23.
    Jafri, S., Tajammul, M., Hemani, A., Paul, K., Plosila, J., & Tenhunen, H. (2013). Energy-aware-task-parallelism for efficient dynamic voltage, and frequency scaling, in cgras. In International Conference on Embedded computer systems: Architectures, Modeling, and Simulation (SAMOS XIII), 2013 (pp. 104–112).Google Scholar
  24. 24.
    Jejurikar, R., Pereira, C., & Gupta, R. (2004). Leakage aware dynamic voltage scaling for real-time embedded systems. In Proceedings of the 41st Annual Design Automation Conference, DAC ’04, pp. 275–280. ACM, New York, NY, USA. doi: 10.1145/996566.996650.
  25. 25.
    Jones, M.T. (2006). Inside the linux scheduler.
  26. 26.
    Kahng, A., Kang, S., Kumar, R., & Sartori, J. (2013). Enhancing the efficiency of energy-constrained dvfs designs. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 21(10), 1769–1782. doi: 10.1109/TVLSI.2012.2219084.
  27. 27.
    Karmarkar, N. (1984). A new polynomial-time algorithm for linear programming. In Proceedings of the 16th ACM symposium on Theory of computing, STOC ’84, pp. 302–311. ACM. doi: 10.1145/800057.808695.
  28. 28.
    Khalid, N., Ahmad, S., Noor, N., Fadzil, A., & Taib, M. (2011). Parallel approach of sobel edge detector on multicore platform. International Journal of Computers and Communications Issue, 4, 236–244.Google Scholar
  29. 29.
    Kim, N., Austin, T., Baauw, D., Mudge, T., Flautner, K., Hu, J., Irwin, M., Kandemir, M., & Narayanan, V. (2003). Leakage current: Moore’s law meets static power. Computer, 36(12), 68–75. doi: 10.1109/MC.2003.1250885.
  30. 30.
    Kim, W., Shin, D., Yun, H.S., Kim, J., & Min, S.L. (2002). Performance comparison of dynamic voltage scaling algorithms for hard real-time systems. In Real-Time and Embedded Technology and Applications Symposium, 2002. Proceedings. Eighth IEEE. doi: 10.1109/RTTAS.2002.1137397 (pp. 219–228).
  31. 31.
    M’zah, A., & Hammami, O. (2010). Parallel programming and speed up evaluation of a noc 2-ary 4-fly. In International Conference on Microelectronics (ICM), 2010.  10.1109/ICM.2010.5696103 (pp. 156–159).
  32. 32.
    Nollet, V., Verkest, D., & Corporaal, H. (2008). A safari through the mpsoc run-time management jungle. Journal of Signal Processing Systems, 60(2), 251–268.CrossRefGoogle Scholar
  33. 33.
    Pelcat, M., Piat, J., Wipliez, M., Aridhi, S., & Nezan, J. F. (2009). An open framework for rapid prototyping of signal processing applications. EURASIP journal on embedded systems, 2009, 11.CrossRefGoogle Scholar
  34. 34.
    Qiu, M., Niu, J.W., Yang, L., Qin, X., Zhang, S., & Wang, B. (2010). Energy-aware loop parallelism maximization for multi-core dsp architectures. In Green Computing and Communications (GreenCom), 2010 IEEE/ACM Int’l Conference on Int’l Conference on Cyber, Physical and Social Computing (CPSCom). doi: 10.1109/GreenCom-CPSCom.2010.87 (pp. 205–212).
  35. 35.
    Rauber, T., & Runger, G. (2012). Energy-aware execution of fork-join-based task parallelism. In IEEE 20th International Symposium on Modeling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS), 2012. doi: 10.1109/MASCOTS.2012.35 (pp. 231–240).
  36. 36.
    Sadri, M., Bartolini, A., & Benini, L. (2011). Single-chip cloud computer thermal model. In 17th international workshop on Thermal investigations of ICs and systems (THERMINIC), 2011 (pp. 1–6).Google Scholar
  37. 37.
    Sasaki, H., Imamura, S., & Inoue, K. (2013). Coordinated power-performance optimization in manycores. In 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT), 2013. doi: 10.1109/PACT.2013.6618803 (pp. 51–61).
  38. 38.
    Seth, K., Anantaraman, A., Mueller, F., & Rotenberg, E. (2003). Fast: Frequency-aware static timing analysis. In Proceedings of the 24th IEEE international Real-Time Systems Symposium, RTSS ’03, pp. 40–. IEEE computer society, washington, DC, USA.Google Scholar
  39. 39.
    Singh, H., Agarwal, K., Sylvester, D., & Nowka, K. (2007). IEEE Transactions on Enhanced leakage reduction techniques using intermediate strength power gating. Very Large Scale Integration (VLSI) Systems, 15(11), 1215–1224. doi: 10.1109/TVLSI.2007.904101.
  40. 40.
    Takouna, I., Dawoud, W., & Meinel, C. (2011). Accurate mutlicore processor power models for power-aware resource management. In IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing (DASC), 2011. doi: 10.1109/DASC.2011.85 (pp. 419–426).
  41. 41.
    Truchet, C., Richoux, F., & Codognet, P. (2013). Prediction of parallel speed-ups for las vegas algorithms. In 42nd International Conference on Parallel Processing (ICPP), 2013. doi: 10.1109/ICPP.2013.25(pp. 160–169).

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Simon Holmbacka
    • 1
    Email author
  • Erwan Nogues
    • 2
  • Maxime Pelcat
    • 2
  • Sébastien Lafond
    • 3
  • Daniel Menard
    • 2
  • Johan Lilius
    • 3
  1. 1.Turku Centre for Computer ScienceTurkuFinland
  2. 2.IETR Image GroupINSA de RennesRennesFrance
  3. 3.Faculty of Science and EngineeringÅbo Akademi UniversityTurkuFinland

Personalised recommendations