Skip to main content

Advertisement

Log in

Wake-up latencies for processor idle states on current x86 processors

  • Special Issue Paper
  • Published:
Computer Science - Research and Development

Abstract

During the last decades various low-power states have been implemented in processors. They can be used by the operating system to reduce the power consumption. The applied power saving mechanisms include load-dependent frequency and voltage scaling as well as the temporary deactivation of unused components. These techniques reduce the power consumption and thereby enable energy efficiency improvements if the system is not used to full capacity. However, an inappropriate usage of low-power states can significantly degrade the performance. The time required to re-establish full performance can be significant. Therefore, deep idle states are occasionally disabled, especially if applications have real-time requirements. In this paper, we describe how low-power states are implemented in current x86 processors. We then measure the wake-up latencies of various low-power states that occur when a processor core is reactivated. Finally, we compare our results to the vendor’s specifications that are exposed to the operating system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. The variable core clock is typically derived from a constant reference clock via frequency multiplying circuits, e.g., phase-locked loops (PLL).

  2. The PCI register readings are:

    D18F4 0x118: 0x0107000Bh - settings for C1 (15:0) and C2 (31:16)

    D18F4 0x11C: 0x00000000h - C3 (15:0) not configured

    D18F4 0x128: 0x00005500h - C1 cache flush timer \(=\) 28h (11:5)

    D18F3 0x0DC: 0x05475632h - C2 cache flush timer \(=\) 28h (25:19).

  3. /sys/devices/system/cpu/cpu*/cpuidle/state*/disable.

  4. D18F3 xA8[31:29] PopDownPstate = D18F3 xDC[10:8] HwPstateMaxVal.

    Fig. 6
    figure 6

    C6 states for Bulldozer processor

  5. Each compute unit has its own frequency domain while all CUs share one voltage domain.

  6. The processing of instructions is stopped during the frequency transition as there is no stable clock signal.

References

  1. Advanced configuration and power interface (acpi) specification, revision 5.0 (2011). http://www.acpi.info/. Accessed 1 Apr 2014

  2. Cherin T, David R, Lana B, Alison Y (ed) (2013) Realtime tuning guide-advanced tuning procedures for the realtime component of Red Hat Enterprise MRG, 4 edn. Red Hat

  3. Advanced Micro Devices (2012) BIOS and Kernel developers guide (BKDG) for AMD Family 15h Models 00h–0Fh Processors. http://support.amd.com/us/Processor_TechDocs/42301_15h_Mod_00h-0Fh_BKDG.pdf. Rev 3.12, Oct 11, 2012

  4. Barreda M, Catalán S, Dolz MF, Fabregat G, Mayo R, Quintana-Ortí ES (2013) Automatic detection of power bottlenecks in parallel scientific applications. Comput Sci Res Dev: 1–9. doi:10.1007/s00450-013-0242-8

  5. Butts J, Sohi G (2000) A static power model for architects. In: Microarchitecture, 2000. MICRO-33. Proceedings. 33rd Annual IEEE/ACM International Symposium on, pp 191–201. doi: 10.1109/MICRO.2000.898070

  6. Choi K, Lee W, Soma R, Pedram M (2004) Dynamic voltage and frequency scaling under a precise energy model considering variable and fixed components of the system power dissipation. In: Computer Aided Design, 2004. ICCAD-2004. IEEE/ACM International Conference on, pp 29–34. doi: 10.1109/ICCAD.2004.1382538

  7. Curtis-Maury M, Dzierwa J, Antonopoulos CD, Nikolopoulos DS (2006) Online power-performance adaptation of multithreaded programs using hardware event-based prediction. In: Egan GK, Muraoka Y (eds) ICS. ACM, New York, pp 157–166

    Google Scholar 

  8. Curtis-Maury M, Singh K, McKee SA, Blagojevic F, Nikolopoulos DS, De Supinski BR, Schulz M (2007) Identifying energy-efficient concurrency levels using machine learning. In: Cluster computing, 2007 IEEE International Conference on. IEEE, pp 488–495

  9. Ge R, Feng X, chun Feng W, Cameron K (2007) CPU miser: a performance-directed, run-time system for power-aware clusters. In: Parallel Processing, 2007. ICPP 2007. International Conference on, pp 18–18. doi:10.1109/ICPP.2007.29

  10. Hsu Ch, Feng Wc (2005) A power-aware run-time system for high-performance computing. In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, SC ’05. IEEE Computer Society, Washington, DC, p 1. doi:10.1109/SC.2005.3

  11. Intel (2014) Intel 64 and IA-32 Architectures Software Developer’s Manual vol 3A, 3B, and 3C: System Programming Guide

  12. Intel Corporation (2011) 2nd Generation Intel Core Processor Family Desktop, Datasheet, vol 1

  13. Intel Corporation (2011) Intel Core i5–600, i3–500 Desktop Processor Series, Intel Pentium Desktop Processor 6000 Series

  14. Intel Corporation (2011) Intel Xeon 5600 Series, Datasheet, vol 1

  15. Intel Corporation (2014) Desktop 4th Generation Intel Core Processor Family, Desktop Intel Pentium Processor Family, and Desktop Intel Celeron Processor Family, Datasheet vol 1 of 2

  16. Intel Corporation (2014) Intel Xeon Processor E5–1600/E5-2600/E5-4600 v2 Product Families, Datasheet, vol 1 of 2

  17. Intel Corporation (2012) Intel Xeon Processor E5–1600/E5-2600/E5-4600 Product Families, Datasheet, vol 1

  18. Intel Corporation (2013) Desktop 3rd Generation Intel Core Processor Family, Desktop Intel Pentium Processor Family, and Desktop Intel Celeron Processor Family, Datasheet, vol 1 of 2

  19. Knobloch M, Mohr B, Minartz T (2012) Determine energy-saving potential in wait-states of large-scale parallel programs. Comput Sci Res Dev 27:255–263. doi:10.1007/s00450-011-0196-7

    Article  Google Scholar 

  20. Mazouz A, Laurent A, Pradelle B, Jalby W (2013) Evaluation of CPU frequency transition latency. Comput Sci Res Dev: 1–9. doi:10.1007/s00450-013-0240-x

  21. Molka D, Hackenberg D, Schöne R (2014) Main memory and cache performance of intel sandy bridge and amd bulldozer. In: ACM SIGPLAN Workshop on Memory Systems Performance and Correctness (MSPC). doi:10.1145/2618128.2618129

  22. Rountree B, Lownenthal DK, de Supinski BR, Schulz M, Freeh VW, Bletsch T (2009) Adagio: making dvs practical for complex hpc applications. In: Proceedings of the 23rd international conference on Supercomputing, ICS ’09. ACM, New York, pp 460–469. doi:10.1145/1542275.1542340

  23. Schöne R, Hackenberg D (2011) On-line analysis of hardware performance events for workload characterization and processor frequency scaling decisions. In: Proceedings of the second joint WOSP/SIPEW international conference on Performance engineering, ICPE ’11. ACM, New York, pp 481–486. doi:10.1145/1958746.1958819

  24. Schöne R, Hackenberg D, Molka D (2012) Memory performance at reduced cpu clock speeds: an analysis of current x\(86\_64\) processors. In: Proceedings of the 2012 USENIX conference on Power-Aware Computing and Systems, HotPower’12. USENIX Association, Berkeley, p 9. http://dl.acm.org/citation.cfm?id=2387869.2387878. Accessed 1 Apr 2014

  25. Schöne R, Tschüter R, Ilsche T, Hackenberg D (2011) The vampirtrace plugin counter interface: introduction and examples. In: Proceedings of the 2010 conference on Parallel processing., Euro-Par 2010. Springer-Verlag, Berlin, Heidelberg, pp 501–511

  26. Suji C, Maragatharaj S, Hemima R (2011) Performance analysis of power gating designs in low power vlsi circuits. In: Signal Processing, Communication, Computing and Networking Technologies (ICSCCN), 2011 International Conference on, pp 689–694. doi:10.1109/ICSCCN.2011.6024639

  27. Tiwari A, Laurenzano M, Peraza J, Carrington L, Snavely A (2012) Green queue: Customized large-scale clock frequency scaling. In: Cloud and Green Computing (CGC), 2012 Second International Conference on, pp 260–267. doi:10.1109/CGC.2012.62

  28. Wu Q, Pedram M, Wu X (2000) Clock-gating and its application to low power design of sequential circuits. Circ Syst Fundam Theory Appl IEEE Trans 47(3):415–420. doi:10.1109/81.841927

    Article  Google Scholar 

Download references

Acknowledgments

This work has been funded by the Bundesministerium für Bildung und Forschung via the research projects CoolSilicon (BMBF 16N10186) and Score-E (BMBF 01IH13001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert Schöne.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Schöne, R., Molka, D. & Werner, M. Wake-up latencies for processor idle states on current x86 processors. Comput Sci Res Dev 30, 219–227 (2015). https://doi.org/10.1007/s00450-014-0270-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00450-014-0270-z

Keywords

Navigation