Energy Efficient Configuration for QoS in Reliable Parallel Servers

  • Dakai Zhu
  • Rami Melhem
  • Daniel Mossé
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3463)

Abstract

Redundancy is the traditional technique used to increase system reliability. With modern technology, in addition to being used as temporal redundancy, slack time can also be used by energy management schemes to scale down system processing speed and supply voltage to save energy. In this paper, we consider a system that consists of multiple servers for providing reliable service. Assuming that servers have self-detection mechanisms to detect faults, we first propose an efficient parallel recovery scheme that processes service requests in parallel to increase the number of faults that can be tolerated and thus the system reliability. Then, for a given request arrival rate, we explore the optimal number of active severs needed for minimizing system energy consumption while achieving k-fault tolerance or for maximizing the number of faults to be tolerated with limited energy budget. Analytical results are presented to show the trade-off between the energy savings and the number of faults being tolerated.

Keywords

Europe Expense 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bohrer, P., Elnozahy, E.N., Keller, T., Kistler, M., Lefurgy, C., McDowell, C., Rajamony, R.: The case for power management in web servers. In: Power Aware Computing, ch. 1. Plenum/Kluwer Publishers (2002)Google Scholar
  2. 2.
    Burd, T.D., Brodersen, R.W.: Energy efficient cmos microprocessor design. In: Proc. of The HICSS Conference (January 1995)Google Scholar
  3. 3.
    Castillo, X., McConnel, S., Siewiorek, D.: Derivation and calibration of a transient error reliability model. IEEE Trans. on computers 31(7), 658–671 (1982)CrossRefGoogle Scholar
  4. 4.
    Intel Corp. Mobile pentium iii processor-m datasheet. Order Number: 298340-002 (October 2001)Google Scholar
  5. 5.
    Elnozahy, E. (Mootaz), Kistler, M., Rajamony, R.: Energy-efficient server clusters. In: Proc. of Power Aware Computing Systems (2002)Google Scholar
  6. 6.
    Elnozahy, E. (Mootaz), Melhem, R., Mossé, D.: Energy-efficient duplex and tmr real-time systems. In: Proc. of The IEEE Real-Time Systems Symposium (2002)Google Scholar
  7. 7.
  8. 8.
  9. 9.
    Ishihara, T., Yauura, H.: Voltage scheduling problem for dynamically variable voltage processors. In: Proc. of The 1998 International Symposium on Low Power Electronics and Design (August 1998)Google Scholar
  10. 10.
    Kavi, K.M., Youn, H.Y., Shirazi, B.: A performability model for soft real-time systems. In: Proc. of the Hawaii International Conference on System Sciences, HICSS (January 1994)Google Scholar
  11. 11.
    Koo, R., Toueg, S.: Checkpointing and rollback recovery for distributed systems. IEEE Trans. on Software Engineering 13(1), 23–31 (1987)MATHCrossRefGoogle Scholar
  12. 12.
    Lebeck, A.R., Fan, X., Zeng, H., Ellis, C.S.: Power aware page allocation. In: Proc. of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (November 2000)Google Scholar
  13. 13.
    Lee, H., Shin, H., Min, S.: Worst case timing requirement of real-time tasks with time redundancy. In: Proc. of Real-Time Computing Systems and Applications (1999)Google Scholar
  14. 14.
    Lefurgy, C., Rajamani, K., Rawson, F., Felter, W., Kistler, M., Keller, T.W.: Energy management for commercial servers. IEEE Computer 36(12), 39–48 (2003)Google Scholar
  15. 15.
    Melhem, R., Mossé, D. (Mootaz)Elnozahy, E.: The interplay of power management and fault recovery in real-time systems. IEEE Trans. on Computers 53(2), 217–231 (2004)CrossRefGoogle Scholar
  16. 16.
    Pradhan, D.K.: Fault Tolerance Computing: Theory and Techniques. Prentice Hall, Englewood Cliffs (1986)Google Scholar
  17. 17.
    Rambus. Rdram (1999), http://www.rambus.com/
  18. 18.
    Seth, K., Anantaraman, A., Mueller, F., Rotenberg, E.: Fast: Frequency-aware static timing analysis. In: Proc. of the IEEE Real-Time System Symposium (2003)Google Scholar
  19. 19.
    Sharma, V., Thomas, A., Abdelzaher, T., Skadron, K., Lu, Z.: Power-aware qos management in web servers. In: Proc. of the 24th IEEE Real-Time System Symposium (December 2003)Google Scholar
  20. 20.
    Shin, K.G., Kim, H.: A time redundancy approach to tmr failures using fault-state likelihoods. IEEE Trans. on Computers 43(10), 1151–1162 (1994)MATHCrossRefGoogle Scholar
  21. 21.
    Sinha, A., Chandrakasan, A.P.: Jouletrack - a web based tool for software energy profiling. In: Proc. of Design Automation Conference (June 2001)Google Scholar
  22. 22.
    Thompson, S., Packan, P., Bohr, M.: Mos scaling: Transistor challenges for the 21st century. Intel Technology Journal, Q3 (1998)Google Scholar
  23. 23.
    Unsal, O.S., Koren, I., Krishna, C.M.: Towards energy-aware software-based fault tolerance in real-time systems. In: Proc. of The International Symposium on Low Power Electronics Design, ISLPED (August 2002)Google Scholar
  24. 24.
    Weiser, M., Welch, B., Demers, A., Shenker, S.: Scheduling for reduced cpu energy. In: Proc. of The First USENIX Symposium on Operating Systems Design and Implementation (November 1994)Google Scholar
  25. 25.
    Yao, F., Demers, A., Shenker, S.: A scheduling model for reduced cpu energy. In: Proc. of The 36th Annual Symposium on Foundations of Computer Science (1995)Google Scholar
  26. 26.
    Zhang, Y., Chakrabarty, K.: Energy-aware adaptive checkpointing in embedded real-time systems. In: Proc. of IEEE/ACM Design, Automation and Test in Europe Conference(DATE) (2003)Google Scholar
  27. 27.
    Zhang, Y., Chakrabarty, K.: Task feasibility analysis and dynamic voltage scaling in fault-tolerant real-time embedded systems. In: Proc. of IEEE/ACM Design, Automation and Test in Europe Conference(DATE) (2004)Google Scholar
  28. 28.
    Zhu, D., Melhem, R., Mossé, D., (Mootaz) Elnozahy, E.: Analysis of an energy efficient optimistic tmr scheme. In: Proc. of the 10th International Conference on Parallel and Distributed Systems, ICPADS (July 2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Dakai Zhu
    • 1
  • Rami Melhem
    • 2
  • Daniel Mossé
    • 2
  1. 1.University of Texas at San AntonioSan AntonioUSA
  2. 2.University of PittsburghPittsburghUSA

Personalised recommendations