PicoServer: Using 3D Stacking Technology to Build Energy Efficient Servers

  • Taeho Kgil
  • David Roberts
  • Trevor Mudge
Part of the Integrated Circuits and Systems book series (ICIR)


With power and cooling increasingly contributing to the operating costs of a datacenter, energy efficiency is the key driver in server design. One way to improve energy efficiency is to implement innovative interconnect technologies such as 3D stacking. Three-dimensional stacking technology introduces new opportunities for future servers to become low power, compact, and possibly mobile. This chapter introduces an architecture called Picoserver that employs 3D technology to bond one die containing several simple slow processing cores with multiple memory dies sufficient for a primary memory. The multiple memory dies are composed of DRAM. This use of 3D stacks readily facilitates wide low-latency buses between processors and memory. These remove the need for an L2 cache allowing its area to be re-allocated to additional simple cores. The additional cores allow the clock frequency to be lowered without impairing throughput. Lower clock frequency means that thermal constraints, a concern with 3D stacking, are easily satisfied. PicoServer is intentionally simple, requiring only the simplest form of 3D technology where die are stacked on top of one another. Our intent is to minimize risk of introducing a new technology (3D) to implement a class of low-cost, low-power, compact server architectures.


Access Latency Client Request NAND Flash Disk Cache Server Workload 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work is supported in part by the National Science Foundation, Intel, and ARM Ltd.


  1. 1.
  2. 2.
  3. 3.
    The Micron system-power calculator.
  4. 4.
    National semiconductor DP83820 10 / 100 / 1000 Mb/s PCI ethernet network interface controller.Google Scholar
  5. 5.
  6. 6.
    (LS)3-libre streaming, libre software, libre standards an open multimedia streaming project.
  7. 7.
  8. 8.
  9. 9.
    SPECweb2005 benchmark.
  10. 10.
    SPECweb99 benchmark.
  11. 11.
    Sun Fire T2000 Server Power Calculator.
  12. 12.
    ITRS roadmap. Technical report, 2005.Google Scholar
  13. 13.
    K. Banerjee, S. J. Souri, P. Kapur, and K. C. Saraswat. 3-D ICs: A novel chip design for improving deep-submicrometer interconnect performance and systems-on-chip integration. Proceedings of the IEEE, 89(5):602–633, May 2001.CrossRefGoogle Scholar
  14. 14.
    P. Barford and M. Crovella. Generating representative web workloads for network and server performance evaluation. In Measurement and Modeling of Computer Systems, pp. 151–160, 1998.Google Scholar
  15. 15.
    N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G. Saidi, and S. K. Reinhardt. The M5 simulator: Modeling networked systems. IEEE Micro, 26(4):52–60, Jul/Aug 2006.CrossRefGoogle Scholar
  16. 16.
    B. Black, M. Annavaram, N. Brekelbaum, J. DeVale, L. Jiang, G. H. Loh, D. McCaule, P. Morrow, D. W. Nelson, D. Pantuso, P. Reed, J. Rupley, S. Shankar, J. P. Shen, and C. Webb. Die stacking (3D) microarchitecture. In International Symposium on Microarchitecture, December 2006.Google Scholar
  17. 17.
    B. Black, D. Nelson, C. Webb, and N. Samra. 3D processing technology and its impact on iA32 microprocessors. In Proceedings of International Conference on Computer Design, pp. 316–318, 2004.Google Scholar
  18. 18.
    R. Bryant, J. Hawkes, J. Steiner, J. Barnes, and J. Higdon. Scaling Linux to the Extreme From 64 to 512 Processors. In Linux Symposium, July 2004.Google Scholar
  19. 19.
    T.-Y. Chiang, S. J. Souri, C. O. Chui, and K. C. Saraswat. Thermal analysis of heterogeneous 3-D ICs with various integration scenario. In IEDM Technical Digest, pp. 681–684, December 2001.Google Scholar
  20. 20.
    L. T. Clark, E. J. Hoffman, J. Miller, M. Biyani, Y. Liao, S. Strazdus, M. Morrow, K. E. Verlarde, and M. A. Yarch. An embedded 32-b microprocessor core for low-power and high-performance applications. IEEE Journal of Solid State Circuits, 36(11):1599–1608, November 2001.CrossRefGoogle Scholar
  21. 21.
    E. L. Congduc. Packet classification in the NIC for improved SMP-based internet servers. In Proceedings of International Conference on Networking, February 2004.Google Scholar
  22. 22.
    W. R. Davis, J. Wilson, S. Mick, J. Xu, H. Hua, C. Mineo, A. M. Sule, M. Steer, and P. D. Franzon. Demystifying 3D ICs: The pros and cons of going vertical. IEEE Design & Test of Computers, 22(6):498–510, 2005.CrossRefGoogle Scholar
  23. 23.
    M. J. Flynn and P. Hung. Computer architecture and technology: Some thoughts on the road ahead. In Proceedings on International Conference on Engineering of Reconfigurable Systems and Algorithms, pp. 3–16, 2004.Google Scholar
  24. 24.
    M. Ghosh and H.-H. S. Lee. Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Converntional and 3D Die-Stacked DRAMs. In International Symposuim on Microarchitecture, December 2007.Google Scholar
  25. 25.
    B. Goplen and S. S. Sapatnekar. Thermal via placement in 3D ICs. In Proceedings of International Symposium on Physical Design, pp. 167–174, April 2005.Google Scholar
  26. 26.
    S. Gupta, M. Hilbert, S. Hong, and R. Patti. Techniques for producing 3D ICs with high-density interconnect.
  27. 27.
    R. Ho and M. Horowitz. The future of wires. Proceedings of the IEEE, 89(4), April 2001.Google Scholar
  28. 28.
    W. Huang, M. R. Stan, K. Skadron, K. Sankaranarayanan, S. Ghosh, and S. Velusam. Compact thermal modeling for temperature-aware design. In Proceedings Design Automation Conference, June 2004.Google Scholar
  29. 29.
    T. Kgil. Architecting Energy Efficient Servers. PhD thesis, University of Michigan, 2007.Google Scholar
  30. 30.
    T. Kgil and T. Mudge. FlashCache: a NAND flash memory file cache for low power web servers. In Proceedings of International Conference on Compilers, Architecture and Synthesis for Embedded Systems, October 2006.Google Scholar
  31. 31.
    T. Kgil, D. Roberts, and T. Mudge. Improving NAND flash based disk caches. In Proceedings of International Symposium on Computer Architecture, June 2008.Google Scholar
  32. 32.
    T. Kgil, A. Saidi, N. Binkert, S. Reinhardt, K. Flautner, and T. Mudge. PicoServer: Using 3D stacking technology to build energy efficient servers. ACM Journal on Emerging Technologies in Computing Systems, 2009.Google Scholar
  33. 33.
    M. G. Khatib, B. J. van der Zwaag, P. Hartel, and G. J. M. Smit. Interposing flash between disk and DRAM to save energy for streaming workloads. In ESTIMedia, 2007.Google Scholar
  34. 34.
    K. Kim and J. Choi. Future outlook of NAND flash technology for 40 nm node and beyond. In Workshop on Non-Volatile Semiconductor Memory, pp. 9–11, February 2006.Google Scholar
  35. 35.
    I. Koltsidas and S. D. Viglas. Flashing up the storage layer. In VLDB, August 2008.Google Scholar
  36. 36.
    P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-way multithreaded Sparc processor. IEEE Micro, 25(2):21–29, March 2005.CrossRefGoogle Scholar
  37. 37.
    M. Koyanagi. Different approaches to 3D chips. Spring05/slides/051205-Koyanagi.pdf.
  38. 38.
    S. R. Kunkel, R. J. Eickemeyer, M. H. Lipasti, T. J. Mullins, B. O’Krafka, H. Rosenberg, S. P. VanderWiel, P. L. Vitale, and L. D. Whitley. A performance methodology for commercial servers. IBM Journal of Research and Development, 44(6):851–872, 2000.CrossRefGoogle Scholar
  39. 39.
    J. Laudon. Performance/watt: the new server focus. SIGARCH Computer Architecture News, 33(4):5–13, 2005.CrossRefGoogle Scholar
  40. 40.
    K. Lee, T. Nakamura, T. Ono, Y. Yamada, T. Mizukusa, H. Hashimoto, K. Park, H. Kurino, and M. Koyanagi. Three-dimensional shared memory fabricated using wafer stacking technology. In IEDM Technical Digest, pp. 165–168, December 2000.Google Scholar
  41. 41.
    K. Lim, P. Ranganathan, J. Chang, C. Patel, T. Mudge, and S. Reinhardt. Understanding and designing new server architectures for emerging warehouse-computing environments. In Proceedings of International Symposium on Computer Architecture, June 2008.Google Scholar
  42. 42.
    J.-H. Lin, Y.-H. Chang, J.-W. Hsieh, T.-W. Kuo, and C.-C. Yang. A NOR emulation strategy over NAND flash memory. In 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, August 2007.Google Scholar
  43. 43.
    G. L. Loi, B. Agrawal, N. Srivastava, S.-C. Lin, T. Sherwood, and K. Banerjee. A thermally-aware performance analysis of vertically integrated (3-D) processor-memory hierarchy. In Proceedings Design Automation Conference, June 2006.Google Scholar
  44. 44.
    J. Lu. Wafer-level 3D hyper-integration technology platform. luj/RPI_3D_Research_0504.pdf.
  45. 45.
    G. MacGillivray. Process vs. density in DRAMs.
  46. 46.
    D. A. Maltz and P. Bhagwat. TCP splicing for application layer proxy performance. Research Report RC 21139, IBM, March 1998.Google Scholar
  47. 47.
    R. E. Matick and S. E. Schuster. Logic-based eDRAM: origins and rationale for use. IBM Journal of Research and Development, 49(1):145–165, January 2005.Google Scholar
  48. 48.
    T. Ohsawa, K. Fujita, K. Hatsuda, T. Higashi, T. Shino, Y. Minami, H. Nakajima, M. Morikado, K. Inoh, T. Hamamoto, S. Watanabe, S. Fujii, and T. Furuyama. Design of a 128-Mb SOI DRAM using the floating body cell (FBC). IEEE Journal of Solid State Circuits, 41(1), January 2006.Google Scholar
  49. 49.
    C. Park, J.-U. Kang, S.-Y. Park, and J.-S. Kim. Energy-aware demand paging on NAND flash-based embedded storages. In ISLPED, pp. 338–343, 2004.Google Scholar
  50. 50.
    A. Rahman and R. Reif. System-level performance evaluation of three-dimensional integrated circuits. IEEE Transactions on VLSI, 8(6):671–678, December 2000.CrossRefGoogle Scholar
  51. 51.
    F. Ricci, L. T. Clark, T. Beatty, W. Yu, A. Bashmakov, S. Demmons, E. Fox, J. Miller, M. Biyani, and J. Haigh. A 1.5 GHz 90 nm embedded microprocessor core. In Proceedings of the IEEE Symposium on VLSI Circuits, pp. 12–15, June 2005.Google Scholar
  52. 52.
    J. Scaramella. Enabling technologies for power and cooling.
  53. 53.
    J. Schutz and C. Webb. A scalable X86 CPU design for 90 nm process. In Proceedings of IEEE International Solid-State Circuits Conference, February 2004.Google Scholar
  54. 54.
    M. Shah, J. Barreh, J. Brooks, R. Golla, G. Grohoski, N. Gura, R. Hetherington, P. Jordan, M. Luttrell, C. Olson, B. Saha, D. Sheahan, L. Spracklen, and A. Wynn. UltraSPARC T2: A highly-threaded, power-efficient, SPARC SOC. In Asian Solid-State Circuirts Conference, November 2007.Google Scholar
  55. 55.
    J. Truong. Evolution of network memory.
  56. 56.
    D. Wendell, J. Lin, P. Kaushik, S. Seshadri, A. Wang, V. Sundararaman, P. Wang, H. McIntyre, S. Kim, W. Hsu, H. Park, G. Levinsky, J. Lu, M. Chirania, R. Heald, and P. Lazar. A 4 MB on-chip l2 cache for a 90 nm 1.6 GHz 64b SPARC microprocessor. In Proceedings of IEEE International Solid-State Circuits Conference, February 2004.Google Scholar
  57. 57.
    L. Xue, C. C. Liu, H.-S. Kim, S. Kim, and S. Tiwari. Three-dimensional integration: Technology, use, and issues for mixed-signal applications. IEEE Transactions on Electron Devices, 50:601–609, May 2003.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.IntelHillsboroUSA
  2. 2.University of MichiganAnn ArborUSA

Personalised recommendations