PicoServer: Using 3D Stacking Technology to Build Energy Efficient Servers
With power and cooling increasingly contributing to the operating costs of a datacenter, energy efficiency is the key driver in server design. One way to improve energy efficiency is to implement innovative interconnect technologies such as 3D stacking. Three-dimensional stacking technology introduces new opportunities for future servers to become low power, compact, and possibly mobile. This chapter introduces an architecture called Picoserver that employs 3D technology to bond one die containing several simple slow processing cores with multiple memory dies sufficient for a primary memory. The multiple memory dies are composed of DRAM. This use of 3D stacks readily facilitates wide low-latency buses between processors and memory. These remove the need for an L2 cache allowing its area to be re-allocated to additional simple cores. The additional cores allow the clock frequency to be lowered without impairing throughput. Lower clock frequency means that thermal constraints, a concern with 3D stacking, are easily satisfied. PicoServer is intentionally simple, requiring only the simplest form of 3D technology where die are stacked on top of one another. Our intent is to minimize risk of introducing a new technology (3D) to implement a class of low-cost, low-power, compact server architectures.
KeywordsAccess Latency Client Request NAND Flash Disk Cache Server Workload
This work is supported in part by the National Science Foundation, Intel, and ARM Ltd.
- 1.ARM 11 MPcore. http://www.arm.com/products/CPUs/ARM11MPCoreMultiprocessor.html.
- 2.FaStack 3D RISC super-8051 microcontroller. http://www.tachyonsemi.com/OtherICs/datasheets/TSCR8051Lx_1_5Web.pdf.
- 3.The Micron system-power calculator. http://www.micron.com/products/dram/syscalc.html.
- 4.National semiconductor DP83820 10 / 100 / 1000 Mb/s PCI ethernet network interface controller.Google Scholar
- 5.OSDL DataBase Test Suite. http://www.osdl.net/lab_activities/kernel_testing/osdl_database_test_suite/.
- 6.(LS)3-libre streaming, libre software, libre standards an open multimedia streaming project. http://streaming.polito.it/.
- 7.RLDRAM memory. http://www.micron.com/products/dram/rldram/.
- 8.Seagate Barracuda. http://www.seagate.com/products/personal/index.html.
- 9.SPECweb2005 benchmark. http://www.spec.org/web2005/.
- 10.SPECweb99 benchmark. http://www.spec.org/osg/web99/.
- 11.Sun Fire T2000 Server Power Calculator. http://www.sun.com/servers/coolthreads/t2000/calc/index.jsp.
- 12.ITRS roadmap. Technical report, 2005.Google Scholar
- 14.P. Barford and M. Crovella. Generating representative web workloads for network and server performance evaluation. In Measurement and Modeling of Computer Systems, pp. 151–160, 1998.Google Scholar
- 16.B. Black, M. Annavaram, N. Brekelbaum, J. DeVale, L. Jiang, G. H. Loh, D. McCaule, P. Morrow, D. W. Nelson, D. Pantuso, P. Reed, J. Rupley, S. Shankar, J. P. Shen, and C. Webb. Die stacking (3D) microarchitecture. In International Symposium on Microarchitecture, December 2006.Google Scholar
- 17.B. Black, D. Nelson, C. Webb, and N. Samra. 3D processing technology and its impact on iA32 microprocessors. In Proceedings of International Conference on Computer Design, pp. 316–318, 2004.Google Scholar
- 18.R. Bryant, J. Hawkes, J. Steiner, J. Barnes, and J. Higdon. Scaling Linux to the Extreme From 64 to 512 Processors. In Linux Symposium, July 2004.Google Scholar
- 19.T.-Y. Chiang, S. J. Souri, C. O. Chui, and K. C. Saraswat. Thermal analysis of heterogeneous 3-D ICs with various integration scenario. In IEDM Technical Digest, pp. 681–684, December 2001.Google Scholar
- 21.E. L. Congduc. Packet classification in the NIC for improved SMP-based internet servers. In Proceedings of International Conference on Networking, February 2004.Google Scholar
- 23.M. J. Flynn and P. Hung. Computer architecture and technology: Some thoughts on the road ahead. In Proceedings on International Conference on Engineering of Reconfigurable Systems and Algorithms, pp. 3–16, 2004.Google Scholar
- 24.M. Ghosh and H.-H. S. Lee. Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Converntional and 3D Die-Stacked DRAMs. In International Symposuim on Microarchitecture, December 2007.Google Scholar
- 25.B. Goplen and S. S. Sapatnekar. Thermal via placement in 3D ICs. In Proceedings of International Symposium on Physical Design, pp. 167–174, April 2005.Google Scholar
- 26.S. Gupta, M. Hilbert, S. Hong, and R. Patti. Techniques for producing 3D ICs with high-density interconnect. http://www.tezzaron.com/about/papers/ieee_vmic_2004_finalsecure.pdf.
- 27.R. Ho and M. Horowitz. The future of wires. Proceedings of the IEEE, 89(4), April 2001.Google Scholar
- 28.W. Huang, M. R. Stan, K. Skadron, K. Sankaranarayanan, S. Ghosh, and S. Velusam. Compact thermal modeling for temperature-aware design. In Proceedings Design Automation Conference, June 2004.Google Scholar
- 29.T. Kgil. Architecting Energy Efficient Servers. PhD thesis, University of Michigan, 2007.Google Scholar
- 30.T. Kgil and T. Mudge. FlashCache: a NAND flash memory file cache for low power web servers. In Proceedings of International Conference on Compilers, Architecture and Synthesis for Embedded Systems, October 2006.Google Scholar
- 31.T. Kgil, D. Roberts, and T. Mudge. Improving NAND flash based disk caches. In Proceedings of International Symposium on Computer Architecture, June 2008.Google Scholar
- 32.T. Kgil, A. Saidi, N. Binkert, S. Reinhardt, K. Flautner, and T. Mudge. PicoServer: Using 3D stacking technology to build energy efficient servers. ACM Journal on Emerging Technologies in Computing Systems, 2009.Google Scholar
- 33.M. G. Khatib, B. J. van der Zwaag, P. Hartel, and G. J. M. Smit. Interposing flash between disk and DRAM to save energy for streaming workloads. In ESTIMedia, 2007.Google Scholar
- 34.K. Kim and J. Choi. Future outlook of NAND flash technology for 40 nm node and beyond. In Workshop on Non-Volatile Semiconductor Memory, pp. 9–11, February 2006.Google Scholar
- 35.I. Koltsidas and S. D. Viglas. Flashing up the storage layer. In VLDB, August 2008.Google Scholar
- 37.M. Koyanagi. Different approaches to 3D chips. http://asia.stanford.edu/events/ Spring05/slides/051205-Koyanagi.pdf.
- 40.K. Lee, T. Nakamura, T. Ono, Y. Yamada, T. Mizukusa, H. Hashimoto, K. Park, H. Kurino, and M. Koyanagi. Three-dimensional shared memory fabricated using wafer stacking technology. In IEDM Technical Digest, pp. 165–168, December 2000.Google Scholar
- 41.K. Lim, P. Ranganathan, J. Chang, C. Patel, T. Mudge, and S. Reinhardt. Understanding and designing new server architectures for emerging warehouse-computing environments. In Proceedings of International Symposium on Computer Architecture, June 2008.Google Scholar
- 42.J.-H. Lin, Y.-H. Chang, J.-W. Hsieh, T.-W. Kuo, and C.-C. Yang. A NOR emulation strategy over NAND flash memory. In 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, August 2007.Google Scholar
- 43.G. L. Loi, B. Agrawal, N. Srivastava, S.-C. Lin, T. Sherwood, and K. Banerjee. A thermally-aware performance analysis of vertically integrated (3-D) processor-memory hierarchy. In Proceedings Design Automation Conference, June 2006.Google Scholar
- 44.J. Lu. Wafer-level 3D hyper-integration technology platform. http://www.rpi.edu/ luj/RPI_3D_Research_0504.pdf.
- 45.G. MacGillivray. Process vs. density in DRAMs. http://www.eetasia.com/ARTICLES/2005SEP/B/2005SEP01_STOR_TA.pdf.
- 46.D. A. Maltz and P. Bhagwat. TCP splicing for application layer proxy performance. Research Report RC 21139, IBM, March 1998.Google Scholar
- 47.R. E. Matick and S. E. Schuster. Logic-based eDRAM: origins and rationale for use. IBM Journal of Research and Development, 49(1):145–165, January 2005.Google Scholar
- 48.T. Ohsawa, K. Fujita, K. Hatsuda, T. Higashi, T. Shino, Y. Minami, H. Nakajima, M. Morikado, K. Inoh, T. Hamamoto, S. Watanabe, S. Fujii, and T. Furuyama. Design of a 128-Mb SOI DRAM using the floating body cell (FBC). IEEE Journal of Solid State Circuits, 41(1), January 2006.Google Scholar
- 49.C. Park, J.-U. Kang, S.-Y. Park, and J.-S. Kim. Energy-aware demand paging on NAND flash-based embedded storages. In ISLPED, pp. 338–343, 2004.Google Scholar
- 51.F. Ricci, L. T. Clark, T. Beatty, W. Yu, A. Bashmakov, S. Demmons, E. Fox, J. Miller, M. Biyani, and J. Haigh. A 1.5 GHz 90 nm embedded microprocessor core. In Proceedings of the IEEE Symposium on VLSI Circuits, pp. 12–15, June 2005.Google Scholar
- 52.J. Scaramella. Enabling technologies for power and cooling. http://h71028.www7.hp.com/enterprise/downloads/Thermal_Logic.pdf.
- 53.J. Schutz and C. Webb. A scalable X86 CPU design for 90 nm process. In Proceedings of IEEE International Solid-State Circuits Conference, February 2004.Google Scholar
- 54.M. Shah, J. Barreh, J. Brooks, R. Golla, G. Grohoski, N. Gura, R. Hetherington, P. Jordan, M. Luttrell, C. Olson, B. Saha, D. Sheahan, L. Spracklen, and A. Wynn. UltraSPARC T2: A highly-threaded, power-efficient, SPARC SOC. In Asian Solid-State Circuirts Conference, November 2007.Google Scholar
- 55.J. Truong. Evolution of network memory. http://www.jedex.org/images/pdf/jack_troung_samsung.pdf.
- 56.D. Wendell, J. Lin, P. Kaushik, S. Seshadri, A. Wang, V. Sundararaman, P. Wang, H. McIntyre, S. Kim, W. Hsu, H. Park, G. Levinsky, J. Lu, M. Chirania, R. Heald, and P. Lazar. A 4 MB on-chip l2 cache for a 90 nm 1.6 GHz 64b SPARC microprocessor. In Proceedings of IEEE International Solid-State Circuits Conference, February 2004.Google Scholar