Science China Information Sciences

, Volume 55, Issue 7, pp 1493–1508 | Cite as

SCautz: a high performance and fault-tolerant datacenter network for modular datacenters

  • Feng Huang
  • XiCheng Lu
  • DongSheng Li
  • YiMing Zhang
Research Paper

Abstract

Modular datacenters (MDCs) use shipping containers, encapsulating thousands of servers, as large pluggable building blocks for mega datacenters. The MDC’s “service-free” model poses stricter demand on fault-tolerance of the modular datacenter network (MDCN). Based on the “scale-out” principle, in this paper we propose SCautz, a novel hybrid intra-container network for MDCs. SCautz comprises a base Kautz topology, created by interconnecting servers, and a small number of COTS (commercial off-the-shelf) switches. Moreover, each switch connects a specific number of servers forming “clusters”, which, as logical nodes, form multiple higher-level logical Kautz structures. SCautz’s hybrid structure has several advantages. First, it supports multiple running modes for the MDC, while its full structure increases network capacity twofold. Second, it retains the throughput for processing one-to-x traffic in the presence of failures. Finally, it achieves much more graceful network performance degradation than computation and storage capacity do. Results from theoretical analysis and simulations show that SCautz is more viable for intra-container networks.

Keywords

modular datacenter network container Kautz fault-tolerant 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Hamilton J R. Architecture for modular data centers. In: Proceedings of Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, 2007Google Scholar
  2. 2.
    Vishwanath K V, Greenberg A, Reed D A. Modular data centers: how to design them. In: Proceedings of the 1st ACM Workshop on Large-Scale System and Application Performance, Munich, 2009. 3–10Google Scholar
  3. 3.
    Waldrop M. Data center in a box. Sci Am, 2007, 297: 90–93CrossRefGoogle Scholar
  4. 4.
  5. 5.
    Rackable Systems. ICE cube modular data center. http://www.rackable.com/products/icecube.aspx
  6. 6.
  7. 7.
    Guo C X, Lu G H, Li D, et al. BCube: a high performance, server-centric network architecture for modular data centers. In: Proceedings of the ACM SIGCOMM Conference on Data Communication (SIGCOMM’ 09), Barcelona, 2009. 63–74Google Scholar
  8. 8.
    Greenberg A, Hamilton J, Malz D A, et al. The cost of a cloud: research problems in data center networks. In: Proceedings of the ACM SIGCOMM Conference on Data Communication (SIGCOMM’ 08), Seattle, 2008. 68–73Google Scholar
  9. 9.
    Armbrust M, Fox A, Griffith R, et al. A view of cloud computing. Commun ACM, 2010, 53: 50–58CrossRefGoogle Scholar
  10. 10.
    Greenberg A, Hamilton J R, Jain N, et al. VL2: a scalable and flexible data center network. In: Proceedings of the ACM SIGCOMM Conference on Data Communication (SIGCOMM’ 09), Barcelona, 2009Google Scholar
  11. 11.
    Mysore R N, Pamboris A, Farrington N, et al. PortLand: a scalable fault-tolerant layer 2 data center network fabric. In: Proceedings of the ACM SIGCOMM Conference on Data Communication (SIGCOMM’ 09), Barcelona, 2009Google Scholar
  12. 12.
    Guo C X, Wu H T, Tan K, et al. Dcell: a scalable and fault-tolerant network structure for data centers. In: Proceedings of the ACM SIGCOMM Conference on Data Communication (SIGCOMM’ 08), Seattle, 2008Google Scholar
  13. 13.
    Wu H T, Lu G H, Li D, et al. MDCube: a high performance network structure for modular data center interconnection. In: Proceedings of the 5th International Conference on Emerging Networking Experiments and Technologies (CoNEXT’ 09), Rome, 2009Google Scholar
  14. 14.
    Abu-Libdeh H, Costa P,, Rowston A, et al. Symbiotic routing in future data centers. In: Proceedings of the ACM SIGCOMM Conference on SIGCOMM (SIGCOMM’ 10), New Delhi, 2010Google Scholar
  15. 15.
    Kautz W H. Design of optimal interconnection networks for multiprocessors. Archit Des Digit Comput, 1969: 249–272Google Scholar
  16. 16.
    Fiol M A, Llado A S. The partial line digraph technique in the design of large interconnection networks. IEEE Trans Comput, 1992, 41: 848–857MathSciNetCrossRefGoogle Scholar
  17. 17.
    Pradhan D K, Reddy S M. A fault-tolerant communication architecture for distributed systems. IEEE Trans Comput, 1982, 32: 863–870CrossRefGoogle Scholar
  18. 18.
    Al-Fares M, Radhakrishnan S, Raghavan B, et al. Hedera: Dynamic flow scheduling for data center networks. In: Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation (NSDI’10), San Jose, 2010Google Scholar
  19. 19.
    Raiciu C, Barre S, Pluntke C, et al. Improving datacenter performance and robustness with multipath tcp. In: Proceedings of the ACM SIGCOMM Conference on SIGCOMM (SIGCOMM’ 11), Toronto, 2011Google Scholar
  20. 20.
    Wilson C, Ballani H. Better never than late: Meeting deadlines in datacenter networks. In: Proceedings of the ACM SIGCOMM Conference on SIGCOMM (SIGCOMM’ 11), Toronto, 2011Google Scholar

Copyright information

© Science China Press and Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Feng Huang
    • 1
  • XiCheng Lu
    • 1
  • DongSheng Li
    • 1
  • YiMing Zhang
    • 1
  1. 1.National Lab for Parallel and Distributed Processing, School of ComputerNational University of Defense TechnologyChangshaChina

Personalised recommendations