1 Introduction

The need for secure and efficient hosting of digital information demonstrated in converged networks (data, voice, image and video) led to the rapid evolution of data center (DC) around the world. Emergence of Web 2.0 environment with its rich enabled applications paved the way for data to become every organization most valued asset and therefore hosted with the highest degree of confidentiality, integrity and availability. The prevailing models of electronic data interchange (EDI) which demanded corporations to depend absolutely on data made DCs the live wire of the global economy [1] representing the foundation and structure upon which cloud computing was established [2]. The adoption of the cloud computing paradigm has provided the much needed avenue for data centricity of services as seen in Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and Software-as-a-Service (SaaS) [3]. In our Internet-driven millennium, the sustainability of systems responsible for web communication deployment is vital and dependent on uninterrupted power supply, more so that the commodity price of energy is rising faster than expected.

A data center can be defined as a facility that hosts computing resources in a nexus of communication infrastructure for data storage and applications [1, 4]. The capital expenditure (CAPEX) at the initial setup of a DC is equally enormous but sometimes incomparable to the operational expenditure (OPEX) [5]. The latter is needed to maintain the quality of service (QoS) in the service level agreement (SLA) and for users to have a good quality of experience (QoE). Hence, achieving a balance between appropriate service delivery, e.g., provision of more bandwidth and low latency network between communicating nodes, and reduction in energy consumption goes a long way in cutting down on OPEX.

Green DCs are designed to ensure utmost energy efficiency and minimum environmental footprint [6, 7]. Therefore, recent policies of environmentalist and socialist on DC operators have sharpened the evolution of modern DC toward improving QoS and energy efficiency, coupled with breakthroughs that resulted from competition among operators to cut down on OPEX. This is visible in the proficiency employed by IT giants such as Google, Facebook, Microsoft, IBM and Amazon in becoming progenitors of cloud computing, with continuous improvement and developmental strategies to make their offerings attractive. Therefore, green DC research is seen as part of the continuous improvement needed in designing DCs that are less CAPEX when setting up the core components. This is achieved by deploying energy-efficient DCNs in a bid to further lower the 10% spent on energy as part of OPEX [8]. The approach examined introduce energy coefficient to the design of the DCN as a critical complementary consideration for qualitative performance and throughput of the network modules. Data center network topology could be switch-centric, server-centric or hybrid (dual centric) with its specific energy consumption characteristics [4]. However, studies showed that energy utilized to process workloads in switch-centric topology is more profitable as switches by default are equipped with intelligent routing algorithms and connected to servers through a single port [9], making such networks very responsive. A very responsive variant of switch-centric DCN architecture will be useful as a potential solution to the increasing demands of cloud computing DCs and help eradicated challenges faced by legacy DCN architecture.

In this article, we present an analysis of DC performance and efficiency based on the amount of actual energy that is turned into computing power. It further emphasizes the effect of bandwidth provisioning and throughput on energy proportionality of two most common switch-centric DCN topologies: three-tier (3T) and fat tree (FT). The design objective is such that will accommodate scalability, cost-effectiveness, resilience and end-to-end aggregation of bandwidth provisioned at reasonable energy spend. We have implemented a model of 3T and FT topologies based on modified network simulation (ns-2) GreenCloud [8]. We present our evaluation results and compare the performance and power-related metrics [10], bandwidth oversubscription ratio (BOR), communication network energy efficiency (CNEE) and network power usage effectiveness (NPUE) of the two DCN architectures. The energy consumption was matched with network traffic (or workload) to discover the energy awareness of the two DCN architectures.

The main contributions of the article are:

  • An implementation of FT DCN architecture using GreenCloud simulator.

  • Performance evaluation of 3T and FT based on power-related metrics, BOR, CNEE and NPUE. The focus of this study is on intra-DC network traffic which could generate computer-intensive workload (CIW), data-intensive workload (DIW) or balanced workload (BW) [11].

  • A comparison for 3T and FT architecture based on real-world scenario (power budget).

  • Introducing energy coefficient to the design and layout of DCN architecture of smaller businesses as a critical complementary consideration for qualitative performance and throughput of the network modules.

The remainder of the article is organized as follows. Section 2 provides information on the core business value of DCs with focus on topologies available for DCN implementation and the legacy techniques used in evaluating energy efficiency. Section 3 discusses the method implemented in improving energy efficiency with emphasis on the simulation of information technology equipment (ITE) to understand DCN energy consumption in line with the Greenness Initiatives. Parameters from ITE and workloads were simulated to obtain a suitable energy-aware operation scheduling and adaptations. Prior to this, the choice of data center simulator was justified, and the final part enumerates the data collection strategies and experimental methods used. In Sect. 4, the simulation results from our experiments based on modified GreenCloud simulator were discussed and evaluated in line with real-world scenarios. The analysis and performance evaluation of the components of the DCN were considered in terms of topology, server and network workloads. In Sect. 5, we offer our depth analysis on the simulation results. Finally, a conclusion is highlighted in Sect. 6.

2 Background

The design framework of green DC has focused on actualization of a scalable, efficient, extensible and cost-effective DCN architecture [12]. The legacy 3T tree-based and emerging new fabric FT which seemed to satisfy the aforementioned criteria of a green DC is exemplars of such architecture. A greater percentage of existing DCs implemented the traditional 3T topology at the core of their network. This has resulted in enormous energy consumption and budget increase along with the exponential growth of DCs. This is further illustrated in Table 1, where the past, present and future projections of operations, bandwidth demands and power utilization for high-performance systems are shown [13].

Table 1 Projection of power consumption for Internet services [13]

2.1 Data center network

A typical 3T DCN architecture is a hierarchy of three layers of switches (core, aggregation and access/edge) arranged in a tree-based topology with two of its upper layers connected with enterprise network devices (see Fig. 1). We use access and edge interchangeably in this article. The Layer 3 (L3) switches or routers at the core and aggregation layers are energy hungry in nature and therefore cannot be easily energy-managed. Due to its importance, core switches cannot be dynamically put into sleep state although it consumes a great deal of energy due to large switching capabilities, i.e., equal cost multi-path (ECMP) forwarding activities. As a result, core switches operate at the maximum transmission rates of around 85% of full load even when the DC is idle [14]. Core switches are high-capacity switches that located in the backbone network and provide access to a wide area network or the Internet. Server typically operates at 66% of full-load energy consumption when the DC is idle, making dynamic power management (DPM) and dynamic voltage frequency scaling (DVFS) approaches selective [11, 15,16,17].

Fig. 1
figure 1

Three-tier data center network topology

However, end-of-row (EOR) aggregation-level switches with idle module racks can be powered down. This layer is equally utilized as much as the core; hence, packet losses are more at the aggregation layer than any other layers [9]. Most DCs run around 30% of their computational capacity [18]; shutting down inactive aggregation servers with prior considerations for load fluctuations that could be managed by less idle servers had always been an energy-aware decision. It was observed in [19, 20] that traffic flow and packet utilization within the two upper layers are higher than the access layer, more so when the top-of-rack (TOR) switches that inhibit this lowest layer are inexpensive and low-power commodity types.

Considering the traffic characteristics in DCNs, network traffic associated with DCs could be either inter-DC or intra-DC in nature. The focus of this study is on intra-DC network traffic which could generate computer-intensive workload (CIW), data-intensive workload (DIW) or balanced workload (BW) [11]. Intra-DC network traffic is further categorized into long flows (elephant) in need of higher throughput and short (mice) control flows in demand of low latency. A further discovery was made by [19] during the analysis of existing tree-based topologies that suggest the following traffic flow procedure in organization:

  • A greater number of flows in DCs are small in size with duration fewer than hundreds of milliseconds.

  • 75% of cloud-based DCs have their traffic within a rack.

  • Universities and private corporations DCs have 40–90% traffic prevalent through the network, i.e., from rack through the network.

Oversubscription, the ratio between the aggregate incoming and aggregate outgoing bandwidth of end hosts is introduced to reduce CAPEX during design phase. Oversubscription is considered as a drawback of 3T implementation. The typical oversubscription of 3T topology is 2.5:1 or 8:1 [21] which resulted from allocation of 10 Gbps bandwidth communication link for inter-networking between 10 Gigabit Ethernet (GE) switches in the upper layer (see Fig. 1). In addition, the multi-rooted core switches in large DCs demand multiple-path routing procedure, creating oversubscription, limiting routes or path and lookup delay due to enormous routing table.

The introduction of a new fabric with a flat network topology resolved most of 3T architecture’s limitations. The FT DCN presented as folded Clos-based network fabric [5] in Fig. 2 integrates inexpensive Ethernet commodity switches to build a k-ary FT with links connecting each layer equally provisioned with the same bandwidth. Consequently, a bandwidth oversubscription ratio (BOR) of 1:1 is available from the core layer to the servers. FT could be implemented in a two-layered spine–leaf configuration as seen in Cisco’s Massively Scalable DC (MSDC) [22] and with an additional layer provided above the spine to function in a dual capacity as a load balancer and control plane. The latter is specifically designed for enhanced routing (ECMP) between two end nodes. The control plane is provisioned with a pair of L3 switches to reduce the large switch counts in this design fabric of FT when compared with a full fledged three-layered FT DCN, thus opposing the network universal theorem that “for a given number of switches, the most optimal network exists” [22]. Moreover, the topology of spine–leaf FT architecture is scalable enough to support the explosion of east/west data traffic in web communication and the drift toward software-defined data center.

Fig. 2
figure 2

Fat-tree data center network topology

The existence of L3 lookup at leaf nodes in MSDC enhances the selection of an ideal egress port at the leaf. Intelligent routing architecture reduces the potential congestion in network by minimizing packet collision when several packets move toward a single egress port.

To transfer from 3T switching to FT fabric, a fiber connection is established to replace the switch with strong attention given to the channels’ link loss likely to occur. Low-pass connection increases the amount of possible connections in the channels. Host-to-host (server-to-server) communication will be most efficient if “virtualization” is employed using virtual machine (VM) techniques without switch hops. Virtualization brings about more server-to-server data flow, storage to storage area network (SAN) traffic as in Storage-as-a-Service. Virtualization is considered an important technique in achieving a green DC, a concept that works with consolidation in reducing power consumption in DC with full adoption of its principle [23]. The concepts of flattening the networks of DC and emergence of visualization are essential.

2.2 Power-related metrics in data center network

In order to ensure optimum energy efficiency and minimum environmental footprint [6] as suggested by Green DC initiative [1, 24], it is necessary to apply power-related metrics to evaluate the energy efficiency characteristic in DCNs. There are two main existing metrics applicable to switch-centric DCNs:

  • Communication Network Energy Efficiency (CNEE): required energy to convey one bit of information.

  • Network Power Usage Effectiveness (NPUE): ratio of overall IT power to power utilized by the network modules.

Although BOR is not directly power related, its computation is necessary in estimating the minimum non-blocking bandwidth available for each server. When the servers produce network traffic above the provisioned bandwidth, the edge and aggregation switches become congested, encounter overflowed bufferings and begin to drop packets [10]. The continuous loop at this point results in increased energy consumption and decreased network performance of cloud applications significantly.

Furthermore, DENS [11] recommended an energy model for switches in green DC as:

$$\begin{aligned} P_\mathrm{switch}= P_\mathrm{chasis}+ n_\mathrm{linecard} * P_\mathrm{linecard}+\mathop \sum \limits _{i=0}^{R} n_\mathrm{ports.r} * P_\mathrm{r} \end{aligned}$$
(1)

where \(P_\mathrm{r}=\) power utilized by an active port transmitting at a rate. \(P_\mathrm{chasis}=\) power utilized by the switch base hardware. \(P_\mathrm{linecard}=\) power utilized by an operating linecard. \(P_\mathrm{r}\) operates at par with the transmitting rate of the switch, limiting the advantages of rate adaptive design because the overall utilization of switch transceivers results in 3–15% of the total energy used by the switch. On the other hand, \(P_\mathrm{chasis}\) and \(P_\mathrm{linecard}\) depend solely on the power status of the device and affected only when device is powered down for lack of network traffic [14].

The server energy consumption model is derived by [14, 25]:

$$\begin{aligned} P = P_\mathrm{fixed}+ P_{f} * f^{3} \end{aligned}$$
(2)

where \(P_\mathrm{fixed}\): power consumed by memory modules, disk and I/O resources, i.e., part of the utilized power that does not scale with f the frequency of operation. \(P_{f}\): power consumed by CPU, i.e., frequency dependent. f: frequency.

3 Methodology

In [2], DCN architecture was showcased as multiple-layered graph models of diverse DCNs while analyzing the characters of structurally robust DCN [18]. This is similar to the one considered in our model, where ITEs such as computational servers, network and storages devices denote the meeting point of the graph, while the interconnecting network links are the margins of the graph.

3.1 Network management scheme for simulation in GreenCloud

The scheme puts into consideration two switch-centric network architectures: 3T and FT DCN architectures. Specifically, 3T is a tree-based topology, hierarchical three-layered configuration, whereas FT is a Clos-based topology, hierarchical three-layered configuration, with the core or spine, aggregate and access/edge (TOR) layers constituting the layout. The layout also caters for redundancy to forestall points of failure in the connection. The two DCNs to be modeled are configured such that:

  • It caters for network and server workload consolidation in each of the tree- and Clos-based hierarchical topologies considered.

  • The same numbers of computing servers (S) are considered for task execution, computational workload and energy consumption comparison.

  • The core layer switches vary for both networks with downlink speed of 10 Gbps GE medium between core—aggregation—edge switches (C\(_{1}\)–C\(_{2}\)–C\(_{3}\)), and 1 Gbps between edge switches—computing servers (C\(_{3}\)–S) in 3T, and 1 Gbps GE through all layers in FT.

  • Aggregation and access/edge network layers are configured with Layer 3 (L3) and Layer 2 (L2) switches, respectively, in 3T architecture.

  • Commodity switches were deployed in upper layers of the FT architecture, and the topology sometime referred to as spine–leaf network [5] with two layers.

Table 2 illustrates the configuration of the models simulated and compared in terms of energy and cost efficiencies, scalability and fault tolerance, while Table 3 is an example of a real-world configuration of these models.

Table 2 Topology description of modeled 3T and FT DCN
Table 3 Description of Physical Topology for 3T and FT DC Architecture

3.2 Network simulation

The attributes listed in Table 4 will be considered for DC load, task scheduler and architecture. Similar task scheduling techniques defined in [11] are considered:

  • Green: A unified or consolidated scheduler, designed for resolution of computational workload, allowing idle servers and network modules to be powered down.

  • RoundRobin: Allocates computational and communicational jobs equally among servers and switches in a circular layout. Computational servers are not overloaded as this creates balanced network traffic. Hence, no powering down of ITE since idleness does not occur.

  • BestDENS: An architecture specific technique with best-fit server selection. Attains workload consolidation for energy efficiency while averting servers and switches overload. Hence, there are more active ITEs.

Table 4 Data center simulation attributes

The 3T simulation settings are shown in Table 5.

Table 5 3T simulation setup

The 3T DCN architecture is made up of four core switches interconnected to eight aggregate and sixteen TOR switches with seventy-two 10 Gbps links (C\(_{1}\)–C\(_{2}\), C\(_{2}\)–C\(_{2}\) and C\(_{2}\)–C\(_{3})\), and a total of 64 computing servers connected to the TOR switches with 1 Gbps link each (64 Gbps in total) uplink from host to edge switches. Figure 3 depicts the schematic of the modeled 3T DCN architecture.

Fig. 3
figure 3

Schematic of 3T DCN model

In the FT simulation, the FT link connectivity is designed so that the three switch layers: spine/core, aggregation and leaf (TOR), all have the same number of port, which is designated as an even number n [5].

  • TOR(s) connects with \(\frac{n}{2}\) ports to \(\frac{n}{2}\) servers.

  • The remaining \(\frac{n}{2}\) TOR port connects to \(\frac{n}{2}\) aggregation switches.

  • Aggregation switch connects with \(\frac{n}{2}\) ports to the TOR switches.

  • The remaining \(\frac{n}{2}\) port on the aggregation switch connects to spine switches.

  • FT comprises of \(\frac{n}{4}^{3}\) servers, \(\frac{n}{2}^{2}\) aggregation and edge(s) switches, \(\frac{n}{2}^{2}\) core (spine) switches.

Simply put, we have \(\frac{n}{4}^{2}\)spine switches for \(n^{2}\) pod switches and \(n^{2}\) servers (\(\frac{n}{4}^{2}\)per pod) as illustrated in Fig. 4.

Fig. 4
figure 4

Adapted from [2]

Illustration of FT (\(k=4\)) architecture with assigned IP.

The desirable benefit of FT network is the ability to create large inter-connections using small-scale switches. This is mainly because its connection capacity depends on the quantity of core layer switches. Increase in number of deployed core layer switches is directly proportional to the improvement in connection capacity and likewise the cost of the network [26].

When establishing connection routes, all concurrent connections from an access switch compete for the same cluster of the core switch, thereby increasing the burden on the core switches whenever congestion occurs at the access switch. This congestion is due to simultaneous real-time request from all server-edge network interface card (NIC) at full bandwidth capacity (e.g., 1 Gbps multiplied by numbers of servers in the rack). Congestion at the TOR and non-uniformity multicast signal are responsible for the expenses associated with non-blocking multicast FT DCNs.

FT topology achieves non-blocking unicast communication with a few numbers of cores, but non-blocking multicast, an imperative communication pattern utilized in most DCs still requires large numbers of core switches due to the non-uniformity of multicast traffic [26]. Instances of search queries redirection to index servers and file chunk replication in distributed servers are enhanced for high performance with non-blocking multicast communication. Thus, it is of upmost importance to decrease the cost involved in FT DCs. Otherwise, it will be a replica of the energy hungry high-end switches in the upper layers of traditional 3T architecture in terms of cost. Network module and server redundancy in high-availability (HA) DCs with six nines (99.9999%) can be used to lessen the cost of non-blocking multicast FT DCN, by adequately equipping it to handle various forms of multicast traffic by re-arrangement and re-assignment of non-blocking commodity switches to replace core and to provide high network bandwidth.

The commodity switches act as bouncing switches, implementing digit reversal bouncing (DRB), an algorithm for load balancing proposed in [27] with adequate routing condition to control traffic path within the DCN to end host, hence complementing the ECMP in splitting traffic among multiple paths easier. Packet routing interaction between server 0 and 15 in Fig. 5 is an example of the spine switch bouncing the packet along a uniquely determined route, emphasizing the custom addressing and routing scheme FT architecture deployment. In essence, ECMP is used by Clos-based network to break up traffic [28]. However, hash collisions also deny ECMP from taking advantage of the full bisectional bandwidth, thereby resulting in undesirable delays with moderate network traffic [29]. On the other hand, the non-blocking switches do not cause contention in the network, enhancing the FT DCN capability of achieving full bisectional bandwidth.

Fig. 5
figure 5

Adapted from [27]

FT with DRB.

Table 6 FT Simulation setup

The FT simulation setup is illustrated in Table 6. Therefore, the number of switches in the core/spine = the total number of commodity switches in every pod (aggregation + access) = the whole number of servers in each pod, all interconnected with 1 Gbps links as illustrated in Fig. 6.

Fig. 6
figure 6

Schematic of the modeled FT DCN architecture

4 Results

The 3T and FT DCN architecture models were simulated using modified GreenCloud simulator. Table 7 provides a summary of the results based on 3T and FT simulation setup discussed in the previous section

A total number of 64 computing servers (S) were considered for both DCN architectures, resulting DC total computing capacity of 2.256e8 MIPS for each of the eighteen simulated models. One cloud user was considered.

Fig. 7
figure 7

DC computational load comparison among three task schedulers in 3T and FT

Overprovisioning DCN architecture for peak load and fault tolerance made DCNs to be mostly underutilized at an average load of 5–25%. Such scenario is exploitable for energy efficiency [30]. The DC load of 50% (half of the DC load capacity) as depicted in Fig. 7 is considered as the best reference point to analyze the two DCN architectures as DCs are collocated to redistribute workload. Typically, idle servers and network modules consume up to 66 and 85%, respectively, of peak load [8, 31]. Furthermore, due to the inconsistency of DC workloads, overprovisioning of servers and network modules are rather emphasized to cope with such fluctuations or maximum load variations.

The 50% DC load is chosen as a more realistic workload representation of real operational DC, and it comprises of actual regular workload and workload associated with ITE overprovisioning needed to cope with expected upsurge in workload. Similarly in [8] earlier study has shown that the average load is responsible for about 30% of DC resources[32] while the remaining 70% is mostly put to sleep.

One-third of the load, i.e., idle (30%) DC load, was equally simulated (see Table 7). It creates waste of energy and inappropriate OPEX expense. For example, the I/O buses, memory and disk running account for the total tasks of 14,798. For instance the I/O buses, memory and disk running account for the total task of 14,798 at an average rate of 231.2 tasks per server and consuming 843.2 W*h of energy, which according to earlier study can be seen as idle servers task that consumed energy to the tune of two-third of peak lead [15].

Both DVFS and dynamic network shutdown (DNS) power management were implemented in the servers, and only DVFS was implemented in the switches as typical energy-aware scheduling solutions to:

  • Consolidate workloads onto the least amount of machines; about 80% reduction of IT load is possible with virtualization and consolidation [33].

  • Maximize the numbers of resources enabled for sleep state [34].

Table 7 Summary of DCN simulation results for 3T and FT

The performance of schedulers with regard to network load optimization, task execution time and effect of energy consumed for task performed suggested that the Green scheduler is the best responsive method. The choice of Green scheduler ensures incoming workloads are consolidated into minimum numbers of servers based on the jobs’ processing requirements.

Table 8 DCN link utilization in 3T and FT

For 3T, we considered G-50%-3T Green task scheduler which has lowest power consumption compared to other schedulers, and higher task performed: 874W*h produced a total of 29,596 tasks at an average of 462.4 tasks per server. For FT, we considered G-50%-FT which has second lowest power consumption [35] but higher task performed. The task scheduler which has lowest power consumption is 1618.1W*h produced by BestDENS task scheduler (D-50%-FT) compared to Green’s (G-50%-FT) 1620.3 W*h, but the total task performed and average task per server by D-50%-FT (i.e., 26143 and 408.5) is lower than of G-50%-FT (i.e., 29596 and 462.4). This emphasizes the fact that D-50%-FT is a best-fit scheduling algorithm for data-intensive workloads (DIW) [11]. It is assumed that GE links are Green Ethernet (LPI enabled). The link utilization is illustrated in Table 8.

For comparison, the same number of computing nodes of 64 servers was used for both topologies while the network links to switches varied. In 3T DCN, the architecture provides bandwidth of 10 Gbps link in the core, aggregation and access layers network compared to the FT where three layers are interconnected with 1 Gbps links. Thus, the bandwidth in C\(_{1 }\)–C\(_{2}\) and C\(_{2}\)–C\(_{3 }\)links in 3T is ten times higher than the corresponding links in FT. The dissimilarity between downlink and uplink bandwidth capacity in every switch layer (BOR) of 3T is such that:

  • The edge switch has two 10 Gbps links to the aggregation network and with 48 ports at 1 Gbps link downlink to support 48 servers:

    $$\begin{aligned} \frac{48\,\mathrm{Gbps}}{20\,\mathrm{Gbps}} \hbox {provides a BOR of} = 2.4:1 \end{aligned}$$

and a corresponding per server bandwidth of:

$$\begin{aligned} \frac{1\,\mathrm{Gbps}}{2.4} = 416\,\mathrm{Mbps at maximum load} \end{aligned}$$

The BOR for FTs C\(_{1 }\)–C\(_{2}\), C\(_{2 }\)–C\(_{3 }\) and C\(_{3}\)–S is 1:1 due to the 1 Gbps links at all levels in the network. The latency experienced at all links for both topologies is 0.0033ms.

Support for ECMP routing [36] was assumed and made available in 3T with the usage of high-end switches at the core and aggregation layer [37] and the availability of 10 Gbps link between them which caters for the BOR. The extension of 10 Gbps link to the access network further provides for throughput and enhances the ECMP routing closer to the host to reduce possibility of congestion.

Similarly, it is assumed that ECMP, DRB’s per packet round robin enabled routing was implemented as the adequate conditions of load balancing in FT, using BOR of 1 to an advantage and without congestion. The core switches act as bouncing switches to perform the routing [27], and commodity switches in aggregation and access layers were utilized. The same routing scheme is assigned based on the number of nodes with each pod of FT with \(k = 4\).

Table 9 represents the analysis of the server—network module layout of both architectures, while Figs. 7 and 8 illustrate their energy usage.

Table 9 ITE module layout in 3T and FT

4.1 Power utilization in information technology equipment

Fig. 8
figure 8

Energy ratios for ITE module in 3T

Equations (1) and (2) are employed in the simulator to compute the power utilization of servers and switches. It should be noted that the power factor is the same for the server, i.e., \(P_\mathrm{fixed}\) as the same server specification is used. The power factor of the chassis, line card and port’s transfer rate for the core and aggregate switches will be the same for 3T but different for FT as commodity switches are utilized in FT. The power factor in the access switches is the same for the two architectures.

Energy consumption ratio allocated to ITE is approximately 40% of the whole DC infrastructure [8]. The distribution varied based on the composition simulation component. As depicted in Figs. 8 and 9 for 3T and FT, respectively, the ratio of energy consumed by network modules (all switches) to servers is approximately 4:1 in 3T and 9:1 in FT. The higher energy rate in FT is a result of the k-ary pods arrangement which resulted in higher numbers of commodity switches utilized to accommodate 64 servers.

Fig. 9
figure 9

Energy ratios for ITE module in FT

The energy consumption of servers, L2 and L3 switches considered in G-50%-3T is displayed in Fig. 10. It can be observed that 93.8% of the total energy was consumed by 40 out of the 64 computing servers (62.5% of the servers) as shown in Fig. 10b. The remaining 24 servers (37.5%) consumed less than 50% of computing energy, i.e., 179.3 W. Network energy management policies of DVFS and DPM applied were responsible for varying energy use in the racks due to availability of workloads across the network [11, 38]. The core and aggregation switches operated at approximately 95% of full energy in 3T [10] (see Fig. 10c, d). These layers are needed for ECMP routing, and DNS technique is not encouraged as it may degrade network performance. The layers are also overprovisioned for this purpose. Network module overprovisioning accounted for the larger portion of consumed power by the upper layer as shown in Fig. 8.

Fig. 10
figure 10

Energy consumption of ITE in 3T G-50%

The energy consumption of servers, L2 and L3 switches in G-50%-FT is displayed in Fig. 11. The distribution of energy usage among the 64 servers for the FT is similar to that of 3T as shown in Fig. 11b. However, the commodity switches that replaced the energy hungry enterprise switches in 3T at the upper layers are larger in quantity and are actively involved in end-to-end aggregation of bandwidth to host servers [9, 10, 26, 27, 39], resulting an increased energy consumption of the network module in FT (see Fig. 11c, d).

Both 3T and FT have the same energy utilization at the access level with 95% energy consumption and 1 Gbps bandwidth provision for each link to the computing servers.

Fig. 11
figure 11

Energy consumption of ITE in FT G-50%

4.2 Uplink and downlink in information technology equipment

To obtain corresponding power factor, changes were made in the setup parameter for switches to accommodate the low-power forms of the large numbers of commodity switches and port density.

Uplink comparison:

The uplink network traffic summary for 3T (Fig. 12) illustrates the effect of bandwidth oversubscription in the upper layers of the topology [10] with 60% of the core-access link actively utilized (see Fig. 12a), and the core usage substantially higher anticipated by smaller number of link multiplexing traffic from the layers below [9, 19].

For IT load of 50%, the hosts to racks (NIC-TOR) connectivity experienced a decreasing link load from 90 to 10% out of 1 Gbps bandwidth apportioned with only 61% inter-connections active out of the 64\(\times \) 1 Gbps links to the servers, i.e., BOR of \(\frac{64\times 1}{32\times 10} = 0.2:1\), that is, 200 Mbps per server bandwidth (see Fig. 12a). Likewise for the EOR network, it experienced approximately 60% link load from 62.5% of the 16\(\times \) 1 Gbps links supported by 10 Gbps aggregation layer links, i.e., BOR for TORs-EOR of \(\frac{4\times 10}{4\times 10} = 1:1\) (see Fig. 12b).

The core layer with four core switches with a total of 40 Gbps links to the TOR, i.e., 4\(\times \) 10 Gbps links to C\(_{2}\)/EOR experienced 93.3% link load with about 75% of the 4\(\times \) links utilized (see Fig. 12c). BOR is \(\frac{64\times 1}{4\times 10} = 0.4:1\). The BOR for the link favors the upper layer oversubscription, with traffic queue experienced at the 200 Mbps host to rack link (see Fig. 12d–f). DVFS and DPM factors are responsible for the nonlinear variation in the provisioned bandwidth resources across each layer [11, 14].

Fig. 12
figure 12

3T DC network uplink

Fig. 13
figure 13

FT DC network uplink

The uplink network traffic summary shown in Fig. 13 illustrates the effect of BOR of 1:1 across all layers in the FT architecture [5, 29]. For IT load of 50%, the NIC-TOR connectivity experienced a decreasing link load from 90 to 10% out of 1 Gbps bandwidth apportioned with only 61% inter-connections active out of the 64\(\times \) 1 Gbps links to the servers, i.e., BOR of \(\frac{4}{4} = 1:1\). That is 2 Gbps per server bandwidth available but usage limited to 1 Gbps capacity of NIC (see Fig. 13a). Therefore, link capability is 10 times that of 3T, and this provides full bisection bandwidth between the hosts to rack [5], where servers within the network are able to communicate with one another arbitrarily at full bandwidth of their NIC.

Fig. 14
figure 14

3T DC network downlink

Likewise for the EOR network, it experienced approximately 87.5% link load from 62.5% of the 16\(\times \) 1 Gbps links at aggregation layer, i.e., BOR for TORs-EOR of \(\frac{4}{2} = 2:1\) in line with the k-ary pod tree [5] (see Fig. 13b). With higher packet losses expected at the aggregation layer [9], a link BOR of 2:1 will be appropriate for the network since the layout in k / 2 for both access and aggregation layers in Clos topology is the same. There are two access layer and two aggregation layer switches in each pod which eventually guarantees 4\(\times \) 1 Gbps link within this layer in a pod.

The core layer is made up of 8 core switches, with 4\(\times \) 1 Gbps link connected to one out of two aggregate switches in a pod. The rack to the core links ratio is such that \(\left( {\frac{k}{2}} \right) ^{2}\), i.e., 4\(\times \) 1 Gbps links available per pod to 4 computing servers as depicted in Fig. 6. Therefore, the racks to core link experienced 87.5% (14 out of 16) link load utilized by 62.5% of the links, i.e., 5 out of 8 links (see Fig. 13c). BOR is \(\frac{8\times 1}{8\times 1} = 1:1\). The diffusion optimization of the traffic flow available with the links state is that it prohibits local congestion by assigning traffic to ports on per flow and not per host basis [5].

The flow scheduling removes global (DCNs) congestion and prevents elephant flows in need of high throughput [9] from sharing a link by assigning such flows to different links. A negligible traffic queue which lasted for less than 0.0033ms was experienced during the simulation (see Fig. 13d–f).

Downlink Comparison:

The downlink network traffic in 3T as depicted in Fig. 14 is such that a quarter of 40 Gbps total bandwidth was utilized through three out of four links to the aggregate layer (see Fig. 14a). The aggregation layer has abundant bandwidth with 40 Gbps link from 4\(\times \) upper level switches and same downlink bandwidth link provisioning to the TOR, i.e., BOR of \(\frac{40}{40} = 1:1\), making 62.5% of the TOR switches (16) to utilize only 10% of the total link load at the aggregation layer (see Fig. 14b). The rack to host downlink is such that only 25% of the link load is utilized by 59% of the computing servers (see Fig. 14c). In case of increasing load, the 0.2:1 link BOR is insufficient as it offers only approximately 200 Mbps per server bandwidth which is lower compared to the BOR in the upper layer. In the case of CIWs where the computing server produces traffic at non-blocking bandwidth of the NIC (1 Gbps) which is more than the available bandwidth, congestion is likely to occur at the TOR and aggregation switches [10]. For BWs considered in this project, the link utilization is of equal importance as DIWs emphasize on throughput of network paths. The competition for core layer bandwidth by the TOR switches and servers associated with it is based on requested broadcast at the full bandwidth capacity of NIC [26], thereby making energy spent to support higher bit-rates enormous. However, the higher bit-rates cannot be utilized by the hosts or computing server [10]. This bottleneck of end-to-end aggregate bandwidth a.k.a. cross-section bandwidth degrades network performance in 3T [4]. Moreover, TCP incast congestion could develop at the access switch in intra-DCN for many-to-one traffic mode when multiple requests are sent toward a single server within the rack and throughput is low [40].

Fig. 15
figure 15

FT DC network downlink

In FT DCN downlink illustrated in Fig. 15, approximately 50% of the link load was utilized with 62.5% (\(\frac{5}{8})\) of the 8\(\times \) 1 Gbps link per pod between C\(_{1}\)–C\(_{2}\) (see Fig. 15a). Similarly, the same occurred at the C\(_{2}\)–C\(_{3}\) links, i.e., (\(\frac{10}{16}) = 62.5\%\) as shown in Fig. 15b. However, racks to hosts recorded a downlink link load utilization of about 20% by 59.3% (\(\frac{39}{64})\) of the servers (see Fig. 15c). This indicates that the throughput between any two hosts equals 1 Gbps with the application of ECMP routing in FT, i.e., identical bandwidth path at any available bisection [41, 42].

4.3 Power-related metric comparison in information technology equipment

This part focuses on the application of performance and energy-efficient metrics targeted toward communication systems in DCs. The 64 computing servers scheduled with balanced workloads have different per server bandwidths: 416 Mbps for 3T and 1 Gbps for FT. Therefore, the CNEE in Joules/bit (J/bit) and NPUE for the two DCN topologies are calculated as in [10] and derivable from Figs. 8 and 9.

$$\begin{aligned} \hbox {CNEE}= & {} \frac{\hbox {Power consumed by network equipment} \left( {\hbox {all hardware involved in information delivery}} \right) }{\hbox {Effective network throughput capacity} \left( {\hbox {maximum end to end}} \right) }\nonumber \\ \end{aligned}$$
(3)
$$\begin{aligned} \hbox {NPUE}= & {} \frac{\hbox {Total power consumed by ITE}}{\hbox {Power consumed by network equipment}} \end{aligned}$$
(4)

Assuming the GE link is energy efficient: Green Ethernet [43] and are power over Ethernet (POEs), and using the values at 50% DC load, the power-related metrics are calculated as:

$$\begin{aligned} \mathrm{CNEE: 3T}= & {} \frac{874.3}{416}, \mathrm{FT} = \frac{1620.6}{1000}\\ \mathrm{NPUE: 3T}= & {} \frac{874.3}{695.0}, \mathrm{FT} =\frac{1620.6}{1441.3} \end{aligned}$$

5 Discussion

Having analyzed briefly the results of the simulation, the discussion will focus on the application of energy management policies setup for DCN in terms of management role, planning role and beliefs, and some of the performance and energy-efficient metrics targeted toward communication systems in DCs. Considering the theory and practice of network management policies in [44], which also encompasses the DVFS, DNS and DPM methodologies, the findings of this study suggest the following:

(1) The 3T architecture is notorious with expensive, high-end energy hungry switches at both core and aggregation layers due to the physical layout in a multiple rooted tree-based structure. To improve on the per server bandwidth of existing 3T, the aggregation to access layer link was oversubscribed with 10 Gbps link. However, the BOR limitation from upper layer still significantly affects the server uplink capability with the NIC maximum bandwidth of 1 Gbps and that of the TOR switch. This is responsible for higher CNEE. Aforementioned limitation still persists with two NICs per server.

CIWs and BWs jobs will result random congestion at the C\(_{3 }\)–S layer at likely peak DC load due to bandwidth oversubscription at upper layer. Scalability is rather difficult as the core—aggregate layer is rigidly structured, unlikely to route task to servers outside the TOR network, hence also making it fault intolerant.

At idle DC load, the energy hungry core switches cannot be energy-managed as they are responsible for ECMP routing; thus, operators of such DC will indulge this spending as part of OPEX. Only aggregation switches with idle racks can be powered down and set with minimum wakeup time in case of load upsurge. This could still account for performance degradation in the network and decreased CNEE utilization in the topology. Consequently, unused servers in a 50% loaded DC are harder to localize as the topology is not compartmentalized, and hence, consolidation of active server into fewer rack becomes more difficult. Idle CPU still runs at 66% of peak load; thus, DVFS is not applicable. Outright shutdown with DPM is preferred but awaking server in idle rack drains a considerable amount of energy.

Lastly with reference to link utilization in Table 8, the unused links are automatically powered down with the switches and server except in cases where the port speed is step down in the aggregate layer, e.g., from 10 to 1 Gbps to save energy instead of DNS to cater for traffic fluctuation and preserve minimum level of QoS.

However, the downlink in 3T as shown in Fig. 14 confirms the problem of cross-section bandwidth attributed to end-to-end aggregation bandwidth bottleneck, alongside with those of scalability, cost and non-agility discussed earlier. The overall effect of cooling is enormous as power hungry switches have high heat dissipation, posing more power requirement challenges to heating, ventilation and air-conditioning (HVAC) system. Table 11 illustrates the energy and cost implication of the simulated models for 3T.

(2) The FT is equally switch-centric in nature with the ability to handle oversubscription and full bisectional bandwidth. As given in Table 8, symmetric end-to-end bandwidth utilization between any two nodes in the network has a BOR of 1.1 equal to 1 Gbps, making it suitable for BW jobs. The choice of Green scheduler ensures incoming workloads are consolidated into minimum numbers of server based on the jobs’ processing requirements. ECMP in spine and leaf network segment is similar to the assumption of adequate condition (customized addressing scheme) that has been met to enable bouncing switching in DRB routing algorithm convey a sole bit of information with a lower possible energy level, i.e., CNEE of 1.62 J/bit when compared to 3T’s 2.10 J/bit.

Table 10 Power-related metric evaluation

The FT architecture is switch laden, and the larger number of switches, i.e., 40 inexpensive commodity switches when compared to 28 enterprise switches used in 3T as given in Table 9 accounted for 1441.3 W*h of energy, though as commodity switches they still consume less energy.

A considerable amount of the large number of commodity switches and the resulting port density is put to sleep using DPM scheme as DNS will degrade the network performance. Furthermore, the power factor in the commodity switches is more than 50%, less than that of 3T’s core/aggregation layer formation. Table 11 illustrates the energy and economic benefit of FT architecture using real-world DCN interconnectivity.

However, the spine–leaf network organizations regarded as folded Clos which support high port count at the spine could reduce layer of the topology into two substantive layers. The uppermost layer above the spine implements L3-based routing protocol that acts as a control plane or load balancer for traffic, minimizing latency and providing congestion avoidance. The L3 routing table efficiently route packet from spine to source with egress port selection performed at the leaf, i.e., L3 lookup existing at the node. This scenario is given in Table 3. Utilization of multiple 10 Gbps links for spine–leaf connection instead of a singular 40 Gbps fiber link reduces power consumption by more than 10 times.

Table 11 Power Budget for 3T and FT DCN Architectures

(3) The k-ary pods help consolidate traffic on fewer racks and add agility and scalability as commodity switches can be added to any layer to extend the computational capacity of the fabric, which results in more cost-effective and less energy consumption as shown in NPUE comparison in Table 10. Overall effect of using commodity switches is reduced cost in terms of CAPEX, lower energy for network modules and lower heat dissipation, reducing the OPEX on cooling also.

Cabling complexity and increased cable cost can be observed in FT as given in Table 9 with 160 links when compared to 136 interconnectivities in 3T. The Green Ethernet assumed (IEEE 802.3az standard) for the links is expected to surmount issues regarding link energy. With challenges of port count on Green Ethernet switches, turning off idle devices provides instantaneous savings. It is estimated that 80% power saving is made possible with consolidation and virtualisation; longevity is further ensured for network device in the absence of incessant heat dissipation.

Most ITEs operate at 2/3 of designed power rating, e.g., the HP server operates with a dynamic power capping tools available at the integrated light-out (iLO) user interface or set through HP insight control of the power management module. At different load variations, the network management roles, planning rules and beliefs apply. It is estimated that OPEX on energy is proportional to DC load and likewise in cooling. DVFS and DPM power management techniques were used to optimize energy efficiency while maintaining QoS/SLA. CAPEX is constant for a while, sustained by ITEs efficiency and dependent on the mean time between failures (MTBF) of ITEs. From Table 11, we observed that FT uses 23.2 watts less energy to support the same numbers of computational servers, though the initial total cost of ownership (TCO) is higher. For every 3.412 BTU/h generated 1 watt of energy is expended, and the heat dissipated is relatively proportional to workload. Reduced thermal overrun [45] through consolidation, virtualization and powering down inactive ITEs consequently bring to barest minimum the energy consumed by the computer room air-conditioning (CRAC) unit in cooling the server room. In real operation of DCs scaling to tens of thousands of servers, cooling load reduction will result in significant OPEX savings.

Reduced utilization, oversubscription, manual configuration and scale-out against scale-up, e.g., per port charge, cabling complexity and expandable cooling, are challenges faced when trying to attain DCN design goals of scalable interconnection bandwidth, throughput and load balancing at low power and cooling cost.

5.1 Related work

The energy consumption results obtained in the experiment comparing of 3T and FT DCN architectures are similar to [5, 9, 54]. In [54], it was concluded that network module energy consumption (approximately 4–12%) should not be ignored although the majority is consumed by servers. Energy-saving policies also influenced the outcomes and FT showed higher percentage of energy utilization. In [9], using ns-2 simulator it was demonstrated that data center TCP (DCTCP) congestion control with TCP incast/outcast in FT is better for elephant flows in need of high throughput and mice flows that needed low delays. The focus was on how the DCTCP deploys explicit congestion alertness to augment TCP congestion regulatory algorithm. This allows for optimal performance in FT, leveraging on BOR of 1 across all layers and prevents incast and outcast bottleneck, making FT a DCN of choice for large networks. The analysis by Al-Fares et al. [5] pioneered the study of FT DCN architecture as a cost-effective, cheap commodity-based topology, scalable bisection bandwidth and in terms of lower power consumption and thermal output as shown in Fig. 16 where 3T is regarded as the hierarchical design. The analysis in Fig. 17 was obtained from the power budget for 3T and FT DCN architectures presented in Table 11, and the result is similar to that of [5] in Fig. 16.

Fig. 16
figure 16

Adapted from [5]

Comparison of total power consumption and heat dissipation.

Fig. 17
figure 17

Comparison of total power consumption, thermal rating and CAPEX

It is worth mentioning that FT is DCN architecture that both a symmetric, i.e., it has organized packaging and simple physical wiring of the topology, and recursive defined in nature, i.e., numbers of levels or layers, are not fixed but increase with topology size [55]. These two factors are attributes of scalability possessed by the FT. Furthermore, the scalability and deterministic nature of FT made variants of the DCN architecture implemented by two IT giants: Google FT [5] in 2008 and Facebook FT [56] in 2014 possible. Basically, the application of these variants of FT topology was partly responsible for PUE of between 1.15 and 1.17 in 2010 cut to 1.06 in 2014 by Google [57,58,59] and PUE of 1.08 in 2014 recorded by Facebook [60,61,62], respectively.

6 Conclusion

In this article, we compared the energy-related performance of two most popular switch-centric DCN architectures: three-tier and fat tree. We also compared CAPEX, thermal and power consumption cost using real-world scenarios. The FT is equally switch-centric in nature with the ability to handle oversubscription and full bisectional bandwidth. The k-ary pods help consolidate traffic on fewer racks and add agility and scalability as commodity switches can be added at any layer to extend the computational capacity of the fabric which results in more cost-effective and less energy consumption DC architecture. Overall effect of using commodity switches has reduced cost in terms of CAPEX, lower energy for network modules and lower heat dissipation, decreasing the OPEX on cooling also.