Introduction

In recent years, concerns regarding climate change and environmental sustainability have increased. Energy savings and environmental protection are crucial issues that the world faces today (Zhang et al. 2015; Song and Wang 2017; Lucivero 2020). Big data is viewed as an efficient resource that helps reduce material consumption; however, managing and storing such data require considerable energy consumption (McKinsey 2016; Karnama et al. 2019; Rahmani et al. 2020). The big data revolution has created data center sustainability issues, whose solutions require the simultaneous consideration of economic and environmental factors (Corbett 2018; Herman et al. 2018; Singh and El-Kassar 2019; Kheybari et al. 2020). According to Corbett (2018), the power consumption of IT equipment for storage was 27.8 kWh per TB of data, and the total power consumption of data centers was 46.33 kWh per TB of data per year, which corresponds to approximately 35 kg of CO2 emissions per TB of data per year. These estimates are disconcerting due to the growing workloads of data center from business needs. The processing of large quantities of data requires massive and efficient computing resources, which raises concerns regarding the harmful environmental effects of big data (Ramli et al. 2017). With the development of the big data industry, high cost, high energy consumption, and high carbon emissions have become increasingly common (Rong et al. 2016; Möbius et al. 2014; Shuja et al. 2016; Lucivero 2020). According to the Climate Group, 76 million metric tons of CO2 (MtCO2) were emitted by worldwide data centers in 2002. This figure could reach 259 MtCO2 by 2020 even after implementing advanced technologies in virtualization, data center cooling, and power supply (Webb 2008). The data center industry was estimated to account for 1.3% of the world’s power consumption and 2% of the USA’s power consumption (Nadjahi et al. 2018).

The assessment and management of energy consumption in big data centers are crucial challenges for data center operations. Currently, there is no conceptual framework available to evaluate the sustainability level of big data centers, and most evaluation criteria only include crisp-value ratings, but not qualitative linguistic ratings. To the best of our knowledge, numerous data centers currently use power usage effectiveness (PUE) as the only indicator to assess their energy consumption. The PUE value is closely related to the efficiency of the electricity grid and an accurate calculation of the IT load, and a data center with higher PUE values and higher server utilization is more efficient (Brady et al. 2013). However, IT equipment energy consumption is but a part of the data center’s total energy consumption (Whitehead et al. 2014; Ahmad and SMK 2015). To comprehensively evaluate the degree of sustainability of a data center, many factors must be considered in addition to PUE. Data center sustainability depends on not only the room layout, energy sources, and energy consumption of the IT equipment but also the features of the big data processed (e.g., its volume, variety, and velocity). Few studies have attempted to develop a holistic evaluation method for a sustainable data center. Therefore, a flexible and comprehensive evaluation method must be developed, as a new frontier of research, to solve energy problems in data centers and reduce carbon emissions. Therefore, we investigated and reviewed the literature on data center sustainability to develop a four-factor evaluation index system (comprising the factors of big data, equipment level, room level, and data center level) for assessing data center sustainability.

Given the nature of multiple-criteria decision-making, sustainability assessments of large data centers include both qualitative and quantitative criteria. Specifically, experts present linguistic ratings for the qualitative criteria and crisp-value ratings for the quantitative criteria. In many cases, crisp values are not available for the qualitative criteria or the data is insufficiently precise to be used directly in the evaluation process because the judgments and preferences of experts are ambiguous or uncertain, making it impossible to establish an assessment with accurate values (Mehrjerdi 2012). In response, fuzzy theory has been used to address the subjective and imprecise assessments of experts and to capture the ambiguity and vagueness arising from incomplete information in the decision-making process (Zadeh 1965). In this study, by combining qualitative and quantitative measures, a fuzzy technique for order preference by similarity to an ideal solution (TOPSIS) was used to develop a comprehensive performance rating on a data center’s sustainability.

The remainder of this paper is structured as follows. Section 2 provides a review of some recent studies on the sustainability and energy consumption evaluation of big data centers. Section 3 introduces the proposed methodology, including the analytic hierarchical process (AHP), the analytic network process (ANP), and its fuzzy extension. In Section 4, we propose a fuzzy multiple-criteria approach for evaluating data center sustainability. In Section 5, a numerical example is used to examine the proposed framework. The conclusions are provided in Section 6.

Literature review

Theoretical background

The increased energy demand from data centers has encouraged operators to take measures to decrease energy consumption and enhance energy efficiency. Accordingly, numerous researchers have attempted to formulate approaches aimed at decreasing energy consumption and enhancing energy efficiency. However, research has indicated that sustainable data centers can be achieved only upon the adoption of a system-wide procedure that involves the holistic management of different data center components. Guitart (2017) proposed a comprehensive management strategy for sustainable data centers that involves introducing energy as a driving force for operation programs. Recent research on the design and architecture of green data centers has mainly focused on the improvement of data center building design, computer room design, floor layout, the cooling and refrigeration system, the electrical system, and the IT equipment. This paper summarized the results of the aforementioned research, noting the four crucial factors of data center level (site selection and energy management), room level (computer room design and heat dissipation), equipment level (IT equipment and electrical systems), and data level (big data resource characteristics and service).

In its site selection and energy management, the data center should be situated where power transmission loss can be reduced, power supply is convenient (Sheme et al. 2018; Daim et al. 2013), and natural cooling can be achieved at low location temperatures. As a new frontier of research, the replacement of traditional energy with renewable energy in data centers has been receiving increasing attention. Data centers using renewable energy have low greenhouse gas emissions due to the decreased use of fossil fuels (Shuja et al. 2016). Oró et al. (2015) proposed numerical models by analyzing renewable energy use in data centers. Furthermore, because most of the electrical supply of data center servers is dissipated as thermal energy, the management of the energy conversion process has become a new and crucial aspect for achieving data center sustainability (Jones and Fleischer 2014). Ideally, waste heat can be used to offset heating costs (Haywood et al. 2012), and heating facilities in cold regions can partly meet energy demands through the intermittent generation of renewable energy (Woodruff et al. 2014; Lin 2018; Kwon 2020). The main challenge involved in consolidating data center resources for achieving energy efficiency is that maximizing the quality of services conflicts with minimizing the energy consumption of the data center.

With regard to computer room design and heat dissipation, the indoor temperature has a considerable effect on big data center energy consumption. When run at full load, the refrigeration system increases energy consumption while maintaining the temperature in the computer room (Rong et al. 2016). Through suitable cooling methods, the energy efficiency and thermal management of big data centers can be enhanced to various degrees; thus, most data centers have begun to introduce cooling technologies to enhance cooling efficiency (Li et al. 2016; Ni and Bai 2017; Arora and Bala 2020). These technologies are classified into air-cooled, liquid-cooled, and two-phase technologies (Jones and Fleischer 2014). A higher airflow from the layout is crucial to making room temperature more uniform, in addition to enhancing sustainable development and the efficiency of task implementation (Zhang et al. 2014). Technologies for controlling the environment of computer rooms and to ensure that big data centers operate stably and securely are being developed. The aforementioned research developments can contribute to the effective operation and energy savings of data centers.

IT equipment and power distribution systems consume approximately 40% and 15% of the total energy of a data center, respectively (Boru et al. 2015). Industry specifications and the scholarly literature have proposed several metrics for estimating the energy consumption and performance of computing systems. The most important metric focuses on the efficiency of IT facilities, communication systems, and refrigeration and power distribution systems (Fiandrino et al. 2017; Uddin and Rahman 2012; Arora and Bala 2020). PUE is currently the most popular evaluation indicator in many data centers (Capozzoli and Primiceri 2015), measuring the percentage of overall power demand from IT equipment. An increasing awareness of the energy crisis and the importance of IT productivity have shifted managers’ view toward a sustainable energy strategy. However, energy-saving IT equipment has had limited use in practice, and many companies lack awareness on whether they should establish large-scale sustainable data centers. Considerably reducing the energy consumption of servers is an important requirement for sustainability (Möbius et al. 2014) because high energy consumption does not entail high performance (Dargie et al. 2011). Data center networks have been evolving to meet the high demand for services and applications (Wang et al. 2013), and the construction of a data center network directly influences the scalability, agility, and power consumption of a data center. Hammadi and Mhamdi (2014) classified most data center network structures into switch-centric and server-centric topologies. However, no unified and standardized set of ordinances have been formulated for data center design and improvement. Virtual machine migration has also been used to solve load balancing in data centers to eliminate overloaded equipment (Yu et al. 2018). Furthermore, in most recent studies, data centers have widely adopted the multi-objective coevolutionary algorithm and a simulation method for achieving an enhanced green-oriented scheduling strategy (Ham et al. 2015; Lei et al. 2016; Rahmani et al. 2020). Equipment, optimization, integrated design, and other issues related to the data center architecture should be viewed as necessary components of a sustainability evaluation of data centers.

With respect to big data characteristics and information service, massive amounts of data can be regarded as the “raw material” of data centers. Massive data volumes are created daily at record-high rates from heterogeneous provenances (e.g., from healthcare, government, social networks, and commerce) (Oussous et al. 2018; Addo-Tenkorang and Helo 2016; Lucivero 2020), and data centers are the backbone for communication, cloud services, data computing, and other computational services (Weihl et al. 2011). Such data allow for data-driven decision-making and the obtaining of actionable insights. In general, big data refers to a dataset that cannot be captured, stored, conducted, and processed efficiently by normal computers within a reasonable duration (Hashem et al. 2015). Laney (2001) noted volume, velocity, and variety to be the three primary characteristics of big data. In addition to these traits, veracity and value were also proposed as important characteristics of a data center (Kacfah Emani et al. 2015). These features of big data affect the carbon emissions of a data center. At present, researchers and practitioners have committed to extracting actionable intelligence and discernment from big data in applications such as E-health (Dimitrov 2016), Internet of Things (Zhong et al. 2016), supply chain management and logistics (Sanders and Ganeshan 2018), and retail (Fisher and Raman 2018). A data center provides a platform for data analytics (e.g., descriptive, predictive, exploratory, and prescriptive analytics), allowing for advanced analytic techniques.

Currently, physical-level improvement is the main subject of research in big data (Todorovic and Kim 2014). Many of the practices and measures adopted by firms to successfully improve the energy consumption of data centers involve improvements to IT equipment and the cooling system (Priyadumkol and Kittichaikarn 2014; Kim et al. 2017; Kunkel et al. 2019). However, few studies have addressed the sustainable development of big data itself. Big data may not convey valuable information, and additional data do not entail better decisions (Corbett 2018). Approximately 90% of the data generated on the market is never used, and 60% of these data become useless within milliseconds (Johnson 2015). In this study, we incorporated the features of data in evaluating the sustainability of a big data center. Moreover, although many studies have examined the energy consumption of traditional data centers (e.g., PUE), none has systematically appraised the sustainability of big data centers. Hence, this study formulated a comprehensive method that evaluates the source, consumption, and output (e.g., waste heat utilization) of energy in a data center through accounting for the four aforementioned factors. Furthermore, in this study on the sustainability of a big data center, fuzzy ANP was used to analyze responses to the qualitative and quantitative criteria adopted. Generally, big data centers should be constructed and developed in a sustainable and cautious manner.

Sustainability evaluation

Big data have been characterized as 5 Vs: volume, variety, velocity, veracity, and value (Cheng et al. 2017). The details of five aspects are as follow: (1) volume: the amount of data is huge and the size is enormous; (2) variety: the range of data types and sources is large (e.g., multiple sources with multiple dimensions) and data is mixed, heterogeneous, and unstructured; (3) velocity: the speed of data changes is high and data is accumulated rapidly; (4) veracity: the data quality is unreliable due to data inconsistency, incompleteness, and data accuracy is uncertain; (5) value: the value refers to the insights and benefits from data, and maximizing value is to gain insights in real time (e.g., fraud detection). Big data centers aim to get the insights or benefits from the massive amount of changing data. The huge amount and rapidly varying data increases the hardness of getting benefits from them. Even for a simple search operation, the problem with big data is much more difficult than a problem with ordinary data. Big data cannot be handled by typical database, normal computers, or traditional software. It needs huge parallel processing power of computer clusters. Big data look like chaotic, but some hidden knowledge can be discovered. However, the operations of big data centers consume huge energy and generate enormous carbon emissions. Out of the 5 Vs, the first three (i.e., volume, variety, and velocity) are more related to the sustainability of data centers.

Researchers have studied the energy consumption and performance of data centers from different aspects, such as computing power, cooling, and network-related factors. Many companies are still using PUE-based assessment methods. However, sustainability evaluation involves the examination of various characteristics of a data center; among these characteristics, uncertain criteria are challenging to measure. Hence, few studies have successfully established a holistic system for evaluating data center sustainability. Through an extensive literature review and discussions with experts in the data center industry, the criteria presented in Table 1 were determined to be important to data center sustainability. The criteria were divided into four categories: big data level, equipment level, room level, and data center level. The big data level comprises criteria on the volume, velocity, and variety of big data. The equipment level comprises criteria on PUE, the network, storage devices, power sourcing equipment, the server, and the quantity of IT racks. The room level comprises criteria on room monitoring and management, layout and ventilation, the computer room environment, and the refrigeration system. The data center level comprises criteria on renewable energy, waste heat utilization, data center location, and energy consumption per unit area.

Table 1 List of criteria for evaluating sustainability

Methods

An integrated approach was proposed as effective to evaluate the sustainability of big data centers by combining the ANP and fuzzy TOPSIS. ANP is used to weight the relative importance of the dimensions and subdimensions; then, performance scores and weights are combined using fuzzy TOPSIS. This approach is specifically useful for dealing with the situations where the evaluations are uncertain and imprecise. This integration is applied successfully to the complex big-data-center-sustainability problem of the vague and imprecise nature of linguistic assessments, where the experts’ comparisons are denoted as fuzzy numbers.

ANP

The AHP was first proposed by Saaty (1980) for hierarchically structured decision problems. Based on expert judgments, this method is implemented by comparing measurements obtained through absolute scales on tangible and intangible criteria (Saaty 1980). However, many real decision problems are not hierarchically structured: their structure includes interdependent relationships and feedback among the components and decision levels. Therefore, the ANP is more appropriate to such problems, where a network is treated as having clusters of elements rather than a hierarchy (Saaty and Vargas 2006). The ANP has been widely applied in many decision-making problems in the last decade (Seyhan and Mehpare 2010).

Fundamental scale

To balance between competing multiple objectives and criteria, decisions are usually made qualitatively to determine the numerical value that should represent relative importance. As in the AHP, we should make pair-wise comparisons using a scientific and carefully designed approach. The fundamental scale used for decision-making is presented in Table 2. In this research, the ANP was applied to weight the relationships among the main criteria and subcriteria with respect to the goal.

Table 2 Fundamental evaluation scale

Main steps of the ANP

Model construction

The purpose of the data center evaluation problem should be determined, and the elements affecting the goal of the model must be divided into main criteria and subcriteria. This division results in a three-layer hierarchical model structure. According to the opinions of decision-makers, the relationships among the elements in each cluster are marked by arrows. Finally, the alternative data centers are incorporated into the model as the fourth layer (e.g., Fig. 2).

Pair-wise comparison

The elements of every cluster are compared singly according to their importance for their control criteria. Judges then evaluate the pair-wise comparisons. For every pair-wise comparison matrix, the consistency ratio (CR) is used to evaluate for logical consistency. A CR value < 0.1 indicates a consistent pair-wise comparison matrix.

$$ \mathrm{CR}=\frac{\lambda_{\mathrm{max}}-\mathrm{n}}{\left(n-1\right)\ast \mathrm{RI}} $$

Where λmax = the average of normalized weighted sum vector, n = number of criteria; RI = consistency index of a randomly generated pairwise comparison matrix.

Formation of the supermatrix

To obtain the global priorities, the local preference is calculated in the appropriate columns of the supermatrix. This forms a partitioned matrix. Every section represents the contact between clusters. However, the supermatrix is unweighted and must be transferred into relevant limiting priorities. For this purpose, a weighted supermatrix must be obtained by normalizing each column of the unweighted supermatrix.

Acquiring the weights of criteria

The weighted supermatrix is converted to a limiting supermatrix, where the relative weights of the elements can then be obtained from the rows of the limiting supermatrix. Eventually, the overall score Sj of a decision alternative is shown as follows.

$$ {S}_j=\sum \limits_i{w}_i{r}_{ij} $$

Where wi = the weight for criterion i, rij = the rating for criterion i and decision alternative j.

Fuzzy set theory

Decision-making is difficult in an ambiguous and uncertain environment. Zadeh’s fuzzy set theory (Zadeh 1965) can be used to better regulate ambiguity and uncertainty. Fuzzy set theory can improve the synthesis and rationalization of the decision-making process (Chen 2000), and it has been applied to resolve ambiguities in human perception and decision-making. The use of fuzzy sets has enabled the importance of criteria to be evaluated in linguistic terms.

Establishing the fuzzy number

To enable decision-makers to precisely assess criteria, a triangular fuzzy number is commonly introduced in most fuzzy applications (Lan 2016). Figure 1 illustrates a triangular fuzzy number M (Shaw et al. 2012). We represent a fuzzy number as (a, m, b), and the function of membership is presented in Eq. (1). The lower and upper bounds of the fuzzy number M are a and b, respectively, where m is the mode of M (Lee et al. 2009).

$$ \mu \underset{M}{\sim }(z)=\Big\{{\displaystyle \begin{array}{c}\left(z-a\right)/\left(m-a\right),\\ {}\left(b-z\right)/\left(b-m\right),\\ {}0,\end{array}}{\displaystyle \begin{array}{c}a\le z\le m\\ {}m\le z\le b\\ {} otherwise\end{array}} $$
(1)
Fig. 1
figure 1

Triangular fuzzy number

Let two triangular fuzzy numbers M1 and M2 be parameterized by (a1, m1, b1) and (a2, m2, b2), respectively. We propose the following order of operations for triangular fuzzy numbers.

$$ {M}_1+{M}_2=\left({a}_1,{m}_1,{b}_1\right)+\left({a}_2,{m}_2,{b}_2\right)=\left({a}_1+{a}_2,{m}_1+{m}_2,{b}_1+{b}_2\right) $$
(2)
$$ {M}_1\times {M}_2=\left({a}_1,{m}_1,{b}_1\right)\times \left({a}_2,{m}_2,{b}_2\right)=\left({a}_1{a}_2,{m}_1{m}_2,{b}_1{b}_2\right) $$
(3)
$$ {M}_1\div {M}_2=\left({a}_1,{m}_1,{b}_1\right)\div \left({a}_2,{m}_2,{b}_2\right)=\left({a}_1/{a}_2,{m}_1/{m}_2,{b}_1/{b}_2\right) $$
(4)

The distance between M1 and M2 is defined as follows (Krohling and Campanharo 2011).

$$ d\left({M}_1,{M}_2\right)=\sqrt{\left[{\left({a}_1-{a}_2\right)}^2+{\left({m}_1-{m}_2\right)}^2+{\left({b}_1-{b}_2\right)}^2\right]/3}. $$
(5)

Determining the linguistic variables

Linguistic terms are part of the subjective labels of the linguistic variable. We defined the value of a linguistic variable to be a word or phrase. In this study, nine basic linguistic terms were used in the pair-wise comparisons of the sustainability of two data centers (Gumus 2009). These terms are listed in Table 3.

Table 3 Linguistic variables for pair-wise comparisons of criteria

Fuzzy TOPSIS

TOPSIS is generally used to solve ranking problems with a finite set of alternatives. The method depends on the principle that the selected alternative should have the shortest distance from the positive-ideal solution and the farthest distance from the negative-ideal solution (Sun 2010; Gumus 2009). In this study, we evaluated the sustainability of data centers using TOPSIS. The steps in the fuzzy TOPSIS procedure are as follows (Kuo et al. 2007; Önüt and Soner 2008; Kaya and Kahraman 2011).

  1. Step 1:

    Weight the evaluation criteria. In this study, fuzzy ANP was used to detect the fuzzy preference weights.

  2. Step 2:

    Construct the fuzzy decision matrix and select suitable linguistic variables for the alternatives in relation to the criteria. Specifically,

    $$ {\displaystyle \begin{array}{l}\kern2.04em {C}_1\kern0.86em {C}_2\kern1.5em \cdots \kern0.5em {C}_q\\ {}\tilde{D}=\begin{array}{c}{A}_1\\ {}{A}_2\\ {}\vdots \\ {}{A}_p\end{array}\left[\begin{array}{llll}{\tilde{z}}_{11}& {\tilde{z}}_{12}& \cdots & {\tilde{z}}_{1q}\\ {}{\tilde{z}}_{21}& {\tilde{z}}_{22}& \cdots & {\tilde{z}}_{1q}\\ {}\vdots & \vdots & \ddots & \vdots \\ {}{\tilde{z}}_{p1}& {\tilde{z}}_{p1}& \cdots & {\tilde{z}}_{pq}\end{array}\right]\end{array}}, $$
    (6)

    where \( {\tilde{z}}_{ij}^k \) is the weight given by the kth expert and describes the performance of alternative Ai with respect to criterion Cj. The parameter \( {\tilde{z}}_{ij}^k \) is such that \( {\tilde{z}}_{ij}^k=\left({a}_{ij}^k,{m}_{ij}^k,{b}_{ij}^k\right) \).

  3. Step 3:

    Construct the normalized fuzzy decision matrix. The matrix is expressed as follows:

    $$ R={\left[{\tilde{r}}_{ij}\right]}_{p\times q},\kern0.5em i=1,2,\cdots, p;j=1,2,\cdots q. $$
    (7)

    The normalization method can be implemented using the following equations:

    $$ {r}_{ij}=\left({a}_{ij}/{b}_j^{+},{m}_{ij}/{b}_j^{+},{b}_{ij}/{b}_j^{+}\right){\displaystyle \begin{array}{cc} and& {b}_j^{+}=\underset{i}{\max }{b}_{ij}\end{array}}\kern0.5em \left(\mathrm{when}\kern0.24em {r}_{ij}\mathrm{is}\ \mathrm{a}\ \mathrm{benefit}\ \mathrm{criterion}\right) $$
    (8)
    $$ {r}_{ij}=\left({a}^{\_}/{b}_{ij},{a}^{\_}/{m}_{ij},{a}^{\_}/{a}_{ij}\right){\displaystyle \begin{array}{cc} and& {a}_j^{-}=\underset{i}{\min }{a}_{ij}\end{array}}\kern0.5em \left(\mathrm{when}\kern0.24em {r}_{ij}\kern0.24em \mathrm{is}\ \mathrm{a}\ \mathrm{cost}\ \mathrm{criterion}\right) $$
    (9)
  4. Step 4:

    Calculate the weighted normalized fuzzy decision matrix U by multiplying the fuzzy decision matrix (R) with the evaluation criteria weights (Wj).

    $$ U={\left[{u}_{ij}\right]}_{p\times q},\kern0.5em i=1,2,\cdots, p;j=1,2,\cdots q $$

The weighted normalized value uij is calculated using Eq. (10).

$$ {u}_{ij}={r}_{ij}\odot {w}_j $$
(10)
  1. Step 5:

    Define the fuzzy positive-ideal solution (FPIS) and fuzzy negative-ideal solution (FNIS). The component uij is a normalized positive triangular fuzzy number whose limits are within [0, 1]. The FPIS S+ and FNIS S can be determined using the following.

    $$ {S}^{+}=\left({u}_1^{+},\cdots, {u}_j^{+},\cdots, {u}_q^{+}\right) $$
    (11)
    $$ {S}^{-}=\left({u}_1^{-},\cdots, {u}_j^{-},\cdots, {u}_q^{-}\right) $$
    (12)
  2. Step 6:

    Calculate the Euclidean distance of each alternative from S+ to S.

    $$ {D}_i^{+}=\sum \limits_{j=1}^qd\left({u}_{ij},{u}_j^{+}\right),\kern0.5em i=1,2,\cdots, p;j=1,2,\cdots, q $$
    (13)
    $$ {D}_i^{-}=\sum \limits_{j=1}^qd\left({u}_{ij},{u}_j^{-}\right),\kern0.5em i=1,2,\cdots, p;j=1,2,\cdots, q $$
    (14)

    du represents the distance between two fuzzy numbers, which can be calculated using Eq. (5).

  3. Step 7:

    Calculate the relative closeness of each alternative to the ideal solution. The relative closeness is as follows.

    $$ C{C}_i={D}_i^{-}/\left({D}_i^{+}+{D}_i^{-}\right) $$
    (15)
  4. Step 8:

    Rank the alternatives in descending order according to their relative closeness. The basic principle is that the best alternative has the highest closeness coefficient and is thus closest to the FPIS and farthest from the FNIS.

Proposed model

The proposed model comprises four basic stages. In the first stage, the objective of evaluation is determined. This determination is crucial for applied big data research. In the second stage, the criteria to be used in the model are identified through a literature review and through consultations with experts. The problem examined in this study had a decision-making structure that was hierarchical and networked, as illustrated in Fig. 2. For three data centers, the network of their performance assessment can be clearly visualized. We selected four main criteria (B1 to B4) and 17 subcriteria (C1 to C17) for constructing the model. Among the 17 subcriteria, C4 and C14 are cost criteria, whereas the other subcriteria are benefit criteria.

Fig. 2
figure 2

Multiple-criteria decision-making model for evaluating the sustainability of big data centers

After constructing the framework, we calculated the individual priority weights for each criterion using the ANP. Subsequently, the relative weights of the criteria in the supermatrix were obtained and the matrix of the weighted decision was constructed. In the last stage, alternatives were evaluated and ranked using fuzzy TOPSIS. By measuring, in linguistic terms, the decision-makers’ opinions on the performance of alternatives, fuzzy TOPSIS was used to rank the data centers.

Application of the proposed model

To evaluate alternative solutions and help managers of big data centers improve sustainability, data center sustainability was evaluated using the proposed model in a case study. The alternatives (A1, A2, and A3) are three big data centers located in universities, one in north China (Beijing), one in south China (Shenzhen), and one in central China (Wuhan). The main service items of the big data centers include scientific research, school–enterprise cooperation, managed rental services, bandwidth access, and security services. An increase in data processing volume and energy consumption has resulted in considerable pressure on the implementation of green practices in traditional big data centers. The case study was used to illustrate how data center sustainability is affected when different aspects are integrated into the evaluation problem.

Calculation of the criteria weights

After developing the research evaluation framework, the Super Decisions software was used to calculate the criteria weights according to the ANP. Four groups of pair-wise matrices were determined for the interdependent relationships and relative importance among the main criteria, subcriteria, and alternatives. Expert opinions for the pair-wise comparison were solicited from professors and practitioners in the fields of big data and data analytics. The first pair-wise matrices were used for assessing the influence of the main criteria on the evaluation objectives, and the comparison results of the criteria are presented in Table 4. For the second group of matrices, we analyzed the interdependence between B1 (Big data), B2 (Equipment), B3 (The room), B4 (Data center), and their subcriteria. The last group of pair-wise matrices describes the effect of the subcriteria (C4 to C17) on the alternatives (A1 to A3). The weighted supermatrix is presented in Table 5, and the limited supermatrix is presented in Table 6. From the results of the limited supermatrix, we obtained the relative weights of each subcriterion.

Table 4 Comparison results for the criteria and the criteria’s relative weights
Table 5 Weighted supermatrix
Table 6 Limited supermatrix

Evaluation of alternatives and determination of the final rank

This case study focused on evaluating the sustainability of three data centers. Three experts used a nine-item scale (Table 4) to judge the relative importance of the selection criteria when performing the pair-wise comparisons. The opinions of all experts were of equal weight, and the mean value was used to represent the overall fuzzy value of the experts’ collective judgment for the same evaluation dimensions. Subsequently, linguistic variables, such as “Perfect,” “Absolute,” “Very good,” “Fairly good,” “Good,” “Preferable,” “Not bad,” “Weak advantage,” and “Equal” were used to evaluate the performance of the data center with respect to the criteria in Table 4. The decision matrix was also constructed by comparing the alternatives in relation to each subcriterion (see Tables 5 and 7). The weight of each sub-criterion is also presented in Fig. 3. The assessments of data center performance by the three experts are presented in Table 7.

Table 7 Ratings of the three alternatives under 14 criteria
Fig. 3
figure 3

Weights of subcriteria

First, the decision matrix should be normalized with the triangular fuzzy numbers using Eqs. (8) and (9), which include benefit and cost criteria. For criteria C1 of the alternative A1, the normalized value for r11 is as follows:

$$ {z}_{11}=\left(4+6+7,5+7+8,6+8+9\right)/3=\left(\mathrm{5.67,6.67,7.67}\right) $$
(16)
$$ {r}_{11}={z}_{11}/9=\left(\mathrm{0.63,0.74,0.85}\right) $$
(17)

After we obtained the fuzzy evaluation matrix, the fuzzy weighted decision table was calculated. We then obtained the weighted evaluation matrix, as presented in Eq. (10), using the criteria weights computed with the ANP (Table 5). The fuzzy weighted decision matrix is presented in Table 8.

$$ {u}_{11}={r}_{11}\times {w}_{11}=\left(\mathrm{0.63,0.74,0.85}\right)\times 0.066=\left(\mathrm{0.042,0.049,0.056}\right) $$
(18)
Table 8 Weighted evaluation matrix

According to Table 8, we normalized the elements Uij, and ∀i, j as positive triangular fuzzy numbers, with values within [0, 1]. Therefore, the FPIS and FNIS were assigned as \( {u}_i^{+} \) = (1, 1, 1) and \( {u}_i^{-} \) = (0, 0, 0) for the benefit criterion, respectively, and \( {u}_i^{+} \) = (0, 0, 0) and \( {u}_i^{-} \) = (1, 1, 1) for the cost criterion, respectively. The distance between the FPIS and FNIS of each criterion of alternatives was calculated using Eq. (5).

$$ {d}_{u_{11}}^{+}=\sqrt{\left[{\left(1-0.041\right)}^2+{\left(1-0.049\right)}^2+{\left(1-0.056\right)}^2\right]/3}=0.951 $$
(19)
$$ {d}_{u_{11}}^{-}=\sqrt{\left[{\left(0-0.041\right)}^2+{\left(0-0.049\right)}^2+{\left(0-0.056\right)}^2\right]/3}=0.049 $$
(20)

The relative closeness of alternatives between the FPIS and FNIS for each criterion of alternatives could be determined by adding the results obtained from Eqs. (13) and (14). Similarly, computations were performed for the other alternatives, and the results of fuzzy TOPSIS analyses are presented in Table 9. The last stage involved solving for the similarities to the ideal solution using Eq. (15). As indicated by the data in Table 9, the final ranking was A1 > A3 > A2.

Table 9 Result of the fuzzy TOPSIS analyses

Results and discussion

The sustainability of big data centers was evaluated and CCi values were obtained. According to the CCi values, the alternatives, in descending order, were A1, A3, and A2. A1 was the best performing data center, having the highest CCi of 0.1402. The weight of each element was obtained from a previous study, which used the ANP to evaluate the pair-wise comparison matrix (Table 6). In the evaluation of data center sustainability using the ANP, C10 (refrigeration system), C11 (layout and ventilation), C15 (data center location), C1 (data volume), and C8 (server) were determined to be the five most important criteria. Conversely, C16 (waste heat utilization), C14 (renewable energy), C12 (room monitoring and management), and C4 (power sourcing equipment) were determined to be the four least important criteria. Finally, we calculated the CR of the pair-wise comparison matrix to be 0.0622, which was less than 0.1, indicating that consistent weights were obtained and could be used in the evaluation process.

Rong et al. (2016) provided an energy consumption distribution diagram for data centers, in which the two most important factors were refrigeration system energy consumption (40%) and storage device energy consumption (40%). The results of this study’s proposed method indicated that the refrigeration system and servers are the main contributors of energy consumption in a data center, consistent with Rong et al. (2016). As expected, the main concerns in a sustainable data center are related to refrigeration and hardware, particularly the server. Moreover, in the construction of data centers, a suitable data center location as well as layout and ventilation are used to lower cooling costs. Another factor that requires attention is the volume of big data, which is a significant indicator of the data handling capabilities of a data center. Data centers must reasonably distribute data processing tasks to appropriate IT equipment to avoid idle hosts, energy wastage, or overload due to large data volumes. Furthermore, sustainable energy techniques have been a challenge to exploit for the industry. Nevertheless, data center operators should pay more attention to renewable energy use and waste heat utilization, which have considerable potential for improving data center sustainability.

Like any new technologies, big data have sweeping externalities, both good and bad. The positive aspects of big data are real-time monitoring or periodic updating of water, electricity, and other resources at greater granularity, resulting in a greater variety, volume, and velocity of data. By sharing such big data and information with managers, Disney has reduced its electricity usage by nearly billion kWh annually. Caterpillar’s customer found out that using more generators at lower output is better than using fewer ones at maximum power (Corbett 2018).

Big data might help people make better decisions; however, large-scale big data centers have their concomitant downsides and uninvited or detrimental consequences—they can consume too much energy or generate too many carbon emissions. From the perspective of carbon emissions, data volume is more important than variety and velocity based on our study. But the large volume of data might not be the right data for decision making. Sometimes, we need to ask what data is not being collected (Sachs 2012). Even there are big volumes of data available for analysis, they might not be correct—veracity concerns. How these data are collected or reported needs to be verified. The more accurate the carbon emission data, the lower emissions a big data center can achieve (Melville et al. 2017). In addition, when big volume data become available, we might be past the prevention time and into the risk mitigation time, like the case of COVID-19 pandemic. Thus, we must also concentrate on making decisions with multi-criteria with little rather than a large volume of data.

What is alarming is that most stored data are considered as waste or dark data. These gigantic amounts of data just safely stay there, but only 0.5% of the data in the world is used and 99.5% is dark data, of which 32% is trivial, obsolete, and redundant (Cohen 2018). A substantial amount of energy use for big data can be avoided if the dark data can be reduced. Thus, we must find a way to curtail the unrestrained data growth including being collected, reported, stored, distributed and disposed of.

Big data can help reduce material or energy consumption, but running big data centers entails physical processes (e.g., server, equipment, and room) consuming energy. The biggest data centers in the world are ever more depending on renewable energy. Apple’s and Google’s data centers run on 100% renewable energy (Corbett 2018; Kwon 2020). Firms that invest in green IT achieve lower energy consumption and higher profits (Khuntia et al. 2018). Thus, it is necessary to start gauging and abating the environmental costs of big data centers by investing in renewable energy.

Conclusions

This study proposes a data center sustainability evaluation model in which multiple-criteria decision-making methods are integrated using the ANP and fuzzy TOPSIS. The ANP is used to obtain the relative criteria weights for determining their interdependent relationships, and fuzzy TOPSIS is used to rank the performance of alternatives. The applicability of the proposed model was examined using a case study.

Numerous studies have examined technologies for reducing the energy consumption of data centers. However, few theoretical studies have investigated how energy consumption can be reduced by managing data characteristics, energy input (power source), and energy output (location and waste heat treatment). This study proposes a comprehensive strategy, based on a holistic architecture, for the evaluation of the energy consumption of a data center. The results suggest that designers of a data center should pay attention to the center’s carbon footprint and the various factors—such as data characteristics, IT equipment, refrigeration system, and waste heat utilization—affecting energy consumption. Refrigeration system, layout and ventilation, data center location, data volume, and server power consumption were determined to be the five most significant elements in evaluating data center sustainability. The areas that require further development are renewable energy use, waste heat utilization, and how tasks should be optimally allocated to various IT equipment.

To the best of our knowledge, this study is the first to assess data center sustainability through multiple-criteria decision-making methods, where fuzzy theory is used to evaluate the imprecise and subjective judgments of decision makers. This study formulated a systematic evaluation framework based on qualitative and quantitative criteria and comprising the four factors of big data level, equipment level, room level, and data center level. This framework provides a method that managers can use to enhance sustainability when constructing new data centers or upgrading and optimizing existing ones. The managers of data centers must strive to optimize the refrigeration system, layout and ventilation system, data center location, and arrangement of tasks to increase the energy efficiency of IT equipment. They should also better understand clean energy use and waste heat utilization. Moreover, the research results indicate that achieving sustainability and energy consumption reduction for a data center is a systematic problem influenced by factors such as the efficiency of equipment, airway layout, location, data characteristics, and renewable energy. Big data is like unclean oil, which is not only a powerful force for economic development, but also a root cause of environmental harm. We need to make certain that big data centers operate in a sustainable way.