Estimating the energy consumption of model-view-controller applications

For information and communication technology to reach its goal of zero emissions in 2050, power consumption must be reduced, including the energy consumed by software. To develop sustainability-aware software, green metrics have been implemented to estimate the energy consumed by the execution of an application. However, they have a rebound energy consumption effect because they require an application to be executed to estimate the energy consumed after each change. To address this problem, it is necessary to construct energy estimation models that do not require the execution of applications. This work addresses this problem by constructing a green model based on size, complexity and duplicated lines to estimate the energy consumed by model-view-controller applications without their execution. This article defines a model constructed based on 52 applications. The results were accurate in twelve applications, which showed that the joule estimation was very close to reality, avoiding the energy consumed by the execution of applications.


Introduction
In 2020, the carbon footprint of information and communication technology (ICT) was estimated to be 1.43 Gt (billion tonnes) of CO2e [1]. Currently, ICT generates around 2.5% of global greenhouse gas emissions, which poses a challenge in reaching the goal of zero emissions by 2050 [2]. In addition, mobile, fixed and data centre operators must meet the goal of limiting global warming to 1.5 °C by 2050 in order reduce the risks and effects of climate change [3,4]. ICT has positive effects, such as substituting mobility and travel with teleworking and virtual meetings [2]. However, it has negative effects on sustainability [5]. The high increments of electronic devices, computers, server connections and the execution of their applications contribute to a huge energy consumption increment [6]. Therefore, ICT and energy consumption play critical roles in achieving current sustainability challenges [7] Consequently, ICT solutions must be required to support not only the sustainability BY ICT, but also the sustainability IN ICT [8].
Currently, the design of sustainable hardware and communications have been achieved by producing hardware devices and communication protocols that greatly reduce power consumption [9,10]. However, much work is still needed in the software field [1] to reduce the additional power that the execution of software applications consumes in the hardware devices in which they are deployed [11]. Hence, in the sustainability technical dimension, it is necessary to construct new mechanisms, such as practices, criteria and metrics, to support green-aware activities during the construction and evolution of software [12,13]. Green-aware mechanisms aim to reduce the complexity of software to improve its quality, performance and energy efficiency. Reducing software complexity is important because the way in which software is coded influences both energy consumption and energy efficiency [14], which is due to the input/output (I/O) hardware instructions it generates [15].
The reduction of software complexity, and by extension its energy-efficiency, depends on (i) the improvement of the logics of classes and methods, the data types of variables and parameters, and the control structures at the code level; and (ii) the use of good design practices and architectural patterns and styles that improve performance, scalability, maintainability and resource allocation to minimise data interchange [16]. However, in current software development, changes are rapid and continuous because of the wide adoption of agile methodologies and fast time-to-market, which bring software engineers to make sub-optimal green code and/or design decisions that may lead to less sustainable applications. This problem should be addressed by providing green-oriented development mechanisms that support sustainability-aware decisions during coding and/ or design activities in the development and maintenance processes. To that end, it is necessary to guide software engineers during changes and decision-making to guarantee the construction and maintenance of green software. This guidance can be provided by green estimation models.
Green estimation models are based on green metrics that are used to estimate the energy consumed by applications when they are being executed [17].
However, the problem with green metrics is that, in estimating the energy consumed executing applications, they generate an energy consumption rebound effect [18]. This problem is even higher in agile methodologies, which are open to changes and iteration after iteration the design and code are changed [19], which exponentially increases the need to recalculate green metrics to guide the required changes and the energy consumption rebound. There are estimation models to determine the energy consumption by applications before their execution based on instructions or method calls [17], which avoids the problem of energy consumption rebound effects. However, they do not consider architectural design and the key role of software quality in green software [15], although that software architecture patterns and software quality may influence energy consumption. Therefore, it is necessary to develop quality and architectural design-based models to estimate the energy consumption of applications without the need for their execution. This work addresses this problem following the recommendation of Seo et al. [20] to construct one power consumption estimation model for each specific architectural pattern or architectural style, since the wide variability of architectural patterns and styles hampers the construction of a common power consumption estimation model. Specifically, this work presents a green model to estimate the energy consumption of applications that specialise in software architectures and that implement one of the most commonly used architectural patterns, the model-view-controller (MVC) [21][22][23] without the need for execution. the green model, model-view-controller-complexity and code smell energy model (MVC-CCsEM), is based on the quality metrics of complexity (Cyclomatic Complexity [24]), maintainability (Duplicated Lines) [25], and Size.
This article describes the construction process of this multiple linear regression green model for MVC applications for use by software engineers to estimate the power consumption of applications during software construction and maintenance without having to execute them and incur power consumption rebound effects. This MVC-CCsEM was constructed using a training corpus of 52 applications. In addition, this study was performed to determine how the model works in estimating the energy consumption of twelve different MVC applications. The results of the study demonstrated that the energy estimations of the model are valuable and accurate enough to conclude that the model would be useful for guiding energy-aware changes during software development and the maintenance of MVC applications.
The rest of this article is structured as follows: Sect. 2 examines related work about green metrics and estimation models. In Sect. 3, we describe the process used to construct the MVC-CCsEM based on the analysis of 52 applications. Section 4 presents a study of twelve applications in which the MVC-CCsEM energy consumption model is used to estimate energy consumption. The threats to the validity of this work are discussed in Sect. 5. Finally, Sect. 6 concludes the paper and recommends directions for future research.

3
Estimating the energy consumption of model-view-controller…

Related work
Both industry and academia have obtained successful results in constructing sustainable hardware and communications by reducing their power consumption [9,10]. Because ICT solutions comprise both hardware and software, not only must the hardware be sustainability-aware, but the software must also reduce its power consumption in order to maximise the sustainability of ICT solutions [11]. The evaluation of ecological software depends on the application's structure and the hardware infrastructure used for its deployment [17]. The energy consumed by an application cannot be fully isolated from the basal consumption of the hardware by which it is executed. It is possible to measure hardware without executing an application and then measure the application without measuring peripheral devices and other background processes. Hence, measuring the specific energy consumption of an application requires a calculation, which is in fact an estimation. Therefore, green metrics are considered estimations instead of measurements. Green metrics and models are used to measure different applications or versions of the same application that are executed on the same hardware to determine the greenest software solution.

Green metrics: estimation from software execution
Green metrics estimate the energy consumption of applications when they are being executed. Several metrics described in the literature are reviewed in this section.
To measure the energy consumption and hardware utilisation of software, Guldner et al.
[26] defined a method for evaluating "energy consumption" and "processor utilisation". To collect measurements using green metrics, Michanan et al. [27] developed an interface based on a collection of classes using a dynamic data structure called GreenC5 to evaluate applications that used different workloads.
The collection of green metrics is supported by tools that are used to implement dynamic analysis techniques by applying calculation models to obtain only the specific power consumption of an application. Some well-known tools that measure energy consumption are RAPL [28], Microsoft Joulemeter [29], jRAPL [30], Pow-erAPI [31], and Jalen [32]. In this work, we used Microsoft Joulemeter to measure the energy consumption of the applications. Sehgal et al. [33] also used this tool for experimentation in their research.
Lago et al. [34] collected 66 software metrics and models to measure and estimate energy consumption, 17 of which were related to software architecture [20]. Other works have used these metrics and tools to estimate the energy consumption of a specific kind of software architecture, such as product line architectures [35] or cloud software architectures. Specifically, sustainable cloud software architectures have several open challenges [36], previous studies have attempted to define algorithms and energy-aware techniques [37,38] that use green metrics to support sustainable solutions.
On the other hand, some previous works have used green metrics to demonstrate the relationship between the quality of a code and power consumption. Cairo et al. [39] described how smells and cyclomatic complexity influence the appearance of errors in applications. Sehgal et al. [33] evaluated the impact of refactoring smells on total energy consumption. These previous works revealed that Cyclomatic Complexity and smells are key factors in energy consumption.
These works were based on green metrics, which required calculations to execute the software application and generate an energy consumption rebound [18]. In this work, we go beyond these previous studies by developing a solution to avoid the rebound power consumption of green metrics.

Green models: estimation from green metrics
The measurements of green metrics obtained from application execution allow for the construction of energy consumption estimation models, which are also called metrics, based on them and other kinds of metrics.
Chatzigeorgiou and Stephanides [17] proposed three metrics to estimate the energy consumption of software: executed instruction count measure (EIC), memory access count measure (MAC) and software energy (SEM). These metrics use external meter instruments to measure in runtime. Specifically, EIC measures the number of instructions of an application executed by the processor. MAC measures the number of memory accesses of an instruction, and SEM calculates the average energy cost of executing an instruction. Dufour et al. [40] proposed three "hot spot" metrics based on the methods and classes of an application: total execution frequency (TEF) of a method, class invoked frequency (CIF), and class invoked time (CITC). Energy wasting rate [41] is another metric that uses measurements during the execution, which are part of the software's logic structure (number of classes, methods, and structures for data exchange). The values obtained from the metric allow the programmer to know which Java classes or methods need to be changed or refactored to save power. These previous works revealed that the size of an application (source lines of code (SLOC), number of classes, methods, etc.) is a key factor in its energy consumption. Therefore, size is part of our estimation model. However, these models are focused on size and do not estimate considering the quality of code.
In this regard, Fu et al. [42] estimated the power consumed by an application using statistical algorithms and machine learning. However, although this work was based on quality attributes, it had the disadvantage that the application must be executed to obtain the estimation, generating a rebound energy consumption effect. Because our goal was also to reduce the power consumption of our solution, we avoided this negative effect by defining an estimation model based on Size and quality attributes without the need to execute the application.
Regarding design patterns, Feitosa et al. [43] confirmed that design decisions also affect energy consumption. The authors used crossover experiments to estimate the energy consumption of solutions that applied the GoF design patterns state/strategy and template method. To validate the estimated energy measurements, they applied statistical models and the agglomerative hierarchical clustering technique using SLOC and message passing coupling (MPC) as metrics. This work was innovative in early work on estimations because it was based on design patterns and considered the role of architectural components, not only based on Size (SLOC). They also 1 3 Estimating the energy consumption of model-view-controller… considered that a quality attribute was coupling (MPC). However, it is important not only to use one quality metric but also to use one that previous studies have revealed is a key factor in power consumption, such as Cyclomatic Complexity and smells [33].
Several previous studies have addressed the energy consumption estimation of software architectures. Regarding software architectures, Seo et al. [20] acknowledged the need to create specialised early estimation models that would enable different architectural patterns and styles to be precise. This capability also enables an engineer to employ energy cost predictions to determine the most appropriate architectural style for a given distributed application before the implementation of the system. This provides the opportunity to compare (i) the power consumption of different architectural patterns or architectural styles of the same application and (ii) the power consumption of different applications designed with the same architectural pattern or architectural style. In this work, Seo et al. [20] presented 17 architectural consumption metrics that estimated the energy consumption of components and connectors. They showed that, because components and connectors play diverse roles in architecture, they incur different wastages of energy consumption. Among the 17 metrics, it is important to emphasise the generic energy cost model because it measures a complete architecture by distinguishing the energy consumed by the components (i.e. computational elements) and connectors (i.e. interaction elements). The generic energy cost model specialises in other metrics, such as the client-server energy cost model and the pub-sub energy cost model, which facilitate the early energy estimation of architectural patterns. However, these estimation metrics are based on Size, and quality attributes are not included in the estimation.

Green models for estimating power consumption during software life cycle
Fast time-to-market and the wide adoption of agile methods have led software engineers to apply a fast design decision-making process that may result in sub-optimal sustainable decisions for software applications, if suitable sustainable tools are not adopted as part of the process. Ardito et al. [44] provided guidelines for correctly measuring software applications and describing the techniques, tools and models that can be used, whereas Georgiou et al. [45] drew attention to this problem throughout the entire software life cycle. They provided guidance for using tools and techniques during the requirements analysis, design, implementation, testing and maintenance of software applications. Ournazi et al. [46] also emphasised this problem and provided a solution for addressing sustainability in software requirements and measuring power consumption during software construction to fulfil green requirements. However, these tools are based on green metrics that require software execution, which leads to an undesirable power consumption rebound effect throughout the entire software life cycle.
In summary, based on the relevant literature, it may be concluded that there are no early green models that do not require the execution of a software application to estimate its energy consumption by considering its design and quality attributes. To develop an integrated solution that takes into account the needs identified in our review of related studies (see Table 1), in this work an estimation model was developed to determine the power consumption of MVC software architectures based on not only size but also quality attributes, such as complexity and smells (i.e. code smells and/or Duplicated Lines). In addition, this early estimation model can be applied as often as necessary during software construction and maintenance processes without generating rebound power consumption effects. Finally, this early estimation model was specialised in the MVC pattern to facilitate comparison of the power consumption of different applications or versions designed in the MVC pattern.

Construction of the MVC-CCsEM model
To construct the MVC-CCsEM, we defined and followed a rigorous process formalised in the standard of Software and Systems Process Engineering Meta-Model (SPEM) [47] to guarantee its replication and reuse for creating new estimation models based on quality and energy metrics (see Fig. 1). The process of constructing the MVC-CCsEM comprised five phases: scope definition, building, profiling, analysis and construction. Each phase included a set of tasks as well as their inputs and outputs. Tasks related to energy estimation metrics required highly rigorous testing to avoid bias and obtain precise results. Therefore, the process was based on the evaluation of energy efficiency by Mancebo et al. [48] to define energy consumption measurement tasks. Each phase, its tasks and its results are described in the following subsections.

Phase I: scope definition
This phase consists of two tasks: specifying the requirements and the goal of the model. The requirements specification consists in defining the scope and context of the model, as well as the inclusion and exclusion criteria. The goal of the model is  [45,46] Avoiding the rebound effects of executing applications for estimating power consumption [18,43] Creating specialised early estimation models for different architectural patters or architectural styles [20,34,[36][37][38]43] Considering the Size of an application (source lines of code (SLOC), number of classes, methods, etc.) as a key factor in the power consumption estimation [40,41] Considering quality attributes, in general, and Cyclomatic Complexity and smells in particular, as key factors in the power consumption estimation [33,39,43,45] defined as a set of hypotheses and/or research questions (see Fig. 1). The "scope and context" input and the "inclusion and exclusion criteria", "hypothetical model" and "hypotheses" outcomes are described in the following subsections.

Scope and context
In our previous work, we defined the CCsEM model to estimate the energy consumption of software components without being executed (see Eqs. 1 and 2) [49]. 3). However, it is important to consider that the obtained results are normalised by a logarithmic transformation. Therefore, to obtain the estimation value without normalisation, it is necessary to apply the inverse function of the log (CCsEM), that is, the exponential function. Therefore, the power consumption estimation of the model CCsEM is obtained by applying the exponential function, as shown in Eq. 2.
This CCsEM model is based on the metric Generic Energy Cost Model [20], which estimates the energy consumption of software architectures in terms of the energy costs of components and connectors. The CCsEM model was narrowed to facilitate the energy consumption estimation of components and simple interactions to avoid the uncertainty generated by complex connectors. They vary from simple interactions [50] between components to complex orchestrators among components that implement coordination protocols [51]. In fact, the energy consumption of complex connectors that orchestrate a high number of components with complex policies may have a high impact on energy consumption even higher than the components. Complex connectors are applied by a wide variety of architectural patterns [23] and technologies [52][53][54] which influence energy consumption to different degrees. Therefore, they require a separate specific study to define an appropriate estimation model and determine their energy consumption, depending on their role, technology, protocol and number of orchestrated components, among other properties. In the present work, we went beyond the CCsEM model by considering both components and connectors, that is, the complete architecture. However, because the complex connector variability and their derived uncertainty must be avoided, this work was constrained to a specific architectural pattern, following the recommendations of Seo et al. [20] to construct one power consumption estimation model for each specific architectural pattern or style. In this work, the CCsEM model is specific to software architectures that implement the MVC pattern. Therefore, the training of the model is also constrained by the execution of applications that implement MVC software architectures.
Guaman et. al [55] revealed the metrics that are the most frequently used as static analysis tools to examine software architectures and applications. In addition to size, these metrics include complexity and maintainability. Maintainability is usually supported by tools that provide technical debt (TD) analysis. These TD tools measure smells, such as code smells and duplicated lines. In addition, Sehgal et al. [33] found that cyclomatic complexity and smells are vital factors in power consumption. This base of knowledge was used to construct a quality-aware energy estimation model to predict energy consumption before execution. In this work, the metrics that were measured to construct the MVC-CCsEM are as follows: Estimating the energy consumption of model-view-controller… • Size: Size was measured by the application Size and SLOC. • Complexity: Complexity was measured by Cyclomatic Complexity. • Maintainability: Maintainability was measured by smells, Duplicated Lines and code smells.
These decisions allowed the model to fulfil the needs identified in the software power consumption estimation field (see Table 1). In addition, it is important to emphasize the application context of the model. The MVC-CCsEM model was defined to support a sustainable software construction and maintenance without the need for execution. This means that the model can be calculated as many times as the engineer requires without incurring additional power consumption. This calculation during the software construction and maintenance is locally performed by an engineer on a computer; obtaining the measurements of the quality and size from tools such as SonarQube [56,57]. Therefore, the model is independent of the computer or IoT device on which it is deployed and an MVC-CCsEM-improved greener code during software development fosters greener ICT solutions if we deployed it on power-efficient computers and IoT devices in the real setting. Then, these greener ICT solutions can be additionally measured using green metrics and models (see Sects. 2.1 and 2.2) in their real setting to measure different devices, communications and user-connections.

Inclusion and exclusion criteria
The criteria are based on the fact that the model only is going to estimate the energy consumption of MVC applications in order to estimate applications with complex connectors constraining their variability to only one architectural pattern. Based on this premise, we selected applications by applying the following inclusion criteria (IC) and exclusion criteria (EC): • IC1: Executable Java or C# MVC applications in which the SonarQube tool can perform a static analysis to extract the metrics size, source lines of code, cyclomatic complexity, duplicated lines and code smells. • EC1: Non-MVC applications. • EC2: Java and C# MVC applications with compilation errors. • EC3: Java and C# MVC applications with execution errors. • EC4: Applications not included in IC1.

Goal
The main goal of this work was to construct a green model for estimating the energy consumption of the MVC software architectures in terms of size, maintainability and complexity to avoid software execution (see Table 1). To address this goal, the following two hypotheses (H1 and H2) and null hypothesis were defined (H0): • H1. The power consumption of the applications that implement a MVC software architecture can be estimated accurately without their execution from their Size, Complexity and Maintainability. • H2. The complexity and maintainability cannot be depreciated in the power consumption estimation of applications that implement an MVC software architecture without being executed because both are significant. • H0. ¬ (H1 and H2).
To validate these hypotheses, this work also defined an initial hypothetical model named Model-View-Controller Complexity and Code smells Energy Model(MVC-CCsEM). This model is based on the metrics (variables) defined in the "Scope and Context" input of the "requirements specification" task of the process (see Fig. 1 and Sect. 3.1.1), i.e. Size, Maintainability and Complexity. Because the model required several variables to be considered, the MVC-CCsEM was specified as a multiple linear regression model. The MVC-CCsEM is the value that must be calculated, that is, the dependent variable of a multiple linear regression model, whereas the variables used to calculate the dependent variable (MVC-CCsEM) are independent variables in a multiple linear regression model. As a result, the validation of the multiple linear regression model allowed for the determination of significant independent variables for its calculation. The selected variables are detailed as follows: Based on these initial variables, the formula for the multiple linear regression model was defined, as shown in Eq. 3).

Let be 0 : the value of the dependent variable when the rest of the variables are set to zero; i : the regression coefficient of its independent variable, that is, the average effect of a unit increment of the independent variable on the dependent variable, being i=SIZE, SLOC, CC, CS or DL; and e: the model error.
The H2 is validated during the construction of the model. If the two variables of maintainability are not significant or the CC is not significant, the hypothesis H2 is rejected. If the H2 is accepted, then the hypothesis H1 is validated with the model execution by determining if the estimation results of MVC-CCsEM are accurate Estimating the energy consumption of model-view-controller… for its adoption. To address our goal both hypotheses H1 and H2 must be fulfilled, which means the rejection of H0.

Phase II: building
To construct the model, a training corpus data set is required. It must be composed of a set of MVC applications and their quality and energy measurements. This data set is created in the building phase. It consisted of two tasks: "SW architecture search and selection" and "Build applications" (see Fig. 1). The tasks are described in the following subsections.

SW architecture search and selection
This first task aims to search the training corpus of MVC applications in public and accessible repositories. By taking into account the inclusion criteria (IC) and exclusion criteria (EC), a search of the C# and Java MVC applications was performed in GitHub (https:// github. com/), where the software code is accessible and was written by different programmers to avoid human and programming bias. The applications were then downloaded from GitHub and stored locally on the computer. In this search, we obtained 74 applications. The second step consisted of confirming that none of the downloaded applications fulfilled EC1. Therefore, the applications were opened in their corresponding integrated development environments (IDE), and it was confirmed that their architecture implemented an MVC pattern. When it was confirmed that our data set was composed of MVC applications, it was possible to address the second task. At this point, we excluded 19 applications that applied EC1, and 55 applications were preserved for the data set.

Build applications
This task requires two actions (1) compile and execute the MVC applications and (2) determine that they did not fulfil EC2 and EC3. To that end, the applications must be compiled and executed using the corresponding Integrated Development Environment (IDE). This execution could include pre-processing the code, such as updating dependencies or preparing the JAR files for Java applications. In this task, three applications were removed from the data set by applying EC2, EC3 and EC4. The resulting data set of 52 MVC applications was obtained (see Tables 2 and 3). These applications encompass a range of sizes in percentages of 36.5 % XS, 34.6% S, 1.9 % M, 25% L, and 1.9%XL. Specifically, 73.1% are codified in Java, and 26.9% are codified in C#, and each addresses different domains, as shown in Table 2 and Fig. 2.

Phase III: profiling
The Profiling phase consists of two tasks: "Static Analysis" and "Dynamic Analysis" (see Fig. 1). The Static Analysis is conducted to calculate the size, complexity and maintainability metrics from the selected applications. The Dynamic Analysis is conducted to measure the energy consumption of the applications during their execution. These measurements constitute the training corpus of the model. These two tasks are described in the following subsections.

Static analysis
The static analysis was performed using the SonarQube tool [57]. Each application was loaded in SonarQube, and its SIZE, SLOC, CC, DL and CS were collected by executing the sonar-scanner command. The collected data were then stored in a MySQL repository by SonarQube. We also stored the data in an Excel file for processing.

Dynamic analysis
The dynamic analysis was performed using Microsoft Joulemeter, which measures the energy consumed by the execution of applications and stores the measurements in a CSV file. Specifically, the measurements provided by Microsoft Joulemeter are CPU, MONITOR, RAM, BASE, Application Power (CPU only), and Total Power (see Fig. 3). To obtain valuable data from the dynamic analysis, it is required that the context of execution and the collection procedure be the same for all the applications under measurement [48]. Because all the applications are measured under the same execution conditions, Joulemeter based their calculation on the same basal consumption, which guaranteed that we compare only the power consumption of the application, and the measurement is independent of the computer. Next, we define

3
Estimating the energy consumption of model-view-controller… the context of execution and the collection procedure followed during the dynamic analysis: • Context of Execution: It is important to include both hardware and software configurations. • Collection procedure: This is composed of several steps, which are detailed as follows: 1. Calibrating: Microsoft Joulemeter was calibrated running on batteries with the defined software stack (power chord disconnected). Subsequently, it was not possible to update the hardware and software established in the context of execution until the collection procedure was finished. 2. Determining the measuring time of applications: During the execution of an application, energy consumption fluctuated, depending on the functionality that was being executed at that moment. Hence, to ensure that the measurement of an application is representative, the execution time must allow the (a) The applications categorized with the two highest SIZE of the training corpus are selected in order to determine the number of functionalities provided for execution. In this case, we selected applications that were categorised as XL and L, and we calculated the number of functionalities. For example, if the application provided the create-read-updatedelete (CRUD) of a specific element of the software system, it was recorded as having four functionalities. 3. Establishing the execution context: The unnecessary peripheral devices and background processes were disconnected and stopped, respectively. Only Microsoft Joulemeter and the software required to execute the application were left running. In addition, to begin the measurement at the same starting point in applications that managed databases, all databases were emptied beforehand. 4. Measuring power consumption: Each application was measured during the time determined in step 2, which was 10 min. It is recommended to follow the same interaction pattern to avoid neglecting any functionality during execution. Small applications in which the execution of all functionalities was finished before the established time were executed until the time was completed by repeating the interaction pattern as many times as necessary. 5. Storing power consumption measurements: After completing step 4 in each application, the obtained measurements were stored.
Steps 4 and 5 were performed iteratively. When they were executed for all applications, the training corpus of the model was obtained, which in this case was composed of 52 MVC applications (see Fig. 3).

Phase IV: analysis
The Analysis phase consists of two tasks: "Data Analysis" and "Data Normalization" (see Fig. 1). They are described in the following subsections.

3
Estimating the energy consumption of model-view-controller…

Data analysis
In this task, we prepared and selected the energy consumption values provided by Microsoft Joulemeter and used them to train the regression model (see Fig. 4). Microsoft Joulemeter calculates power consumption using its estimation mathematical model, which is based on CPU, screen, memory and storage of total power [58]. Total power and CPU metrics cannot be used as estimation variables because they include measurements not only from the execution of the application under study but also from other processes that are being executed at the same time. However, in Joulemeter, the power consumption estimation model also calculates the "application power (CPU only)", that is, it subtracts the total CPU and the basal CPU consumption of the computer (see Eq. 4). Hence, the representative metric is "application power (CPU only)" because it stores the energy consumption of the MVC application during its execution-that is, the strict power consumption of the applications under evaluation. In addition, the "data analysis" task (see Fig. 1) requires analysing the relationship between the variables to identify two possible problems: (P1) there are independent variables that present nonlinear relationships with the dependent variable, and (P2) there are collinearity problems between the independent variables.
(4) ApplicationPower(CPUonly) = CPU − Basal CPU Consumption To analyse the data distribution, we used the GGally package [59], which is able to plot a data set with multiple variables. Figure 2 shows the GGally results of the data set in histograms and scatter plots. They show a linear relationship among the variables because the values of the scatter plots between two independent variables are close to the line; thus, problem P1 regarding non-linear relationships was avoided. The data distribution showed that some variables were linearly dependent (see Fig. 2).
In addition, the correlation coefficient of each pair of variables was calculated. The results showed that some variables were highly correlated, such as source lines of code (SLOC) and cyclomatic complexity (CC), with a correlation of 0.955, and SLOC and code smells (CS), with a correlation of 0.886 (see Fig. 4). Therefore, when SLOC increased, CC and CS also increased, and vice versa. In addition, this implied that when SLOC had a high value, CC and CS also had high values. This indicated that collinearity problems may exist and some independent variables may be depreciated in the final model. In addition, as shown in Fig. 4, the results revealed a low positive correlation between the MVC-CCsEM and DL variables, which may have implied a weak relationship between both variables.

Data normalisation
This step is conducted to determine whether the data required normalisation to be processed correctly and to ensure that valid results is obtained. This task is optional, depending on the kind of variables used and the data distribution obtained from the data analysis. If there are variables without a linear-relationship, the normalization is required, whereas the normalization is not performed if the variables have linear relationships. In this case, data normalisation was not required because there were no variables with non-linear relationships (see Fig. 4). On the one hand, the categorical variable SIZE must be transformed into a quantitative variable by assigning continuous values [60]. Hence, the k categories were coded as the following continuous values: 1 (XS), 2(S), 3 (M), 4 (L) and 5(XL).

Phase V: construction
In the final phase, the model is constructed. The regression model is applied to estimate the value of the dependent variable and to evaluate the influence of the predictors for use as an early estimation model (see Eq. 3). In our case, the MVC-CCsEM model was applied to predict the expected MVC-CCsEM of a new MVC software architecture, which energy consumption is not known.
Through the generation of the MVC-CCsEM regression model, hypothesis H2 was validated by determining whether the coefficients of the independent variables were statistically significant. They are significant, with a p-value of less than 5% (0.05) [61,62].
To construct the MVC-CCsEM regression model that validated our hypotheses, we used the least squares method [63]. This method checks for the potential problem of multicollinearity (P2), and it allowed us to determine that this problem did not exist in the MVC-CCsEM model. In this method, an R 2 value (0-1) close to 1 indicates multicollinearity. Therefore, we calculated the R 2 coefficient of the MVC-CCsEM model to discard that it was close to 1. To obtain the first version of the regression model and determine its coefficient values, the "lm" function of RStudio was used to fit linear models. The values of the metrics from the 52 applications were included in the R-Studio Tool as the training corpus of the model. In addition, the dependent variable (MVC-CCsEM) and the independent variables (SIZE, SLOC, CC, CS, DL) were specified in the R-Studio Tool.At this point, the "lm" was executed to generate the linear regression model using the least squares method. This execution provided us with the first version of the MVC-CCsEM model, which is presented in Eq. 5, showing the coefficients of the MVC-CCsEM regression model: This calculation obtained the values of R-square (0.5043) and adjusted R-square (0.4408). Since their values were not close to 1, they indicated that a multicollinearity problem was not present and that the data fit of the model was representative of energy estimation [64]. As shown in Table 3, the coefficients SIZE , SLOC , CC and DL were significant because their p-values were less than 0.05. Those with the highest number of asterisks were the most representative. SIZE represents the relationships between MVC-CCsEM and SIZE with a p-value= 0.04794; SLOC is the relationship between MVC-CCsEM and SLOC with a p-value=0.00764; and DL represents the relationships between MVC-CCsEM and DL with a p-value= 0.02584 < 0.05 . CC was the most significant coefficient with the lowest p-value of 0.0000485, which confirmed the high correlation between cyclomatic complexity and energy consumption. This indicated that the coefficients SIZE , SLOC , CC and DL were significant because their p-values were less than 0.05, being the most representative for the MVC-CCsEM model.
Moreover, the statistic F model had an acceptable p-value of 0.00003114. However, it was possible to conclude that with a p-value of 0.37000, the coefficient CS was not significant, and it could be removed from the model. However, insignificant variables in the initial model could be significant in other models in which one of the  variables was removed. Hence, before definitively depreciating CS , we applied the Stepwise strategy to the model, following Akaike's criterion (AIC) [65] of selecting only significant variables for its construction. This strategy consists of applying the square method to the different combinations of our model by removing one or multiple variables from the initial model and determining the best result. By applying this strategy, we determined that the significant variables were SIZE, SLOC, CC and DL (see Table 4). They conformed to the MVC-CCsEM model, obtaining the values of R-square (0.4939) and adjusted R-square (0.4432), which determined that the fit of the data to the model was representative of energy estimation [64]. In addition, this model highly improved the initial model by increasing the significance of the statistic F model with a p-value = 0.00001324 and determining the following p-values of the coefficients: SIZE =0.03145, SLOC =0.01039, CC =0.000056 and CS =0.03790. These values corresponded to the best model resulting from the AIC-based variable selection process. Based on this statistical evidence, hypothesis H2 was validated. In addition, it was demonstrated that size (SIZE, SLOC), complexity (CC) and maintainability (DL) are significant predictors of estimating energy and they cannot be depreciated. Based on these results, the final model is presented in Eq. 6.

Estimating the energy consumption of MVC applications using the MVC-CCsEM
Once the MVC-CCsEM model was constructed and H2 was validated (see Phase I), it was necessary to validate H1 by evaluating the model's estimation capabilities and accuracy. To that end, an experiment was conducted in which the energy consumed by twelve MVC applications was estimated using the model MVC-CCsEM, and it was compared with the power consumption value measured by Joulemeter during the execution of the MVC application (i.e. application power (CPU only)). The characteristics of these twelve applications are described in detail in Table 5. To conduct this experiment, the Profiling phase was performed for all twelve applications according to the construction process of the model. The metrics SIZE, SLOC, CC and DL were collected by the SonarQube tool to determine the values of the independent variables of the MVC-CCsME model. In addition, to collect data on the real consumption of the MVC applications, they were executed in the same context and pattern to extract the application power (CPU only) using Microsoft Joulemeter. The results are presented in Table 5. Figure 5 and Table 5 show the comparison in joules between real energy consumption and the estimation of the MVC-CCsEM model. To measure the model's performance and determine the precision with which the model predicted the dependent variable, the root mean squared error (RMSE) was used [66], which was calculated using R-Studio. In this case, the prediction of these twelve applications was RMSE = 0.2861, which was an acceptable value for the estimation [66][67][68]. The estimation of Application 2 was the most accurate, with a difference of e = 0.0041 , whereas Application   Table 5). Based on the results of this analysis, we concluded that the estimation of the model was sufficiently accurate to be a valuable tool for software engineers. The joule estimation was very close to the reality in most of the applications. Thus, the need to execute the application to determine its power consumption and generate extra consumption was avoided. Based on these results, the MVC-CCsEM model could effectively support software engineers during software construction and maintenance by informing them about variations in power consumption due to the code changes. Software engineers can therefore avoid performing explicit measurements, such as new tasks in the software life cycle, executing an application to obtain information and therefore generating power consumption rebound effects.

Threats to validity
To improve the internal validity of our results, we selected applications based on evidence described in the code or documentation of GitHub or academic repositories. More than one author checked the architecture to select applications that implemented the MVC pattern. Furthermore, to collect the metrics, we configured a local computer dedicated to collecting and extracting the metrics through the tool. Each application was downloaded from the repository, built and executed under the same context, even the database, which was empty in all cases. As a result, personal bias was avoided in automatic information management. Construct validity was addressed following Lago et al. [33] who proposed a set of metrics that can be used to estimate energy consumption. We also followed the guidelines proposed by the PowerAPI Reference Architecture [34] to collect, measure, extract and export metrics. The model was developed by considering the variables identified in the literature as essential in software power consumption estimation (see Table 1), since our aim was to provide a simple model that could be quickly adopted by software engineers. Regarding the regression model, we applied statistical methods to avoid bias and followed Kiers et al. [69] to evaluate correlations between different variables by assigning continuous values to translate qualitative variables into quantitative variables.
Finally, to ensure external validity, the model was constructed using 52 MVC applications programmed in C# and Java with different sizes and quality features (see Table 2 and Fig. 3 ). Hence, this model is useful for Java and C# MVC applications that use MySQL and SQL Server databases. So, to generalise the applicability of the MVC-CCsEM model, it will be necessary to increase the training corpus of applications, including applications in other languages, database management systems and non-SQL databases.

3
Estimating the energy consumption of model-view-controller…

Conclusion
In this work, we developed a green estimation model, MVC-CCsEM, which provides an integrated solution for the identified needs of software power consumption estimation throughout the software lifecycle. MVC-CCsEM estimates the energy consumption of MVC applications in terms of Size, Source Lines Of Code, Cyclomatic Complexity and Duplicated Lines without the need for execution and the resulting generation of power consumption rebound effects. The adoption of this model is feasible during the construction and maintenance decision-making of C# and Java applications of any size in which the architecture implements an MVC pattern that uses MySQL and SQL server databases. The model was constructed by analysing 52 applications and validated by estimating twelve applications. The results showed that this model is an advancement in the field. The results for the twelve applications showed an RMSE = 0.2861, indicating that the joule estimation was very close to reality in avoiding the extra energy consumed by application execution. Therefore, MVC-CCsEM can assist software engineers saving their time during their development tasks and the power consumption of their applications.
In addition, this work formalises in SPEM the process that was defined and followed to construct the MVC-CCsEM model, which is a reusable asset for the research community, by emphasising optional and mandatory tasks as well as defining how to perform them to avoid bias. In future work, we will extend and improve the MVC-CCsEM model by extending the training corpus with a greater number of applications, examining the ways in which a population and the use of MVC scaffolding mechanisms may influence power consumption, and measuring each application several times to increase the accuracy of the measurements. In addition, we plan to automatise the calculation using Sonarqube to construct a tool. We will examine other variables that can be extracted from a static analysis to determine whether they influence the energy consumption of software applications. In addition, we will determine how new languages and databases influence the model to determine whether it is necessary to specialise in programming languages or data management systems. Finally, we will determine the success of adopting the model during the software life cycle for the benefit software engineers.