1 Introduction

Currently, Resource Planning Optimization (RPO) is a frequent task that companies may face to get many benefits, like budget improvements, run-time analyses, and human resource organizations (Halima 2017). To this end, RPO is often addressed by using several software products and tools, such as Mavenlink (Mavenlink 2020), Enterprise Resource Management (ERM) Software (2020), and Tempo Planner (Tempo 2020), which are able to provide a practical solution as soon as possible. Also, RPO becomes crucial in industrial contexts because it can guarantee the operative of several particular systems, like Microgrids. At a high level, a Microgrid system is defined as a group of distributed energy resources that act as a single controllable entity able to provide energy for a local community 24 hours a day (Jiang et al. 2013; Das et al. 2017; Joseph and Thomas 2013). More precisely, a Microgrid has got two modes respectively called stand-alone mode and grid-connected mode, which are able to manage the energy by taking several decisions in order to satisfy the following three goals: reliability (physical and technological), sustainability (environmental considerations), and economics (cost optimizing and efficiency) (Hirsch et al. 2018; Hossain et al. 2014). For this reason, several RPO models, defined as Mixed Integer Linear Programming (MILP) formulations, have been proposed for the Microgrids optimally managing (Yuan et al. 2017; Sedzro et al. 2018; Mohamed et al. 2019).

However, due to their combinatorial nature and the presence of several decision variables, Mixed Integer Linear Programming (MILP) problems often require an exponential time effort to compute optimal solutions (Bragin et al. 2019). Consequently, MILP state-of-the-art algorithms (e.g. the CPLEX solver) cannot always provide computationally efficient solutions because of several fundamental issues, such as problem size, model and data characteristics, and parameters settings (IBM 2018). For instance, Branch-and-Cut (B&C) based methods solve optimization problems by exploiting the convex hull, namely: the smaller convex set containing the feasible solutions (Bragin et al. 2019). However, if the convex hull is difficult to obtain, the employed methods become computationally inefficient and might depend on some heuristics, such as aggressive settings for the cuts generation, adding cuts based on the model knowledge, and variables selection strategies (IBM 2018a). Additionally, even Lagrangian Relaxation (LR) based algorithms, often employed to exploit the master problem separability into several sub-problems, suffer from a slowly overall convergence that might negatively affect the optimization process (Bragin et al. 2019). Hence, although the mathematical programming related literature reports the execution times for specific algorithms and problem instances (Meindl and Templ 2013), it is difficult to establish the required time given only a MILP problem’s generic formulation (IBM 2018). For instance, S. Lehmann et al. have investigated an RPO problem related to wind farm planning with multiple cable types (Lehmann 2017), while J. C. S. N. Pinheiro et al. have investigated an RPO problem related to parallel machine scheduling (Pinheiro et al. 2020), in which each machine has a certain amount of resources to process a job. The related MILP formulations have been solved through different proposed approaches, like Simulated-Annealing (SA) and Iterated Greedy (IG), and then compared to the classical MILP solvers with a threshold time of 1 hour. In both cases, due to the complex nature of the considered problems (NP-Hard), the comparisons have shown the difficulty of classical MILP solvers in solving small data instances in a short time by becoming computationally inefficient and providing worse solutions in comparison with those derived by the proposed approaches.

On the other hand, Artificial Intelligence (AI) based approaches have been increasingly used in many scientific fields in the last decades, demonstrating to be valid alternatives in solving complex issues, like cancer’s diagnosis (Elia et al. 2020), aerospace’s structures testing (D’Angelo and Rampone 2015), and malware detection (D’Angelo et al. 2020). More precisely, thanks to the famous ability of AI to learn complex patterns, features, and relationships from huge amounts of data, it is possible to talk about a set of classification and knowledge elicitation mechanisms falling under the umbrella of Machine Learning (ML) techniques. In particular, Artificial Neural Networks (ANNs) represent one of the most famous AI approaches that have been employed in several research areas, like robotics (Li et al. 2019) and computer vision (Kanuri et al. 2018), to investigate their effectiveness in solving multi-labels multi-classes problems as classification tasks.

For this reason, the goal of the following proposal is to investigate the use of multiple ANNs as an alternative approach for solving a RPO problem. More precisely, an optimization problem related to the scheduling of different Combined Heat & Power (CHP) generators in a real Microgrid system is presented. Then, experimental results, achieved by considering only the input demands and the corresponding output schedules through several multi-label multi-class classification tasks, are discussed and compared with the most famous ML-based approaches.

The rest of the paper is organized as follows. Section 2 will report an overview of related works. Section 3 will show background concepts related to the developed application. Section 4 will present the optimization problem’s description to be solved, which is formalized through a MILP model. Finally, Section 5 will report the experimental results, while Section 6 will show the conclusions and future works.

2 Related works

Since RPO problems are often faced in industrial contexts, Microgrid systems are a particular instance where the resources may be scheduled 24 hours a day to ensure the correct operation of different mechanical artefacts typologies. To this purpose, several RPO model formulations, by considering their goals and characteristics, have been proposed (Yuan et al. 2017; Sedzro et al. 2018; Mohamed et al. 2019). In 2015 A. Khodaei et al. presented a microgrid planning problem decomposed into an investment master problem and an operational sub-problem, respectively. More precisely, the optimal planning decisions, determined in the master problem, are employed in the sub-problem by examining the optimality of the master solution in the worst-case (Khodaei et al. 2015). In 2016 W. Yuan et al. proposed a planning problem to coordinate the resource allocation and minimize the system’s damages in case of natural disasters, respectively. More precisely, a robust optimization-based framework is presented to coordinate the planning of distribution systems by considering uncertain natural disaster occurrences (Yuan et al. 2016). In 2017 A. Khodaei proposed a new class of microgrids called provisional microgrids. More precisely, the microgrid planning model is defined and solved by respectively considering interactions among the provisional microgrid, the coupled microgrid, and the utility grid (Khodaei 2017). Also, R. D. Azevedo et al. proposed a Multiagent-Based control strategy in order to coordinate a Microgrid system as a set of several distinct entities. More precisely, the experimental results have proven the effectiveness of the proposed approach by achieving a total cost increment only of 0.11% compared to a centralized classical Microgrid system (de Azevedo et al. 2017). In 2018 D. Neves et al. investigated an economic dispatch model concerning several optimization goals. More precisely, they have considered four scenarios applied to a real Microgrid system located on Terceira Island (Portugal) by achieving a 1.9% savings on dispatch costs and emissions, respectively (Neves et al. 2018). Finally, in 2019, K. Antoniadou-Plytaria et al. proposed an optimization model related to the optimal energy management of grid-connected microgrids with the battery energy storage systems. More precisely, they have considered a Microgrid system located at the Chalmers University of Technology (Sweden), and the achieved results, derived by a CPLEX solver, have shown a costs reduction of 4% compared to those that effectively considered (Antoniadou-Plytaria et al. 2019).

Additionally, several approaches, like the Swarm Intelligence (SI) based solutions, have been explored to solve RPO problems. SI approaches, in fact, consist of a population of simple agents that interact locally with each other and their environment. The inspiration often comes from nature, and examples of swarm intelligence in natural systems include ant colonies, bird flocking, hawks hunting, animal herding, and bacterial growth. In 2015 L. Zuo et al. proposed a multi-objective optimization scheduling method based on the Ant Colony algorithm in a Cloud Computing environment. More precisely, the investigated optimization method is solved in accordance with the user’s budget costs by using an improved version of the Ant Colony algorithm (Zuo et al. 2015). In 2019 L. LI et al. proposed a new particle swarm optimization algorithm to obtain an adaptive resource scheduling for multi-objective problem. In particular, it is translated into a set of sub-problems by using a proposed hybrid decomposition approach (Li et al. 2019). In 2020, L. Zhang et al. proposed an adaptive strategy for the Microgrids to optimize the droop control through particle swarm optimization. More precisely, a new fuzzy inference system is presented to adjust the learning factor and inertia weight of the employed algorithm, respectively (Zhang et al. 2020). However, since SI formulations are based on complex mathematics structures, they are rarely implemented.

On the other hand, since Artificial Intelligence-based approaches have been increasingly used in many scientific fields, they have also proven to be a valid alternative to classical mathematical-based solutions (Elia et al. 2020; D’Angelo and Rampone 2015; D’Angelo et al. 2020). More precisely, in 2017, two contributions, respectively related to an energetic environment and an optimization problem, have been presented. Firstly, A. Tesfaye et al. proposed a new wind power forecasting method based on the combination of measured data from SCADA and an Artificial Neural Network (ANN) model. The achieved results by the employed fully-connected neural network have shown an average accuracy of 86% (Eseye et al. 2016). Secondly, G. Villarrubia et al. proposed the use of ANNs to approximate the objective function in the optimization problems by using non-linear regression. More precisely, the authors have proposed several experiments to minimize or maximize different objective functions by achieving an average accuracy of 97%.

Therefore, due to the issues related to MILP formulations and thanks to the great success of AI-based methods, we investigate the use of multiple ANNs as an alternative approach for solving a RPO problem related to a real Microgrid system.

3 Background

One of the main abilities of Artificial Intelligence (AI) based approaches, like Machine Learning, Deep Learning, and Data Mining-based consists of learning complex patterns, features, and relationships from numerous amounts of data. More precisely, they represent the basis for the application of AI in knowledge discovery processes. Generally, an AI-based approach tries to learn information by imitating the actions of an expert, like a child who imitates the actions of an adult (D’Angelo et al. 2020). Consequently, the comparison between the goal to be achieved and the outcome derived by the machine state represents the core of the learning process of intelligent machines, which have proven to be effective in many research areas (D’Angelo and Palmieri 2020; D’Angelo et al. 2019, 2019a).

However, the training process of an ML or DL based model can be adversely affected by several issues, like the presence of overfitting/underfitting and unbalanced datasets. Therefore, since we investigate the effectiveness of multiple ANNs to achieve resource scheduling for a Microgrid system, we report some background concepts that have been used to face the following problems and improve the achieved results.

3.1 Dropout

Since deep ANNs are highly flexible models, overfitting is an issue that can often arise when training them. For this reason, it is often reduced through the usage of several Regularization techniques, which try to reduce the model’s variance, and consequently, obtain a model able to extract as many relevant features as possible (Bhagwat et al. 2019).

One of the most famous Regularization techniques is the Dropout, which works by randomly removing nodes during the training phase. More precisely, Dropout sets up a probability value for each node to determine its chance to be included in the training at each iteration of the learning algorithm. It means that some nodes are not considered in the parameter-updating process, and consequently, the Backpropagation (BT) algorithm can compute the derivatives on a smaller network (Bhagwat et al. 2019).

In this study, due to the unbalanced nature of the employed data, we use several Dropout layers and different probability values to reduce overfitting for each employed ANNs and obtain the possible best results.

3.2 Weighted classification

The Real-world scenarios are often described by unbalanced data that do not present equally distributed classes. In most cases, they can adversely affect the training process through the presence of overfitting. For this reason, when an employed dataset is unbalanced, more guidelines suggest trying different solutions, like the Weighted classification (Xu et al. 2020).

More precisely, unlike the other Resampling techniques (Brownlee 2017; Vidhya 2017; Ghorbani and Ghousi 2020), it tries to adapt ML and DL models by considering the frequency of each output class. Therefore, it is possible to set a frequency-weight value for each class without modifying the dataset structure and limiting the model generalization (Hashemi and Karimi 2018; Xu et al. 2020).

Therefore, due to the unbalanced nature of the employed data, we combine the weighted classification and the Dropout in order to adapt each proposed ANN, reduce the overfitting, and improve the achieved results, respectively.

3.3 K-Fold cross-validation

One of the best practices to evaluate ML and DL models is to consider several partitions of the employed dataset instead of dividing it into two mutually exclusive subsets. To this purpose, an employed evaluation technique is the K-Fold cross-validation algorithm that splits the considered dataset randomly into k approximately equal-size subsets or folds. More precisely, in the beginning, the first fold is used as a test set, and the model is trained on the remaining k - 1 folds. Then, a different fold is used as the test set, while the remaining k - 1 folds are employed as the training set. In practice, the K-Fold cross-validation algorithm is usually performed k = 5 or k = 10 times because they are the recommended values to achieve a good model validation (Bhagwat et al. 2019).

Therefore, in order to evaluate and validate each ANN on as many training and testing set instances as possible, we use the 70/30 criteria and K-Fold cross-validation algorithm with k=5.

4 Microgrid optimization problem

Since Microgrid systems provide energy for a local community by satisfying several goals like cost optimization and efficiency (Hirsch et al. 2018; Hossain et al. 2014), we chose to address a RPO problem related to the optimal scheduling of different CHP generators in a Microgrid system. More precisely, it can be stated as a Mixed-Integer Linear Programming (MILP) model characterized by a minimum cost function J and subject to several constraints typologies to fulfil, such as interactive, operative, and physical. Consequently, the main goal of the following RPO problem consists of providing a resource plan for employed generators by considering several input parameters and constraints. Input parameters are defined by predictions of the upcoming demand, the energy available, the energy prices, and the production from renewable energy units. Instead, constraints determine the imported/exported energy quantity from/to the grid, when to buy/sell the energy, and how/when to use generators.

Therefore, in order to provide a high-level definition of the following problem, we report the cost function J definition, interaction constraints, and operating conditions related to a possible MILP formulation proposed in Parisio and Glielmo (2011). For the sake of completeness, A. Parisio et al. presented an update of the following model by considering a multi-objective cost function (Parisio and Glielmo 2012). Additionally, it is also possible to find a high amount of other Microgrids model extensions in literature. For instance, in 2016, L. Bolivar et al. considered a weighted objective function to minimize the operative costs and environmental impacts (Bolívar Jaramillo and Weidlich 2016). In 2020, Y. Wu et al. proposed power balance constraints to mitigate the risk of system instability within the scheduling horizon under uncertainty (Wu et al. 2020). Finally, in 2021, M. Javadi et al. considered new frequency constraints to ensure Microgrids stability grid-connected following islanding events (Javadi et al. 2021).

4.1 Cost function

Since resource optimization, related to a Microgrid system, is achieved by minimizing a cost function, the definitions of main decision variables are reported in order to introduce the function J.

Let k a time instance, T the length of the prediction horizon, and \(N_g\) the number of generators. The state \(\delta \) of the \(i^{th}\) Distributed Generator (DG) and its power level P are defined as follows:

$$\begin{aligned} \delta _{i}(k)= & {} {\left\{ \begin{array}{ll} 1 &{} \text{ if } \text{ the } \text{ i } \text{- } \text{ th } \text{ DG } \text{ is } \text{ on } \\ 0 &{} \text{ otherwise } \end{array}\right. } \end{aligned}$$
(1)
$$\begin{aligned} P_{i}(k)\ge & {} 0 \end{aligned}$$
(2)

with \(i = 1,...,N_{g}\) and \(k = 0,...,T-1\).

Let \(c^{P}\) and \(c^{S}\) the purchasing and selling energy prices of the \(i^{th}\) generator, respectively. The start-up \(SU_i\) and shut-down \(SD_i\) costs are defined as follows:

$$\begin{aligned} SU_{i}(k)\ge & {} c_{i}^{SU}(k)[\delta _{i}(k) - \delta _{i}(k - 1)], \end{aligned}$$
(3)
$$\begin{aligned} SD_{i}(k)\ge & {} c_{i}^{SD}(k)[\delta _{i}(k - 1) - \delta _{i}(k)], \end{aligned}$$
(4)
$$\begin{aligned} SU_{i}(k)\ge & {} 0, \end{aligned}$$
(5)
$$\begin{aligned} SD_{i}(k)\ge & {} 0, \end{aligned}$$
(6)

with \(i = 1,...,N_{g}\) and \(k = 0,...,T-1\).

Hence, the minimum cost function J including costs associated with energy production, start-up and shut-down decisions, and possible earnings and curtailment penalties, is defined as follows:

$$\begin{aligned}&J := \sum _{k=0}^{T-1} \sum _{i=1}^{N_g}[C_{i}^{DG}(P_{i}(k)) + OM_{i}\delta _{i}(k) \nonumber \\&\qquad + SU_{i}(k) + SD_{i}(k)] \nonumber \\&\quad + \ C^{grid}(k) + \rho _{c} \sum _{h=1}^{N_c}\beta _{h}(k)D_{h}^{c}(k) \end{aligned}$$
(7)

with \(i = 1,...,N_{g}\), \(h = 1,...,N_{c}\), and \(k = 0,...,T-1\)

Therefore, according to the quadratic cost function J and parameters shown in Table 1, the following decisions could be taken by a Microgrid system:

  • when each generation unit should be started and stopped, and how much each unit should generate to meet this load at minimum cost. The cost of these choices is considered, at each instant i, by the following sum: \(\sum _{i=1}^{N_g}[C_{i}^{DG}(P_{i}(k)) + OM_{i}\delta _{i}(k) + SU_{i}(k) + SD_{i}(k)]\).

  • when and how much energy should be purchased or sold to the main grid. This cost is represented by \(C^{grid}(k)\).

  • when and which controllable loads must be shed/curtailed. The cost of these choices is considered, at each instant h, by the following sum: \(\rho _{c} \sum _{h=1}^{N_c}\beta _{h}(k)D_{h}^{c}(k)\).

Table 1 Parameters

4.2 Interaction constraints

Generally, each Microgrid system has got a special mode called grid-connected mode, which is able to purchase and sell energy continuously and take many high-level decisions (Jiang et al. 2013; Das et al. 2017; Joseph and Thomas 2013). When the grid-connected mode is on, a Microgrid system can purchase energy from the main grid by respecting the interaction constraints. These constraints play a fundamental role in these systems because they define the rules applied to purchase/sell the energy from/to the main grid. To this purpose, the interaction constraints are defined, at each time instant k, by considering the importing/exporting power level \(P^{g}(k)\) from the main grid (Parisio and Glielmo 2011). More precisely, if \(P^{g}(k)\) is greater than zero the energy is purchased and imported \(\delta ^{g}(k) = 1\), otherwise the energy is sold and exported.

$$\begin{aligned} P^{g}(k)= & {} {\left\{ \begin{array}{ll} > 0 &{} \text{ if } \text{ the } \text{ power } \text{ is } \text{ imported } \\ < 0 &{} \text{ otherwise } \end{array}\right. } \end{aligned}$$
(8)
$$\begin{aligned} \delta ^{g}(k)= & {} {\left\{ \begin{array}{ll} 1 &{} \text{ if } \text{ the } \text{ importing } \text{ mode } \text{ is } \text{ on } \\ 0 &{} \text{ otherwise } \end{array}\right. } \end{aligned}$$
(9)
$$\begin{aligned}&P^{g}(k) > 0 \iff \delta ^{g}(k) = 1 \end{aligned}$$
(10)

with \(k = 0,...,T-1\).

Consequently, let \(c^{P}\) and \(c^{S}\) the purchasing and selling energy prices, respectively. The cost of the imported/exported energy \(C^{g}\) is defined as follows:

$$\begin{aligned} C^{g}(k) = {\left\{ \begin{array}{ll} c^{P}(k)P^{g}(k) &{} \text{ if } \delta ^{g}(k) = 1 \\ c^{S}(k)P^{g}(k) &{} \text{ otherwise } \end{array}\right. } \end{aligned}$$
(11)

with \(k = 0,...,T-1\).

4.3 Operating conditions

On the other hand, since a Microgrid system consists of several generators, several rules are established in order to define how and when each generator should be used. To this purpose, we define the minimum amount of time for which a generator must be kept on/off (minimum up/down times):

$$\begin{aligned} elta_{i}(k)\ge & {} \delta _{i}(k - t_{up} - 1) - \delta _{i}(k - t_{up} - 2), \end{aligned}$$
(12)
$$\begin{aligned} 1 - \delta _{i}(k)\ge & {} \delta _{i}(k - t_{down} - 2) - \delta _{i}(k - t_{down} - 1), \end{aligned}$$
(13)

with \(i = 1,...,N_{g}\), \(k = 0,...,T-1\), \(t_{up} = 0,...,min(T_{i}^{up},k - T_{i}^{up} + 2)\), and \(t_{down} = 0,...,min(T_{i}^{up},k - T_{i}^{up} + 2)\).

Finally, since the following discussion has been done to provide a high-level definition of the proposed problem, we remand to Parisio and Glielmo (2011) for more details about the already discussed operating conditions, the definitions of physical constraints, and other theoretical assumptions about the considered problem.

4.4 Problem instance

In the considered scenario, the resources to be scheduled are five special generators named Jenbacher, Caterpillar, and three Chiller, respectively (Innio 2020; Caterpillar 2020). As reported in Table 2, each generator has got five working modes employed to fill the energy demand. In this instance, Jenbacher and Caterpillar generators are able to produce both electricity and thermal energy, while the remaining three Chiller generators can only provide thermal energy. Therefore, the request to satisfy, at each hour, is formulated on the basis of the following three parameters: the required electricity quantitative, the required thermal quantitative, and the available quantitative of electricity. More precisely, the required electricity and thermal quantitative represent the upcoming demand that is satisfied according to the presented interaction constraints and operating conditions. Instead, the available electricity quantitative is obtained from renewable energy sources and used to minimize the required costs. For this reason, according to the proposed scenario, the goal of this RPO problem is to compute an optimal generators’ scheduling by satisfying the hourly energetic request and minimizing the cost function J simultaneously.

Table 2 Parameters of each employed generators
Fig. 1
figure 1

Boxplot representation of input data distribution

Fig. 2
figure 2

Distribution of electricity output categories of Jenbacher and Caterpillar generators

5 Experimental results

The goal of the reported experiment is devoted to demonstrating the effectiveness of multiple ANNs in obtaining a resource plan by considering only the input demands and the corresponding output schedules. To this purpose, the reported experimental results have been carried out by using data extracted by a real scenario related to the presented Microgrid system.

5.1 Dataset and experimental setting

The dataset used during the experimental evaluation consists of historical data that have been scheduled, for two years and for the described 5 generators, from a Matlab solver based on the presented MILP formulation (Parisio and Glielmo 2011) and develop by Italdata (S.p.A. Italdata 2020). In particular, more than 17000 rows have been stored as a sequence of 12 different values. The first 5 values represent the input demand, and each value means the available quantitative of electricity, the required electricity quantitative, the required thermal quantitative, the hour of the day, and the day, respectively. Instead, the last 7 respectively represent the scheduled power mode for each generator, which are reported by Table 2.

Therefore, since the employed dataset consists of many data and several input/output values typologies, an Exploratory Data Analysis (EDA) technique has been used in order to obtain a complete overview of them (Weng 2020; Prabhu 2020). This approach is able to analyze a dataset, and consequently, summarize its main characteristics by using several charts, as they are shown in Figures 1, 2, 3, and 4.

Fig. 3
figure 3

Distribution of thermal output categories of Jenbacher and Caterpillar generators

Fig. 4
figure 4

Distribution of thermal output categories of Chiller generators

In Figure 1 is show the boxplot representation of input data distribution. In particular, five boxplots are reported for each input data, respectively. The first one shows the data distribution of available energy. The second and third one report the distribution of the required electricity and thermal quantitative, respectively. Finally, the last two show the hours and days when each demand has been satisfied. Instead, Figures 2, 3, and 4 show the frequencies distributions of output modes that have been scheduled for each generator. More precisely, in the first one is shown the distribution of electricity output categories of Jenbacher and Caterpillar generators, in the second one is shown the distribution of thermal output categories of Jenbacher and Caterpillar generators, while in the third one is shown the distribution of thermal output categories of Chiller generators. Additionally, due following charts, it is possible to highlight the nature extremely unbalanced of the employed dataset and, for each involved generator, a finite number of output power categories.

Subsequently, we have split the following dataset in order to run the experiments. Therefore, the whole dataset has been subdivided into two mutually exclusive subsets called training and testing dataset, respectively. We used 70% of the entire dataset for training and the remaining 30% for testing. Then, the K-Fold cross-validation algorithm, with k=5, has been used to tune the hyper-parameters and provide an unbiased evaluation of each employed ANN. More precisely, we used k=5 because it is a recommended value (Bhagwat et al. 2019), and consequently, the entire dataset has been, in turn, equally partitioned in five training and testing sets. Finally, each ANN has been trained on each training set and evaluated on the corresponding testing set.

Fig. 5
figure 5

High-level architecture of each network

5.2 Proposed networks and evaluation metrics

Since Jenbacher and Caterpillar generators are able to produce both electricity and thermal energy, they have got two working modes to satisfy the required electricity and thermal quantitative, respectively. To this purpose, seven ANNs, one for each work mode presented earlier in Subsect. 4.4, have been developed as a fully-connected neural network composed of two Dense layers with 800 neurons, activation=relu, and Dropout=0.5. Additionally, each network had a Dense output layer with activation=softmax and a number of nodes equal to the number of output categories expected. We have used a softmax activation function to achieve a probability distribution concerning each output schedule category, given a specified input request (Keras 2021).

The proposed architecture has been derived from training and testing processes in order to obtain the best results as possible. More precisely, we tested different hyper-parameters like:

  • numLayers: the number of layers for each neural network (2, 3, 4, 5);

  • numNeurons: the number of neurons for each layer (100, 200, 400, 800);

  • dropout: different values of dropout (0.2, 0.3, 0.4, 0.5);

  • activFunction: the typologies of activation functions (relu, tanh, sigmoid, and softmax);

  • batchSize: different values of batch_size (32, 64, 128, 256);

  • optimizer: the optimized algorithm used (Stochastic Gradient Descent - SGD, Adam, and Adamax);

  • lossFunction: the typologies of loss functions (Mean Absolute Error - MAE, Mean Squared Error - MSE, and categorical_crossentropy);

Figure 5 shows the high-level architecture of the proposed networks, while Table 3 summarizes their main information.

Table 3 Description of each network

To evaluate the classification quality of each employed network, we have derived the following metrics from the multi-class confusion matrix: Accuracy (Acc.), Sensitivity (Sens.), Specificity (Spec.), Precision (Prec.), Area Under the ROC Curve (AUC), and F-Measure (F-Meas or F-Score). More precisely, for each output mode, TPs (True Positives) are input demands correctly scheduled, while TNs (True Negatives) are instances correctly assigned to another output mode. On the other hand, FPs (False Positives) are input demands incorrectly scheduled with the considered output mode, while FNs (False Negatives) are the instances in another category incorrectly assigned to the considered output mode. Finally, we have derived the average values (Avg.) and standard deviation values (Dev.) in order to obtain a global validation.

5.3 Achieved results and discussion

The proposed ANNs have been trained and tested with a PC-Laptop equipped with an Intel 4-Core I5-8265U CPU @ 1.60GHz, and 8 GB RAM. Each employed neural network has been compiled with Adam optimizer and categorical_crossentropy loss function, which computes the cross-entropy loss between the labels and the derived predictions (Keras 2021a). Then, they have been trained with \(batch\_size=256\), weighted classification technique, and 1000 epochs by using the 70/30 criteria and the K-Fold cross-validation algorithm with k=5. The following hyper-parameters have been chosen according to the achieved results from the testing process. Table 4 reports the evaluation metrics for each ANN. Table 5 shows the average values (Avg.) and standard deviation values (Dev.) of each evaluation metric. Also, Tables 6, 7, 8, 9, 10, 11, and 12 respectively report the multi-label confusion matrix of each ANN, while Table 13 summarizes the validation results obtained by performing the K-Fold cross-validation algorithm with k=5. Finally, Figures 6 and 7 show the loss function behaviour of each employed ANN, which has been acquired on training and testing data.

Table 4 Evaluation metrics for each network
Table 5 Average and deviation values for each metric
Table 6 Multi-label confusion matrix related to Net 1
Table 7 Multi-label confusion matrix related to Net 2
Table 8 Multi-label confusion matrix related to Net 3
Table 9 Multi-label confusion matrix related to Net 4
Table 10 Multi-label confusion matrix related to Net 5
Table 11 Multi-label confusion matrix related to Net 6
Table 12 Multi-label confusion matrix related to Net 7
Table 13 K-Fold cross-validation global avarage results
Fig. 6
figure 6

Loss function behaviour related to Net 1, Net 2, Net 3, and Net 4

Fig. 7
figure 7

Loss function behaviour related to Net 5, Net 6, and Net 7

In order to show the effectiveness of the employed ANNs, a comparison between the most notable ML-based approaches has been made by using WEKA (Tempo 2020). More precisely, we used Multi-Layer Perceptron (MLP) classifier, J48 trees (J48), and Naive Bayes (NB) to derive the classification metrics for each work mode identified by an Alias in Table 3. The achieved results are shown in Tables 14, 15, and 16, while Table 17 summarizes the comparison among the proposed ANNs and the ML-based methods related to the 70/30 criteria.

They show that the NB and MLP classifiers have achieved discrete results, while J48 has obtained excellent evaluation metrics comparable with those carried out by the proposed approach. More precisely, the employed ANNs have achieved the best results in solving of considered RPO problem by achieving up to a 6% improvement in average accuracy over the Naive Bayes classifier, up to a 12% over the Multi-Layer Perceptron classifier, and up to a 13% over state-of-the-art ANN for power forecasting. Finally, their evaluation metrics (like Precision, Sensibility, and F-Score) are slightly higher than those obtained from the J48 decision trees, as already observed in several studies where neural networks have been compared with decision trees (Tharaha and Rashika 2017; Karakurt et al. 2013; Ahmad et al. 2017).

On the other hand, the achieved results have also been confirmed by those derived by the K-Fold cross-validation algorithm applied, to each WEKA method and each neural network, during the hyper-parameters tuning process.

As shown in Table 18, the employed ANNs have achieved up to a 6% and 12% improvement in average accuracy over Naive Bayes and Multi-Layer Perceptron classifiers, respectively. However, since they have obtained a 3% less average accuracy than the J48 trees and their metrics are comparable to those derived by the same decision trees by using the 70/30 criteria, ANNs do not represent the only valid approach to solve the discussed problem satisfactorily. For this reason, ANNs and decision trees might also be employed to face many other RPO problems, respectively.

6 Conclusions and future works

In this paper, multiple Artificial Neural Networks (ANNs) have been employed for solving a Resource Planning Optimization (RPO) problem related to the scheduling of different Combined Heat & Power (CHP) generators for a Microgrid system. To this purpose, we have defined it as a Mixed-Integer Linear Programming (MILP) model characterized by a minimum cost function. Subsequently, we have involved seven ANNs by considering only the input demands and the corresponding output schedules. More precisely, each neural network has been validated through statistic metrics and compared to the most famous Machine Learning approaches provided by WEKA. The obtained results show that the proposed ANNs have achieved up to a 6% improvement in average accuracy over Naive Bayes classifier, up to a 12% over Multi-Layer Perceptron classifier, and up to a 13% over state-of-the-art ANNs, and consequently, could be an alternative approach to solve the considered problem.

Table 14 Metrics for MLP classifier
Table 15 Metrics for J48 classifier
Table 16 Metrics for Naive Bayes classifier
Table 17 Comparison with most notable ML methods related to 70/30 criteria
Table 18 Comparison with most notable ML methods related to K-Fold

For this reason, we would like to propose two possible future works. First of all, in order to explore the effectiveness of the employed approach, we will explore new RPO problems by considering several and critical scenarios. Second, since a specific ANN has been employed for each scheduled mode, we will investigate a dedicated neural network for each generator. For instance, Recurrent Neural Networks (RNNs) might be involved to consider temporal features, like the hour of the day. Moreover, Autoencoders (AEs) might be investigated to extract relevant features that are able to reduce the number of employed networks.