Enhancing the performance of smart electrical grids using data mining and fuzzy inference engine

This paper is about enhancing the smart grid by proposing a new hybrid feature-selection method called feature selection-based ranking (FSBR). In general, feature selection is to exclude non-promising features out from the collected data at Fog. This could be achieved using filter methods, wrapper methods, or a hybrid. Our proposed method consists of two phases: filter and wrapper phases. In the filter phase, the whole data go through different ranking techniques (i.e., relative weight ranking, effectiveness ranking, and information gain ranking) The results of these ranks are sent to a fuzzy inference engine to generate the final ranks. In the wrapper phase, data is being selected based on the final ranks and passed on three different classifiers (i.e., Naive Bayes, Support Vector Machine, and neural network) to select the best set of the features based on the performance of the classifiers. This process can enhance the smart grid by reducing the amount of data being sent to the cloud, decreasing computation time, and decreasing data complexity. Thus, the FSBR methodology enables the user load forecasting (ULF) to take a fast decision, the fast reaction in short-term load forecasting, and to provide a high prediction accuracy. The authors explain the suggested approach via numerical examples. Two datasets are used in the applied experiments. The first dataset reported that the proposed method was compared with six other methods, and the proposed method was represented the best accuracy of 91%. The second data set, the generalization data set, reported 90% accuracy of the proposed method compared to fourteen different methods.

FS is implemented in many tasks in diverse fields including Machine Learning (ML) [34], Pattern Recognition (PR), Image Processing (IP), and multimedia. FS is a process concerning selecting relevant and informative features, so, redundant information can be avoided and ignored [1]. It majorly focuses on selecting a subset of features from the input dataset, which could effectively describe the input dataset. FS can significantly minimize the detrimental effects of noise and irrelevant characteristics on data [13,22]. Some of the dependent features may supply no additional information. In other words, the majority of the critical information could be achieved via a few unique features that provide class discriminative information. As a result, removing the dependent features in some cases that do not correlate with the classes is essential.
There are two mainstream categories for FS and the associated taxonomy are illustrated in Table 1. They are label information and search strategy FS algorithms can be categorized concerning the search strategy into three categories (1) filter, (2) wrapper, or (3) hybrid. Filter methods depend on applying different statistical tests on each feature and ranking the features based on the score [22] and selecting the subset of features as a pre-processing step before a classification [62]. Wrapper methods select a set of features and pass them to a classifier to check the accuracy, and repeat the same process with different sets of features until the maximum accuracy is reached [10]. They also use a learning algorithm to evaluate the subsets of features according to their predictive power and accuracy [18]. A hybrid method is a mixture of filter and wrapper methods. First, features are ranked using filter methods, and then only the top scores' features are passed to the wrapper [22].
Moreover, the FS methods can be categorized into three types according to the class label information (1) supervised FS, (2) unsupervised FS, and (3) semi-supervised FS. Supervised FS methods employ labeled data for feature selection and measure the correlation of the features with the class label to determine the feature significance. To evaluate feature relevance, semi-supervised FS algorithms leverage label information from labeled data as well as the data distribution structure of both labeled and unlabeled data. Unsupervised FS methods assess feature relevance by the capability of keeping specific attributes of the data [27].
The objective of the current study is to filter the collected data selecting only the effective features of the collected data from Fogs to the cloud for the next load prediction phase. Where a perfect feature selection methodology not only improves the model prediction efficiency but also speeds up the forecasting process by considering fewer features. The main contribution of this paper is to present a new effective hybrid feature selection technique named FSBR. Figure 1 represents the framework for the proposed method which contains two phases. The first phase is the ranking phase. The features go through three different ranking methods: (1) relative weight ranking (R FRW ), (2) effectiveness ranking(R FE ), and (3) information gain ranking(R IG ). Then, the three ranks are passed to a fuzzy inference system to generate the final ranks based on a set of rules. The second phase is the wrapper phase, different sets of topranked features are passed through three different classifiers: (1) Naïve Bayes, (2) Support Vector Machine (SVM), and Neural Network (NN), only one set of features is chosen based on the performances of the three classifiers.
The contributions of the current study can be summarized as follows: & Proposing a hybrid technique in the feature selection field called FSBR. & FSBR is to present two phases [feature ranking phase, and feature selection phase] & feature ranking phase is to present three different ranking methods[relative weight ranking, effectiveness ranking, and information gain ranking]. Cons.

Examples
Label Information Supervised High performance, flexibility, reliability, and accurate Expensive, Need huge data to get high performance and a long time to implement Self-paced regularizer (SP-regularizer) terms [16] and deep Neural network (NeuralFS) [24] Semi-Supervised Use both labeled and unlabeled data, simple, and high efficiency results are not stable and have low accuracy Deep learning features [21] and local preserving logistic I-Relief [50] Unsupervised Less complexity, minimized error, easy, and fast to be implemented Chi-square [7] Information gain [47,55] Ranking-based feature selection [9] Wrapper Simple, interaction with the classifier, and models feature dependencies Risk of overfitting, classifier dependent selection, and computationally intensive Crow search algorithm [44] and Grasshopper Optimization Algorithm [30] Hybrid Merges between filter and wrapper methods, implemented by algorithms and gets high results Classifier dependent selection Weighted Gini index [28] and Enhanced K-Nearest Neighbor [45] & feature selection phase Combining different machine learning classification algorithms(Naive Bayes, Support Vector Machine, and neural network). & Apply the proposed method to two types of datasets. & Comparing the current study with a set of state-of-the-art studies.

Paper organization
This paper is organized as follows: Section 2 discusses the previous efforts concerning the feature selection strategy. Section 3 presents the proposed user load forecasting strategy. Section 4 shows the experimental results. The conclusions and future work are discussed in Section 5.

Literature review
Initially, this section introduces a set of the previous efforts in the field of FS generally. Then, it introduces the previous efforts in the smart grid system and FS methods. Currently, there are many related works related to the concept of fog computing that discusses the difference between cloud computing [37] and edge computing technologies, its applications, emerging key technologies (i.e., communication and storage technologies), and various challenges involved in fog technology [17,48].
Bellavista et al. [8] presented a survey on fog computing for the IoT. They illustrated the architecture of fog and some of the applications based on fog. Javadzadeh et al. [25] provided a systematic survey with a different analytical evaluation of fog computing applications in smart cities. They presented a differential approach of evolution that incorporates filter and wrapper methods into an enhanced local knowledge computational search process that is based on fuzziness principles to cope with both continuous and discrete data sets [22]. Another study by Mafarja et al. proposed an approach to solve problems in the FS using two incremental hill-climbing techniques (i.e., quick reduce and CEBARKCC). They are hybridized with the binary ant lion optimizer in a Data pre-processing layer Fig. 1 The framework for the proposed method feature selection-based ranking model called HBALO [29]. Sayed et al. [44] suggested a metaheuristic optimizer, namely a chaotic crow search algorithm, to find an optimal feature subset that maximizes the classification performance and minimizes the number of selected features. Mafarja et al. [30] presented a binary grasshopper optimization algorithm for FS problems. Binary variants were recommended and used to select the best feature subset within a wrapperbased system for classification purposes. Cilia et al. [9] presented a ranking-based approach related to the FS for handwritten character recognition. Zhu et al. [63] suggested a supervised FS algorithm to simultaneously preserve the local structure (i.e., through adaptive structure learning in a lowdimensional feature space of the original data) and the global structure (i.e., through a low-rank constraint) of the data at the same time. Bassine et al. [7] proposed an improved Arabic text classification system that used the Chi-square FS, called ImpCHI, to improve the efficiency of the classification. Verma et al. [53] applied a new hybrid approach using three FS techniques Chi-Square, Information Gain, and Principle Component Analysis, and then merged them to select the best available subset of the collected data for skin disease [60].
Ahmed et al. [3] suggested a supervised machine learning-based approach to detect a covert cyber deception assault in the state estimate with a genetic algorithm-based feature selection to improve detection accuracy. Hafeez et al. [20] provided two modules: FS like random forests and Relief-F. They were merged to create a hybrid FS algorithm to reduce the. Ahmad et al. [2] presented an artificial neural network-based day-ahead load forecasting model for smart grids, with is made up of three modules; the data preparation module, FS, and the forecast module. The data preparation module made the historical load curve compatible with the FS module to predict the future load based on the selected features. Niu et al. [35] developed a practical machine learning model based on FS using binary-valued cooperation search algorithm and parameter optimization for short-term load prediction using support vector machine. The related studies are summarized in Table 2 according to the publication year in ascending order.
Based on what was mentioned in this section, there have been researches that work only on the filtering method, while others work on the wrapper. In addition, more than one other way to feature selection was mentioned, but it had limitations such as the time, as well as computationally complex. In our proposed method, we collected the filtering method and the cover, and this gave a higher accuracy when implementing and compared to other methods, and we used easy-to-implement equations. In addition, to using equations that are easy to implement in our method.

The suggested feature selection strategy
An efficient forecasting model makes acceptable use of the electric loads-based data with all characteristics and also reduces its dimensionality of it. Load forecasting is classified into three types based on the time intervals. First is the short-term load forecasting that forecasts the load from a period of 24 h to one week. Short-term load forecasting makes great progress recently concerning the large data collected from smart meters [40]. The second is Medium-Term Load Forecasting, which anticipates load for a week to a year. The third is Long Term Load Forecasting, which forecasts the load from a period of one year to more than two years. With the presence of fog and the development of the centralized computing topology, we can train the load forecasting models and forecast the workloads to distributed smart meters so that consumers' raw data is handled locally and forwarded. The data to a central cloud [46]. The suggested forecasting model [59] is built in an architectural manner that consists of three tiers. The first tier contains the IoT devices such as sensors, smart meters, monitoring systems, wireless communication devices, and demand response [6,26]. The data is collected and sent to the fog computing layer, where it is the second tire. Fog is responsible to take data from devices such as smart meters. In the fog layer, two operations are carried out, (1) a preprocessing layer and (2) a short-term prediction model layer as shown in Fig. 2. The third tire is the cloud that contains a set of integrated data that can be used in long-term prediction Thus, we can obtain an improved electrical grid for accurate load prediction [38].
In this paper, we focus on the fog first layer. In summary, the current study works in the data pre-processing layer using the suggested FSBR approach. In this section, we will go through the proposed feature selector in detail. It consists of two phases. First, the feature ranking phase (filter methods) using the proposed methods [relative weight ranking, effectiveness ranking, and information gain ranking]. Second, the feature selection phase (wrapper method) uses the different machine learning classification [Naive Bayes, Support Vector  Fig. 3, data is collected from smart devices and sent to fog to start the execution of the feature selection algorithm.

Feature ranking phase (FRP)
FSBR starts with the ranking phase using three different filter methods (1) Feature Relative Weight Ranking (R FRW ), (2) Feature Effectiveness Ranking (R FE ), and (3) information gain (IG) [49]. Each of the previous methods produces a different ranking for the priority and importance of the feature; so to have only one ranking for the feature, a fuzzy inference system is used. Fuzzy will take these ranks in and produce a final ranking from the three methods to give us the best features in order. so it can be used in the second phase. The feature selection  phase is about determining the best set of top-ranked features. This is done by determining different sets of the top-ranked features and passing them to a classifier. Based on the results of the classifier, the best set is selected. This is represented graphically in Fig. 3.

Feature relative weight ranking (R FRW )
The first-ranking method depends on the greatest feature's impact on the output. Where the number of unique states in each feature affects the priority of the feature and the bond's strength of output. The bond between the inputs' features and output is a true metric to measure the importance of each feature concerning the strength of the bond. This reflects the greatest feature's impact on the output. To achieve this, we followed an assumption that, each data column including input and output, consists of a finite number of unique states. Assume a feature F i consists of n unique states as follows: S 1 , S 2 , …, S n , and the output consists of m unique states as follows: where n and m are finite numbers. S 1 is repeated k times inside the feature's column, which means the corresponding output, could be any state of output's states and the same applies for other states. To measure the bond's strength, we have to calculate the probability of the different output states occurring inside S 1 on its own and then for S 2 , S 3 , and so on. The formulas used for this method are presented in Eqs. (1) and (2). where i is the feature's index, j is the output's state's index, P(S n , O j ) is the probability of the output O j to occur when the input's state is S n . where R FRW (F i ) is the feature relative weight ranking for feature F i . The R FRW is calculation is presented in Algorithm 1 (Fig. 4). To simplify the illustration of this method, a sample dataset (Table 3) Table 4. illustrates how to execute the R FRW method on Table 5.

Feature effectiveness ranking
The second-ranking method is driven by the popular Naïve Bayes classifier that has presented in Eq. (3).
The newly proposed ranking technique inherits the same formula but instead of applying it once, using only one state of each feature to get the probability of output, we used it on each feature's state to get The effect of the feature over other features. Where P(O | F) is the probability of target O to happen if given attribute F, P(F | O) is the conditional probability of F given O, P(O) is the probability of the class, and P(F) is the probability of the attribute. It is used as a classifier by giving attributes as input and getting a probability as an output. The higher probability we get, the higher chance the output to occur. We have driven a new ranking technique using this probability based on the different states of each feature as mentioned in the first ranking method. The newly proposed ranking technique inherits the same previous formula but instead of applying it once, using only one state of each feature to get the probability of output, we used it on each feature's state to get the probability of each output's state as presented in Eq. (4).
is the probability to get output state O j if given feature's state S i , n is the number of feature's states, and O j is the required output's state. Then the following formula is used to get a final rank that represents the whole feature's column as presented in Eq. (5).
where R FE (F i ) is the feature effectiveness of the feature's column F i , and P(O 1 | F i ) is the probability to get output state O 1 if given feature F i . Rank Feature Effectiveness is illustrated in The same is applied for other features, the results are reported in Table B F Algorithm 2 (Fig. 5). To simplify the previous formulas and the ranking technique, the dataset shown in Table 3 is used a gain to simplify the illustration as show in Table 6.

Information gain
Information gain [15] is used as a third-ranking method with the previous two methods where it measures the mutual dependence between two variables such as the dependence between an input's feature and the output. From our context, the higher information gain, the strongest relation between input's feature and the output. Hence, we aim to find which features have the highest values with the output. Equation 6 shows the used formula to calculate IG.
where E(y) is the target's entropy and E (X | y) is the measure of the entropy of target y given variable X. The term Entropy represents the measure of uncertainty or disorder, which has the formula shown in Eq. 7.
where m is the number of target's classes, P(y i ) is the probability of class y i , and Log 2 (P(y i )) is the logarithmic value of the class's probability. E (X | y) can be calculated from: To measure a feature's IG first, we measure the whole dataset's output entropy. Then, we measure the entropy of the output but for a given input's feature. Finally, we subtract both values to get the final value of the IG. Table 7 shows the values of IG for each feature from the dataset presented in Table 3. Thus, we have obtained the third and final rank of the proposed types of ranks.  Calculate P(F i )

Fuzzy Interface engine
The previous ranking methods will lead to three different ranking tables. Hence, somehow it is necessary to merge them into only one. A fuzzy inference system is the best among alternatives to do that. Fuzzy is based only on simple IF-THEN rules merged with fuzzy logic operators (e.g., AND and OR) to enhance the decision-making which is much similar to humans' reasoning. Fuzzy inference is a process to map the input into an output using fuzzy logic [43]. The previous process goes as follows (1) the input of crisp values is converted into fuzzy quantities, (2) then it goes through the fuzzy rules and fuzzy memberships to generate an output, and (3) the output is in a form of a fuzzy set that needs to be defuzzified to get the output of crisp values once again. The input to the fuzzy system is the crisp values as shown in Table 8. They need to be converted into fuzzy sets using the memberships shown in Fig. 6. To generate the corresponding memberships for the three ranking methods, we need to determine the α, β, and ϒ values as presented in Eqs. 9 to 11 [43].
After calculating the three values of α, β, and ϒ, each value from each ranking is converted into fuzzy input that can be small (S), medium (M), or large (L) by using the Eqs. 12 to 14.
where n is the number of the input values for the methods, value is the ranking of the feature x related to a particular method. Consequently, the fuzzy rule-based input is the fuzzification process' output. A set of rules are considered here in the form; if (X is A) AND (Y is B) THEN (Z is C), where A, B, and C represent input variables (R FRW , R FE , and R IG ) and A, B, and C represent the corresponding linguistic variables (e.g., small, medium, and large). The first part of the rule (before THEN) is called "antecedent". The second part (after THEN) is called "consequent". These input fuzzy sets go through if-then rules to determine the output. In this paper, there are 27 different rules used to determine the output as shown in Table 9. Then the output goes into a defuzzify process to get crisp values back representing the final ranking.
Defuzzification can be applied using different methods such as max-min, max criterion, center-of-gravity (COG), and the mean of maxima [5,39,43]. The max-min depends on choosing a min operator for the conjunction in the premise of the rule as well as for the implication function and the max aggregation operator [5]. Consider a simple case of two elements of evidence per rule, the corresponding rules will be: This yields to:  The COG is the most popular one [5] and this is the used method in the current study. This method is identical to the formula for calculating the center of gravity in physics. The membership function, In our case, is bounded by the weighted average of the membership function or the COG of the area. Defuzzification can be accomplished by the output membership function shown in step 3 in Table 10. Assuming α = 3, β = 6, and ϒ = 9 related to Eqs. 9 to 11. In Algorithm 3 (Fig. 7), the final ranking using the fuzzy interface engine of the fuzzy rank is used to get results of the implemented fuzzy rank (R) method. An example is illustrated in Table 10.

Feature selection phase (FSP)
The results of the previous phase are the ranks of all features ranked from the most effective features to the least ones. This is suitable as we can drop the least effective ones to reduce the computations and complexity, ending with the most effective features but they could be many, for big systems they could be hundreds or thousands. Therefore, the question is; What is the least amount of the most effective features that could be enough to operate as if we had used every feature from the features set? this question will be answered in this section. A wrapper If ≤ 12.
(  Fig. 7 Algorithm 3: final ranking using fuzzy interface engine phase is a tray-and-error phase where we use different classifiers with a different number of top-ranked features to end with the least number of features that could do the same job as the whole features set. In this work, we have used three different classifiers (1) Simple Neural Network, (2) Naïve Bayes, and (3) Support Vector Machine. First, we have to determine the accuracy of each classifier to compare the results with the different combinations of top-ranked features. Then, we determine the average accuracy in each combination, and only one set of these features is chosen based on the highest average accuracy. As an explanation, the dataset described in Table 3 Table 11. Nevertheless, for the sake of simplicity, fewer computations, and faster decision-making; Top 3 is better than Top 5, as only three features could do the same job as the whole features set.
During our work, as a summary, we discussed the first phase that is called the feature ranking phase (FRP) by applying the filter feature selection methods. The final features ranking are the outputs of the fuzzy method. Illustrative calculations are performed on a sample dataset ( Table 3). The second stage is called the feature selection phase (FSP) will apply a wrapper methodology feature selection method to select the best features from x feature ranking according to three classifiers (1) neural network, (2) naive Bayes, and (3) support vector machine. Based on the output fuzzy, the order of features is [Season, Weather, Time, Holidays, Events, Weekday]. First, we will test each classifier on the data multiple times. The first experiment calculates the accuracy from the first feature TOP1 [Season] then calculates the accuracy by adding the following feature TOP2 [Season, Weather]. Thus, We continue to add the features gradually, then find six results in each classifier. With the imposition of the values as shown in Table 11, we find that at the TOP3 and TOP5 features. We have the average accuracy of TOP3 is 89.66% and of TOP5 is 88.33%. When calculating the average accuracy, we found that TOP3 features are the best for sake of simplicity.

Evaluation and results
The current section shows the experiments, reported results, and the corresponding discussions. The experiments are performed on Windows 10 using the Python programming language. The used packages are NumPy, Pandas, Keras, and Scikit-Learn. The environment had an Intel Core i7 processor with a RAM of 6 GB.

Datasets
The current study uses two datasets. The first is the EUNITE dataset while the second is the USPS dataset.

EUNITE dataset
European Network on Intelligent Technologies or EUNITE [15] is a dataset that contains electrical loads for Eastern Slovakian Electricity Corporation during the period between January 1, 1997, and December 31, 1998. It was in the form of four columns that contain daily information as follows (1) date, (2) temperature, (3) holiday, and (4) load. The holiday column represents if the corresponding day meets an annual holiday or not (e.g., Christmas and Easter). The first three columns are the features while the last one (load) is the desired output, but for a complete test and representation of our proposed method, we have added three more columns based on the day's date. These columns are (1) weekday, (2) event, and (3) season. The event represents if there any occasional events happened. The resultant modified dataset consists of 730 samples and seven columns (i.e., six columns as features and one column as the desired output). Samples from the modified dataset are shown in Table 12.
The data in its current form cannot be used, neither for the filter phase nor for the selection phase, because there are different forms for each column and a wide range for numerical columns temperature and load. Hence, it is necessary to go through a preprocessing stage to reform the data in an appropriate numerical form. This stage converts the date column into a number that represents a day in the week. It also converts the holiday, weekday, and events columns into 1 or 0 where 1 is Yes or 0 is No. Finally, it normalizes the temperature and load columns. The temperature range is set to [−1, 1] and the load range is set to [0, 1]. Table 13 shows the results after the preprocessing stage.

USPS dataset
The handwritten digits USPS dataset [23] is a digital dataset created by the United States by scanning envelopes automatically. Postal Service contains 9298 images divided into 7291 images for training and 2007 images for testing which is 80%:20% divisions. There are 256 features. The images here have been deslanted and the size is normalized, resulting in (16,16) grayscale images.
where TP (True Positive) is the number of samples classified positive correctly, TN (True Negative) is the number of samples classified negative correctly, FP (False Positive) is the number of samples wrongly classified positive, and FN (False Negative) is the number of samples wrongly classified negative.

Evaluation of EUNITE
The current subsection shows a comparison between the proposed method and others filters methods (1) CHI square [7], (2) mutual information [7], (3) Feature Importance [56], (4) ACC2 [42], and (5) AACC2 [42]. The results are shown in Table 15. Table 15 shows the top-ranked feature up to top-5. As shown, the top-1 is the same for all methods whereas the other top-k differs from one method to another. To prove that the proposed method accurately ranks the features, each method is tested separately during the second phase. Figures 8, 9, and 10 show the accuracy for each method using the Neural Network, Naïve Bayes, and Support vector machine respectively. The results of this comparison are shown in Table 16 Table 17.
According to the obvious results and Figs. 12, 13, 14, 15, it is noticed that the proposed features selection method represented in FSBR demonstrates the "Accuracy", "Precision", "Recall", and "F1-score". Thus, It has been proven that FSBR is the most efficient methodology. On the other hand, it is found that Most correlated features introduce the worst performance in terms of "Accuracy", "Precision", "Recall", and "F1-score". For Most correlated features and FSBR, the error reaches 22% and 9% respectively. FSBR is better than Most correlated features because FSBR is based on using a hybrid technique that combines filter and Finally, FSBR is better than other techniques to select the best subset of features that is improving the performance of the classifier or the prediction model using only three features.

Generalization using USPS
The USPS dataset is used to prove the applicability of our proposal on a big dataset. Table 18 shows a comparison between our proposed method for feature ranking and selection with different feature selection algorithms in [31]. Comparison criterion is made as follows, the whole number of features go through each feature selection algorithm including the proposed one then the top-ranked features are used to train a Neural Network, and finally, the test results  are used to decide which method is the best. Figure 16 illustrates the result accuracy, precision, and recall of the proposed method and the different feature selection algorithms in Table 18. From that, our proposed method has reported the best result. As shown in Table 18, the least amount of features to gain the maximum accuracy were 10 features, the number is the same for each method but each method chooses its features based

Conclusions and future work
As a recap, the proposed method is a feature selection method based on filter and wrapper techniques. The filter phase consists of three different filters: relative weight ranking, effectiveness ranking, and information gain ranking. In the wrapper phase, we used three different classifiers to select the least amount of top-ranked features (from the previous phase) without affecting the performance. The used were classifiers Neural Network, Naïve Bayes, and Support Vector Machine. Hence, our main contribution was to improve the smart electrical grid by optimizing the data being sent to fog and cloud. However, only important data is selected while any other repetitive and irrelevant data is dropped, maintaining the performance of the system. Therefore, we have proposed a new feature selection method that could successfully choose only important data with proving its correctness by applying it on two   Fig. 16 The result to accuracy, precision, and sensitivity different datasets. EUNITE, which is related to the electrical field, and USPS, which is not related to the electrical field but is about images. In both cases, results were satisfying enough to put our proposed method in a comparison with other methods. Experimental results have shown that the proposed feature selection technique provides more accurate results than the existing methods in terms of accuracy, precision, recall, and F1-score. FSBR provide accuracy, precision, recall, and F1-score values that reached 91%, 0.90%, 0.91%, and 0.90% respectively in EUNITE 90%, 93%, and 88% were the best accuracy, precision, and recall respectively use the USPS dataset. In future work, we will work on the second layer, the user load forecasting strategy, and the possibility of saving on electricity consumption by predicting the load per user in the short-term and long term.
Authorship We confirm that the manuscript has been read and approved by all named authors. We confirm that the order of authors listed in the manuscript has been approved by all named authors.

Authors contribution
We the undersigned declare that this manuscript is original, has not been published before, and is not currently being considered for publication elsewhere. We confirm that the manuscript has been read and approved by all named authors and that there are no other persons who satisfied the criteria for authorship but are not listed. We further confirm that the order of authors listed in the manuscript has been approved by all of us. We understand that the Corresponding Author is the sole contact for the Editorial process. She is responsible for communicating with the other authors about progress, submissions of revisions, and final approval of proofs.
Funding Open access funding provided by The Science, Technology & Innovation Funding Authority (STDF) in cooperation with The Egyptian Knowledge Bank (EKB).

Declarations
Conflict of interest We wish to confirm that, there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.
Intellectual property We confirm that, we have given due consideration to the protection of intellectual property associated with this work and that there are no impediments to publication, including the timing of publication, concerning intellectual property. In so doing we confirm that we have followed the regulations of our institutions concerning intellectual property.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.