Short-term load forecasting method based on fuzzy optimization combined model of load feature recognition

With the continuous development of smart grid construction and the gradual improvement of power market operation mechanisms, the importance of power load forecasting is continually increasing. In this study, a short-term load prediction method based on the fuzzy optimization combined model of load feature recognition was designed to address the problems of weak generalization ability and poor prediction accuracy of the conventional feedforward neural network prediction model. First, the Douglas – Peucker algorithm and fuzzy optimization theory of load feature recognition was analyzed, and the combined prediction model was constructed. Second, data analysis and pre-processing were performed based on the actual historical load data of a certain area and the corresponding meteorological and calendar rule information data. Finally, a practical example was used to test and analyze the short-term load forecasting effect of the fuzzy optimization combined model. The calculation results proved that the presented fuzzy optimization combined model of load feature recognition outperformed the conventional model in terms of computational efficiency and specific performance; therefore, the proposed model supports further development of actual power load prediction.


Introduction
The short-term forecasting of power load is essential in the stable economic operation of power systems [1].As the intelligence degree of power grids continuously increases and the power marketing mechanism gradually improves, the frequency of information interaction between the user and power grid sides increases at a wider range [2].At the grid operation level, major adjustment characteristics of the load side must be acquired at the highest possible accuracy level to guarantee the reliable operation of the system through corresponding dispatching measures.Moreover, the market operation system is more concerned with the electricity consumption characteristics of users to reasonably adjust the generation schedule and enhance competitiveness in the electricity market environment [3][4].Therefore, a power system under the new development background has more detailed requirements for short-term load forecasting, that is, it must not only fulfill the demand of grid production and scheduling but also describe the load characteristics with high accuracy.
The concept of artificial intelligence has been applied in many fields and the technology has been significantly developed; thus, machine learning and deep learning have been widely supported and applied in short-term load forecasting research.Furthermore, machine learning methods, such as neural network and support vector machine (SVM) regression, have been rapidly developed owing to their powerful data mining ability and superiority in solving complex nonlinear problems [5][6][7][8].
In the actual application of typical machine learning methods, such as neural networks and SVM regression, for short-term load forecasting, some problems might be encountered.First, the collected load information data are more intensive in time scale owing to the continuous refining of the level of load information collection device; simultaneously, data sources affecting short-term load forecasting, including meteorological, daily type, and economic and social factors, are gradually increasing.Therefore, the input vector for training the machine learning model of short-term load forecasting is usually extremely large, which reduces the computational efficiency.Second, when the machine learning model trained on the historical load information data is forecasting the output of a short-term load prediction curve, large prediction errors at the characteristic points of the predicted load curve might appear; therefore, prediction errors of the peak and trough of the load curve in a period are large [9].In this study, the load characteristics primarily refer to the optimal load data subset in the original load data set that can reflect the trend law of the original load curve.Load characteristics reflect the key change trend of the load curve in a certain time interval.Indeed, we can gain insight into the internal characteristics of actual load by acquiring load characteristics.Because the load data points near the load characteristics will fluctuate and interfere within a certain numerical range of the load characteristics, the forecasting accuracy of the short-term load forecasting curve characteristics will inevitably be negatively affected.This not only restricts the further improvement of the overall accuracy of short-term load forecasting but also leads to the loss of key information on the short-term load forecasting curve.This is not conducive to the stable economic operation of the power grid and the long-term development of the power market.
In typical engineering applications of machine learning, feature selection methods can be roughly classified into filtering, wrapping, and embedding methods.Feature dimension reduction methods include the principal component and linear discriminant analyses [5,[10][11][12][13].Although the machine learning feature engineering method has its general advantages in application, the interpretability of the finally formed load features is not clear and sufficiently intuitive.The Douglas-Peucker (DP) algorithm, as a classical method in the research of basic curve feature extraction and compression, has the advantages of high computational efficiency and strong visibility; thus, it is appropriate for the extraction and dimension reduction of load curve features.However, the threshold setting of the algorithm has some limitations on the rationality of feature extraction [14].Therefore, this study proposes a short-term load forecasting method based on the fuzzy optimization load feature identification (recognition) combined model.
First, the load curves were analyzed using the fuzzy clustering analysis.The DP algorithm of fuzzy optimization with adaptive threshold adjustment was established to identify and extract the load characteristics of similar load curves with comparable load characteristics.
Second, the fuzzy optimization load feature recognition and machine learning models were combined to construct a combined prediction model, and the idea of classification prediction was integrated to predict the characteristics of future loads; thus, the typical predicted load curve was reconstructed.
Finally, the effectiveness of the proposed method was evaluated and discussed using an actual power system example; the obtained results verified that the proposed fuzzy optimization load feature identification combined model exhibited higher prediction performance and wider application prospects compared to those of the conventional neural network prediction model.

Fuzzy optimization load feature identification method
The machine learning method used in short-term load forecasting is supervised learning, and the process of training needs a considerable number of historical data samples.In addition, the core requirement of load feature extraction is to complete the accurate identification, extraction, and dimension reduction of load features on the premise of preserving the shape features of the original load curve.Fig. 1 shows a schematic of the relationship between the load curve and corresponding load characteristics.An appropriate clustering algorithm should be employed for classification to analyze the historical load curve data samples; therefore, an optimized load feature recognition method is used to construct an accurate adaptive load feature recognition model for each class under the premise of similar characteristics for all types of in-class load curves.The process of fuzzy optimization load feature identification is shown in Fig. 2.

Fuzzy cluster analysis method
The process of The fuzzy c-means (FCM) [15] clustering analysis is described as follows: Step 1: Initialize.Set the total set of data samples as , , , M X X X . Therefore, ( 1,2, , ), 1 Meanwhile, M initial cluster centers are set, denoted as 1 2 ( ( ), ( ), , ( )) Step 2: Calculate the Euclidean distance i c from sample A to the cluster center j x using Eq.(1).
where t is the total number of clustering indicators of data samples.
Step 3: Calculate the membership ji u of sample j x for class i as follows: According to the principle of minimum distance, X is clustered.Suppose that Eq. ( 3) is satisfied.0 1 ( , ( )) min ( , ( )) Step 4: Update the cluster center using Eq. ( 4).

k +
; subsequently, go to Step 2 and execute the process again; otherwise, the FCM clustering ends.

Douglas-Peucker algorithm
The load curves obtained using the FCM clustering exhibit large differences between classes and small differences within classes.Therefore, an appropriate curve feature recognition method is needed to identify and extract all similar load curves in a certain class one by one.
The flowchart of the classical DP algorithm [14] can be summarized as follows: Step 1: Connect the first and last two points of the target curve with a straight line and find the vertical distance between the other points on the current target curve and the straight line.
Step 2: Set the threshold value for the DP algorithm and select the maximum vertical distance calculated in Step 1 to compare with the threshold value.If the value is greater than the threshold, the data points corresponding to the maximum vertical distance of the line are retained.Otherwise, all data points between the two ends of the line are discarded.
Step 3: Based on the retained data points, the target curve is divided into two parts for processing, and each part is treated as a new target curve.Steps 1 and 2 are repeated, and the iteration is repeated based on the idea of dichotomy, that is, the maximum vertical distance is still selected to compare with the threshold value, and the selection is successively determined until no point can be abandoned.Finally, the feature points of the curve that fulfill the predetermined accuracy threshold requirement are obtained, and other points are dropped to complete the feature extraction of the target curve.

A two-stage fuzzy optimization method for load feature recognition and extraction
This study adopts a two-stage fuzzy improved DP (TFIDP) algorithm.First, the improved DP algorithm based on the fuzzy optimization threshold is used for the initial feature extraction of load.Subsequently, the secondary feature extraction is carried out based on the primary feature extraction following the idea of statistical frequency distribution.Finally, the feature recognition and extraction of the load is completed.

Initial feature extraction based on the fuzzy optimization DP algorithm
In practical applications, the value of the threshold must be considered by complicated factors that are difficult to quantify.Moreover, the threshold adaptivity for different original data sets must be adjusted.Therefore, it is of great practical significance to construct a fuzzy optimal DP model with an adaptive threshold adjustment.
Based on practical experiences, the threshold in the classical DP algorithm is usually regarded as a certain value in the range of [0,1].However, for a series of actual curves with similar shape features, a reasonable threshold should exist such that the curve features extracted from this cluster of similar curves can fulfill practical requirements.Therefore, the threshold setting of the DP algorithm is fuzzy, which is in line with the basic idea of fuzzy mathematics to describe and model fuzzy concepts through accurate mathematical means and solve practical problems properly.In summary, the classical DP algorithm is improved to introduce the concept of fuzzy mathematics in the formation of the DP algorithm of fuzzy optimization threshold  .The threshold  in the DP algorithm is the key control factor for the final feature set extraction of curves.The introduction of a reasonable fuzzy mathematical concept to fuzzy optimization of the threshold  can make the curve feature recognition and extraction process of the DP algorithm more universal and generalized.
The universe [0,1] E  is defined as the threshold value region of the DP algorithm feature extraction.This study sets the threshold value in the interval [0,1], and the value interval is 0.1 to simplify the operation.Moreover,

( )
Sat  represents the membership of the threshold value of the DP algorithm to a cluster of similar curves.
where the threshold membership

( )
Sat  is composed of two parts, which can be seen as the overall curve feature extraction satisfaction of the specified similar curve cluster under a certain threshold; ( ) D  is the average matching degree between the curve features extracted from the specified similar curve cluster and original curve, to verify whether the extracted curve features can completely reflect the shape features of the original curve; ( ) is the percentage ratio of the number of curve feature points to the number of original curve points, that is, the average percentage of the original curve compressed by curve feature extraction; and, a and b are the corresponding proportional coefficients.
(1) Average dynamic time warping (DTW) matching degree ( ) Because the time dimension of curve features is reduced compared with the original curve, and thus it is no longer a one-to-one mapping relationship, this study introduces the DTW algorithm to calculate the matching degree between the original curve and curve features.If the original and feature sequences of the curve are X and Y , the sequence lengths are X L and Y L , respectively.The regular path is defined as . The regular path needs to fulfill the boundary, continuity, and monotonicity, as follows: ( ) Subsequently, the regular path cumulative distance ( , ) w F X Y between the original sequence X and feature sequence Y can be calculated as The cumulative distance of the regular path under the optimal regular path * w reaches the minimum, and the cumulative distance of the regular path currently is the DTW distance.
Based on the idea of dynamic programming, the cumulative distance of the optimal regular path can be calculated as follows: ) The DTW distance between X and Y is calculated iteratively to achieve ( , ) Curve features in extreme cases only include the first and last endpoints of the original curve.Moreover, when the straight line connecting the first and last endpoints of the original curve is Y , the average matching degree ( ) D  between the curve features and the original curve obtained from the identification and extraction of similar curve clusters is calculated as follows: where n is the number of curves contained in the current similar curve cluster.
(2) Average compression ratio ( ) The average compression ratio

( )
Z  between the curve features and the original curve using the recognition and extraction of similar curve clusters is calculated as follows: where ( ) Num  represents the amount of data in the obtained sequence.

Quadratic feature extraction based on statistical frequency distribution
The DP algorithm based on fuzzy optimization threshold completes the initial extraction of load characteristics of all load curves in a certain type of load curve cluster and applies the statistical frequency distribution to extract the overall characteristics of this type of load curve cluster twice.Moreover, the number set of all the nonrepeat m load characteristics generated in the process of load feature identification and extraction of this type of load curve cluster is  , , , , , ] i m G g g g g = .
Subsequently, the statistical frequency of each load characteristic of this type of load curve cluster is calculated using Eq. ( 12).Next, the appropriateness of the load characteristic as one of the overall characteristics of the load curve cluster is determined using Eq. ( 13).If Eq. ( 13) is satisfied, its corresponding i I is added to the updated characteristic number set I of this type of load curve cluster, that is, i II   .
Finally, the set I obtained by the first feature extraction of the DP algorithm using the fuzzy optimization threshold and the second feature extraction of statistical frequency distribution is used as the overall feature mark of this type of load curve cluster.

Combined short-term load forecasting model based on machine learning 3.1. Nonlinear autoregressive with external input (NARX) neural network model
After the two-stage fuzzy optimization of load feature recognition and extraction, the output unified load feature set can be combined with a variety of machine learning models to form a combined forecasting model, such as the neural network model and SVM regression model.Among them, the NARX neural network model is more competitive than other typical machine learning models because of its reasonable structural performance and excellent nonlinear ability to capture time series; in addition, its parallel distributed training mode improves the fault tolerance and stability of the model [16][17].
The structure of the NARX neural network is shown in Fig. 3.In Fig. 3,  are the connection weight coefficients from the input to the hidden layer, and from the hidden to the output layer, respectively;  and  represent the excitation functions of hidden and output layer neurons, respectively.Therefore, the input-output relationship of the NARX neural network model can be described as follows [18]: where () .Therefore, the NARX model can completely consider the historical implication information of the time series, and it can be used to finely depict the state of the time series at the prediction time.

Input layer
Hidden layer Output layer Fig. 3 Structure of the NARX neural network

Markov chain model
When all machine learning models are successfully trained and short-term load forecasting is carried out, the classification of the day to be forecasted is difficult to determine owing to the unknown load-related information.In this study, the Markov chain model [19] was used to solve the problem of classification and discrimination of the days to be forecasted.
To apply the Markov chain model, the size of the state space of the studied system must be determined.Suppose that the category code sequence of the historical load contains a total of r states, which are recorded as 12 [ , , , , , , , ] i j r s s s s s .When the result of load FCM clustering is M , the value range of any state i s is [1, ]  M .According to the Markov chain definition, the probability of state i s transferred to state j s after n steps is calculated as where Tn represents the number of times that state i s in the historical state sequence transfers to state j s after n steps; i T represents the total number of occurrences of state i s in the historical state sequence.
The n-step state transition probability matrix of the Markov chain can be calculated repeatedly using Eq. ( 15), as expressed in Eqs. ( 16) and (17). .Subsequently, according to the Markov chain definition, it can be determined that the state u s is the maximum probability state of the step 1 r + .Therefore, the classification of the date to be forecasted can be reasonably distinguished.

Smooth spline fitting model
After load characteristics were predicted using the proposed model, the smoothing spline (SP) model was employed to reconstruct the original dimensions of the load and complete the short-term load forecasting.The execution process of SP fitting is as follows [20].
Provided the corresponding discrete sequence [( , ),( , ), ( , )] nn x y x y x y , consider the determination of fitting function s(x) to satisfy Eq. (18) in the set of all fitting functions with second-order continuous derivatives: , (18) where n represents the number of sequence data pairs, ωi represents the error weight, p represents the smoothing coefficient of the SP fitting model, and the value interval is [0,1].The first term in Eq. ( 18) is the error penalty, which is used to balance the similarity between the fitting and original data; the second term is the roughness penalty, which is used to measure the degree of curvature fluctuation of the fitting curve.Eq. ( 18) is used to minimize the total penalty function, that is, to evaluate the simultaneous effects of the fitting error and fitting roughness.For a smoothing factor value p=0, the fitting result becomes the least-squares straight line fitting of the data; for p=1, the fitting is the cubic spline interpolation of the data.The SP fitting does not need to specify the location of nodes and has strong adaptability.Moreover, the SP fitting comprehensively considers the fitting error and fitting roughness, making its smooth fitting results more appropriate for the actual engineering needs than those of the conventional fitting methods.

General modeling concept
In the load characteristic analysis stage, the historical data of multi-energy load and historical information of meteorological, daily type, and other load influencing factors are obtained; then, data cleaning, normalization, and other preprocessing tasks are carried out.Subsequently, the fuzzy clustering analysis is implemented.Based on the clustering results, the proposed TFIDP algorithm is used to extract the load characteristics in two stages, to obtain the input of the NARX prediction model, and the Markov chain transition probability model is used to determine the training samples.Finally, the predicted load characteristics are reconstructed using the SP fitting model to complete short-term load forecasting.

Analysis of a practical case study
The power load data used in this study were derived from the actual power system of a university campus.These are the daily load curve data of the power supply area from July 2019 to July 2020.The sampling interval of load data is 15 min.The daily load curve consists of 96 load points in the time range of 00:00-23:45.In this study, the load influencing factors include the meteorological and daily type rule information.The meteorological information data include the daily maximum temperature, minimum temperature, average temperature, humidity, rainfall, and wind speed in the power supply area from July 2019 to July 2020.The day-type rule data include working days, weekends, and holidays.The computer hardware parameters were as follows: 3.00 GHz clock speed, Intel Core i7 processor, and 16 GB memory; the modeling was performed using the MATLAB R2018b platform.
In this case study, the daily load and load-influencing factor data were used as the training set.The data in August 2020 were used as a test set to evaluate the prediction performance of the proposed model.The reference input of the load at time t of the day to be forecasted, Xp, was composed of the load at time t (considering similar days) of the load class that it belonged to, load at t-1 and t+1 (considering similar times), and meteorological and daily type rule information of similar and prediction days.

Data cleaning
Abnormal load data points objectively exist in the actual data terminal system; thus, reasonable data cleaning is needed.To guarantee the consistency of the original data, abnormal data points should be corrected as accurately as possible to avoid affecting the authenticity of the original data and introducing other errors.Therefore, this study adopted a data-cleaning method of visual joint discrimination of load wavelet decomposition coefficient and RGB transformation.Moreover, according to the sensitivity of wavelet decomposition, the abnormal load data points in the massive load data were accurately checked.Additionally, the visualization operation was carried out according to the linear mapping transformation of wavelet coefficients and RGB values to help analyze the location of abnormal load data points [21].
(1) Wavelet decomposition load Fig. 4 shows the wavelet decomposition coefficient diagram of the fifth-layer db4 wavelet decomposition using load data as the original signal.The steep points in the detailed signals obtained by decomposition can slightly reflect the data abnormality degree of the original load signal. (

2) RGB conversion visualization
The wavelet decomposition coefficients can be converted into RGB values in the interval [0,64] using the linear mapping relationship in Eq. ( 19) such that the anomaly degree of data points can be determined using the RGB graph of wavelet coefficients.As shown in Fig. 5

B
are the minimum and maximum coefficients of all wavelet coefficients, respectively.
(3) Fitting and correction of abnormal data According to the visual joint judgment of load wavelet decomposition coefficient and RGB transformation, the wavelet coefficients of each layer corresponding to normal data are less than 2 under normal conditions.With this as the standard, the location information of abnormal data points is determined and corrected separately.The correction method is to take the first five and the last five normal data at the location of abnormal data for cubic polynomial fitting and use the corresponding polynomial fitting calculation results at the location of abnormal data to correct them.A comparison of load data before and after cleaning is shown in Fig. 6.It can be seen that the abrupt abnormal load data is reasonably corrected by implementing the data cleaning process of visual joint discrimination of load wavelet decomposition coefficient and RGB transformation.The cleaned load data eliminates the interference of equipment and human failures, which is more consistent with the original situation of real load data.Therefore, the load data used in the subsequent analysis of this case study were subject to the data cleaning process.

Data normalization
In this study, the short-term load forecasting data were normalized based on the linear normalization method.The normalized interval was [0,1], as expressed in Eq. ( 20):

FCM cluster analysis of load
In this study, the FCM method was used to cluster the load data; the process was thoroughly described in Subsection 2.1.The daily load data from July 2019 to July 2020 as well as the corresponding historical meteorological information data and calendar rule information were used as data samples for FCM clustering.To determine the reasonable number of FCM clusters, the sum of the distance between the load curve and FCM cluster center in all classifications was recorded as , lc D to investigate the similarity of the elements in the FCM cluster results.
( ) where M is the number of cluster centers, S is the number of load curves included in the i-th load curve, i c is the cluster center of the i-th class, and , ij l is the j-th load curve in the i-th class.
In addition, the total distance between two clusters of all cluster centers is recorded as , cc D to evaluate the degree of difference between clusters in the FCM clustering results, as expressed in Eq. ( 22).As the number of clusters gradually increases in the interval [1,20] , the relative distance between classes becomes larger; thus, the value of , cc D increases.
Simultaneously, the relative distance between load curves in each category becomes smaller; thus, the value of , lc D decreases.Therefore, based on the change in , cc D and , lc D values, it can be observed from Fig. 7 that Val decreases with an increasing number of clusters in the interval [1,20] .However, before the number of clusters is six, Val decreases rapidly.However, after the number of clusters becomes six, the downward trend of Val is significantly extenuated.Therefore, based on the idea of the inflection point method, the optimal number of FCM clusters is six.
Subsequently, the clustering task was carried out according to the process described in Subsection 2.1.Fig. 8 shows the FCM clustering results of the load curve.Obvious differences among various types of FCM clustering results can be seen; indeed, each type has a certain number of compact load curves.Therefore, it is reasonable to select FCM clustering results with a cluster number of six.

Two-stage fuzzy optimization of load feature extraction
After the FCM clustering analysis of the load curve was completed, the initial feature extraction of various loads was carried out according to the DP algorithm flowchart of the fuzzy optimization threshold, as described in Subsection 2.3.1.The threshold-setting process of the DP algorithm was used for fuzzy optimization of the third type of load.Fig. 9 shows the changes in average DTW matching degree

( )
Sat  is the optimal threshold of the DP algorithm for this type of load curve cluster.Fig. 9 shows that the optimal threshold value of the DP algorithm for the third type of load after the fuzzy optimization analysis is 0.2.Based on the first feature extraction, the second feature extraction of the third type of load curve cluster was carried out using the statistical frequency distribution process, as described in Subsection 2.3.2.Fig. 10 shows the final feature recognition and extraction results of the third type of load curve cluster.Fig. 10 illuminates that after applying the proposed two-stage fuzzy optimization of load feature recognition and extraction, the load feature dimension of the third type of load curve cluster is simplified from the original 96 to 62 dimensions, and the average compression ratio is approximately 65.0%.It not only effectively simplifies part of the inputs of the combined short-term load forecasting model but also completes the feature selection and dimension reduction.Therefore, the important original shape feature information of the load curve is retained, which provides a reliable supporting condition for the accurate prediction of the subsequent combined short-term load forecasting model.

Evaluation of the model performance
The performance evaluation indicators of the forecasting model were the relative error rate i E at load forecasting point i , root mean square error ( ) where ˆi x and i x are the predicted and actual values of power load at point i , respectively.

Load characteristic prediction
The load of the prediction calculation system on August 1, 2020, is taken as an example.The maximum transfer probability state calculated by the probability matrix of the Markov chain model in Section 2.2 combined with the historical state indicates that the daily load most likely belongs to the third type of the load curve cluster.Thus, the third type of load characteristics and its influencing factors should be used as the source of training samples on that day.At present, the parameter setting of the neural network and support vector machine has not formed a complete theory to guide.The parameter value interval in the neural network is determined empirically, and specific parameters are set using experiments and comparisons [22][23].The optimal parameters of the SVM model are determined based on grid search and cross-validation [24].Because a three-layer neural network can approximate any complex continuous nonlinear function [25], the number of layers of the neural network was selected as three, and the hidden layer neurons of the NARX neural network were set for parameter trial optimization, as shown in Fig. 11.11 shows that the optimal interval for trial based on an empirical equation is [11,20].To avoid the contingency of a single test, 10 rounds of calculation were conducted for the number of neurons in each hidden layer, and the mean square error of repeated test training data samples was calculated.When the number of neurons in the hidden layer was 18, because the mean square error of each round of training data sample test error was at a low level and its mean value reached the minimum value; the exploratory optimization process was used to determine the number of neurons in the hidden layer of the NARX neural network, which was finally regarded as 18.Table 1 lists the key parameter settings of the proposed model after exploratory optimization and search verification.As shown in Fig. 12, based on the fuzzy clustering analysis, the FCM-TFIDP-BP, FCM-TFIDP-SVM, and FCM-TFIDP-NARX composite models were respectively constructed using the proposed two-stage fuzzy optimization of load feature recognition and extraction method, BP neural network, SVM, and NARX neural network through cascade settings.
Fig. 12 Prediction results of load characteristics Fig. 12 shows that, compared with the typical BP neural network and SVM structure, the cyclic NARX neural network with time-delay feedback connection can better mine the correlation characteristics between the load characteristic sequences with complex nonlinearity; thus, this network has the best overall prediction effect of load characteristics.shows that the ERMSE of FCM-TFIDP-NARX is 1.028 MW and its EMAPE is 3.484%; these values are better than those of FCM-TFIDP-BP and FCM-TFIDP-SVM.Accurate load characteristics provide a decisive foundation for the subsequent improvement of the overall short-term load forecasting accuracy.

Evaluation of short-term load forecast
As shown in Fig. 13, based on the BP neural network, SVM, and NARX neural network, the direct method, the simple load feature recognition combined forecasting model with a fixed threshold of 0.5, and the fuzzy optimization load feature recognition combined forecasting model were used to compare the short-term load forecasting results of the case study for three consecutive days from August 1 to August 3, 2020.Moreover, the predicted load characteristics were reconstructed to the original dimensions based on the SP fitting technology.3, obtaining reasonable results for short-term load forecasting based on a single machine learning forecasting model is difficult.In the process of continuous short-term load forecasting, the ERMSE of the BP neural network on working and rest days reached 4.243 and 4.524 MW, respectively, owing to the problems of easily falling into local minimum points and "overfitting."However, EMAPE reached 12.969 and 13.485%, respectively, and the prediction effect was poor.Moreover, the typical SVM model exhibited an insufficient training generalization ability for massive data samples; hence, its prediction performance cannot fulfill the actual demand.However, the NARX neural network exhibited the highest overall prediction performance based on the advantages of its own cyclic delay feedback structure.Indeed, ERMSE and EMAPE of the NARX neural network are the lowest among the comparison single machine learning forecasting models.However, owing to the complex nonlinear effects of load fluctuation and influencing factors, the NARX neural network yields large prediction errors at some load characteristic prediction points.Therefore, the overall prediction accuracy must be improved.Fig. 13 and Table 3 reveal that the proposed load feature recognition forecasting model improves the overall forecasting accuracy compared with the single machine learning forecasting models because it forecasts the load features with more details and uses an appropriate SP fitting technology for load reconstruction.Furthermore, the ERMSE of FCM-DP-BP on a week and rest days decreased by 0.942 and 0.775 MW, respectively, compared with that of FCM-BP.In addition, EMAPE decreased by 2.408 and 1.113%, respectively.Similarly, the Acc of FCM-DP-SVM in a week and rest days increased by 1.626 and 1.160%, respectively, compared with that of FCM-SVM.Furthermore, Acc of FCM-DP-NARX is 0.545 and 0.433% higher than that of FCM-NARX.Therefore, the combined forecasting model of load characteristics recognition with a fixed pattern is completely practical.
This study proposed an adaptive improved fuzzy optimization load feature recognition combined forecasting model to alleviate the limitations of the fixed pattern load feature recognition combined forecasting model and further improve the accuracy of load forecasting.Compared with the fixed pattern load feature recognition combined forecasting model, the fuzzy optimization load feature recognition combined forecasting model is more sensitive to load feature prediction.Moreover, the adaptive two-stage fuzzy optimization method for load feature identification and extraction can effectively analyze the necessary load feature set to provide support for accurate prediction of future load features.In addition, the load feature dimension considered by the combined fuzzy optimization load feature recognition model is more reasonable than the combined load feature recognition model with a fixed pattern.Therefore, it not only supports the forecasting model to efficiently mine the correlation characteristics between load characteristics and improve the accuracy of load characteristics forecasting but also enhances the rationality and effectiveness of load fitting and reconstruction, and thus effectively improves the final load forecasting accuracy.Moreover, according to the total calculation time statistics listed in Table 3, compared with the single machine learning prediction model, the fixed-mode load feature recognition and fuzzy optimization load feature recognition combined forecasting models focus on the load characteristics.The input scale of the model can be clearly and effectively reduced; thus, the computational efficiency is improved.However, the calculation time of the fuzzy optimization load feature recognition combined forecasting model is higher than that of the fixed-mode load feature recognition combined forecasting model because of its more refined load feature mining.Therefore, the combined forecasting model of fuzzy optimization load feature recognition can obtain a higher load forecasting accuracy at a reasonable calculation time.Among them, the FCM-TFIDP-NARX combined forecasting model exhibits the highest performance and satisfactorily combines the superior structural performance of the NARX neural network for nonlinear time series problems using cyclic delay feedback structure, and practical application advantages of adaptive two-stage fuzzy optimization of load feature recognition.ERMSE on rest days is only 1.101 MW, EMAPE is 3.200%, whereas ERMSE on weekdays is 1.073 MW and EMAPE is only 3.587%, The overall prediction accuracy is 95.914%, and the calculation time is approximately 48% less than that of FCM-NARX.It not only significantly reduces the computational complexity of the combined forecasting model but also effectively improves the accuracy of short-term load forecasting results, which can reliably meet the targeted demand under the new development background.

Conclusions
Based on a typical machine learning forecasting framework, this study introduced a combined forecasting model of fuzzy optimization load feature recognition.Additionally, the proposed model was applied to an actual case study to validate its short-term load forecasting performance.Based on the theoretical analysis and verification of the calculation results, the following conclusions were obtained: (1) Compared with conventional machine learning models, such as the BP neural network and SVM, the NARX neural network has the superior structural characteristics of cyclic delay feedback, which can better capture the dynamic characteristics of load sequence, and thus exhibits a higher prediction performance.
(2) Compared with the single machine learning prediction model, the load feature recognition combined forecasting model is more refined in the process of load feature prediction and load reconstruction; thus, the overall forecasting performance is improved.
(3) The combined forecasting model of fixed-mode load feature recognition is improved adaptively.
Under the new power system development conditions, the proposed fuzzy optimal load feature recognition combined forecasting model can effectively consider various targeted load forecasting needs.In future studies, the proposed model will be improved to reasonably describe the characteristics of coupling, correlation, and transformation between multiple complex loads.

Fig. 1
Fig. 1 Actual load curve and load characteristics

Fig. 2
Fig. 2 Schematic of fuzzy optimization load feature recognition

( 3 )
Proportional coefficients a and bThe setting of the scale coefficients a and b is related to the average matching degree ( )

1 [
When the Markov transition probability matrix is used to predict the state of step 1 r + , the maximum transition probability of the state r s in row r of the Markov transition probability matrix is first determined.Assume that this probability is

Fig. 6
Fig. 6 Schematic of data cleaning comparison power load data sequence; x is the original power load data sequence; and max x and min x are the maximum and minimum values in the original power load data sequence, respectively.

Fig. 9
Fig. 9 Schematic of the third type of fuzzy optimization threshold of load Fig. 9 shows that the number of features extracted for the first time decreases as the threshold of the DP algorithm for the third type of load gradually increases in the interval [0,1] .Thus, the average compression ratio

Fig. 10
Fig. 10 Characteristic dimension diagram of the third type of load curve cluster load forecasting, average absolute percentage error rate MAPE E , and overall forecasting accuracy rate Acc , as expressed in Eqs.(24)-(27).

Fig. 11
Fig.11Optimal setting of hidden layer neurons Fig.11shows that the optimal interval for trial based on an empirical equation is[11,20].To avoid the contingency of a single test, 10 rounds of calculation were conducted for the number of neurons in each hidden layer, and the mean square error of repeated test training data samples was calculated.When the number of neurons in the hidden layer was 18, because the mean square error of each round of training data sample test error was at a low level and its mean value reached the minimum value; the exploratory optimization process was used to determine the number of neurons in the hidden layer of the NARX neural network, which was finally regarded as 18.Table1

Fig. 13
Fig. 13 Prediction results of load characteristics Table 3 lists a summary of the indicators to evaluate the prediction performance of the proposed model.The evaluation indicators include ERMSE, EMAPE, Acc, and the total calculation time of the short-term load forecasting on working and rest days.According to Fig. 13 and Table3, obtaining reasonable results for short-term load forecasting based on a single machine learning forecasting model is difficult.In the process of continuous short-term load forecasting, the ERMSE of the BP neural network on working and rest days reached 4.243 and 4.524 MW, respectively, owing to the problems of easily falling into local minimum points and "overfitting."However, EMAPE reached 12.969 and 13.485%, respectively, and the prediction effect was poor.Moreover, the typical SVM model exhibited an insufficient training generalization ability for massive data samples; hence, its prediction performance cannot fulfill the actual demand.However, the NARX neural network exhibited the highest overall prediction performance based on the advantages of its own cyclic delay feedback structure.Indeed, ERMSE and EMAPE of the NARX neural network are the lowest among the  is the input of the NARX model, () , the RGB values of normal load data are close; thus, their colors are similar.The color of abnormal load data significantly differs from the color of normal data.

Table 1 .
Key parameter settings of models

Table 3 .
Summary of evaluation indices of the comparison forecasting models