Keywords

1 Introduction

Load prediction is a key link in power supply planning, as well as a basic feature and important calculation basis for intelligent power supply planning. In addition to traditional machine learning models, deep neural networks, as the most popular intelligent research framework at present, have been widely implied by researchers in the active distribution network load prediction research. Active distribution network load prediction data can be regarded as time series data, which means it could be classified by chronological order. Time series analysis method describes and interprets phenomena that change over time to derive various predictive decisions. Deep learning neural networks can automatically learn arbitrarily complex mapping from input to output, and support multiple inputs and outputs [1]. It provides many ways for time series prediction tasks, such as automatic learning of time dependence or trends and seasonality automatic processing of data based on time structure.

Although deep neural networks can approximate any complex function arbitrarily and perform good non-linear modelling of a variety of data, in the historical data used in the active distribution network load prediction, the short-term load data sequence has obvious approximate period characteristics, and the long-term load data sequence shows the variability and rich dynamic characteristics. Besides, with the development of the Internet and big data technology, it will improve the performance of active distribution network load prediction by importing some kinds of time series data, such as market reports and production management data and other modalities. LSTM (Long Short-Term Memory) and other RNN (recurrent neural network) structures could not effective in predicting the difference between peak hours and minimum power consumption times, and usually requires higher computational cost.

This paper proposes a multi-modal CNN-BiLSTM (Convolutional Neural Network-Bidirectional Long Short-Term Memory) architecture, which have an improved shared parameter parallel convolutional network to learn feature representations for short-term load data sequences, and an improved bidirectional attention LSTM network. The model presents the dynamic changing characteristics of data affected by some disturbances with the text features, such as temperature and holidays. On the 24 months of load and market report data set, the method is compared with the convolutional neural network and the bidirectional long short-term memory neural network. The experimental results show that the model has some advantages on the computational speed and accuracy.

The rest of this paper includes: The part II introduces the characteristics of the load sequence data and the variables that may affect the prediction. The third part introduces the multi-modal deep learning. The fourth part details the structure of the proposed multi-modal. The experimental and evaluation results are given in the fifth part and the last one is the summary.

2 Load Feature Extraction and Prediction

2.1 Load Feature Extraction

The load types can be distinguished according to the reaction guidance mechanism and the non-reaction guidance mechanism, which are respectively controllable load and uncontrollable load. The load type is divided into friendly load and non-friendly load. The load prediction model can be constructed by analysing the active load characteristics and energy storage characteristics including friendly load and according to the constraint conditions [2]. Another method is to use the bottom-up prediction method [3], in the small area divided according to certain properties, first perform load prediction, and finally superimpose the obtained load demand curve to obtain a complete load prediction result.

For example, a large amount of data can be processed in parallel through the cloud computing platform, the maximum entropy algorithm can be used to classify the data, the abnormal data and the available data can be distinguished, and the local weighted linear regression model can be combined with the Map-Reduce model framework to realize the active configuration of cloud computing [4].

The Spark platform is used to divide all the obtained data and compute them in parallel to speed up the processing of big data. First, the data is pre-processed through feature extraction, and the input that meets the requirements of the model is obtained, which input into the multivariate L2-Boosting for training and learning and get the final regression model [5]. The grey prediction method is also a common method of load prediction, which added secondary smoothing processing through historical data to eliminate the interference factors of historical data with Markov chain and grey theory to predict the residual sequence and the sign of the future residual together to revise the results [6].

2.2 Load Feature Prediction

As a type of time series data, load prediction can also be implemented using neural network technology. In monthly and quarterly time series, time series prediction based on neural network has more obvious advantages than traditional statistical methods and artificial judgment methods compared with traditional statistical time series methods [7]. Mbamalu et al. believe that load prediction is an autoregressive process, and use iterative re-weighted least squares to estimate model parameters [8]. Based on the combination prediction model of neural network, by learning the weights of different prediction models in the combination, the variable weight coefficient combination prediction model is shown in Eq. 1.

$${y}_{ij}=\sum_{t=1}^{K} {w}_{t}\left(i,j\right)\left({f}_{tij}+{e}_{tij}\right)$$
(1)

Where \({y}_{ij}\) is the actual load of month i in year j, \({f}_{tij}\) is the predicted value of month i in year j of the first method, \({e}_{tij}={y}_{ij}-{f}_{tij}\) and \(w=\mathrm{Min}\sum_{i=1}^{n} \sum_{j=1}^{12} {\left[{y}_{ij}-g\left({f}_{1ij},{f}_{2ij},\dots ,{f}_{Kij}\right)\right]}^{2}\).

Since there is a relatively complicated non-linear relationship between the actual prediction input and the final output, a three-layer forward neural network is used to fit an arbitrary function. Through the continuous iteration of the network and the update of the gradient back propagation, the final reasonable parameters are obtained. And by these parameters, the combined predicted value of any predicted input value is realized. The load forecasting results by Autoregressive Integrated Moving Average and Seasonal Autoregressive Integrated Moving Average showed that obtained 9.13% and 4.36% mean absolute percentage error respectively. With deep learning Long Short-Term Memory model, it will reduce to 2% [9].

3 Multi-modal Deep Learning

Deep neural networks have been widely used on single modal data such as text, images or audio, which included a variety of supervised and unsupervised deep feature learning model architectures [10]. Multi-modal deep learning refers to training new deep network applications to learn the features of multiple modes. For example, in emotion recognition technology, the voice and text information fusion can improve the effect of emotion recognition [3]. Establishing a private domain network (for visual information and audio information in short videos to extract individual features) and a public domain network (for acquiring joint features) could solve the problem of short video classification [8].

The principle of multi-modal feature learning is, if there are multiple modalities at the same time, one of the modalities can be learned better than a single modal in-depth feature. It can also be learned by sharing representations between multiple modalities to further improve the accuracy index on specific tasks. Researchers have begun to carry out research in various fields for multi-modal model, such as multi-modal model based on fuzzy cognitive maps [5], which first extract a subset from the complete data and trained separately on each subset, then used fuzzy cognitive maps for modelling and prediction, and finally the output was fused from each subset by the information granulation.

The time series data is widely available, such as holidays, weather and other data, which can be used to jointly predict the city's traffic conditions [6]. Firstly, the holiday and weather feature information were extracted, and the Prophet algorithm is selected to predict the traffic flow characteristics during the holidays with one DCRNN network to predict the traffic flow on the combination of road network structure data and flow data. Besides, image and time series data are indispensable in the automatic driving system. The time series refers to the speed series and steering wheel angle series. The multi-modal network serving the autonomous driving system includes CNN, RNN, horizontal control network and vertical control network. The time series data is input into the RNN network for processing, and the image data is input into the CNN network for feature extraction. The extracted features are input into the horizontal and vertical control network respectively. Finally, the predicted value of the steering wheel and speed is obtained to guide the steering wheel angle and the speed.

4 An Improved Multi-modal CNN-LSTM Prediction Model

Although classic time series prediction algorithms can be used for load prediction, the fluctuation of load does not only depend on historical time series data. Due to the diversification of intelligent load management requirements, it is manifested as a multi-modal data form in time series.

Fig. 1.
figure 1

Multi-modal CNN-BiLSTM network structure.

This paper proposes a multi-modal convolutional neural network-long short term memory neural network prediction method on load data and its primary structure is shown in Fig. 1. For short-term load data series, introduce data such as temperature and holidays, and use an improved shared parameter parallel convolutional network to learn feature representation; and use an improved two-way attention mechanism long and short-term memory neural network, combined with medium and long-term load sequences and effects. The relevant text data is introduced in this model for its dynamic change features.

In the multi-modal convolutional neural network-bidirectional long and short term memory neural network structure in Fig. 1, two parallel convolutional neural networks are used to extract features from the original historical load and other modal data sequences such as temperature and text. These convolutional neural networks share parameters. The first convolutional layer includes two convolution kernels with sizes 4*4 and 5*5. The number of convolution kernels is 64, and then a shared connection is used. The structure is to extract some of the convolution kernels from the previous layer of convolution kernels to form the current layer of convolution kernels. The fully connected output needs to be sent to the attention layer, trained according to the attention mechanism, and output to the BiLSTM network. The size of the hidden state is 64. The final output is the short-term load data sequence and the long-term load data sequence.

Fig. 2.
figure 2

Shared-parameter convolutional neural network structure.

5 Experiments and Results

In this section, we introduce the experimental evaluation methods and results of the baseline system and the above-mentioned improved methods on existing data sets. The data set contains unit hour load data of a city in North China for about 2 years, local daily maximum temperature, minimum temperature, average temperature and precipitation data, local public holiday date data, and local quarterly market operation information report data within 2 years. The entities and their types in the maximum and minimum temperatures, holiday information, and text are represented as vectors of length 128. The load value is divided into short-term load data series and long-term load data series according to the time period. The former contains the load data series within a quarter, and the latter contains the load data series greater than one quarter. Use these data to predict the unit hour load value on a specified time series period.

The evaluation index is the mean absolute percentage error (MAPE) based on the short-term load data series and the long-term load data series prediction and its calculation method is shown in Eq. 2.

$$\mathrm{MAPE}=\frac{1}{N}\sum_{k=1}^{\mathrm{N}} \left|\frac{\widehat{\mathrm{v}}(k)-\mathrm{v}(k)}{\mathrm{v}(k)}\right|\times 100\mathrm{\%}$$
(2)

Where \(N\) represents the total number of samples in the test set, \(\mathrm{v}(k)\) represents the actual value, and \(\widehat{\mathrm{v}}(k)\) represents the predicted value.

Fig. 3.
figure 3

MAPE results of short-term load prediction.

The baseline system adopts weighted least squares method WLS, autoregressive moving average ARMA, seasonal autoregressive integrated moving average SARIMA and CNN-LSTM architectures, and divides a total of 731 days * 24 h of data into training data and verification data in chronological order And the test data, the ratio is 4:2:4. Under the four baseline systems and the multi-modal CNN-BiLSTM model, the average absolute percentage error MAPE results and the average error MAE results of short-term load data series prediction and long-term load data series prediction are obtained, as shown in Fig. 3 and Fig. 4, respectively. The figure shows that the multi-modal CNN-BiLSTM method has certain advantages for short-term load data sequence prediction and long-term load data sequence prediction on the training set and testing dataset. Compared with the CNN-LSTM architecture, it has a certain error reduction. Especially in the long-term load data series prediction, it has higher prediction accuracy than the short-term load data series.

Fig. 4.
figure 4

MAPE results of long-term load prediction.

6 Conclustion

Load prediction has the characteristics of time trend. There are obvious differences in load in different seasons. Precise prediction is helpful for efficient decision-making and reasonable planning. This paper proposes a multi-modal convolutional neural network-bidirectional long and short-term memory neural network architecture, which uses a parallel convolutional network with shared parameters and a bidirectional attention mechanism. The long-term and short-term memory neural network processes load data, temperature data and text data. The multi-modal data sequence, etc., can predict the short-term load data sequence and the long-term load data sequence. The experimental results verify that the network structure can achieve a certain improvement in prediction accuracy compared with other baseline systems.