1 Introduction

Transient stability has been widely regarded as one of the most concerned issues of modern power system. In the last two decades, a number of large blackouts occurred all over the world due to the loss of synchronization caused by cascading failures [1]. Insufficient online implementations and lack of timely emergency controls, such as load shedding, generator tripping and proactive islanding, are said to be the common causes of those accidents [2]. The increasing renewable energy integration brings dynamic security deterioration of power systems, which would lead to the operation risks [3]. However, the deployment of phasor measurement units (PMUs) provides a promising way to improve awareness ability of control centers for the disturbed operation scenarios. PMUs, the infrastructure of wide-area monitoring system (WAMS) of power system, is able to measure synchronized phasor data with much higher sampling frequency compared with supervisory control and data acquisition (SCADA) [4]. The measurement accuracy is also reported to be sufficiently satisfactory. Since PMUs can grasp the instant response of power systems when faults occur, how to utilize the massive disturbed trajectories has been significantly investigated in the last decade.

As WAMS are now being deployed in quite a few power systems, PMU is playing an ever increasingly vital role in transient stability awareness [5]. A number of researches have been carried out to evaluate the transient stability by using PMU data. PMU trajectories based indicators are considered as the efficient estimators to understand dynamic features of power systems, especially during severe disturbances. For example, Alvarez et al proposed seven trajectory based indices which are suitable for fuzzy inference on real-time dynamic vulnerability [6]. A phasor data–based energy function indicator was developed in [7] aiming at monitoring the dynamic status of power transfer paths. A real-time transient stability assessment (TSA) method based on centre-of-inertia estimation from PMU records was reported in [8]. From voltage stability aspect, a coupled single-port model was applied to establish WAMS based assessment indicator [9]. Furthermore, Makarov et al. [10] presented a review on PMU-based TSA, offering a clear roadmap for further development.

Machine learning techniques have been widely applied for TSA. Most of the existing works are focused on the binary state prediction for global stability using clustering and classification. For example, support vector machine, decision tree and artificial neural network (ANN) are widely used to detect instability of power systems by using post-fault dynamic data during a few cycles [1113]. Guo and Milanović presented a probabilistic framework to evaluate the accuracy of data mining tools applied for online prediction of transient stability [14], enabling the comprehensive analysis of performance of different implementations.

However, few machine learning techniques have considered the impact of the critical unstable generators (CUGs) of power systems. The majority of the researches have focused on the identification of the global system status due to the fact that a power system normally has hundreds of generators which generate massive volumes of data [15]. As a result, it has become a challenge for standalone machine learning techniques running on single computers to deal with stability assessment taking into account CUGs identification. For example, Passaro et al. [16] employed adaptive neural network to evaluate stability for each generator, admitting that standalone neural networks can hardly solve the problem in a reasonable time. For this purpose, applying advanced computing techniques to enable high-performance training and prediction associated with PMU measured data has become a necessity.

It is well known that neural network is highly adapted to classification tasks [17]. A number of researchers employed neural network to achieve high accuracy classifications in both academia and industrial fields. References [18, 19] figured out that BPNN encounters low efficiency issue due to large number of sum and sigmoid calculations. Some researchers focused on speeding up BPNN using cloud computing techniques. For example, Yuan et al. [20] implemented parallel BPNN using cloud computing technique. Ikram et al. [21] also employed cloud computing to parallelize BPNN in training phase. And also some researchers focused on solving the issue using MPI [22]. However, their ideas are all based on data separation, which does not consider the accuracy loss caused by the simple data separation. Therefore, to improve the efficiency of BPNN whilst maintains classification accuracy in predicting CUGs, this paper presents a MapReduce based parallel back propagation neural network(BPNN) algorithm. The algorithm firstly employs ensemble techniques [23] to complement the data information loss in data separation. And then the mappers in Hadoop clusters start training a number of sub-BPNNs. Finally, these sub-BPNNs can be employed to classify instances by fed with a few cycles of post-fault data and output final prediction results based on majority voting.

2 ANN-based CUGs prediction

2.1 Definition

CUGs are defined as the earliest group of generators rotor angles of which have a leading or lagging tendency compared with the rest units after fault clearing. The term of tendency refers to the given threshold of power angle difference between any pair of generators. Technically, CUGs are the most severely disturbed units that may lead to the ultimate loss of stability [24]. On the other hand, they are the potential control candidates for emergency tripping or correction action which is able to quickly diminish instability risk of power systems. The clustering-based method of identifying CUGs is detailed in the following section. Fig. 1 illustrates a few examples of CUGs in terms of rotor angle trajectories.

Fig. 1
figure 1

Illustration of CUGs

The unstable generators is belonged to CUGs, because their leading (or lagging) rotor angle against other units must exceed the given threshold which is usually set to be equal or little smaller than the wide-accepted instability criterion. For example, Fig. 1a and Fig. 1b illustrate rotor angle trajectories of CUGs which also contain all the unstable generators. It is a similar situation in Fig. 1d. In this situation, all the generators are determined as unstable ones at the end of observation time window, 150 cycles. But before that, none of generators reaches the CUG threshold criterion. Therefore, the strict two-cluster instability pattern corresponds to the situation that all the generators are CUGs, such as the case of Fig. 1d. However, unlike Fig. 1a, Fig. 1b and Fig. 1d, Fig. 1c offers the different pattern in which CUGs only are part of unstable units. Although belongs to the leading cluster, ahead of other leading generators, the two generators indicated in Fig. 1c meet CUGs identification criterion at the very beginning of time windows. These two units are considered to be the most effective objects for the further control strategy.

The primary aim of this study is to enable fast CUGs prediction by means of large-scale parallelized BPNNs learning method, providing more in-depth information for situational awareness of power systems transient stability.

2.2 Clusterwise CUGs identification

It is not difficult to distinguish unstable generators from the rest stable ones through the trajectory plot of rotor angle in a few seconds, as shown in Fig. 1a, and Fig. 1b. However, due to the lack of commonly used confirmation criteria for CUG, a method which is able to identify CUGs based on k-means clustering algorithm is presented in the paper. For each fault scenario, CUGs can be confirmed by means of following procedure.

Step 1: Collect rotor angle trajectory of each generator in a few seconds.

Step 2: Calculate rotor angle difference of any two generators i and j cycle by cycle from the very beginning of post-fault point according to:

$$ \Delta \delta_{ij} (t) = \delta_{i} (t) - \delta_{j} (t)\quad \forall i,j $$
(1)

where ∆δ ij (t) is the angle difference between i and j at cycle point t after fault. If ∆δ ij (t) exceeds the given threshold, e.g. 170°, where the power system is critically unstable, then record the time point t. Otherwise, the procedure is terminated for the next round.

Step 3: Extract a specific power angle trajectory δ i (t+∆t) of every individual generator for further analysis. Here ∆t refers to the CUG validation interval. If ∆t is selected to be a relatively large value, like 3 s, it is hardly possible to distinguish CUGs from subsequently potential unstable generators. Empirically, ∆t is preferably set to be 50 cycles, i.e. 1 s.

Step 4: Classify all the δ i (t+∆t) trajectories into two groups by means of widely-used k-means clustering algorithm, followed by calculating center of inertia (COI) against classified rotors of each individual group A and B respectively using:

$$ \delta_{\text{COI}}^{k} = \frac{1}{{M_{T}^{k} }}\sum\limits_{i = 1}^{{N_{k} }} {M_{i}^{k} \delta_{i}^{k} } ,\quad M_{T}^{k} = \sum\limits_{i = 1}^{{N_{k} }} {M_{i}^{k} } , \, k \in \{ A,B\} $$
(2)

where δ k i and M k i are rotor angle and inertia constant of generator i which belongs to group k, N k refers to the number of generators in group k. It is worth noting that since the number of clusters is confirmed according to first swing situation, the replicate k-means, detailed in Appendix A, is employed in order to overcome the drawback of randomly selecting initial centroids.

2.3 Features selection

According to the previous works, a variety of dynamic parameters can be selected to compose features vector for the training procedure of particular machine learning algorithms. There exist two basic types of feature selection, i.e. time-series synchronized data such as a few cycles of voltage trajectory [25] and dynamic performance indices such as kinetic energy indicator of rotors [11].

In order to avoid information loss, the features in this study are confirmed to be the straightforward trajectory data after fault clearing including voltage amplitude, rotor angle and rotor speed of each individual generator. In addition, the maximal kinetic energy, a widely-used indicator highly related to disturbance severity of single generator, is also considered as one of the features. The features vector of generator n for one sample with time interval ∆T is denoted as:

$$ F_{n} (\Delta T) = \{ \varvec{V}(T), \,\varvec{\delta}_{\text{COI}} (T), \,\varvec{\omega}_{\text{COI}} (T), \, KE_{\text{COI}}^{n} \} $$
(3)

where V(T), δ COI(T) and ω COI(T) represent time-series data of voltage amplitude, rotor angle and speed during the time window T, respectively. Let t c denote the exact time when fault clearing accomplishes, T is acquired from t c to t c +∆T. The symbol \( KE_{\text{COI}}^{n} \) refers to the kinetic energy at the instant of one cycle after fault clearing. It is noted that except voltage trajectory array, the time-dependent states such as δ and ω would be mapped to COI coordinate so as to consider mechanical effects of generators. The COI coordinate transformation and kinetic energy calculation are detailed in [24]. The COI coordinate is a widely accepted method of considering global pattern of system stability including the impacts of all the generators. The COI coordinate is applied in training and prediction stage, ensuring generalization ability of ANNs for a specific generator with the inclusion of global impacts of stability pattern. In addition, it is noted that the shape of transient voltage recovery which reflects disturbance severity of a generator closely relates to the stability evaluation of this generator. The disturbed voltage trajectory is also reported as a well-performing attribute for machine learning based transient stability prediction of power systems [25].

Features dimension significantly affects ANN training performance as well as generalization ability. Large size dimension is prone to over-fitting while the short one probably leads to inadvertent information loss. However, it is difficult to confirm the preferable dimension of feature candidates before validating fitting performance. In this regard, the parameters vectors with different dimension are tested by training a set of parallel ANNs in order to determine the optimal dimension for the input features. According to the vector defined by (3), the length of input features, d(∆T), is of linear dependence in terms of ∆T, shown as:

$$ d(\Delta T) = 3 \times \text{num} (\Delta T) + 1 $$
(4)

where the symbol num(∆T) refers to the number of cycles existing during ∆T interval. The triple means adding the length of vector containing V(T), δ COI(T) and ω COI(T) shown in (3). The kinetic energy indicator \( KE_{\text{COI}}^{n} \) is selected as the last feature, adding one more factor into the final vector of attributes. Taking voltage trajectory of generator bus as an example, the principle of determining preferable features dimension is detailed as follows.

As shown in Fig. 2, the time-series voltage amplitude after fault clearing during ∆T is taken as a sub-array of features, i.e. V(T). If ∆T equals 0.4 s, for example, the number of cycles during ∆T is 20 according to PMU data acquisition frequency. In this case, the total dimension given by (4) is 61. Aiming to determine the most effective features dimension for training regarding the trade-off between accuracy and computing cost, a number of ANNs fed by diverse dimension features associated with different ∆T are trained simultaneously. Specifically, ∆T is taken from 0.08 s to 0.56 s with step of 0.04 s, respectively corresponding to the intervals of 4 to 28 cycles using 2 cycles step-size after fault clearing. That means, for a single generator n, a set of ANNs denoted as:

Fig. 2
figure 2

Illustration of confirming features dimension

$$ ANN_{1}^{n} \quad ANN_{2}^{n} \quad ANN_{3}^{n} \, \ldots \, ANN_{13}^{n} $$
(5)

are involved in the same training scheme which is challenged by the extremely computing-intensive machine learning tasks. However, training ANNs tentatively with all the potential input dimension is a straightforward way to ascertain the most generalized structure for the ANN trained for each generator.

At the output side of ANN, on the other hand, learning target against each data sample is given to be a binary variable reflecting status of an individual generator. If a particular generator is identified to be critically unstable by clusterwise method, the target status is tagged as 1, otherwise, it equals to 0.

2.4 Data samples production

A time-domain simulation based program is designed to automatically produce massive data samples associated with determined features. Specifically, the data generation procedure is applied by the following steps.

Step 1: Scale the initial load level of base case through multiplying active power P Li0 on each bus by a random stress factor α, while update reactive power Q Li using the constant power factor shown as:

$$ P_{Li} = P_{Li0} (1 + \alpha )\quad i \in \{ {\text{PQ}}\} $$
(6)
$$ Q_{Li} = P_{Li} \cdot \tan \left\{ {\cos^{ - 1} \left( {{{P_{Li0} } \mathord{\left/ {\vphantom {{P_{Li0} } {\sqrt {P_{Li0}^{2} + Q_{Li0}^{2} } }}} \right. \kern-0pt} {\sqrt {P_{Li0}^{2} + Q_{Li0}^{2} } }}} \right)} \right\} $$
(7)

where α ranges from −0.25 to 0.6. All generators, except the reference one, offset the load variation proportional to their base generations. The updated output of generator n can be calculated using:

$$ P_{Gn} = P_{Gn0} \left( {\frac{{1 + \sum\limits_{{i \in \{ {\text{PQ}}\} }} {\alpha P_{Li0} } }}{{\sum\limits_{{n \ne {\text{ref}}}} {P_{Gn0} } }}} \right)\;\;\; \, n \in \{ {\text{PV}}\} $$
(8)

Step 2: Execute time-domain simulation for a severe fault randomly selected from the pre-defined contingency list. The fault clearing time is applied as a random number between 0.15 s and 0.35 s. The longer clearing time implies the higher possibility of instability of power system.

Step 3: Organize features sample for each generator by using the simulated trajectories according to (3). It is noted that for any generator, a set of data samples with different feature dimensions due to the different ∆T selection (see Fig. 1) are produced at the same time for further parallel training, providing a way to determine the ANN structure with highest generalization ability for each individual generator.

Step 4: Identify CUGs based on the simulation results according to the proposed clusterwise method, enabling the learning target for each data sample.

Step 5: Save all the samples for the current round and repeat the complete procedure from the first step until the given total number of iterations is achieved.

In order to enhance generalization level of data samples, multiple faults are simulated in data producing procedure to consider N − k (k ≤ 3) scenarios.

3 Methodology of MapReduce based parallel BPNN

3.1 Issue of data volume

As discussed previously, BPNN has been widely used in quite a lot fields due to its stable performance and remarkable classification accuracy [26]. Therefore, we also employ such type of ANN to create online predictors to classify CUGs rapidly fed by PMUs measured data.

However, in this study, the time-series statuses of all generators produce an extremely large volume of data for ANN learning, which is completely different from the traditional applications. Traditionally, BPNN is frequently applied to train hundreds or a few thousands of instances in order to identify global instability. Thus, the algorithm generates less overhead. Currently because of the sharp increase of data volume, BPNN is forced to deal with massive scale data.

The volume of data is a specific term generally referring to the computable size and scale of a massive amount of data. Fig. 3 illustrates the quantification on the data volume of input instances fed to BPNN training for predicting CUGs.

Fig. 3
figure 3

Illustration of the volume of data

In Fig. 3, N represents the total length of selected features, M is the number of available samples, and K refers to the sum of generators. Therefore, the cubic data volume can be estimated by

$$ {\text{Data Volume}} = Byte(M \times N \times K) $$
(9)

The data volume of fed instances highly affects training efficiency. For processing one instance, overhead occurs in both training and classification phases due to a large number of sum and sigmoid calculations existing in the network. Additionally, in training phase, BPNN has to execute back propagation to tune all the parameters, which generates overhead. At last each instance is not only trained once but a number of times. The training loops also generate overhead. Therefore, the standalone BPNN will meet a critical bottleneck for processing large volume of data in terms of efficiency. Our data intensive task will deteriorate the performance of CUGs prediction. This motivates us to parallelize BPNN using MapReduce [27].

3.2 Parallelization of BPNN

3.2.1 Standalone BPNN

BPNN is a widely used machine learning technique for classification due to its remarkable function approximation ability. It normally employs only feed forward to output final classification result for each input instance according to the trained weights and biases in training phase. In feed forward, the definitions of related variables are listed in Table 1.

Table 1 Definition of variables used in feed forward stage

The number of inputs in input layer is decided by n, and the number of outputs in output layer is decided by the length of the encoded classifications. Therefore, I j can be represented by:

$$ I_{j} = \mathop \sum \limits_{i} w_{ij} o_{lj} + \theta_{j} $$
(10)

In typical BPNN, the non-linear equation is frequently using sigmoid, therefore the output of the j th neuron from the current layer to next layer can be represented by:

$$ o_{cj} = \frac{1}{{1 + e^{{ - I_{j} }} }} $$
(11)

Output layer finally outputs its o cj which indicates the classification result, and then feed forward completes. Following, back propagation starts. In back propagation, the related variables are defined in Table 2.

Table 2 Definition of variables used in back propagation

Therefore, Err j in output layer is expressed by:

$$ Err_{j} = o_{j} (1 - o_{j} )(t_{j} - o_{j} ) $$
(12)

while Err j in hidden layers can be represented by:

$$ Err_{j} = o_{j} (1 - o_{j} )\mathop \sum \limits_{k} Err_{k} w_{kj} $$
(13)

As a result, the weight w ij and bias θ j can be tuned using:

$$ w_{ij} = w_{ij} + Err_{j} o_{j} $$
(14)
$$ \theta_{j} = \theta_{j} + Err_{j} $$
(15)

BPNN terminates its training procedure based on two conditions. The first one is that if the loop reaches a certain number, the algorithm terminates. The second condition is that if the error reaches a given threshold according to (16) for the single output and (17) for the multi-outputs:

$$ { \hbox{min} }(E[e^{2} ]) = { \hbox{min} }(E[(t - o)^{2} ]) $$
(16)
$$ \hbox{min} \left( {E\left[ {e^{\text{T}} e} \right]} \right) = { \hbox{min} }\left( {E\left[ {\left( {t - o} \right)^{\text{T}} \left( {t - o} \right)} \right]} \right) $$
(17)

3.2.2 MapReduce and Hadoop framework

MapReduce is a remarkable distributed computing model, offering two main operations named as Map and Reduce. Map function is responsible for data processing and computation. Reduce function, however operates the collecting and outputting operations. Specifically, the inputs and outputs for Map and Reduce are controlled by key-value pairs. Map processes each input key-value pair {K1, V1} and outputs intermediate output {K2, V2}. Reduce collects the output pairs with the same keys and executes merging, shuffling operations. And then Reduce outputs the final results {V2}.

Among a number of MapReduce implementations [28, 29], Hadoop framework [30] is the most famous one. Specifically, the nodes in a Hadoop cluster contribute their resources including processors, memory, hard disks and network adaptors to form hadoop distributed file system (HDFS) which is not only aiming at storing data but also the basic infrastructure of Hadoop. The nodes are categorized into one NameNode managing the metadata of cluster and several DataNodes executing computations. The implementations of Map and Reduce functions in Hadoop are named as mapper and reducer which are located in DataNodes. Based on the design of HDFS, Hadoop supplies remarkable scalability, fault tolerance, load balancing, heterogeneous environment support, and efficiency in dealing with large volume of data.

3.2.3 Ensemble technique

The presented parallelization of BPNN is based on data separation. The main idea is to divide the training data set into a number of sub-sets. Each sub-set is input into a sub-BPNN maintained by one mapper in the Hadoop cluster. As a result, each sub-BPNN is only trained by a part of original data set so as to improve the training efficiency. However, the merely simple data separation encounters a problem that partial training data results in insufficiently trained NN, which may lose accuracy in classification. Therefore, this paper employs ensemble technique including bootstrapping and majority voting. The solution of insufficient training for sub-BPNN based on bootstrapping and majority voting [23] is further detailed in Appendix B.

3.2.4 Algorithm design

As long as each bootstrapped sub-set is generated by using bootstrapping, each instance in one sub-set is defined in the format of <instance k , target k , type>, where

instance k represents the bootstrapped instance, which is the input of neural network; target k represents the desirable output if instance k is a training instance; type field has two values: “train” and “test”, which labels the type of instance k . Therefore the sub-BPNN in a mapper is aware of the instance type moreover executing training or classification operations.

Afterwards each individual mapper in the Hadoop cluster constructs one BPNN and initializes weights and biases with random values between -1 and 1 for its neurons. And then the mapper inputs one instance in the form of <instance k , target k , type> from one input sub-set of the mapper.

The mapper parses the data and retrieves the type of the instance. If the type value is “train”, the instance is fed into the input layer. Secondly, each neuron in different layers computes its output until the output layer generates an output which indicates the completion of feed forward process. And then the mapper starts a back propagation process and updates weights and biases for neurons. If one input instance is labeled as “test”, all the mappers start to classify the instances. In this case, each mapper generates an intermediate output in the form of < instance k , o jm > where instance k is the key and o jm represents the classification result of the m th mapper.

Finally, a reducer collects the intermediate outputs of all the mappers. As all the outputs have the same key instance k , the reducer merges these outputs into one set, in which the reducer executes majority voting and outputs the finally voted result of instance k into HDFS in the form of <instance k , r k > where r k represents the voted classification result of instance k . Fig. 4 and Fig. 5 illustrate the algorithm architecture for training and classification procedure of MapReduce based BPNN respectively.

Fig. 4
figure 4

Algorithm architecture of parallelized BPNN training

Fig. 5
figure 5

Algorithm architecture of parallelized BPNN classification

Based on the majority voting, a number of sub-BPNNs (weak classifiers) can form a strong classifier. Therefore, although each sub-BPNN is only trained by a portion of the original training data which may lead to the wrongly classification, the final voted classification result of a number of sub-BPNNs has a higher chance to be correct with higher efficiency for dealing with a large volume of data.

3.3 Implementation framework

The presented technique is able to be implemented to WAMS application platform, enabling online prediction of transient stability for each generator after fault clearing. Compared with the conventional methods only predicting global stability [16], the proposed approach could provide more in-depth information for the potential emergency control which aims to resynchronize the disturbed power system as fast as possible. The implementation framework is simply shown in Fig. 6.

Fig. 6
figure 6

Implementation framework built on the presented technique

As illustrated in Fig. 6, a distributed simulator for fault response of power systems according to previous study [31] serves to generate massive data samples based on the given operation point in a parallel computing environment. Although it is an off-line procedure from the traditional concept, however, due to the fast variability of operation point including unexpected topology change, updating training samples ensures generalization of data. Therefore, the simulated fault scenarios which reflect various stability patterns of all the generators are collected accumulatively to update sample database in the proposed framework, resulting in an intensive computation burden of the standalone ANN training.

In the previous literature, the well-trained ANN is used as stability predictor which normally does not need update. However, the generalization of ANN could be enhanced by means of re-training the updated transient samples. The presented technique, shown in the dotted frame, enables high-performance computing framework and algorithm of large-scale parallelized BPNNs training. The training efficiency highly depends upon the number of DataNodes, which operates in a similar way of distributed samples generation [31].

Theoretically, in order to enable industrial application, the time consumption of samples update and thousands of BPNNs re-training could be reduced to an acceptable level, few minutes for example, by configuring sufficient computing resources based on the scalable Hadoop framework. The trained parallel BPNNs for each generator are stored in the distributed environment. When a fault occurs in power system, the PMUs installed on generator busbars capture the disturbance signal in a few cycles, providing the trajectories-based features which are imported into the well-trained parallel CUGs predictors that can avoid time-consuming series ANNs prediction. As a result, CUGs information, the most leading or lagging generators, will be sent back to the WAMS application, contributing to further emergency or correction control.

4 Case study

4.1 Test systems and training data

NPCC 48-machines test system and a provincial power system in the southwest of China [31] are used to validate the proposed technique. Details of test systems are shown in Table 3.

Table 3 Details of test system

In order to obtain BPNN-based CUGs predictor with high generalization ability, rich data samples are required, preferably including all the potential operation conditions and fault modes. Therefore, massive samples are generated for both test systems. In this work, a distributed random fault simulator has been developed to generate massive samples [31]. Random fault refers to stochastic three-phase short circuits on any transmission lines. In addition, fault clearing time is randomly set between 0.1 s to 0.35 s. The samples generation procedure is in the following steps.

Step 1: Load base case, if the initial outage exists, trip the component and calculate power flow.

Step 2: Change P and Q on each bus by multiply a random number in the range of [0.8, 1.4] to simulate load level, allocate unbalance power to all generators in proportion to their base generation.

Step 3: Implement three-phase fault on a randomly selected component at time T f , clear fault at T f +μ, where μ is a random decimal in [0.1, 0.35].

Step 4: Perform time-domain simulation for above randomly configured operation and fault scenario, collect output trajectories to calculate features.

The results of data samples production based on the fault simulator are detailed in Table 4. In addition, the volume of data which are fed to train and validate parallel BPNNs in this study is shown in Table 5 according to (9).

Table 4 Details of simulated samples
Table 5 Volume of data

In Table 5, V max and V min represent the maximal and minimal block of data samples for individual generator in test systems. Table 4 and Table 5 indicate that the massive samples production as well as ∆T-dependent features definition for generators according to (5) results in a huge volume of data. However, a standalone BPNN inputs instances one by one leading to both sizable IO overhead and considerable computational overhead. As a result, the most reasonable way of identifying the CUGs in such a great volume of data is to apply the presented parallel BPNN which parallelizes both training and classification phase. Furthermore, it is the most feasible approach to enable WAMS application integrated with CUGs prediction according to the architecture in Fig. 6.

4.2 Computing cluster configuration

In order to evaluate the performance of MapReduce based parallel BPNN enabling predicting CUGs after fault clearing, a practical Hadoop cluster was built up. The cluster contains ten nodes, nine of which are Datanodes and the rest one is Namenode. Table 6 shows the configuration of the cluster.

Table 6 Configuration of computing cluster

4.3 Evaluation of MapReduce based parallel BPNN

4.3.1 Precision of CUGs prediction

In this evaluation, we tested the algorithm precision of the generator status prediction. In terms of precision, when the number of training instance is large, the presented MapReduce based parallel BPNN has the same precision compared to that of standalone BPNN. Therefore, the following only lists the precision of the parallel BPNN without comparison with a standalone BPNN algorithm. Fig. 7 illustrates the precision of CUG identification for two test systems.

Fig. 7
figure 7

Precision of predicted CUGs

The figure recording the CUGs predicting precision of test systems indicates that the parallel BPNN is of satisfactorily high precision in identifying the generators transient status during the post-fault period of power systems. The average precisions for all generators of two test systems are 99.18% and 98.57% respectively.

However, it is worth noting that features dimension which depends upon the determination of ∆T affects CUGs prediction precision. In this study, we choose BPNNs possessing the highest average precision to be the CUGs predictor, even though it is not the best ∆T choice for all the generators. Table 7 gives the details.

Table 7 Precision details of test systems

Table 7 indicates that when the value of ∆T equal to 5 cycles and 8 cycles respectively, two test systems get the BPNNs predictor with highest average precision. However, the particular machines, 3rd generator in system I and 16th generator in system II for example, could only obtain their most precise NN-based predictors by setting longer ∆T.

4.3.2 New samples validation

In order to test the generalization ability of parallelized ANNs based CUGs predictor, thousands of new simulated samples of two test systems are generated and fed into the trained ANNs to assess their prediction precision. The additional samples information is listed in Table 8.

Table 8 Details of new cases for prediction precision test

Aiming to validate the comprehensive performance of MapReduce based CUGs predictor with new data cases, the numerical studies are investigated by retraining the BPNNs using the initial data samples introduced in Table 4 with different ∆T setting. Then the well-trained BPNNs are applied to predict CUGs using the data of new cases as input with the corresponding ∆T setting. The test results of precision statistics are provided in Table 9.

Table 9 Results of new sample cases test

The validation results listed in Table 9 indicate that the parallelized BPNNs based on the presented technique have a well-performance of generalization ability for the new data cases which are not included in the training data set.

4.3.3 Features dimension impact analysis

As discussed above, the features dimension determined by ∆T according to (3) is highly related to the prediction precision. However, it is hardly to understand in advance that which ∆T choice is of the highest precision. Owing to MapReduce and HDFS technique, the parallelized BPNNs with thirteen types of features dimension for every single generator are trained and validated at the same time.

In this test, we focus on the comparison of algorithm precision with increasing dimension of the instance. Fig. 8 shows the variation of CUGs prediction precision against the change of ∆T.

Fig. 8
figure 8

Variation of prediction precision against the change of ∆T

It can be observed that, along with an increasing number of elements involved in an instance, the final precision of BPNNs trained for single generator changes in a nonlinear way. It is assumed that over-fitting largely affects training procedure. Moreover, due to the major part of generators obtain their most accurate BPNNs when ∆T equal 0.1 s and 0.16 s respectively in System I and System II, the average precisions are highest by selecting these ∆T for all generators. Fig. 9 illustrates statistical distribution of BPNNs with highest precision in term of ∆T.

Fig. 9
figure 9

Distribution of BPNNs with highest precision

4.3.4 Validation of ensemble training

Figure 10 indicates that the presented ensemble based neural network algorithm outperforms the standalone neural network in terms of precision.

Fig. 10
figure 10

Precision of less number of training instances

The figure shows that, when the number of training instance is small, the precision of the ensemble training strategy in the presented parallel BPNN outperforms that of a standalone BPNN algorithm. The figure also tells that the precision of ensemble training strategy increases stably without fluctuations.

4.3.5 Algorithm efficiency

In this test, we primarily focused on the evaluation of algorithm efficiency. We duplicate the training data from 1 MB to 1024 MB, with 16 mappers employed. The experimental result is shown in Fig. 11.

Fig. 11
figure 11

Distribution of BPNNs with highest precision

It can be observed that when the data size is small, the standalone BPNN outperforms the parallel BPNN due to the overhead of Hadoop framework. However, when the data size keeps increasing, the parallel BPNN can still execute efficiently. Contrarily, the standalone BPNN needs more time to execute the data processing.

According to the test result, the time consumption of training 1 GB data by standalone BPNN costs around 9000 s. It is obviously not acceptable for continually updated samples production application, such as the technique architecture presented in Fig. 6. In order to investigate efficiency of the presented CUGs prediction algorithm, the comprehensive comparison in term of training time consumption is performed in Table 10.

Table 10 Efficiency comparison

The efficiency comparison indicates that along with the computing resource increasing, the algorithm processing time is largely reduced without accuracy loss. If more Datanodes are configured, the efficiency will be further improved so as to accomplish training procedure in a few minutes even a few seconds. That enables the on-line training for the huge bulk of updated samples.

5 Conclusion

This paper presents a high-performance CUGs prediction approach using MapReduce based parallel BPNN. Our work in the first place employs time domain simulation to generate massive disturbed scenarios using the published fault simulator. Secondly we propose features selection principles to produce feature vector which represents the system status with reasonable data dimension. Thirdly for overcoming the disadvantages of standalone application, MapReduce based BPNNs technique is developed aiming to facilitate simultaneous training for every single generator. The presented methodology employs ensemble technique and data separation to enhance the training efficiency, whilst it uses data separation in enabling scalable integration of computing resource as well as large classification efficiency improvement. The experiment results show that the presented technique is able to predict CUGs especially in large-scale data with high accuracy and efficiency, providing a way to in-depth awareness of stability based on WAMS architecture.