1 Introduction

Penetration of renewable energy-based power plants in smart grids bring about some specific instability issues because of their fast dynamics. For this reason, a reliable control and monitoring system are required to increase grid reliability in the presence of such sources. The employed control system is augmented when an accurate fault diagnosis technique is used in the power grids relays to properly control the power system circuit breakers in an acceptable time period to prevent instability. On the other hand, machine learning algorithms enable engineers to classify large scale data in separated categories as fast as possible. Availability of various non-linear power components in power grids intricate preparation of a precise mathematical model. So, researchers have, recently, tried to utilize the machine learning methods as a high-precision technique to recognize the fault types in power grids.

The appearance of intelligent algorithms has divided fault detection techniques into two different categories. In the conventional detection approach, engineers use the mathematical model of the power-grids to determine the type of fault to disconnect the related breakers from the network [1,2,3,4]. Fault clearance, using conventional relaying systems, is not safe enough and, in different occurrences, relays command some unnecessary breakers to disconnect a healthy transmission line from the network. Such disconnections are not optimum and lead to the shot-down of a greater area of the system. Phase difference technique [5], symmetrical sequence component [6], and fault detection and classification using conventional signal analysis method (e.g. wavelets, FFT, IFFT) [7] are the most commonly used techniques in the conventional relays. Because of the fact that the mathematical model of a power grid is formulized based on some simplifications, the accuracy of the aforementioned techniques is not acceptable. For instance, the proposed method in [6] fails when it is utilized in a nonsymmetrical system. Also, FFT based algorithms have a complex nature which decelerates the system response, especially in large systems.

On the other side of the coin, intelligent algorithms and learning techniques demonstrate their reliability in comparison to conventional systems. Fuzzy logic (FL) and genetic algorithm (GA) are two widely used methods used in fault detection and diagnosis of the power grids [8]. Fuzzification in different membership function has a determinative role to increase the precision of the FL controller. However, the increase in the number of memberships functions and, consequently, the increment of the fuzzy rules slows the system response. For this reason, a suitable FL controller must not include a large number of membership functions. In such a system, it is possible that the controller converges to a wrong answer. Moreover, although GA is robust enough in noisy systems [9], it is hard to apply the GA in large scale systems. Hence, the introduction of supervised machine learning methods which are simple to implement and fast for detection and classification help engineers to present more reliable protection systems. Artificial neural network (ANN) [10] and decision tree [11] are two previously used algorithms in fault detection application of smart-grids. In spite of the fact that these algorithms have a good ability in the detection of a fault, they suffer from a lack of classification ability. Also, their various features intricate the implementation of such an algorithm in commercial microcontrollers. The stacked autoencoder (SAE) proposed in [12] tried to address the implementation difficulties of the conventional NN by a dimension reduction technique and noise cancellation. The Bayesian network which is worked based on very large conditional probability table is another category of the employed machine-learning method in fault detection which cannot find popularity among engineers because of the need for huge computer memory. K-nearest neighbor (KNN) is able to detect and classify faults with the accuracy of 99% while it does not require a preprocessing [13]. The KNN cannot reach the right answer in a large dataset or high dimensional features. Exponentially weighted moving average (EWMA) and anomaly detection approaches [14] has been used in associated with the conventional KNN to obtain precise results in large datasets.

In this paper, the KNN technique augmented with principal component analysis (PCA) and linear discriminant analysis (LDA) is used to detect and classify different faults in a smart grid. In the first stage of the proposed classification approach, PCA method which uses simple matrix operations and statistics to calculate a projection of the original data into fewer dimensions is applied to the input datasets. In the second stage, the LDA is applied to find a linear combination of features to reduce the events features. Finally, the KNN detects the received data to categorize the input data into its related class. Although all parts of the proposed approach have been introduced in previous literature, the utilization of all of them has not been investigated to classify the fault signals of a power grid. The result of this paper demonstrates the robustness and effectiveness of the proposed method in the classification of a wide range of faults occurred in the smart grids. The different parts of the understudy smart grid as a sample network and the considered fault is discussed in the next section. The third section of the paper explains the proposed algorithm with details. And at the last section, a comprehensive study is done on different faults of the smart grid to prove the acceptable performance of the system.

2 Smart grid

According to Fig. 1, a 100 kW solar power plant, a 5 kW energy storage, a fuel cell package with the power of 20 kW, and a linear load with the capacity of 800 kW + j 200 kVAR and a nonlinear load connected to a power system are all components of the understudy smart grid. The nonlinear load power varies between the 100 and 500 kW with different power factors. Such a smart grid is big enough to test all required faults and create the needed dataset to thoroughly study a fault detection system. In fact, the power system loading depends on a large number of variables such as the environment temperature, sun irradiation, stored energy in batteries, nonlinear load, and also operation of the fuel-cell. After the preparation of some data sets for different fault categories, a large number of sample test can be provided to examine the detection technique.

Fig. 1
figure 1

The understudy smart grid considered for the proposed technique test

In this study, three-phase short circuit (LLL), line to line short circuit (LL), three-phase to ground (LLLG), line–line short circuit connected to ground (LLG), single line to ground short circuit (LG) engender five types of fault classes which can be occurred on AC side of the smart grid. Moreover, DC-links of all distributed generators (DGs) are endangered of short circuit or open circuit faults which are studied as the DC side faults. The DC-link short circuit and open circuit faults of the solar power plant are shown by SDPV and ODPV. Because of the fact that a fault can occur at the low voltage side of all DGs six different fault for these locations are considered. The open-circuit fault and short circuit fault at the low voltage side of the solar power plant, are shown by OLPV and SLPV. Also, these faults for fuel cell are shown by OLFC and SLFC. The battery package low voltage side faults shown by OLB and SLB are the open circuit and short circuit faults, respectively. In brief, all of the aforementioned categories are considered as different categories to scrutinize the ability of the proposed detection technique.

Figure 2 shows the simulated smart grid to study the proposed detection technique. The aforementioned LLL, LL, LLLG, LLG faults can occur at any point in the AC side of the network. In this study, considering the transmission line length the studied AC faults occur 100 km after the photovoltaic power plant. Also, the DC side introduced faults occur on the DC bus of each distributed generators.

Fig. 2
figure 2

The simulated power system with details

3 Artificial intelligence

There are many different methods to project features into the lower dimensions such as factor analysis, PCA, LDA, locally linear embedding (LLE), or Multi-dimensional scaling (MDS), and isometric feature mapping (Isomap).

3.1 Dimension reduction

3.1.1 PCA

PCA is a statistical feature extraction algorithm. It seems to be logical to use PCA in the presence of large datasets of variables where a small set of data contains the determinative information [15,16,17,18]. In addition, it is possible to apply PCA in some applications where the training samples are much smaller than the number of features. The dataset dimension reduction with the smallest projection error from the main dataset dimensions is carried out by PCA. The dimension reduction is required to remove the dataset redundancy (i.e. creation of orthogonal components), reduced complexity, and elimination of the noise effect [19, 20]. The covariance (∑) matrix should be computed to get eigenvectors of the covariance matrix by (1).

$$\sum \, = \frac{1}{l} \mathop \sum \limits_{i = 1}^{l} (X^{\left( i \right)} - \bar{X}^{\left( i \right)} )(X^{\left( i \right)} - \bar{X}^{\left( i \right)} )^{T}$$
(1)

where \(\bar{X}\) represents the mean value of sample vector \(X^{\left( i \right)}\) and l represent the number of features. As a principal, the first eigenvalues (\(\lambda_{i}\)) of the eigenvector matrix gives the direction of the maximum spread of the data [21]. So, the largest k eigenvalues (principal components) of the covariance matrix are chosen to create the matrix Ureduce. Let U represent a matrix that every column of matrix U is eigenvector of the matrix ∑ then the first k column of the U is chosen to create the second matrix named Ureduce which inherently have n rows and k columns. The following condition must be considered to determine the number of k.

$$\frac{{\mathop \sum \nolimits_{i = 1}^{k} \lambda_{i} }}{{\mathop \sum \nolimits_{i = 1}^{m} \lambda_{i} }} \ge threshold$$
(2)

where m represents the number of samples and threshold is considered in the range of 0–1, arbitrary. It must be noticed that the higher threshold provides the best projection to maximize k. Therefore, the new dataset in a lower dimension is computed by (3).

$$Z = U_{reduce} \times X$$
(3)

A reconstructable matrix in k dimension has been prepared using the PCA method. Now, the LDA method can be applied to the dataset.

3.1.2 LDA

LDA is used for the feature reduction and discrimination between categories of dependent variables [22,23,24,25]. LDA is applied to the dataset in order to find an updated subspace while the mapped data in the created subspace have the minimum scatteration in a same class and maximum distance with the data available in other classes. Now, (4) is applied to Z to find Zob, that maximizes the ratio of between-class scatter SB against within-class scatter SW (Fisher’s criterion)

$$Z_{ob} = \arg \mathop {\hbox{max} }\limits_{Z} \frac{{\left| {Z^{T} S_{B} Z} \right|}}{{\left| {Z^{T} S_{W} Z} \right|}} = \left[ {z\quad z_{2}\,.. .\,z_{m} } \right]$$
(4)

The optimized answer of the above equation by the assumption of invertibility of SW is obtained as below:

$$S_{B} z_{i} = \lambda_{i} S_{w} z_{i}$$
(5)
$$S_{w}^{ - 1} S_{B} z_{i} = \lambda_{i} z_{i} \quad i = 1, \ldots ,m$$
(6)

And SW is invertible because PCA eliminate the singularity of matrix X. The required SB and SW as the between class and within class scatters are calculated by (7) and (8), respectively [26, 27].

$$S_{B} = \phi_{B} \phi_{B}^{T}$$
(7)
$$S_{W} = \phi_{W} \phi_{W}^{T}$$
(8)

where \(\phi_{B}\) and \(\phi_{W}\) are as follow:

$$\phi_{B} = \left[ {\sqrt {\varvec{n}_{1} } \varvec{ }\left( {\varvec{\mu}_{1} -\varvec{\mu}} \right)\varvec{ },\varvec{ }\sqrt {\varvec{n}_{2} } \varvec{ }\left( {\varvec{\mu}_{2} -\varvec{\mu}} \right)\varvec{ },\varvec{ } \ldots } \right]\varvec{ } \left( {n*j} \right)$$
(9)
$$\phi_{W} = \left[ {\left( {\varvec{X}_{1} -\varvec{\mu}_{{\varvec{k}1}} } \right)\varvec{ },\varvec{ }\left( {\varvec{X}_{2} -\varvec{\mu}_{{\varvec{k}2}} } \right)\varvec{ },\varvec{ } \ldots } \right] \left( {n* n_{t} } \right)$$
(10)

where j is number of classes and \(n_{i}\) is number of training examples. So, the new dataset is achieved as below:

$$W = Z_{opt} \times Z$$
(11)

3.2 Classification

3.2.1 K nearest neighbor

KNN is an example-based algorithm with a wide range of applications [28,29,30,31]. The number of K in the fundamental structure of the KNN is required to determine the number of Ks nearest samples of a test. The test label is elected based on the labels of these samples. The algorithm computes the distance between every samples of a dataset and updates data by Euclidean distance as follow:

$$D\left( {x,y} \right) = \sqrt {\mathop \sum \limits_{i = 1}^{n} \left( {x_{i} - y_{i} } \right)^{2} }$$
(12)

For example, as it is shown in Fig. 3, the KNN method labeled the data as true data for a new data when k = 3 and labeled the data as the false for k = 7. Figure 4 shows the flowchart of the implemented detection and classification technique explained in this section.

Fig. 3
figure 3

The data labeling by the KNN

Fig. 4
figure 4

Flowchart of the classification technique used in this paper

The benefits method uses PCA to reduce the signal features in the first step. After that, the LDA is applied to the output data of the LDA to achieve the optimum features of the signals. These reductions improve the performance of KNN in the classifications stage. Moreover, the signal sampling (i.e. 5000 samples per second), which is discussed in the next section, is a useful preprocessing step to decrease the data sizes of the available big data. Such a sampling strategy speeds up the data analyzing and processing which eliminates the need for use of the expensive commercial processors.

4 Simulation result and discussion

The ability of the introduced KNN technique in fault detection is examined in thirteen different scenarios. First, a comprehensive dataset is provided by applying all types of faults which previously introduced in the second section. The provided dataset is required to construct the training classes for classification. The considered threshold and K for the PCA and KNN implementation for the acquired datasets are equal to 1.

In the first scenario, it is assumed that the LLL fault occurs on the three-phase transmission line where all DGs have at least 100 km distance with the fault location. According to the instability of the network shown in Fig. 5a, the occurrence happens at t = 1 s. Figure 5b–d show the power variation, DC-link voltage variations of the DGs, and DC-current fluctuation during the fault. It is expected that these fluctuations contain informative datasets for the proposed detection technique for classification. In the second scenario, the LLL fault studied in the first scenario fault is substituted with an LG fault. According to Fig. 6a, phase a is grounded and its voltage drops to zero at t = 1 s. The achieved data presented in Fig. 6b–d demonstrate the power, DC-link voltage, and DC-link fluctuations during faults. Pursuant to Fig. 7a which shows the zero voltage of A and B phases at t = 1 s, an LLG fault must occur on these phases in the third scenario. Due to the fact that the power of the bus 1, DC-link voltage and DC-link currents are the determinative data to generate the KNN required dataset, these fluctuations are respectively shown in Fig. 7b, c. The short circuit and open circuit faults of the DC-link are two other occurrences which are studied in fourth and fifth scenarios, respectively. According to Figs. 8a and 9a, these two faults are occurred at t = 1 s and cleared at t = 1.2 s. The power oscillation of the AC bus, DC-link voltage variation, and DC-link current variation like formerly explained scenarios shown in Figs. 8 and 9 provide the required datasets for the training process.

Fig. 5
figure 5

The LLL fault occurs on the three-phase transmission

Fig. 6
figure 6

The LG fault occurs on phase a of the three-phase transmission

Fig. 7
figure 7

The LL fault occurs on phase a and b of the three-phase transmission

Fig. 8
figure 8

The DC-link bus open circuit fault occurs on DC side of the network

Fig. 9
figure 9

The DC-link short circuit fault occurs on DC side of the network

These five scenarios are graphically shown in above mentioned figures. All other scenarios’ results and precision of the proposed detection technique are tabulated in Table 1. 2500 samples from different introduced faults are provided to examine the performance of the classification technique. According to the results, the proposed algorithm is able to detect and categorize different fault types with 95.9% accuracy. To highlight the performance of the proposed technique, the provided dataset is classified by means of KNN method when the PCA and LDA are used for feature reduction separately. The results of the conducted classification demonstrate that the KNN will have the accuracy of 33.3% when PCA is the only technique used in the feature reduction section. If the LDA is used for the feature reduction alone instead of the PCA the accuracy of the classification on the available data set is decreased to 22.6%. Tables 2 and 3 show the performance of the classification technique when the PCA and LDA are lonely used for feature reduction. In actuality, the PCA and LDA are not able to help KNN data classification alone. So, the consecutive utilization of PCA and LDA to reduce the signal features is the significant advantageous of the proposed method. It must be highlighted that the dataset is provided with 5000 samples per second for the accomplishment of the detection technique. Signal sampling with such scan rate does not require a high-speed high cost microprocessor. For this reason, the proposed detection technique could be easily implemented on a microprocessor. Robustness in fault detection and classification and simple implementation on commercial microchips are two outstanding features of the proposed technique.

Table 1 Performance of the proposed classification algorithm
Table 2 PCA base-line: performance of the KNN classification algorithm when PCA is the only instrument for feature reduction
Table 3 LDA base-line: performance of the KNN classification algorithm when LDA is the only instrument for feature reduction

5 Conclusion

A classification technique based-on the conventional K-NN algorithm is proposed to detect and classify different types of fault in a smart grid. In the proposed technique, the PCA method is used to decrease the dataset size while LDA provides online classification before applying the K-NN. Simulation results demonstrate the effectiveness and robustness of the such an augmented K-NN technique in fault detection and classification. Because of the fact that the proposed method has an acceptable detection accuracy with a low sample rate in presence of different fault types, it could be easily applied to the commercial microprocessors.

High impedance faults (HIFs) can be occurred when the transmission lines are grounded or connected to each other through a high impedance connection way. In this condition, the available power system relays encounter a problem to detect the fault clearly. In the feature work, authors will try to introduce a simple detection technique to increase the precision of the HIF detection systems by means of the machine learning algorithms.