# Fault detection and classification in smart grids using augmented K-NN algorithm

- 155 Downloads

**Part of the following topical collections:**

## Abstract

The ability of artificial intelligence and machine learning techniques in classification and detection of the types of data in large datasets lead to their popularity among scientists and researchers. Because of the presence of different load at different times in power systems, it is hard to provide an accurate mathematical model for such systems. On the other hand, most of the available protection devices in power grids work based on the estimated mathematical models of the grid. For this reason, power system utilizers usually suffer from the low accuracy of the available protection systems in fault detection and diagnosis. In this paper, a reliable machine learning technique is proposed to detect and classify different faults of smart grids. The proposed technique benefits from the principal component analysis (PCA) and linear discriminant analysis (LDA). The PCA is used to reduce the size of the dataset matrixes. The applied PCA reduces the dataset sizes and eliminates the possible singularity of the datasets. The LDA method is applied to the outputs data of the PCA to minimize the with-in class distance of the dataset and maximize the distance between classes. Finally, the well-known K-nearest neighbor technique is applied to detect the fault and determine its classes. The paper results demonstrate the effectiveness and robustness of the proposed algorithm in the determination of the fault class in smart grids.

## Keywords

Classification Fault detection K-NN LDA PCA Smart grid## 1 Introduction

Penetration of renewable energy-based power plants in smart grids bring about some specific instability issues because of their fast dynamics. For this reason, a reliable control and monitoring system are required to increase grid reliability in the presence of such sources. The employed control system is augmented when an accurate fault diagnosis technique is used in the power grids relays to properly control the power system circuit breakers in an acceptable time period to prevent instability. On the other hand, machine learning algorithms enable engineers to classify large scale data in separated categories as fast as possible. Availability of various non-linear power components in power grids intricate preparation of a precise mathematical model. So, researchers have, recently, tried to utilize the machine learning methods as a high-precision technique to recognize the fault types in power grids.

The appearance of intelligent algorithms has divided fault detection techniques into two different categories. In the conventional detection approach, engineers use the mathematical model of the power-grids to determine the type of fault to disconnect the related breakers from the network [1, 2, 3, 4]. Fault clearance, using conventional relaying systems, is not safe enough and, in different occurrences, relays command some unnecessary breakers to disconnect a healthy transmission line from the network. Such disconnections are not optimum and lead to the shot-down of a greater area of the system. Phase difference technique [5], symmetrical sequence component [6], and fault detection and classification using conventional signal analysis method (e.g. wavelets, FFT, IFFT) [7] are the most commonly used techniques in the conventional relays. Because of the fact that the mathematical model of a power grid is formulized based on some simplifications, the accuracy of the aforementioned techniques is not acceptable. For instance, the proposed method in [6] fails when it is utilized in a nonsymmetrical system. Also, FFT based algorithms have a complex nature which decelerates the system response, especially in large systems.

On the other side of the coin, intelligent algorithms and learning techniques demonstrate their reliability in comparison to conventional systems. Fuzzy logic (FL) and genetic algorithm (GA) are two widely used methods used in fault detection and diagnosis of the power grids [8]. Fuzzification in different membership function has a determinative role to increase the precision of the FL controller. However, the increase in the number of memberships functions and, consequently, the increment of the fuzzy rules slows the system response. For this reason, a suitable FL controller must not include a large number of membership functions. In such a system, it is possible that the controller converges to a wrong answer. Moreover, although GA is robust enough in noisy systems [9], it is hard to apply the GA in large scale systems. Hence, the introduction of supervised machine learning methods which are simple to implement and fast for detection and classification help engineers to present more reliable protection systems. Artificial neural network (ANN) [10] and decision tree [11] are two previously used algorithms in fault detection application of smart-grids. In spite of the fact that these algorithms have a good ability in the detection of a fault, they suffer from a lack of classification ability. Also, their various features intricate the implementation of such an algorithm in commercial microcontrollers. The stacked autoencoder (SAE) proposed in [12] tried to address the implementation difficulties of the conventional NN by a dimension reduction technique and noise cancellation. The Bayesian network which is worked based on very large conditional probability table is another category of the employed machine-learning method in fault detection which cannot find popularity among engineers because of the need for huge computer memory. K-nearest neighbor (KNN) is able to detect and classify faults with the accuracy of 99% while it does not require a preprocessing [13]. The KNN cannot reach the right answer in a large dataset or high dimensional features. Exponentially weighted moving average (EWMA) and anomaly detection approaches [14] has been used in associated with the conventional KNN to obtain precise results in large datasets.

In this paper, the KNN technique augmented with principal component analysis (PCA) and linear discriminant analysis (LDA) is used to detect and classify different faults in a smart grid. In the first stage of the proposed classification approach, PCA method which uses simple matrix operations and statistics to calculate a projection of the original data into fewer dimensions is applied to the input datasets. In the second stage, the LDA is applied to find a linear combination of features to reduce the events features. Finally, the KNN detects the received data to categorize the input data into its related class. Although all parts of the proposed approach have been introduced in previous literature, the utilization of all of them has not been investigated to classify the fault signals of a power grid. The result of this paper demonstrates the robustness and effectiveness of the proposed method in the classification of a wide range of faults occurred in the smart grids. The different parts of the understudy smart grid as a sample network and the considered fault is discussed in the next section. The third section of the paper explains the proposed algorithm with details. And at the last section, a comprehensive study is done on different faults of the smart grid to prove the acceptable performance of the system.

## 2 Smart grid

In this study, three-phase short circuit (LLL), line to line short circuit (LL), three-phase to ground (LLLG), line–line short circuit connected to ground (LLG), single line to ground short circuit (LG) engender five types of fault classes which can be occurred on AC side of the smart grid. Moreover, DC-links of all distributed generators (DGs) are endangered of short circuit or open circuit faults which are studied as the DC side faults. The DC-link short circuit and open circuit faults of the solar power plant are shown by SDPV and ODPV. Because of the fact that a fault can occur at the low voltage side of all DGs six different fault for these locations are considered. The open-circuit fault and short circuit fault at the low voltage side of the solar power plant, are shown by OLPV and SLPV. Also, these faults for fuel cell are shown by OLFC and SLFC. The battery package low voltage side faults shown by OLB and SLB are the open circuit and short circuit faults, respectively. In brief, all of the aforementioned categories are considered as different categories to scrutinize the ability of the proposed detection technique.

## 3 Artificial intelligence

There are many different methods to project features into the lower dimensions such as factor analysis, PCA, LDA, locally linear embedding (LLE), or Multi-dimensional scaling (MDS), and isometric feature mapping (Isomap).

### 3.1 Dimension reduction

#### 3.1.1 PCA

*l*represent the number of features. As a principal, the first eigenvalues (\(\lambda_{i}\)) of the eigenvector matrix gives the direction of the maximum spread of the data [21]. So, the largest

*k*eigenvalues (principal components) of the covariance matrix are chosen to create the matrix

*U*

_{reduce}. Let

*U*represent a matrix that every column of matrix

*U*is eigenvector of the matrix ∑ then the first

*k*column of the

*U*is chosen to create the second matrix named

*U*

_{reduce}which inherently have

*n*rows and

*k*columns. The following condition must be considered to determine the number of

*k*.

*m*represents the number of samples and threshold is considered in the range of 0–1, arbitrary. It must be noticed that the higher threshold provides the best projection to maximize

*k*. Therefore, the new dataset in a lower dimension is computed by (3).

A reconstructable matrix in *k* dimension has been prepared using the PCA method. Now, the LDA method can be applied to the dataset.

#### 3.1.2 LDA

*Z*to find

*Z*

_{ob}, that maximizes the ratio of between-class scatter

*S*

_{B}against within-class scatter

*S*

_{W}(Fisher’s criterion)

*S*

_{W}is obtained as below:

*S*

_{W}is invertible because PCA eliminate the singularity of matrix X. The required

*S*

_{B}and

*S*

_{W}as the between class and within class scatters are calculated by (7) and (8), respectively [26, 27].

*j*is number of classes and \(n_{i}\) is number of training examples. So, the new dataset is achieved as below:

### 3.2 Classification

#### 3.2.1 K nearest neighbor

The benefits method uses PCA to reduce the signal features in the first step. After that, the LDA is applied to the output data of the LDA to achieve the optimum features of the signals. These reductions improve the performance of KNN in the classifications stage. Moreover, the signal sampling (i.e. 5000 samples per second), which is discussed in the next section, is a useful preprocessing step to decrease the data sizes of the available big data. Such a sampling strategy speeds up the data analyzing and processing which eliminates the need for use of the expensive commercial processors.

## 4 Simulation result and discussion

The ability of the introduced KNN technique in fault detection is examined in thirteen different scenarios. First, a comprehensive dataset is provided by applying all types of faults which previously introduced in the second section. The provided dataset is required to construct the training classes for classification. The considered threshold and K for the PCA and KNN implementation for the acquired datasets are equal to 1.

Performance of the proposed classification algorithm

Fault type | Number of tests | Number of true predictions | Accuracy (%) |
---|---|---|---|

LLL | 45 | 44 | 97.77 |

LL | 45 | 43 | 95.55 |

LLLG | 45 | 45 | 100 |

LLG | 45 | 42 | 93.33 |

LG | 45 | 43 | 95.55 |

SLPV | 45 | 45 | 100 |

OLPV | 45 | 43 | 95.56 |

SLFC | 45 | 44 | 97.78 |

OLFC | 45 | 41 | 91.11 |

OLB | 45 | 41 | 91.11 |

SLB | 45 | 45 | 100 |

SDPV | 45 | 43 | 95.56 |

ODPV | 45 | 42 | 93.33 |

Total accuracy (%) | 95.9 |

PCA base-line: performance of the KNN classification algorithm when PCA is the only instrument for feature reduction

Fault type | Number of tests | Number of true predictions | Accuracy (%) |
---|---|---|---|

LLL | 45 | 16 | 35.5 |

LL | 45 | 14 | 31.11 |

LLLG | 45 | 25 | 55.55 |

LLG | 45 | 18 | 40 |

LG | 45 | 11 | 24.44 |

SLPV | 45 | 12 | 26.66 |

OLPV | 45 | 14 | 31.11 |

SLFC | 45 | 17 | 37.7 |

OLFC | 45 | 13 | 28.88 |

OLB | 45 | 8 | 17.77 |

SLB | 45 | 11 | 24.44 |

SDPV | 45 | 19 | 42.22 |

ODPV | 45 | 17 | 37.77 |

Total accuracy (%) | 33.3 |

LDA base-line: performance of the KNN classification algorithm when LDA is the only instrument for feature reduction

Fault type | Number of tests | Number of true predictions | Accuracy (%) |
---|---|---|---|

LLL | 45 | 14 | 31.11 |

LL | 45 | 9 | 20 |

LLLG | 45 | 12 | 26.66 |

LLG | 45 | 6 | 13.33 |

LG | 45 | 12 | 26.66 |

SLPV | 45 | 11 | 24.44 |

OLPV | 45 | 9 | 20 |

SLFC | 45 | 7 | 15.55 |

OLFC | 45 | 11 | 24.44 |

OLB | 45 | 9 | 20 |

SLB | 45 | 14 | 31.11 |

SDPV | 45 | 5 | 11.11 |

ODPV | 45 | 13 | 28.88 |

Total accuracy (%) | 22.56 |

## 5 Conclusion

A classification technique based-on the conventional K-NN algorithm is proposed to detect and classify different types of fault in a smart grid. In the proposed technique, the PCA method is used to decrease the dataset size while LDA provides online classification before applying the K-NN. Simulation results demonstrate the effectiveness and robustness of the such an augmented K-NN technique in fault detection and classification. Because of the fact that the proposed method has an acceptable detection accuracy with a low sample rate in presence of different fault types, it could be easily applied to the commercial microprocessors.

High impedance faults (HIFs) can be occurred when the transmission lines are grounded or connected to each other through a high impedance connection way. In this condition, the available power system relays encounter a problem to detect the fault clearly. In the feature work, authors will try to introduce a simple detection technique to increase the precision of the HIF detection systems by means of the machine learning algorithms.

## Notes

### Compliance with ethical standards

### Conflict of interest

The authors declare that they have no conflict of interest.

### Ethical approval

This article does not contain any studies with human or animal subjects.

## References

- 1.Alhelou HH, Golshan MH, Askari-Marnani J (2018) Robust sensor fault detection and isolation scheme for interconnected smart power systems in presence of RER and EVs using unknown input observer. Int J Electr Power Energy Syst 99:682–694CrossRefGoogle Scholar
- 2.Alhelou HH (2019) Fault detection and isolation in power systems using unknown input observer. In: Advanced condition monitoring and fault diagnosis of electric machines. IGI Global, pp 38–58Google Scholar
- 3.Triki-Lahiani A, Abdelghani ABB, Slama-Belkhodja I (2018) Fault detection and monitoring systems for photovoltaic installations: a review. Renew Sustain Energy Rev 82:2680–2692CrossRefGoogle Scholar
- 4.Taheri B, Razavi F (2018) Power swing detection using rms current measurements. J Electr Eng Technol 13(5):1831–1840Google Scholar
- 5.Patel TK, Mohanty SK, Mohapatra S (2017). Fault detection during power swing by phase difference technique. In: 2017 innovations in power and advanced computing technologies (i-PACT). IEEE, pp 1–6Google Scholar
- 6.Abdel-Akher M, Nor KM (2010) Fault analysis of multiphase distribution systems using symmetrical components. IEEE Trans Power Deliv 25(4):2931–2939CrossRefGoogle Scholar
- 7.Xu X, Peters JF (2002) Rough set methods in power system fault classification. In: IEEE CCECE2002. Canadian conference on electrical and computer engineering. Conference proceedings (Cat. No. 02CH37373), vol 1. IEEE, pp 100–105Google Scholar
- 8.Cho HJ, Park JK (1997) An expert system for fault section diagnosis of power systems using fuzzy relations. IEEE Trans Power Syst 12(1):342–348CrossRefGoogle Scholar
- 9.Wen FS, Chang CS (1997) Probabilistic approach for fault-section estimation in power systems based on a refined genetic algorithm. IEE Proc Gener Transm Distrib 144(2):160–168CrossRefGoogle Scholar
- 10.Fernandez AO, Ghonaim NKI (2002) A novel approach using a FIRANN for fault detection and direction estimation for high-voltage transmission lines. IEEE Trans Power Deliv 17(4):894–900CrossRefGoogle Scholar
- 11.Sheng Y, Rovnyak SM (2004) Decision tree-based methodology for high impedance fault detection. IEEE Trans Power Deliv 19(2):533–536CrossRefGoogle Scholar
- 12.Wang Y, Liu M, Bao Z (2016) Deep learning neural network for power system fault diagnosis. In: 2016 35th Chinese control conference (CCC). IEEE, pp 6678–6683Google Scholar
- 13.Yadav A, Swetapadma A (2014) Fault analysis in three phase transmission lines using k-nearest neighbor algorithm. In: 2014 international conference on advances in electronics computers and communications. IEEE, pp 1–5Google Scholar
- 14.Harrou F, Taghezouit B, Sun Y (2019) Improved
*k*NN-based monitoring schemes for detecting faults in PV systems. IEEE J Photovolt 9(3):811–821CrossRefGoogle Scholar - 15.Kang X, Xiang X, Li S, Benediktsson JA (2017) PCA-based edge-preserving features for hyperspectral image classification. IEEE Trans Geosci Remote Sens 55(12):7140–7151CrossRefGoogle Scholar
- 16.Jing C, Hou J (2015) SVM and PCA based fault classification approaches for complicated industrial process. Neurocomputing 167:636–642CrossRefGoogle Scholar
- 17.Khosravi MR, Sharif-Yazd M, Moghimi MK, Keshavarz A, Rostami H, Mansouri S (2015) MRF-based multispectral image fusion using an adaptive approach based on edge-guided interpolation. arXiv:1512.08475
- 18.Lazzari E, Schena T, Marcelo MCA, Primaz CT, Silva AN, Ferrão MF, Bjerk T, Caramão EB (2018) Classification of biomass through their pyrolytic bio-oil composition using FTIR and PCA analysis. Ind Crops Prod 111:856–864CrossRefGoogle Scholar
- 19.Phillips PJ, Flynn PJ, Scruggs T, Bowyer KW, Chang J, Hoffman K, Marques J, Min J, Worek W (2005) Overview of the face recognition grand challenge. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1. IEEE, pp 947–954Google Scholar
- 20.Asadi S, Rao CDVS, Saikrishna V (2010) A comparative study of face recognition with principal component analysis and cross-correlation technique. Int J Comput Appl 10(8):17–21Google Scholar
- 21.Hagar AA, Alshewimy MA, Saidahmed MTF (2016) A new object recognition framework based on PCA, LDA, and K-NN. In: 2016 11th international conference on computer engineering and systems (ICCES). IEEE, pp 141–146Google Scholar
- 22.Zhang X, Peng F, Long M (2018) Robust coverless image steganography based on DCT and LDA topic classification. IEEE Trans Multimed 20(12):3223–3238CrossRefGoogle Scholar
- 23.Chen Q, Yao L, Yang J (2016). Short text classification based on LDA topic model. In: 2016 international conference on audio, language and image processing (ICALIP). IEEE, pp 749–753Google Scholar
- 24.Khosravi MR, Akbarzadeh O, Salari SR, Samadi S, Rostami H (2017) An introduction to ENVI tools for synthetic aperture radar (SAR) image despeckling and quantitative comparison of denoising filters. In: 2017 IEEE international conference on power, control, signals and instrumentation engineering (ICPCSI). IEEE, pp 212–215Google Scholar
- 25.Varatharajan R, Manogaran G, Priyan MK (2018) A big data classification approach using LDA with an enhanced SVM method for ECG signals in cloud computing. Multimed Tools Appl 77(8):10195–10215CrossRefGoogle Scholar
- 26.Menhour I, Fergani B (2018). A new framework using PCA, LDA and KNN-SVM to activity recognition based smartphone’s sensors. In: 2018 6th international conference on multimedia computing and systems (ICMCS). IEEE, pp 1–5Google Scholar
- 27.Yu H, Yang J (2001) A direct LDA algorithm for high-dimensional data—with application to face recognition. Pattern Recognit 34(10):2067–2070CrossRefGoogle Scholar
- 28.Zhang S, Li X, Zong M, Zhu X, Cheng D (2017) Learning k for knn classification. ACM Trans Intell Syst Technol (TIST) 8(3):43Google Scholar
- 29.Vinoj PG, Jacob S, Menon VG, Rajesh S, Khosravi MR (2019) Brain-controlled adaptive lower limb exoskeleton for rehabilitation of post-stroke paralyzed. IEEE Access 7:132628–132648CrossRefGoogle Scholar
- 30.Khosravi MR, Bahri-Aliabadi B, Salari R, Samadi S, Rostami H, Karimi V (2018) A tutorial and performance analysis on ENVI tools for SAR image despeckling. Curr Signa Transduct Ther 13:1–8CrossRefGoogle Scholar
- 31.Tomašev N, Buza K (2015) Hubness-aware kNN classification of high-dimensional data in presence of label noise. Neurocomputing 160:157–172CrossRefGoogle Scholar