Abstract
The ability of artificial intelligence and machine learning techniques in classification and detection of the types of data in large datasets lead to their popularity among scientists and researchers. Because of the presence of different load at different times in power systems, it is hard to provide an accurate mathematical model for such systems. On the other hand, most of the available protection devices in power grids work based on the estimated mathematical models of the grid. For this reason, power system utilizers usually suffer from the low accuracy of the available protection systems in fault detection and diagnosis. In this paper, a reliable machine learning technique is proposed to detect and classify different faults of smart grids. The proposed technique benefits from the principal component analysis (PCA) and linear discriminant analysis (LDA). The PCA is used to reduce the size of the dataset matrixes. The applied PCA reduces the dataset sizes and eliminates the possible singularity of the datasets. The LDA method is applied to the outputs data of the PCA to minimize the with-in class distance of the dataset and maximize the distance between classes. Finally, the well-known K-nearest neighbor technique is applied to detect the fault and determine its classes. The paper results demonstrate the effectiveness and robustness of the proposed algorithm in the determination of the fault class in smart grids.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Penetration of renewable energy-based power plants in smart grids bring about some specific instability issues because of their fast dynamics. For this reason, a reliable control and monitoring system are required to increase grid reliability in the presence of such sources. The employed control system is augmented when an accurate fault diagnosis technique is used in the power grids relays to properly control the power system circuit breakers in an acceptable time period to prevent instability. On the other hand, machine learning algorithms enable engineers to classify large scale data in separated categories as fast as possible. Availability of various non-linear power components in power grids intricate preparation of a precise mathematical model. So, researchers have, recently, tried to utilize the machine learning methods as a high-precision technique to recognize the fault types in power grids.
The appearance of intelligent algorithms has divided fault detection techniques into two different categories. In the conventional detection approach, engineers use the mathematical model of the power-grids to determine the type of fault to disconnect the related breakers from the network [1,2,3,4]. Fault clearance, using conventional relaying systems, is not safe enough and, in different occurrences, relays command some unnecessary breakers to disconnect a healthy transmission line from the network. Such disconnections are not optimum and lead to the shot-down of a greater area of the system. Phase difference technique [5], symmetrical sequence component [6], and fault detection and classification using conventional signal analysis method (e.g. wavelets, FFT, IFFT) [7] are the most commonly used techniques in the conventional relays. Because of the fact that the mathematical model of a power grid is formulized based on some simplifications, the accuracy of the aforementioned techniques is not acceptable. For instance, the proposed method in [6] fails when it is utilized in a nonsymmetrical system. Also, FFT based algorithms have a complex nature which decelerates the system response, especially in large systems.
On the other side of the coin, intelligent algorithms and learning techniques demonstrate their reliability in comparison to conventional systems. Fuzzy logic (FL) and genetic algorithm (GA) are two widely used methods used in fault detection and diagnosis of the power grids [8]. Fuzzification in different membership function has a determinative role to increase the precision of the FL controller. However, the increase in the number of memberships functions and, consequently, the increment of the fuzzy rules slows the system response. For this reason, a suitable FL controller must not include a large number of membership functions. In such a system, it is possible that the controller converges to a wrong answer. Moreover, although GA is robust enough in noisy systems [9], it is hard to apply the GA in large scale systems. Hence, the introduction of supervised machine learning methods which are simple to implement and fast for detection and classification help engineers to present more reliable protection systems. Artificial neural network (ANN) [10] and decision tree [11] are two previously used algorithms in fault detection application of smart-grids. In spite of the fact that these algorithms have a good ability in the detection of a fault, they suffer from a lack of classification ability. Also, their various features intricate the implementation of such an algorithm in commercial microcontrollers. The stacked autoencoder (SAE) proposed in [12] tried to address the implementation difficulties of the conventional NN by a dimension reduction technique and noise cancellation. The Bayesian network which is worked based on very large conditional probability table is another category of the employed machine-learning method in fault detection which cannot find popularity among engineers because of the need for huge computer memory. K-nearest neighbor (KNN) is able to detect and classify faults with the accuracy of 99% while it does not require a preprocessing [13]. The KNN cannot reach the right answer in a large dataset or high dimensional features. Exponentially weighted moving average (EWMA) and anomaly detection approaches [14] has been used in associated with the conventional KNN to obtain precise results in large datasets.
In this paper, the KNN technique augmented with principal component analysis (PCA) and linear discriminant analysis (LDA) is used to detect and classify different faults in a smart grid. In the first stage of the proposed classification approach, PCA method which uses simple matrix operations and statistics to calculate a projection of the original data into fewer dimensions is applied to the input datasets. In the second stage, the LDA is applied to find a linear combination of features to reduce the events features. Finally, the KNN detects the received data to categorize the input data into its related class. Although all parts of the proposed approach have been introduced in previous literature, the utilization of all of them has not been investigated to classify the fault signals of a power grid. The result of this paper demonstrates the robustness and effectiveness of the proposed method in the classification of a wide range of faults occurred in the smart grids. The different parts of the understudy smart grid as a sample network and the considered fault is discussed in the next section. The third section of the paper explains the proposed algorithm with details. And at the last section, a comprehensive study is done on different faults of the smart grid to prove the acceptable performance of the system.
2 Smart grid
According to Fig. 1, a 100 kW solar power plant, a 5 kW energy storage, a fuel cell package with the power of 20 kW, and a linear load with the capacity of 800 kW + j 200 kVAR and a nonlinear load connected to a power system are all components of the understudy smart grid. The nonlinear load power varies between the 100 and 500 kW with different power factors. Such a smart grid is big enough to test all required faults and create the needed dataset to thoroughly study a fault detection system. In fact, the power system loading depends on a large number of variables such as the environment temperature, sun irradiation, stored energy in batteries, nonlinear load, and also operation of the fuel-cell. After the preparation of some data sets for different fault categories, a large number of sample test can be provided to examine the detection technique.
In this study, three-phase short circuit (LLL), line to line short circuit (LL), three-phase to ground (LLLG), line–line short circuit connected to ground (LLG), single line to ground short circuit (LG) engender five types of fault classes which can be occurred on AC side of the smart grid. Moreover, DC-links of all distributed generators (DGs) are endangered of short circuit or open circuit faults which are studied as the DC side faults. The DC-link short circuit and open circuit faults of the solar power plant are shown by SDPV and ODPV. Because of the fact that a fault can occur at the low voltage side of all DGs six different fault for these locations are considered. The open-circuit fault and short circuit fault at the low voltage side of the solar power plant, are shown by OLPV and SLPV. Also, these faults for fuel cell are shown by OLFC and SLFC. The battery package low voltage side faults shown by OLB and SLB are the open circuit and short circuit faults, respectively. In brief, all of the aforementioned categories are considered as different categories to scrutinize the ability of the proposed detection technique.
Figure 2 shows the simulated smart grid to study the proposed detection technique. The aforementioned LLL, LL, LLLG, LLG faults can occur at any point in the AC side of the network. In this study, considering the transmission line length the studied AC faults occur 100 km after the photovoltaic power plant. Also, the DC side introduced faults occur on the DC bus of each distributed generators.
3 Artificial intelligence
There are many different methods to project features into the lower dimensions such as factor analysis, PCA, LDA, locally linear embedding (LLE), or Multi-dimensional scaling (MDS), and isometric feature mapping (Isomap).
3.1 Dimension reduction
3.1.1 PCA
PCA is a statistical feature extraction algorithm. It seems to be logical to use PCA in the presence of large datasets of variables where a small set of data contains the determinative information [15,16,17,18]. In addition, it is possible to apply PCA in some applications where the training samples are much smaller than the number of features. The dataset dimension reduction with the smallest projection error from the main dataset dimensions is carried out by PCA. The dimension reduction is required to remove the dataset redundancy (i.e. creation of orthogonal components), reduced complexity, and elimination of the noise effect [19, 20]. The covariance (∑) matrix should be computed to get eigenvectors of the covariance matrix by (1).
where \(\bar{X}\) represents the mean value of sample vector \(X^{\left( i \right)}\) and l represent the number of features. As a principal, the first eigenvalues (\(\lambda_{i}\)) of the eigenvector matrix gives the direction of the maximum spread of the data [21]. So, the largest k eigenvalues (principal components) of the covariance matrix are chosen to create the matrix Ureduce. Let U represent a matrix that every column of matrix U is eigenvector of the matrix ∑ then the first k column of the U is chosen to create the second matrix named Ureduce which inherently have n rows and k columns. The following condition must be considered to determine the number of k.
where m represents the number of samples and threshold is considered in the range of 0–1, arbitrary. It must be noticed that the higher threshold provides the best projection to maximize k. Therefore, the new dataset in a lower dimension is computed by (3).
A reconstructable matrix in k dimension has been prepared using the PCA method. Now, the LDA method can be applied to the dataset.
3.1.2 LDA
LDA is used for the feature reduction and discrimination between categories of dependent variables [22,23,24,25]. LDA is applied to the dataset in order to find an updated subspace while the mapped data in the created subspace have the minimum scatteration in a same class and maximum distance with the data available in other classes. Now, (4) is applied to Z to find Zob, that maximizes the ratio of between-class scatter SB against within-class scatter SW (Fisher’s criterion)
The optimized answer of the above equation by the assumption of invertibility of SW is obtained as below:
And SW is invertible because PCA eliminate the singularity of matrix X. The required SB and SW as the between class and within class scatters are calculated by (7) and (8), respectively [26, 27].
where \(\phi_{B}\) and \(\phi_{W}\) are as follow:
where j is number of classes and \(n_{i}\) is number of training examples. So, the new dataset is achieved as below:
3.2 Classification
3.2.1 K nearest neighbor
KNN is an example-based algorithm with a wide range of applications [28,29,30,31]. The number of K in the fundamental structure of the KNN is required to determine the number of Ks nearest samples of a test. The test label is elected based on the labels of these samples. The algorithm computes the distance between every samples of a dataset and updates data by Euclidean distance as follow:
For example, as it is shown in Fig. 3, the KNN method labeled the data as true data for a new data when k = 3 and labeled the data as the false for k = 7. Figure 4 shows the flowchart of the implemented detection and classification technique explained in this section.
The benefits method uses PCA to reduce the signal features in the first step. After that, the LDA is applied to the output data of the LDA to achieve the optimum features of the signals. These reductions improve the performance of KNN in the classifications stage. Moreover, the signal sampling (i.e. 5000 samples per second), which is discussed in the next section, is a useful preprocessing step to decrease the data sizes of the available big data. Such a sampling strategy speeds up the data analyzing and processing which eliminates the need for use of the expensive commercial processors.
4 Simulation result and discussion
The ability of the introduced KNN technique in fault detection is examined in thirteen different scenarios. First, a comprehensive dataset is provided by applying all types of faults which previously introduced in the second section. The provided dataset is required to construct the training classes for classification. The considered threshold and K for the PCA and KNN implementation for the acquired datasets are equal to 1.
In the first scenario, it is assumed that the LLL fault occurs on the three-phase transmission line where all DGs have at least 100 km distance with the fault location. According to the instability of the network shown in Fig. 5a, the occurrence happens at t = 1 s. Figure 5b–d show the power variation, DC-link voltage variations of the DGs, and DC-current fluctuation during the fault. It is expected that these fluctuations contain informative datasets for the proposed detection technique for classification. In the second scenario, the LLL fault studied in the first scenario fault is substituted with an LG fault. According to Fig. 6a, phase a is grounded and its voltage drops to zero at t = 1 s. The achieved data presented in Fig. 6b–d demonstrate the power, DC-link voltage, and DC-link fluctuations during faults. Pursuant to Fig. 7a which shows the zero voltage of A and B phases at t = 1 s, an LLG fault must occur on these phases in the third scenario. Due to the fact that the power of the bus 1, DC-link voltage and DC-link currents are the determinative data to generate the KNN required dataset, these fluctuations are respectively shown in Fig. 7b, c. The short circuit and open circuit faults of the DC-link are two other occurrences which are studied in fourth and fifth scenarios, respectively. According to Figs. 8a and 9a, these two faults are occurred at t = 1 s and cleared at t = 1.2 s. The power oscillation of the AC bus, DC-link voltage variation, and DC-link current variation like formerly explained scenarios shown in Figs. 8 and 9 provide the required datasets for the training process.
These five scenarios are graphically shown in above mentioned figures. All other scenarios’ results and precision of the proposed detection technique are tabulated in Table 1. 2500 samples from different introduced faults are provided to examine the performance of the classification technique. According to the results, the proposed algorithm is able to detect and categorize different fault types with 95.9% accuracy. To highlight the performance of the proposed technique, the provided dataset is classified by means of KNN method when the PCA and LDA are used for feature reduction separately. The results of the conducted classification demonstrate that the KNN will have the accuracy of 33.3% when PCA is the only technique used in the feature reduction section. If the LDA is used for the feature reduction alone instead of the PCA the accuracy of the classification on the available data set is decreased to 22.6%. Tables 2 and 3 show the performance of the classification technique when the PCA and LDA are lonely used for feature reduction. In actuality, the PCA and LDA are not able to help KNN data classification alone. So, the consecutive utilization of PCA and LDA to reduce the signal features is the significant advantageous of the proposed method. It must be highlighted that the dataset is provided with 5000 samples per second for the accomplishment of the detection technique. Signal sampling with such scan rate does not require a high-speed high cost microprocessor. For this reason, the proposed detection technique could be easily implemented on a microprocessor. Robustness in fault detection and classification and simple implementation on commercial microchips are two outstanding features of the proposed technique.
5 Conclusion
A classification technique based-on the conventional K-NN algorithm is proposed to detect and classify different types of fault in a smart grid. In the proposed technique, the PCA method is used to decrease the dataset size while LDA provides online classification before applying the K-NN. Simulation results demonstrate the effectiveness and robustness of the such an augmented K-NN technique in fault detection and classification. Because of the fact that the proposed method has an acceptable detection accuracy with a low sample rate in presence of different fault types, it could be easily applied to the commercial microprocessors.
High impedance faults (HIFs) can be occurred when the transmission lines are grounded or connected to each other through a high impedance connection way. In this condition, the available power system relays encounter a problem to detect the fault clearly. In the feature work, authors will try to introduce a simple detection technique to increase the precision of the HIF detection systems by means of the machine learning algorithms.
References
Alhelou HH, Golshan MH, Askari-Marnani J (2018) Robust sensor fault detection and isolation scheme for interconnected smart power systems in presence of RER and EVs using unknown input observer. Int J Electr Power Energy Syst 99:682–694
Alhelou HH (2019) Fault detection and isolation in power systems using unknown input observer. In: Advanced condition monitoring and fault diagnosis of electric machines. IGI Global, pp 38–58
Triki-Lahiani A, Abdelghani ABB, Slama-Belkhodja I (2018) Fault detection and monitoring systems for photovoltaic installations: a review. Renew Sustain Energy Rev 82:2680–2692
Taheri B, Razavi F (2018) Power swing detection using rms current measurements. J Electr Eng Technol 13(5):1831–1840
Patel TK, Mohanty SK, Mohapatra S (2017). Fault detection during power swing by phase difference technique. In: 2017 innovations in power and advanced computing technologies (i-PACT). IEEE, pp 1–6
Abdel-Akher M, Nor KM (2010) Fault analysis of multiphase distribution systems using symmetrical components. IEEE Trans Power Deliv 25(4):2931–2939
Xu X, Peters JF (2002) Rough set methods in power system fault classification. In: IEEE CCECE2002. Canadian conference on electrical and computer engineering. Conference proceedings (Cat. No. 02CH37373), vol 1. IEEE, pp 100–105
Cho HJ, Park JK (1997) An expert system for fault section diagnosis of power systems using fuzzy relations. IEEE Trans Power Syst 12(1):342–348
Wen FS, Chang CS (1997) Probabilistic approach for fault-section estimation in power systems based on a refined genetic algorithm. IEE Proc Gener Transm Distrib 144(2):160–168
Fernandez AO, Ghonaim NKI (2002) A novel approach using a FIRANN for fault detection and direction estimation for high-voltage transmission lines. IEEE Trans Power Deliv 17(4):894–900
Sheng Y, Rovnyak SM (2004) Decision tree-based methodology for high impedance fault detection. IEEE Trans Power Deliv 19(2):533–536
Wang Y, Liu M, Bao Z (2016) Deep learning neural network for power system fault diagnosis. In: 2016 35th Chinese control conference (CCC). IEEE, pp 6678–6683
Yadav A, Swetapadma A (2014) Fault analysis in three phase transmission lines using k-nearest neighbor algorithm. In: 2014 international conference on advances in electronics computers and communications. IEEE, pp 1–5
Harrou F, Taghezouit B, Sun Y (2019) Improved kNN-based monitoring schemes for detecting faults in PV systems. IEEE J Photovolt 9(3):811–821
Kang X, Xiang X, Li S, Benediktsson JA (2017) PCA-based edge-preserving features for hyperspectral image classification. IEEE Trans Geosci Remote Sens 55(12):7140–7151
Jing C, Hou J (2015) SVM and PCA based fault classification approaches for complicated industrial process. Neurocomputing 167:636–642
Khosravi MR, Sharif-Yazd M, Moghimi MK, Keshavarz A, Rostami H, Mansouri S (2015) MRF-based multispectral image fusion using an adaptive approach based on edge-guided interpolation. arXiv:1512.08475
Lazzari E, Schena T, Marcelo MCA, Primaz CT, Silva AN, Ferrão MF, Bjerk T, Caramão EB (2018) Classification of biomass through their pyrolytic bio-oil composition using FTIR and PCA analysis. Ind Crops Prod 111:856–864
Phillips PJ, Flynn PJ, Scruggs T, Bowyer KW, Chang J, Hoffman K, Marques J, Min J, Worek W (2005) Overview of the face recognition grand challenge. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1. IEEE, pp 947–954
Asadi S, Rao CDVS, Saikrishna V (2010) A comparative study of face recognition with principal component analysis and cross-correlation technique. Int J Comput Appl 10(8):17–21
Hagar AA, Alshewimy MA, Saidahmed MTF (2016) A new object recognition framework based on PCA, LDA, and K-NN. In: 2016 11th international conference on computer engineering and systems (ICCES). IEEE, pp 141–146
Zhang X, Peng F, Long M (2018) Robust coverless image steganography based on DCT and LDA topic classification. IEEE Trans Multimed 20(12):3223–3238
Chen Q, Yao L, Yang J (2016). Short text classification based on LDA topic model. In: 2016 international conference on audio, language and image processing (ICALIP). IEEE, pp 749–753
Khosravi MR, Akbarzadeh O, Salari SR, Samadi S, Rostami H (2017) An introduction to ENVI tools for synthetic aperture radar (SAR) image despeckling and quantitative comparison of denoising filters. In: 2017 IEEE international conference on power, control, signals and instrumentation engineering (ICPCSI). IEEE, pp 212–215
Varatharajan R, Manogaran G, Priyan MK (2018) A big data classification approach using LDA with an enhanced SVM method for ECG signals in cloud computing. Multimed Tools Appl 77(8):10195–10215
Menhour I, Fergani B (2018). A new framework using PCA, LDA and KNN-SVM to activity recognition based smartphone’s sensors. In: 2018 6th international conference on multimedia computing and systems (ICMCS). IEEE, pp 1–5
Yu H, Yang J (2001) A direct LDA algorithm for high-dimensional data—with application to face recognition. Pattern Recognit 34(10):2067–2070
Zhang S, Li X, Zong M, Zhu X, Cheng D (2017) Learning k for knn classification. ACM Trans Intell Syst Technol (TIST) 8(3):43
Vinoj PG, Jacob S, Menon VG, Rajesh S, Khosravi MR (2019) Brain-controlled adaptive lower limb exoskeleton for rehabilitation of post-stroke paralyzed. IEEE Access 7:132628–132648
Khosravi MR, Bahri-Aliabadi B, Salari R, Samadi S, Rostami H, Karimi V (2018) A tutorial and performance analysis on ENVI tools for SAR image despeckling. Curr Signa Transduct Ther 13:1–8
Tomašev N, Buza K (2015) Hubness-aware kNN classification of high-dimensional data in presence of label noise. Neurocomputing 160:157–172
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human or animal subjects.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hosseinzadeh, J., Masoodzadeh, F. & Roshandel, E. Fault detection and classification in smart grids using augmented K-NN algorithm. SN Appl. Sci. 1, 1627 (2019). https://doi.org/10.1007/s42452-019-1672-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42452-019-1672-0