1 Introduction

Power Quality (PQ) has been given an increased attention all over the world over the past decade. Power quality is a set of electrical boundaries that allows a piece of equipment to function in its intended manner without significant loss of performance or life expectancy. Now-a-days, power quality is very important due to deregulation of the power industry and proliferation of sensitive loads that require clean and uninterrupted power such as power electronic drives, microprocessor based controllers, computers, processing plants, hospitals, bank security system, etc. Power quality problems like voltage sag, swell, unbalance, interruption, flicker, harmonics, etc., create poor power quality. The existence of PQ problems greatly affects the safe, reliable and economical operations of electric power systems. When the supply voltage is distorted, electrical devices draw non-sinusoidal current from the supply, which causes many technical problems such as extra losses, extra heating, misoperation, early aging of the devices, etc. A small power outage has a great economic impact on the industrial consumers. A longer interruption harms practically all operations of a modern society [1]. The PQ problems cannot be completely eliminated, but can be minimized up to a limit through various equipment such as custom power devices, power factor corrector circuits, filters, etc. [2, 3].

To know the sources of power quality problems and make appropriate decision in improving power quality, the electric utilities should provide real time monitoring systems which are capable of identifying different power quality problems. For this, instruments should collect huge amount of data, such as measured currents, voltages and occurrence times. From the data collected, online or offline analysis is needed to be carried out to classify the disturbances [4,5,6,7].

Vast and increasing volumes of data obtained from power quality monitoring system, requires the use of data mining technique for analyzing the data. Data mining technology is an effective tool to deal with massive data, and to detect the useful patterns in those data. In power systems, data can be raw waveforms (voltages and currents) sampled at relatively high sampling frequencies, pre-processed waveforms (e.g., RMS values) or status variables (e.g., if a relay is opened or closed) which are typically sampled at low sampling frequencies [8]. Classification of data is an important task in the data mining process that extracts models for describing classes and predicts target class for data instances.

Today, several standard classifiers are available, among which the decision trees are most powerful and popular for both classification and prediction. Decision trees are flexible enough to handle items with a mixture of real-valued and categorical features, as well as items with some missing features. These are more interpretable than other classifiers such as Artificial Neural Networks (ANN) and Support Vector Machines (SVM) because they combine simple questions about the data in an understandable way [9]. They are expressive enough to model many partitions of the data that are not as easily achieved with classifiers that rely on a single decision boundary such as logistic regression or SVM. Decision trees naturally support classification problems with more than two classes and can be modified to handle regression problems. Finally, once constructed, they classify new items quickly [10].

In [11], SVM, ANN, logistic regression, Naïve Bayes, classification and regression trees, C5.0 algorithm, Quick, Unbiased and Efficient Statistical Tree (QUEST), CHi-square Automatic Interaction Detector (CHAID) and discriminant analysis have been implemented for classification on nine datasets. According to the experimental results, C5.0 model proved to have the best performance. The performances of J48 decision tree, Multi-Layer Perceptron (MLP) and Naïve Bayes classification algorithms were studied with respect to training time and accuracy of prediction [12]. It is observed that MLP takes highest training time for each of the data instances than J48 decision tree and Naïve Bayes classifiers. The accuracy of Naïve Bayes reduces as the data size increases. J48 and MLP showed high accuracies with low as well as higher data sizes. The performance of ANN and SVM is evaluated for the classification of sag, swell, interruption, harmonics and flicker [13]. It is found that SVM outperform ANN in terms of classification accuracy and computation time. Ten different types of disturbances such as sag, swell, interruption with and without harmonics, are classified using SVM and decision tree [14]. It is observed that the decision tree is faster and provides better classification accuracy at every case with and without noise. It is also easier to implement than SVM. Moreover, the decision tree worked satisfactorily with both synthesized and real signals.

Decision trees such as J48, Logistic Model Tree (LMT), Reduced Error Pruning (REP) Tree, Random Tree, Simple Cart, Random Forest are used for the classification purpose [15,16,17]. Random Forest is used for the classification of PQ disturbances [18] and fault record detection in data center of large power grid [19]. J48 is compared with Random Forest in the classification of power quality disturbances and found that Random Forest is more accurate than J48 [20]. The performance of Random Tree is observed to be better than REP Tree, Simple Cart [21], Logical Analysis of Data (LAD) Tree and Random Forest [22] for the classification purpose. It has been found that whenever correct attributes are selected before classification, accuracy of data mining algorithms is improved significantly [23, 24]. This paper focuses on how data mining techniques of J48, Random Tree and Random Forest decision trees are applied to classify power quality problems of voltage sag, swell, interruption and unbalance. The effect of data attributes on the classification accuracy and time taken for training the decision trees is also discussed.

The paper is organized as follows: Section 2 gives definitions and causes of power quality problems like voltage sag, swell, interruption and unbalance along with their typical figures. Section 3 deals with the basics of data mining and explains about J48, Random Tree and Random Forest algorithms. This Section also briefs about WEKA software used for implementing data mining for the classification purpose. In Section 4, the MATLAB simulation circuit is given which is used for generating the data for various power quality problems. The testing and results of the data mining algorithms obtained from WEKA are discussed in Section 5. Finally, Section 6 gives conclusions of the work from the observed results.

2 Power quality problems

Power quality problem is defined as any power problem manifested in voltage, current, or frequency deviations that results in failure or misoperation of customer equipment. Some of the commonly occurring power quality problems in a power system are voltage sag, swell, interruption and unbalance [25].

2.1 Voltage sag

Voltage sag is defined as a decrease in RMS voltage between 0.1 p.u. to 0.9 p.u. at the power frequency for durations from 0.5 cycles to 1 min, reported as the remaining voltage. Voltage sags can occur due to short circuits, overloads and starting of large motors. Figure 1 shows typical waveform of a voltage sag.

Fig. 1
figure 1

Voltage waveform during Sag

2.2 Voltage swell

Voltage swell is defined as an increase in RMS voltage between 1.1 p.u. and 1.8 p.u. at the power frequency for durations from 0.5 cycles to 1 min. The causes of swell are switching off a large load, energizing a large capacitor bank and temporary voltage rise on the unfaulted phases during a single line-to-ground fault. Voltage waveform of a swell is as shown in Fig. 2.

Fig. 2
figure 2

Voltage waveform during Swell

2.3 Interruption

An interruption occurs when the supply voltage or load current decreases to less than 0.1 p.u. for a period of time not exceeding 1 min. Interruptions can be the result of power system faults, lightning, equipment failures and control malfunctions. Interruption is illustrated in Fig. 3.

Fig. 3
figure 3

Voltage waveform during Interruption

2.4 Voltage unbalance

In a 3-phase system, voltage unbalance takes place when the magnitudes of phase or line voltages are different, or the phase angles differ from the balanced conditions, or both. The sources of voltage unbalance are unbalanced faults, single-phase loads on a three-phase circuit and blown fuses in one phase of a 3-phase capacitor bank. The three phase voltages during an unbalanced fault are as shown in Fig. 4.

Fig. 4
figure 4

Three phase voltages during Unbalance condition

3 Data mining

Data mining is a process that uses a variety of data analysis tools to identify hidden patterns and relationships within the data. These tools are a mixture of machine learning, statistics and database utilities. Data mining has recently obtained popularity within many research fields over classical techniques for the purpose of analyzing data due to (i) a vast increase in the size and number of databases, (ii) the decrease in storage device costs, (iii) an ability to handle data which contains distortion (noise, missing values, etc.), (iv) continuous progress in the implementation of automatic learning techniques and (v) the rapid increase in computer technology [26]. The ultimate goal of data mining is to discover useful information from large amounts of data in many different ways using rules, patterns and classification [27]. Data mining can be used to identify anomalies that occur as a result of network or load operation, which may not be acknowledged by standard reporting techniques. It is proposed that data mining can provide answers to the end-users about PQ problems by converting raw data into useful knowledge [28, 29].

Many people treat data mining as a synonym for another popularly used term, Knowledge Discovery from Data (KDD), while others view data mining as merely an essential step in the process of knowledge discovery. The knowledge discovery process is an iterative sequence of the following steps: (i) Data cleaning, (ii) Data integration, (iii) Data selection, (iv) Data transformation, (v) Data mining, (vi) Pattern evaluation and (vii) Knowledge presentation. Steps (i) through (iv) are different forms of data pre-processing, where data are prepared for mining. The data mining step may interact with the user or a knowledge base. The interesting patterns are presented to the user and may be stored as new knowledge in the knowledge base. The preceding view shows data mining as one step in the knowledge discovery process, albeit an essential one because it uncovers hidden patterns for evaluation. However, in industry, in media, and in the research milieu, the term data mining is often used to refer to the entire knowledge discovery process [30].The data mining process differs from classical statistical methods in the way that statistical methods focus only on model estimation, while data mining techniques focus on both model formation and its performance. Another significant difference is that statistical methods fail to analyze data with missing values, or data that contains a mixture of numeric and qualitative forms. Data mining techniques, instead, can analyze and cope intelligently with records containing missing values, as well as a mixture of qualitative and quantitative data, without tedious manual manipulation [31, 32].

Data mining starts with the real data, collected from the real equipment. In fact, more the diversified data, more accurate and better result is obtained. So, if hundreds of parameters are recorded and available for analysis, data mining can consider and use all the data which is collected. Data mining methods are well equipped to handle large amount of data and to detect the useful patterns in these data that allow us to improve the performance. Data mining methodologies and algorithms have their origins in many different disciplines. For example, researchers on artificial intelligence have proposed various methods and techniques that can efficiently “mimic” how real people (“experts”) can detect difficult hidden patterns in large amounts of complex data.

3.1 Methods: Data mining algorithms

There are many data mining algorithms available, among which the most widely used algorithms for classification are J48, Random Tree and Random Forest. These are decision trees which use divide-and-conquer strategies as a form of learning by induction. Thus, these algorithms use a tree representation, which helps in pattern classification in data sets, being hierarchically structured in a set of interconnected nodes. The internal nodes test an input attribute/feature in relation to a decision constant and, this way, determines what will be the next descending node. Therefore, the nodes considered as leaves classify the instances that reach them according to the associated label [33].

3.1.1 J48

J48 is an open source Java implementation of the C4.5 algorithm in the WEKA data mining tool. It creates a binary tree. It is one of the most useful decision tree approach for classification problems. It employs top-down and greedy search through all possible branches to construct a decision tree to model the classification process. In order to classify a new item, it first needs to create a decision tree based on the attribute values of the available training data. So, whenever it encounters a set of items (training set), it identifies the attribute that discriminates the various instances more clearly. This feature, which is able to tell us more about the data instances, so that we can classify them the best, is said to have the highest information gain. Now, among the possible values of this feature, if there is any value for which there is no ambiguity, i.e., for which the data instances falling within its category have the same value for the target variable, then that branch is terminated and the target value is assigned to it. For the other cases, another attribute is selected which gives the highest information gain. The process is continued in this manner until a clear decision is obtained about what combination of attributes gives a particular target value, or till all the attributes are completed. In the event that all the attributes are finished, or if the unambiguous result cannot be obtained from the available information, we assign this branch a target value that the majority of the items under this branch possesses. Now that we have the decision tree, we follow the order of attribute selection as we have obtained for the tree. By checking all the respective attributes and their values with those seen in the decision tree model, the target value of the new instance can be predicted. J48 classification is based on the decision trees or rules generated from them [34]. The simple tree structure of J48 is as shown in Fig. 5.

Fig. 5
figure 5

J48 decision tree

3.1.2 Random tree

A Random Tree is a decision tree that is formed by a stochastic process. In standard tree, each node is split using the best split among all attributes. In a Random Tree, each node is split using the best among the subset of randomly chosen attributes at that node. Random Tree algorithm has an option to estimate the class probabilities for classification. Random Trees have been introduced by Leo Breiman and Adele Cutler. This algorithm can deal with both classification and regression problems [21, 35]. The structure for a Random Tree is shown in Fig. 6.e

Fig. 6
figure 6

Structure of Random Tree


3.1.3 Random forest

This algorithm uses a set of classifiers based on decision trees. Random Forest fits many classification trees to a data set and then combines the prediction from all the correlated trees. Each tree depends on the value of a separately sampled random vector. Random Forest corresponds to a collection of combined decision trees {hk(x, Tk)}, for k = 1, 2,..., n, where n is the number of trees and Tk is the training set built at random and identically distributed, hk represents the tree created from the vector Tk and is responsible for producing an output x.

The trees that make up the Random Forest are built randomly selecting ‘m’ (value fixed for all nodes) attributes in each node of the tree; where the best attribute is chosen to divide the node. The vector used for training each tree is obtained using a random selection of the instances. Thus, to determine the class of an instance, all the trees indicate an output and the most voted is selected as the final result. So, the classification error depends on the strength of individual trees of the forest and the correlation between any two trees in the forest [20]. Figure 7 shows the tree diagram of a Random Forest. The various differences between the three data mining algorithms are presented in Table 1.

Fig. 7
figure 7

Structure of Random Forest

Table 1 Differences between Data Mining Algorithms

3.2 Data mining tool: WEKA software

WEKA, formally called Waikato Environment for Knowledge Analysis, is a computer program that was developed at the University of Waikato in New Zealand for the purpose of identifying information from raw data gathered from agricultural domains. WEKA is a state-of-the-art facility for developing machine learning techniques and their application to real-world data mining problems. It is a collection of machine learning algorithms for data mining tasks. The algorithms are applied directly to a dataset. WEKA supports many different standard data mining tasks such as data pre-processing, classification, clustering, regression, visualization and feature selection. The basic premise of the application is to utilize a computer application that can be trained to perform machine learning capabilities and derive useful information in the form of trends and patterns. WEKA is an open source application that is freely available under the GNU general public license agreement. It is user friendly with a graphical interface that allows for quick set up and operation. WEKA operates on the predication that the user data is available as a flat file or relation, this means that each data object is described by a fixed number of attributes that usually are of a specific type, normal alpha-numeric or numeric values. The WEKA application allows novice users a tool to identify hidden information from database and file systems with simple to use options and visual interfaces [36].

4 Circuit for generating data for classification

The circuit shown in Fig. 8 is modelled in MATLAB Simulink. The circuit consists of a 33/11 kV distribution substation connected to a 2 km distribution line having a 11/0.433 kV distribution transformer supplying to a load of 190 kW, 140 kVAr [37]. It is simulated to get the data for various voltage sags, swells, interruptions and unbalance problems. Voltage sags are created by balanced 3-phase to ground faults with varied fault impedance and duration, for different categories of sags. Voltage swells are created by switching capacitors of different capacitances connecting to the line, for varied durations to get different categories of swells. Interruptions are introduced by opening circuit breaker 1 (CB 1) for different time durations, thereby disconnecting the supply. The voltage unbalance is created by a 3-phase unbalance fault. The 3-phase RMS voltages calculated at the Point of Common Coupling (PCC) are used as the main data for classification of the power quality problems. The data is sampled at a frequency of 2 kHz. From the simulation, 400,001 data samples are obtained, among which 31,438 samples contain sag, 22,506 samples contain swell, 5441 samples contain interruption, 14,268 samples contain unbalance problem and remaining 326,348 samples have no power quality problems. This data is used for classification by data mining algorithms.

Fig. 8
figure 8

Simulation circuit diagram of the system

5 Results and discussion

The data samples obtained from simulations carried out on the system shown in Fig. 8, are stored in a datasheet. Using this data, a class attribute is formulated which is used to differentiate sag, swell, interruption and unbalance. With this information, an ARFF (Attribute-Relation File Format) file is written. ARFF file is an ASCII text file that describes a list of instances sharing a set of attributes. ARFF files were developed by the machine learning project at the Department of Computer Science of the University of Waikato for use with the WEKA machine learning software. The file has a header section followed by data section. The header section contains relation declarations mentioning the name of the relation and attribute declarations listing the attributes (the columns in the data) with their types [38].

The ARFF file is used to load the data into WEKA software for the classification of the power quality problems. Figure 9 shows the pre-processing stage of data mining in WEKA indicating total number of instances, the number of attributes and number of samples under each class of power quality problems along with a bar graph. The attributes used in this case are the numeric values of three phase RMS voltages, namely Va, Vb and Vc along with the class attribute. The class attribute value is “NoProb” for samples containing no power quality problem, “Sag” for samples with voltage sag, “Swell” when samples contain voltage swell, “Intr” for samples containing interruption and “Unbal” for samples with voltage unbalance condition.

Fig. 9
figure 9

Pre-process stage of data mining in WEKA with 4 attributes

The data loaded into WEKA is used to train the data mining algorithms: J48, Random Tree and Random Forest for the classification purpose. After training, the algorithms are tested based on the given training set and as well as using stratified 10-fold cross validation [39]. The results obtained after testing the algorithms using training set are indicated in Table 2. It is observed that the overall accuracy of J48 algorithm is 99.9973%, whereas Random Tree and Random Forest algorithms have an accuracy of 100% in the classification of the power quality problems. It is also clear that the training time taken by the Random Tree is only 1.91 s, which is very less as compared to J48 and Random Forest.

Table 2 Comparison of Data Mining Algorithms with 4 attributes for evaluation on training set

The results obtained after testing the algorithms using stratified 10-fold cross validation are shown in Table 3. From the results, it is seen that the Random Tree has a more overall accuracy (99.9943%) and takes less training time (1.86 s) as compared to J48 and Random Forest algorithms. From Tables 2 and 3, it is clear that with only four attributes in the data, Random Tree is best of the three algorithms as it has more accuracy and takes very less time for training.

Table 3 Comparison of Data Mining Algorithms with 4 attributes for stratified 10-fold cross-validation

In the next case, along with Va, Vb, Vc and class attribute, three more extra numeric attributes are included. They are average (Vavg), minimum (Vmin) and maximum (Vmax) values of the three phase voltages. Figure 10 shows the pre-processing stage of data mining for seven attributes in WEKA. It indicates the total number of instances, the number of attributes and number of samples under each class of power quality problems along with a bar graph. The information is same as that shown in Fig. 9, except for the number of attributes taken.

Fig. 10
figure 10

Pre-process stage of data mining in WEKA with 7 attributes

Using the data of seven attributes, loaded into WEKA, the data mining algorithms are trained and tested. The results obtained after testing the algorithms using training set are indicated in Table 4. It is observed that the overall accuracy of J48 algorithm is 99.9983%, whereas Random Tree and Random Forest algorithms have an accuracy of 100% in the classification of the power quality problems. It is again clear that the training time taken by the Random Tree (1.88 s) is very less as compared to J48 and Random Forest algorithms. Comparing the results of Tables 2 and 4, it is observed that the classification accuracy of J48 algorithm is improved in seven attributes case. Random Tree and Random Forest has 100% accuracy in both the cases. It is also observed that the training time taken by all the algorithms is reduced in seven attributes case.

Table 4 Comparison of Data Mining Algorithms with 7 attributes for evaluation on training set

Table 5 shows the results obtained after testing the algorithms using stratified 10-fold cross validation. From the results, it is seen that the Random Forest has highest overall accuracy (99.9973%) whereas Random Tree has lowest training time (1.75 s) as compared to other algorithms. Comparing the results of Tables 3 and 5, it is clear that for all the algorithms, the classification accuracy is improved and the training time is reduced using seven attributes. Thus, it indicates that the generalization capabilities of the algorithms are enhanced by including the extra attributes in the second case.

Table 5 Comparison of Data Mining Algorithms with 7 attributes for stratified 10-fold cross-validation

From all the results obtained by testing the algorithms for classification of power quality problems, comparison of overall performance of the algorithms is indicated briefly in Table 6. It is observed that Random Forest gives most accurate results, but takes more time for training, whereas, Random Tree takes very less time for training and gives satisfactorily accurate results.

Table 6 Summary of comparison of overall performance of the Data Mining Algorithms

6 Conclusion

This paper presents the implementation of data mining algorithms: J48, Random Tree and Random Forest decision trees, for classification of power quality problems of voltage sag, swell, interruption and unbalance using WEKA. The algorithms are trained and tested with data consisting numeric attributes of three phase voltages as well as with the inclusion of minimum, maximum and average voltage numeric attributes. The testing in both the cases is performed based on the given training set of data and by using stratified 10-fold cross validation. From the results, it is seen that J48 algorithm is less accurate and takes moderate training time as compared to other algorithms. Random Tree algorithm takes very less training time among the three algorithms and its accuracy is good. Random Forest algorithm gives more accuracy, but it takes much higher training time than other decision trees. Thus, Random Tree can be used if less training time is required and Random Forest can be used where very high accuracy is required. From the tests, it is also observed that the classification accuracy is increased and training time is reduced for all the algorithms by using three extra attributes such as minimum, maximum and average voltage values in the data taken for training and classification. So, with the inclusion of these three simple attributes into the data, the data mining algorithms have trained better and their generalization capabilities are enhanced, leading to more accurate results.