Abstract
Electricity theft is one of the most significant factors among non-technical losses. Because of electricity theft, genuine users have to pay more, supply quality decreases, and generation load increases. Development in the Internet of Things-based sensors has changed the way to monitor the electricity consumption pattern of the consumers. This electricity consumption data are processed by classification algorithms to identify electricity theft. The electricity consumption data are imbalanced in nature. The objective of this study is to design a machine learning model considering that data imbalance issue is resolved using six data balancing techniques, namely Synthetic Minority over Sampling (SMOTE), Adaptive Synthetic Sampling (ADASYN), Random over Sampler, Support Vector Machine-Synthetic Minority over Sampling, SMOTEENN (Edited Nearest Neighbor) and SMOTE Tomek Links. The designed model consists of two stages. In first stage, twelve classification algorithms (Decision Tree, Adaboost, Extra Tree, Logistic Regression, XGBoost, Light GBM, Multi-Layer Perceptron, Bagging, Random Forest, Support Vector Machine, and Naïve Bayes, K-Nearest Neighbor) are applied on the data balanced by six techniques. In the next stage, two ensemble techniques, namely maximum voting and stacking, are applied to the best five performing algorithms using Python language. Dataset from the State Grid Corporation of China is considered, and algorithms are compared based on accuracy, MCC (Matthews Correlation Coefficient), f1-Score, and log-loss. We observed that SMOTEENN with stacking ensemble algorithm gives the highest performance: accuracy value—97.67%, MCC value—0.9434, log-loss value—1.01, and f1-score value—97.88% among all the experiments performed. The proposed model reported around 3% higher performance than the results presented in the literature. Moreover, the effectiveness of the model is validated by applying the ANOVA (Analysis of variance) one-way statistical test.
Similar content being viewed by others
Abbreviations
- AdaBoost:
-
Adaptive Boosting
- ANOVA:
-
Analysis of Variance
- ADASYN:
-
Adaptive Synthetic
- ANN:
-
Artificial Neural Network
- AUC:
-
Area Under Curve
- CNN:
-
Convolutional Neural Network
- DANN:
-
Deep Artificial Neural Network
- DT:
-
Decision Tree
- EMD:
-
Empirical Mode Decomposition
- ET:
-
Extra Tree
- ETD:
-
Electricity Theft Detection
- FA:
-
Firefly Algorithm
- FIS:
-
Fuzzy Inference System
- GA:
-
Genetic Algorithm
- KNN:
-
K-Nearest Neighbor
- LightGBM:
-
Light Gradient Boosting Machine
- LR:
-
Logistic Regression
- LSTM:
-
Long Short-Term Memory
- MAP:
-
Mean Average Precision
- MIN–MAX:
-
Minimum–Maximum
- MCC:
-
Matthews Correlation Coefficient
- ML:
-
Machine Learning
- MLP:
-
Multi-Layer Perceptron
- NB:
-
Naïve Bayes
- OPF:
-
Optimum Path Forest
- PSO:
-
Particle Swarm Optimization
- RF:
-
Random Forest
- ROC:
-
Receiver Operating Characteristic
- RUSBoost:
-
Random Undersampling Boosting
- SGCC:
-
State Grid Corporation of China
- SVM:
-
Support Vector Machine
- SMOTE:
-
Synthetic Minority Over Sampling
- SMOTEENN:
-
Synthetic Minority Over Sampling Edited Nearest Neighbor
- SVM-SMOTE:
-
SVM- Synthetic Minority Over Sampling
- VGG:
-
Visual Geomtry Group
- XGBoost:
-
EXtreme Gradient Boosting
References
Jiang, R.; Lu, R.; Wang, Y.; Luo, J.; Shen, C.; Shen, X.S.: Energy theft detection issues for advanced metering infrastructure in smart grid. Tsinghua Sci. Technol. 19(2), 105–120 (2014)
Agüero, J.R.: Improving the efficiency of power distribution systems through technical and non-technical losses reduction. InProceedingsofthePEST&D2012, Orlando, FL, USA, 7–10 May 2012; pp. 1–8
McLaughlin, S.; Holbert, B.; Fawaz, A.; Berthier, R.; Zonouz, S.: A multi-sensor energy theft detection framework for advanced metering infrastructures. IEEE J. Sel. Areas Commun. 31, 1319–1330 (2013)
Smith, T.B.: Electricity theft: a comparative analysis. Energy Policy 32, 2067–2076 (2004)
McLaughlin, S.; Holbert, B.; Fawaz, A.Q.; Berthier, R.; Zonouz, S.: A multi-sensor energy theft detection framework for advanced metering infrastructures. IEEE J. Sel. Areas Commun. 31(7), 1319–1330 (2013)
Li, S.; Han, Y.; Yao, X.; Yingchen, S.; Wang, J.; Zhao, Q.: Electricity theft detection in power grids with deep learning and random forests. J. Electr. Comput. Eng. 2019 (2019)
Guerrero, J.I.; León, C.; Monedero, I.; Biscarri, F.; Biscarri, J.: Improving knowledge-based systems with statistical techniques, text mining, and neural networks for non-technical loss detection. Knowl.-Based Syst. 71, 376–388 (2014)
Ramos, C.C.O.; Souza, A.N.; Chiachia, G.; Falcão, A.X.; Papa, J.P.: A novel algorithm for feature selection using harmony search and its application for non-technical losses detection. Comput. Electr. Eng. 37(6), 886–894 (2011)
Wang, Y. F.; Lin, W. M.; Zhang, T.; Ma, Y. Y.: Research on application and security protection of internet of things in smart grid. (2012): 1–54
Mehdipour Pirbazari, A.; Farmanbar, M.; Chakravorty, A.; Rong, C.: Short-term load forecasting using smart meter data: a generalization analysis. Processes 8(4), 484 (2020)
Wang, K.; Chenhan, Xu.; Zhang, Y.; Guo, S.; Zomaya, A.Y.: Robust big data analytics for electricity price forecasting in the smart grid. IEEE Trans. Big Data 5(1), 34–45 (2017)
Liu, Y.; Yuen, C.; Yu, R.; Zhang, Y.; Xie, S.: Queuing-based energy consumption management for heterogeneous residential demands in smart grid. IEEE Trans. Smart Grid 7(3), 1650–1659 (2016)
Wu, Y.; Tan, X.; Qian, L.; Tsang, D.H.; Song, W.-Z.; Yu, L.: Optimal pricing and energy scheduling for hybrid energy trading market in future smart grid. IEEE Trans. Industr. Inf. 11(6), 1585–1596 (2015)
Yaghmaee, M.H.; Moghaddassian, M.; Leon-Garcia, A.: Autonomous two-tier cloud-based demand side management approach with microgrid. IEEE Trans. Industr. Inf. 13(3), 1109–1120 (2017)
Costa, B.C.; Alberto, B.L.A.; Portela, A.M.; Maduro, W.; Eler, E.O.: Fraud detection in electric power distribution networks using an ann based knowledge-discovery process. Int. J. Artif. Intell. Appl. 4(6), 17–21 (2013)
Guerrero, J.I.; Leon, C.; Monedero, I.; Biscarri, F.; Biscarri, J.: Improving knowledge-based systems with statistical techniques, text mining, and neural networks for non-technical loss detection. Knowl.-Based Syst. 71, 376–388 (2014)
Ramos, C.C.; Souza, A.N.; Chiachia, G.; Falcao, A.X.; Papa, J.P.: A novel algorithm for feature selection using harmony search and its application for non-technical losses detection. Comput. Electr. Eng. 37(6), 886–894 (2011)
Junior, L.A.P.; Ramos, C.C.O.; Rodrigues, D.; Pereira, D.R.; de Souza, A.N.; da Costa, K.A.P.; Papa, J.P.: Unsupervised non-technical losses identification through optimum-path forest. Electric Power Syst. Res. 140, 413–423 (2016)
Glauner, P.; Meira, J.A.; Valtchev, P.; State, R.; Bettinger, F.: The challenge of non-technical loss detection using artificial intelligence: a surveyficial intelligence: a survey. Int. J. Comput. Intell. Syst. 10(1), 760–775 (2017)
Lo, C.-H.; Ansari, N.: CONSUMER: A novel hybrid intrusion detection system for distribution networks in smart grid. IEEE Trans. Emerg. Top. Comput. 1, 33–44 (2013)
Xiao, Z.; Xiao, Y.; Du, D.H.-C.: Non-repudiation in neighborhood area networks for smart grid. IEEE Commun. Mag. 51, 18–26 (2013)
Amin, S.; Schwartz, G.A.; Cardenas, A.A.; Sastry, S.S.: Game theoretic models of electricity theft detection in smart utility networks: providing new capabilities with advanced Journal of Electrical and Computer Engineering 11 metering infrastructure. IEEE Control Syst. Mag. 35(1), 66–81 (2015)
Mitchell, T.M.: Machine learning. 1997. Burr Ridge, IL: McGraw Hill 45(37), 870–877 (1997)
Ahuja, R.; Chug, A.; Gupta, S.; Ahuja, P.; Kohli, S.: Classification and clustering algorithms of machine learning with their applications. In: Nature-Inspired Computation in Data Mining and Machine Learning, pp. 225–248. Springer, Cham, (2020)
LeCun, Y.; Bengio, Y.; Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Rehman, H.A.U.; Lin, C.-Y.; Mushtaq, Z.; Su, S.-F.: Performance analysis of machine learning algorithms for thyroid disease. Arabian J. Sci. Eng. 1–13 (2021)
Çağataylı, M.; Çelebi, E.: Estimating academic success in higher education using big five personality traits, a machine learning approach. Arabian J. Sci. Eng. 1–10 (2021)
Alharbi, A.; Kalkatawi, M.; Taileb, M.: Arabic sentiment analysis using deep learning and ensemble methods. Arabian J. Sci. Eng. 1–11 (2021)
Bozkurt, F.: A comparative study on classifying human activities using classical machine and deep learning methods. Arabian J. Sci. Eng. 1–15 (2021)
Ngo, N.-T.; Pham, A.-D.; Truong, T. T. H.; Truong, N.-S.; Huynh, N.-T.; Pham, T. M.: An ensemble machine learning model for enhancing the prediction accuracy of energy consumption in buildings. Arabian J. Sci. Eng. 1–13 (2021)
Tumbaz, M.N.M.; Ipek, M.: Energy demand forecasting: avoiding multi-collinearity. Arabian J. Sci. Eng. 46(2), 1663–1675 (2021)
Depuru, S.S.S.R.; Wang, L.; Devabhaktuni, V.; Nelapati, P.: A hybrid neural network model and encoding technique for enhanced classification of energy consumption data. In: 2011 IEEE Power and Energy Society General Meeting, pp. 1–8. IEEE, (2011)
Coma-Puig, B.; Carmona, J.: Bridging the gap between energy consumption and distribution through nontechnical loss detection. Energies 12, 1748 (2019)
Jokar, P.; Arianpoo, N.; Leung, V.C.: Electricity theft detection AMI using customers’ consumption patterns. IEEE Trans. Smart Grid 7, 216–226 (2015)
Nagi, J.; Mohammad, A. M.; Yap, K. S.; Tiong, S. K.; Ahmed, S. K.: Non-technical loss analysis for detection of electricity theft using support vector machines. In: 2008 IEEE 2nd International Power and Energy Conference (pp. 907–912). IEEE (2008, December)
Di Martino, M.; Decia, F.; Molinelli, J.; Fernández, A.: Improving electric fraud detection using class imbalance strategies. In: ICPRAM (2) (pp. 135–141) (2012, February)
Glauner, P.; Boechat, A.; Dolberg, L.; State, R.; Bettinger, F.; Rangoni, Y.; Duarte, D.: Large-scale detection of non-technical losses in imbalanced data sets. In: 2016 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT) (pp. 1–5). IEEE (2016, September)
Nagi, J.; Yap, K. S.; Tiong, S. K.; Ahmed, S. K.; Mohammad, A. M.: Detection of abnormalities and electricity theft using genetic support vector machines. In: TENCON 2008–2008 IEEE Region 10 Conference (pp. 1–6). IEEE (2008, November)
Bhat, R. R.; Trevizan, R. D.; Sengupta, R.; Li, X.; Bretas, A.: Identifying non-technical power loss via spatial and temporal deep learning. In: 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA) (pp. 272–279). IEEE (2016, December)
Zheng, Z.; Yang, Y.; Niu, X.; Dai, H.-N.; Zhou, Y.: Wide and deep convolutional neural networks for electricity-theft detection to secure smart grids. IEEE Trans. Ind. Inform. 14, 1606–1615 (2018)
Muniz, C.; Figueiredo, K.; Vellasco, M.; Chavez, G.; Pacheco, M.: Irregularity detection on low tension electric installations by neural network ensembles. In: Proceedings of the 2009 International Joint Conference on Neural Networks, Atlanta, GA, USA, 14–19 June 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 2176–2182
Hasan, M.; Toma, R.N.; Nahid, A.A.; Islam, M.M.; Kim, J.M.: Electricity theft detection in smart grid systems: a CNN-LSTM based approach. Energies 12(17), 3310 (2019)
Angelos, E.W.S.; Saavedra, O.R.; Cortés, O.A.C.; de Souza, A.N.: Detection and identification of abnormalities in customer consumptions in power distribution systems. IEEE Trans. Power Deliv. 26, 2436–2442 (2011)
Nagi, J.; Yap, K.S.; Tiong, S.K.; Ahmed, S.K.; Nagi, F.: Improving SVM-based non-technical loss detection in power utility using the fuzzy inference system. IEEE Trans. Power Deliv. 26, 1284–1285 (2011)
Toma, R. N.; Hasan, M. N.; Nahid, A.-A.; Li, B.: Electricity theft detection to reduce non-technical loss using support vector machine in smart grid. In: 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT), pp. 1–6. IEEE, (2019)
Muniz, C.; Vellasco, M. M. B. R.; Tanscheit, R.; Figueiredo, K.: A neuro-fuzzy system for fraud detection in electricity distribution. In: IFSA/EUSFLAT Conf., pp. 1096–1101. (2009)
Jindal, A.; Dua, A.; Kaur, K.; Singh, M.; Kumar, N.; Mishra, S.: Decision tree and SVM-based data analytics for theft detection in smart grid. IEEE Trans. Ind. Inf. 12(3), 1005–1016 (2016)
Bohani, F. A.; Suliman, A.; Saripuddin, M.; Sameon, S. S.; Salleh, N.S. M.; Nazeri, S.: A comprehensive analysis of supervised learning techniques for electricity theft detection. J. Electr. Comput. Eng. 2021 (2021)
Khan, Z.A.; Adil, M.; Javaid, N.; Saqib, M.N.; Shafiq, M.; Choi, J.-G.: Electricity theft detection using supervised learning techniques on smart meter data. Sustainability 12(19), 8023 (2020)
Adil, M.; Javaid, N.; Qasim, U.; Ullah, I.; Shafiq, M.; Choi, J.-G.: LSTM and bat-based RUSBoost approach for electricity theft detection. Appl. Sci. 10(12), 4378 (2020)
Finardi, P.; Campiotti, I.; Plensack, G.; de Souza, R. D.; Nogueira, R.; Pinheiro, G.; Lotufo, R.: Electricity theft detection with self-attention arXiv preprint http://arxiv.org/abs/2002.06219 (2020)
Huang, Y.; Xu, Q.: Electricity theft detection based on stacked sparse denoising autoencoder. Int. J. Electr. Power Energy Syst. 125, 106448 (2021)
Kocaman, B.; Tümen, V.: Detection of electricity theft using data processing and LSTM method in distribution systems. Sādhanā 45(1), 1–10 (2020)
Aziz, S.; Naqvi, S. Z. H.; Khan, M. U.; Aslam, T.: Electricity theft detection using empirical mode decomposition and K-Nearest neighbors. In: 2020 International Conference on Emerging Trends in Smart Technologies (ICETST), pp. 1–5. IEEE, (2020)
Lemaître, G.; Nogueira, F.; Aridas, C.K.: Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(1), 559–563 (2017)
Singh, D.; Singh, B.: Investigating the impact of data normalization on classification performance. Appl. Soft Comput. (2019): 105524
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Nguyen, H.M.; Cooper, E.W.; Kamei, K.: Borderline over-sampling for imbalanced data classification. Int. J. Knowl. Eng. Soft Data Paradigms 3(1), 4–21 (2011)
Drummond, C.; Holte, R. C.: C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on learning from imbalanced datasets II, vol. 11, pp. 1–8. Washington DC: Citeseer, (2003)
Batista, G.E.A.P.A.; Prati, R.C.; Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newslett. 6(1), 20–29 (2004)
Batista, G. EAPA; Bazzan, A. L.C.; Monard, M. C.: Balancing training data for automated annotation of keywords: a case study. In: WOB, pp. 10–18. (2003)
He, H.; Bai, Y.; Garcia, E.A.; Li, S.: ADASYN: Adaptive synthetic sampling approach for imbalanced learning." In: 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), pp. 1322–1328. IEEE, (2008)
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. : Lightgbm: a highly efficient gradient boosting decision tree,” in Advances in neural information processing systems, pp. 3146–3154 (2017)
Ridgeway, G.: Generalized boosted models: A guide to the gbm package. Update 1(1), 2007 (2007)
Geurts, P.; Ernst, D.; Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)
Liaw, A.; Wiener, M., et al.: Classification and regression by randomforest. R News 2(3), 18–22 (2002)
Aha, D.W.; Kibler, D.; Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)
Holzinger, A.: Introduction to machine learning & knowledge extraction (make). Mach. Learn. Knowl. Extract. 1(1), 1–20 (2019)
Rish, I.: An empirical study of the naive Bayes classifier. In: IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, vol. 3, no. 22, pp. 41–46. (2001)
Kégl, B.: The return of AdaBoost. MH: multi-class Hamming trees. arXiv preprint https://arxiv.org/abs/1312.6086 (2013)
Walczak, S.: Artificial neural networks,” Advanced Methodologies and Technologies in Artificial Intelligence, Computer Simulation, and Human-Computer Interaction. IGI Global, pp. 40–53 (2019)
Hastie, T.; Tibshirani, R.; Friedman, J.: The elements of statistical learning: data mining, inference, and prediction. Springer, Berlin (2009)
Safavian, S.R.; Landgrebe, D.: A survey of decision tree classifier methodology. IEEE Trans. Syst. Man Cybern. 21(3), 660–674 (1991)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Ouyang, Z.; Sun, X.; Chen, J.; Yue, D.; Zhang, T.: Multi-view stacking ensemble for power consumption anomaly detection in the context of industrial internet of things. IEEE Access 6, 9623–9631 (2018)
Polikar, R.: Ensemble learning. In: Ensemble Machine Learning, pp. 1–34. Springer, Boston, MA, (2012)
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M., et al.: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Onan, A.; Korukoğlu, S.; Bulut, H.: A hybrid ensemble pruning approach based on consensus clustering and multi- objective evolutionary algorithm for sentiment classification. Inf. Process. Manage. 53(4), 814–833 (2017)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Banga, A., Ahuja, R. & Sharma, S.C. Accurate Detection of Electricity Theft Using Classification Algorithms and Internet of Things in Smart Grid. Arab J Sci Eng 47, 9583–9599 (2022). https://doi.org/10.1007/s13369-021-06313-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-021-06313-z