Abstract
The main objective of this paper is the application of big data analytics to a real case in the field of smart electric networks. Smart meters are not only elements to measure consumption, but they also constitute a network of millions of sensors in the electricity network. These sensors provide a huge amount of data that, once analyzed, can lead to significant advances for the society. In this way, tools are being developed in order to reach certain goals, such as obtaining a better consumption estimation (which would imply a better production planning), finding better rates based on the time discrimination or the contracted power, or minimizing the non-technical losses in the network, whose actual costs are eventually paid by end-consumers, among others. In this work, real data from Spanish consumers have been analyzed to detect fraud in consumption. First, 1 TB of raw data was preprocessed in a HDFS-Spark infrastructure. Second, data duplication and outliers were removed, and missing values handled with specific big data algorithms. Third, customers were characterized by means of clustering techniques in different scenarios. Finally, several key factors in fraud consumption were found. Very promising results were achieved, verging on 80% accuracy.
Keywords
- Big data
- Sensors
- Classification
- Fraud detection
This is a preview of subscription content, access via your institution.
Buying options

References
Apache Parquet: A columnar storage format. https://parquet.apache.org
Armbrust, M., Xin, R.S., Lian, C., Huai, Y., Liu, D., Bradley, J.K., Meng, X., Kaftan, T., Franklin, M.J., Ghodsi, A., Zaharia, M.: Spark SQL: relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD 2015, pp. 1383–1394. ACM, New York (2015)
Cabral, J.E., Pinto, J.O.P., Martins, E.M., Pinto, A.M.A.C.: Fraud detection in high voltage electricity consumers using data mining. In: Proceedings of the IEEE Transmission and Distribution Conference and Exposition, pp. 1–5 (2008)
Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, pp. 785–794. ACM, New York (2016)
Cody, C., Ford, V., Siraj, A.: Decision tree learning for fraud detection in consumer energy consumption. In: Proceedings of the IEEE International Conference on Machine Learning and Applications, pp. 1175–1179 (2015)
Coma-Puig, B., Carmona, J., Gavald, R., Alcoverro, S., Martin, V.: Fraud detection in energy consumption: a supervised approach. In: Proceedings of the IEEE International Conference on Data Science and Advanced Analytics, pp. 120–129 (2016)
Costa, B.C., Alberto, B.L.A., Portela, A.M., Maduro, W., Eler, E.O.: Fraud detection in electric power distribution networks using an ANN-based knowledge discovery process. Int. J. Artif. Intell. Appl. 4(6), 17–23 (2013)
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: The KDD process for extracting useful knowledge from volumes of data. Commun. ACM 39(11), 27–34 (1996)
Ford, V., Siraj, A., Eberle, W.: Smart grid energy fraud detection using artificial neural networks. In: Proceedings of the IEEE Symposium on Computational Intelligence Applications in Smart Grid, pp. 1–6 (2014)
Golub, G.H., Heath, M., Wahba, G.: Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21(2), 215–223 (1979)
Google: Snappy: A fast compressor/decompressor. https://google.github.io/snappy/
Lawi, A., Wungo, S.L., Manjang, S.: Identifying irregularity electricity usage of customer behaviors using logistic regression and linear discriminant analysis. In: Proceedings of the International Conference on Science in Information Technology, pp. 552–557 (2017)
Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D., Amde, M., Owen, S., et al.: MLlib: machine learning in Apache Spark. J. Mach. Learn. Res. 17(1), 1235–1241 (2016)
Monedero, I., Biscarri, F., Len, C., Guerrero, J.I., Biscarri, J., Milln, R.: Detection of frauds and other non-technical losses in a power utility using Pearson coefficient, Bayesian networks and decision trees. Int. J. Electric. Power Energy Syst. 34, 90–98 (2012)
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10, May 2010
Spiric, J.V., Docic, M.B., Stankovic, S.S.: Fraud detection in registered electricity time series. Int. J. Electr. Power Energy Syst. 71, 42–50 (2016)
Acknowledgments
The authors would like to thank the Spanish Ministry of Economy and Competitiveness for the support under projects TIN2014-55894-C2-R and TIN2017-88209-C2-R.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Gutiérrez-Avilés, D. et al. (2018). SmartFD: A Real Big Data Application for Electrical Fraud Detection. In: , et al. Hybrid Artificial Intelligent Systems. HAIS 2018. Lecture Notes in Computer Science(), vol 10870. Springer, Cham. https://doi.org/10.1007/978-3-319-92639-1_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-92639-1_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92638-4
Online ISBN: 978-3-319-92639-1
eBook Packages: Computer ScienceComputer Science (R0)