Skip to main content

SmartFD: A Real Big Data Application for Electrical Fraud Detection

Part of the Lecture Notes in Computer Science book series (LNAI,volume 10870)


The main objective of this paper is the application of big data analytics to a real case in the field of smart electric networks. Smart meters are not only elements to measure consumption, but they also constitute a network of millions of sensors in the electricity network. These sensors provide a huge amount of data that, once analyzed, can lead to significant advances for the society. In this way, tools are being developed in order to reach certain goals, such as obtaining a better consumption estimation (which would imply a better production planning), finding better rates based on the time discrimination or the contracted power, or minimizing the non-technical losses in the network, whose actual costs are eventually paid by end-consumers, among others. In this work, real data from Spanish consumers have been analyzed to detect fraud in consumption. First, 1 TB of raw data was preprocessed in a HDFS-Spark infrastructure. Second, data duplication and outliers were removed, and missing values handled with specific big data algorithms. Third, customers were characterized by means of clustering techniques in different scenarios. Finally, several key factors in fraud consumption were found. Very promising results were achieved, verging on 80% accuracy.


  • Big data
  • Sensors
  • Classification
  • Fraud detection

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-92639-1_11
  • Chapter length: 11 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   89.00
Price excludes VAT (USA)
  • ISBN: 978-3-319-92639-1
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   119.99
Price excludes VAT (USA)
Fig. 1.


  1. Apache Parquet: A columnar storage format.

  2. Armbrust, M., Xin, R.S., Lian, C., Huai, Y., Liu, D., Bradley, J.K., Meng, X., Kaftan, T., Franklin, M.J., Ghodsi, A., Zaharia, M.: Spark SQL: relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD 2015, pp. 1383–1394. ACM, New York (2015)

    Google Scholar 

  3. Cabral, J.E., Pinto, J.O.P., Martins, E.M., Pinto, A.M.A.C.: Fraud detection in high voltage electricity consumers using data mining. In: Proceedings of the IEEE Transmission and Distribution Conference and Exposition, pp. 1–5 (2008)

    Google Scholar 

  4. Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, pp. 785–794. ACM, New York (2016)

    Google Scholar 

  5. Cody, C., Ford, V., Siraj, A.: Decision tree learning for fraud detection in consumer energy consumption. In: Proceedings of the IEEE International Conference on Machine Learning and Applications, pp. 1175–1179 (2015)

    Google Scholar 

  6. Coma-Puig, B., Carmona, J., Gavald, R., Alcoverro, S., Martin, V.: Fraud detection in energy consumption: a supervised approach. In: Proceedings of the IEEE International Conference on Data Science and Advanced Analytics, pp. 120–129 (2016)

    Google Scholar 

  7. Costa, B.C., Alberto, B.L.A., Portela, A.M., Maduro, W., Eler, E.O.: Fraud detection in electric power distribution networks using an ANN-based knowledge discovery process. Int. J. Artif. Intell. Appl. 4(6), 17–23 (2013)

    Google Scholar 

  8. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: The KDD process for extracting useful knowledge from volumes of data. Commun. ACM 39(11), 27–34 (1996)

    CrossRef  Google Scholar 

  9. Ford, V., Siraj, A., Eberle, W.: Smart grid energy fraud detection using artificial neural networks. In: Proceedings of the IEEE Symposium on Computational Intelligence Applications in Smart Grid, pp. 1–6 (2014)

    Google Scholar 

  10. Golub, G.H., Heath, M., Wahba, G.: Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21(2), 215–223 (1979)

    CrossRef  MathSciNet  Google Scholar 

  11. Google: Snappy: A fast compressor/decompressor.

  12. Lawi, A., Wungo, S.L., Manjang, S.: Identifying irregularity electricity usage of customer behaviors using logistic regression and linear discriminant analysis. In: Proceedings of the International Conference on Science in Information Technology, pp. 552–557 (2017)

    Google Scholar 

  13. Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D., Amde, M., Owen, S., et al.: MLlib: machine learning in Apache Spark. J. Mach. Learn. Res. 17(1), 1235–1241 (2016)

    MathSciNet  MATH  Google Scholar 

  14. Monedero, I., Biscarri, F., Len, C., Guerrero, J.I., Biscarri, J., Milln, R.: Detection of frauds and other non-technical losses in a power utility using Pearson coefficient, Bayesian networks and decision trees. Int. J. Electric. Power Energy Syst. 34, 90–98 (2012)

    CrossRef  Google Scholar 

  15. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10, May 2010

    Google Scholar 

  16. Spiric, J.V., Docic, M.B., Stankovic, S.S.: Fraud detection in registered electricity time series. Int. J. Electr. Power Energy Syst. 71, 42–50 (2016)

    CrossRef  Google Scholar 

Download references


The authors would like to thank the Spanish Ministry of Economy and Competitiveness for the support under projects TIN2014-55894-C2-R and TIN2017-88209-C2-R.

Author information

Authors and Affiliations


Corresponding authors

Correspondence to D. Gutiérrez-Avilés , J. A. Fábregas or J. Tejedor .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Gutiérrez-Avilés, D. et al. (2018). SmartFD: A Real Big Data Application for Electrical Fraud Detection. In: , et al. Hybrid Artificial Intelligent Systems. HAIS 2018. Lecture Notes in Computer Science(), vol 10870. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-92638-4

  • Online ISBN: 978-3-319-92639-1

  • eBook Packages: Computer ScienceComputer Science (R0)