Cluster Computing

, Volume 22, Supplement 4, pp 10163–10173 | Cite as

Short-term load forecasting with clustering–regression model in distributed cluster

  • Jingsheng Lei
  • Ting Jin
  • Jiawei HaoEmail author
  • Fengyong Li


This paper tackles a new challenge in power big data: how to improve the precision of short-term load forecasting with large-scale data set. The proposed load forecasting method is based on Spark platform and “clustering–regression” model, which is implemented by Apache Spark machine learning library (MLlib). Proposed scheme firstly clustering the users with different electrical attributes and then obtains the “load characteristic curve of each cluster”, which represents the features of various types of users and is considered as the properties of a regional total load. Furthermore, the “clustering–regression” model is used to forecast the power load of the certain region. Extensive experiments show that the proposed scheme can predict reasonably the short-term power load and has excellent robustness. Comparing with the single-alone model, the proposed method has a higher efficiency in dealing with large-scale data set and can be effectively applied to the power load forecasting.


Distributed cluster Short-term load forecasting Clustering–regression model Load characteristic curve 



This work was supported by National Natural Science Foundation of China under Grants (Nos. 61472236, 61672337, 61602295, and 61562020), Natural Science Foundation of Shanghai (No. 16ZR1413100), and the Excellent University Young Teachers Training Program of Shanghai Municipal Education Commission (No. ZZsdl15105).


  1. 1.
    Cai, Y., et al.: Modeling and impact analysis of interdependent characteristics on cascading failures in smart grids. Int. J. Electr. Power Energy Syst. 89, 106–114 (2017)CrossRefGoogle Scholar
  2. 2.
    Verma, V., Kumar, A.: Cascaded multilevel active rectifier fed three-phase smart pump load on single-phase rural feeder. IEEE Trans. Power Electr. 32(7), 5398–5410 (2017)MathSciNetCrossRefGoogle Scholar
  3. 3.
    ZhenYa, L.: Global Energy Internet. China Electric Power Press, Beijing (2015)Google Scholar
  4. 4.
    ZhenYa, L.: Technology of Smart Grid. China Electric Power Press, Beijing (2010)Google Scholar
  5. 5.
    Song, D., Liu, X.: Medium and long-term electric power planning load forecasting based on variable weights gray model. In: Huang, B., Yao, Y. (eds.) Proceedings of the 5th International Conference on Electrical Engineering and Automatic Control, pp. 137–144 (2016)Google Scholar
  6. 6.
    Hassan, S., et al.: A systematic design of interval type-2 fuzzy logic system using extreme learning machine for electricity load demand forecasting. Int. J. Electr. Power Energy Syst. 82, 1–10 (2016)CrossRefGoogle Scholar
  7. 7.
    Soudari, M., et al.: Learning based personalized energy management systems for residential buildings. Energy Build. 127, 953–968 (2016)CrossRefGoogle Scholar
  8. 8.
    Lei, S.L., Sun, C.X., Zhou, X.X.: The research of local linear model of short term electrical load on multivariate time series. Proceedings of the CSEE 26(2), 5 (2006)Google Scholar
  9. 9.
    Hu, R., et al.: A short-term power load forecasting model based on the generalized regression neural network with decreasing step fruit fly optimization algorithm. Neurocomputing 221, 24–31 (2017)CrossRefGoogle Scholar
  10. 10.
    Khwaja, A.S., et al.: Boosted neural networks for improved short-term electric load forecasting. Electr. Power Syst. Res. 143, 431–437 (2017)CrossRefGoogle Scholar
  11. 11.
    Liang, Y., et al.: Short-term load forecasting based on wavelet transform and least squares support vector machine optimized by improved cuckoo search. Energies 9(12), 827 (2016)CrossRefGoogle Scholar
  12. 12.
    Dudek, G.: Short-term load forecasting using random forests. In: Filev, D., et al. (eds.) Intelligent Systems. Architectures, Systems Applications, pp. 821–828. Springer, Cham (2015)Google Scholar
  13. 13.
    Lee, C.-W., Lin, B.-Y.: Application of hybrid quantum Tabu search with support vector regression (SVR) for load forecasting. Energies 9(11), 873 (2016)CrossRefGoogle Scholar
  14. 14.
    Spark, A.:. (2017)
  15. 15.
    Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. HotCloud 10, 95 (2010)Google Scholar
  16. 16.
    Lin, F.: Research and Implementation of Memory Optimization Based on Parallel Computing Engine Spark. Tsinghua University, Beijing (2013)Google Scholar
  17. 17.
    Rodrigues, L.M., et al.: Parallel and distributed Kmeans to identify the translation initiation site of proteins. In: Proceedings 2012 IEEE International Conference on Systems, Man, and Cybernetics, pp. 1639–1645 (2012)Google Scholar
  18. 18.
    Yang, Y.: An improved cop-kmeans clustering for solving constraint violation based on mapreduce framework. Fundam. Inf. 126(4), 301–318 (2013)MathSciNetGoogle Scholar
  19. 19.
    Pandagale, A.A., Surve, A.R.: IEEE: Hadoop-HBase for finding association rules using Apriori MapReduce algarithm. In: 2016 IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (Rteict), pp. 795–798 (2016)Google Scholar
  20. 20.
    Lee, K.C., Open image in new window, and Open image in new window, Development of detection system of vocal tic symptoms using SVM algorithm in Spark. Database Res. 32(3), pp. 115–127 (2016)Google Scholar
  21. 21.
    Wang, B., Wang, D., Zhang, S.: Distributed short-term load forecasting algorithm based on Spark and IPPSO-LSSVM. Electr. Power Autom. Equip. 36(1), 117–122 (2016)Google Scholar
  22. 22.
    Ma Tiannan, N.X., Huang, Y.: Short-term load forecasting for distributed energy system based on Spark platform and multi-variable L2-boosting regression model. Power Syst. Technol. 40(6), 8 (2016)Google Scholar
  23. 23.
    Xie, M., Ji, D.J.L.X.: Cooling load forecasting method based on support vector machine optimized with entropy and variable accuracy roughness set. Power Syst. Technol. 41(1), 5 (2017)Google Scholar
  24. 24.
    Yaslan, Y., Bican, B.: Empirical mode decomposition based denoising method with support vector regression for time series prediction: a case study for electricity load forecasting. Measurement 103, 52–61 (2017)CrossRefGoogle Scholar
  25. 25.
    Meng, X., et al.: MLlib: machine learning in Apache Spark. J. Mach. Learn. Res. 17, 1235–1241 (2016)MathSciNetzbMATHGoogle Scholar
  26. 26.
    Siegal, D., et al., Smart-MLlib: a high-performance machine-learning library. In 2016 IEEE International Conference on Cluster Computing, pp. 336–345 (2016)Google Scholar
  27. 27.
    Zhang, F., et al.: A distributed frequent itemset mining algorithm using Spark for Big Data analytics. lust. Comput. J. Netw. Softw. Tools Appl. 18(4), 1493–1501 (2015)Google Scholar
  28. 28.
    Zaharia, M.: Apache Spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)CrossRefGoogle Scholar
  29. 29.
    Sepasi, S.: Very short term load forecasting of a distribution system with high PV penetration. Renew. Energy 106, 142–148 (2017)CrossRefGoogle Scholar
  30. 30.
    Zhang, S., Liu, J., Zhao, B., et al.: Cloud computing-based analysis on residential electricity consumption behavior. Power Syst. Technol. 37(6), 1542–1546 (2013)Google Scholar
  31. 31.
    Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Education, New York (2006)Google Scholar
  32. 32.
    Huang, M.: Spark MLlib Machine Learning: Algorithm, Source Code and Practical. Publishing House of Electronics Industry, Beijing (2016)Google Scholar
  33. 33.
    Gonzalez, C., Mira-McWilliams, J., Juarez, I.: Important variable assessment and electricity price forecasting based on regression tree models: classification and regression trees, Bagging and Random Forests. LET Gener. Transm. Distrib. 9(11), 1120–1128 (2015)CrossRefGoogle Scholar
  34. 34.
    Huang, N., Lu, D., Xu, D.: A permutation importance-based feature selection method for short-term electricity load forecasting using random forest. Energies 9(10), 767 (2016)CrossRefGoogle Scholar
  35. 35.
    Lahouar, A., Slama, J.B.H.: Day-ahead load forecast using random forest and expert input selection. Energy Convers. Manag. 103, 1040–1051 (2015)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.College of Computer Science and TechnologyShanghai University of Electric PowerShanghaiPeople’s Republic of China
  2. 2.School of Information Science and TechnologyHainan UniversityHaikouPeople’s Republic of China

Personalised recommendations