A Scalable Smart Meter Data Generator Using Spark

  • Nadeem IftikharEmail author
  • Xiufeng Liu
  • Sergiu Danalachi
  • Finn Ebertsen Nordbjerg
  • Jens Henrik Vollesen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10573)


Today, smart meters are being used worldwide. As a matter of fact smart meters produce large volumes of data. Thus, it is important for smart meter data management and analytics systems to process petabytes of data. Benchmarking and testing of these systems require scalable data, however, it can be challenging to get large data sets due to privacy and/or data protection regulations. This paper presents a scalable smart meter data generator using Spark that can generate realistic data sets. The proposed data generator is based on a supervised machine learning method that can generate data of any size by using small data sets as seed. Moreover, the generator can preserve the characteristics of data with respect to consumption patterns and user groups. This paper evaluates the proposed data generator in a cluster based environment in order to validate its effectiveness and scalability.


Smart meter Scalable Synthetic data generator Time series 



This research is supported by UCN-FOU funding (Project-6/2016-17) and the CITIES project by Danish Innovation Fund (1035-00027B).


  1. 1.
    Smart Meter From Wikipedia.
  2. 2.
    Liu, X., Golab, L., Golab, W., Ilyas, I.F.: Benchmarking smart meter data analytics. In: Proceedings of the 18th International Conference on Extending Database Technology, pp. 385–396 (2015)Google Scholar
  3. 3.
    Liu, X., Golab, L., Golab, W., Ilyas, I.F., Jin, S.: Smart meter data analytics: systems, algorithms, and benchmarking. In: ACM Transactions on Database Systems (TODS), 42(1), Article no. 2. ACM Press, New York (2017)Google Scholar
  4. 4.
    Liu, X., Golab, L., Ilyas, I.F.: SMAS: a smart meter data analysis system (Demo). In: Proceedings of the 31st International Conference on Data Engineering, pp. 147–1479 (2015)Google Scholar
  5. 5.
  6. 6.
    Iftikhar, N., Liu, X., Nordbjerg, F.E., Danalachi, S.: A prediction-based smart meter data generator. In: 19th International Conference on Network-Based Information Systems, pp. 173–180. IEEE (2016)Google Scholar
  7. 7.
    Time Series Components.
  8. 8.
    Zhang, G.P., Qi, M.: Neural network forecasting for seasonal and trend time series. Eur. J. Oper. Res. 160(2), 501–514 (2005)CrossRefzbMATHMathSciNetGoogle Scholar
  9. 9.
    Weiers, R.: Introduction to Business Statistics. Cengage Learning, Boston (2010)Google Scholar
  10. 10.
    Lawrence, K.D., Klimberg, R.K., Lawrence, S.M.: Fundamentals of Forecasting using Excel. Industrial Press Inc., Norwalk (2009)Google Scholar
  11. 11.
    Okcan, A., Riedewald, M.: Processing theta-joins using MapReduce. In: Proceedings of SIGMOD, pp. 949–960 (2011)Google Scholar
  12. 12.
    Wu, J.: Advances in K-means Clustering: A Data Mining Thinking. Springer Science & Business Media, Heidelberg (2012)CrossRefzbMATHGoogle Scholar
  13. 13.
    Parsian, M.: Data Algorithms: Recipes for Scaling Up with Hadoop and Spark. O’Reilly Media Inc., Sebastopol (2015)Google Scholar
  14. 14.
    Liao, T.W.: Clustering of time series data—a survey. Pattern Recogn. 38(11), 1857–1874 (2005)CrossRefzbMATHGoogle Scholar
  15. 15.
    Black, K.: Business Statistics: For Contemporary Decision Making. Wiley, Hoboken (2011)Google Scholar
  16. 16.
    Peng, B., Wan, C., Dong, S., Lin, J., Song, Y., Zhang, Y., Xiong, J.: A two-stage pattern recognition method for electric customer classification in smart grid. In: Smart Grid Communications (SmartGridComm), pp. 758–763 (2016)Google Scholar
  17. 17.
    Poess, M., Floyd, C.: New TPC benchmarks for decision support and web commerce. ACM Sigmod Rec. 29(4), 64–71 (2000)CrossRefGoogle Scholar
  18. 18.
    Breinl, K., Turkington, T., Stowasser, M.: Simulating daily precipitation and temperature: a weather generation framework for assessing hydrometeorological hazards. Meteorol. Appl. 22(3), 334–347 (2014)CrossRefGoogle Scholar
  19. 19.
    Li, Z., Brissette, F., Chen, J.: Finding the most appropriate precipitation probability distribution for stochastic weather generation and hydrological modeling in nordic watersheds. Hydrol. Process. 27(25), 3718–3729 (2013)CrossRefGoogle Scholar
  20. 20.
    Breinl, K., Turkington, T., Stowasser, M.: A weather generator for hydro-meteorological hazard applications EGU general assembly conference. In: EGU General Assembly Conference Abstracts, vol. 16, p. 10522 (2014)Google Scholar
  21. 21.
    van Paassen, A.H., Luo, Q.X.: Weather data generator to study climate change on buildings. Build. Serv. Eng. Res. Technol. 23(4), 251–258 (2002)CrossRefGoogle Scholar
  22. 22.
    Shamshad, A., Bawadi, M.A., Hussin, W.W., Majid, T.A., Sanusi, S.A.M.: First and second order markov chain models for synthetic generation of wind speed time series. Energy 30(5), 693–708 (2005)CrossRefGoogle Scholar
  23. 23.
    Cuddihy, M.A., Drummond Jr., J.B., Bourquin, D.J.: Ford motor company, vehicle crash data generator. U.S. Patent No. 5,608,629 (1997)Google Scholar
  24. 24.
    Zhang, G.P.: Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 50, 159–175 (2003)CrossRefzbMATHGoogle Scholar
  25. 25.
    Anderson, P.L., Meerschaert, M.M., Zhang, K.: Forecasting with prediction intervals for periodic autoregressive moving average models. J. Time Ser. Anal. 34(2), 187–193 (2013)CrossRefzbMATHMathSciNetGoogle Scholar
  26. 26.
    Kegel, L., Hahmann, M., Lehner, W.: Template-based time series generation with loom. In: EDBT/ICDT Workshops, vol. 1558 (2016)Google Scholar
  27. 27.
    De Gooijer, J.G., Hyndman, R.J.: 25 years of time series forecasting. Int. J. Forecast. 22(3), 443–473 (2006)CrossRefGoogle Scholar
  28. 28.
    Arlitt, M., Marwah, M., Bellala, G., Shah, A., Healey, J., Vandiver, B.: IoTA bench: an internet of things analytics benchmark. In: 6th ACM/SPEC International Conference on Performance Engineering, pp. 133–144. ACM Press, New York (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Nadeem Iftikhar
    • 1
    Email author
  • Xiufeng Liu
    • 2
  • Sergiu Danalachi
    • 1
  • Finn Ebertsen Nordbjerg
    • 1
  • Jens Henrik Vollesen
    • 1
  1. 1.University College of Northern DenmarkAalborgDenmark
  2. 2.Technical University of DenmarkKongens LyngbyDenmark

Personalised recommendations