The VLDB Journal

, Volume 24, Issue 2, pp 193–218

A time-series compression technique and its application to the smart grid

  • Frank Eichinger
  • Pavel Efros
  • Stamatis Karnouskos
  • Klemens Böhm
Regular Paper

Abstract

Time-series data is increasingly collected in many domains. One example is the smart electricity infrastructure, which generates huge volumes of such data from sources such as smart electricity meters. Although today these data are used for visualization and billing in mostly 15-min resolution, its original temporal resolution frequently is more fine-grained, e.g., seconds. This is useful for various analytical applications such as short-term forecasting, disaggregation and visualization. However, transmitting and storing huge amounts of such fine-grained data are prohibitively expensive in terms of storage space in many cases. In this article, we present a compression technique based on piecewise regression and two methods which describe the performance of the compression. Although our technique is a general approach for time-series compression, smart grids serve as our running example and as our evaluation scenario. Depending on the data and the use-case scenario, the technique compresses data by ratios of up to factor 5,000 while maintaining its usefulness for analytics. The proposed technique has outperformed related work and has been applied to three real-world energy datasets in different scenarios. Finally, we show that the proposed compression technique can be implemented in a state-of-the-art database management system.

Keywords

Data compression Time series  Piecewise regression Smart grid 

Supplementary material

778_2014_368_MOESM1_ESM.pdf (36 kb)
Supplementary material 1 (pdf 36 KB)

References

  1. 1.
    Aggarwal, S.K., Saini, L.M., Kumar, A.: Electricity price forecasting in deregulated markets: a review and evaluation. Int. J. Electr. Power Energy Syst. 31(1), 13–22 (2009)CrossRefGoogle Scholar
  2. 2.
    Barker, S., Mishra, A., Irwin, D., Cecchet, E., Shenoy, P., Albrecht, J.: Smart*: an open data set and tools for enabling research in sustainable homes. In: Workshop on Data Mining Applications in Sustainability (SustKDD) (2012)Google Scholar
  3. 3.
    Box, G.E.P., Jenkins, G.M.: Time Series Analysis: Forecasting and Control. Holden-Day, San Francisco (1976)MATHGoogle Scholar
  4. 4.
    Brockwell, P.J., Davis, R.A.: Introduction to Time Series and Forecasting. Springer, Berlin (2002)CrossRefMATHGoogle Scholar
  5. 5.
    Chan, K.P., Fu, A.C.: Efficient time series matching by wavelets. In: International Conference on Data Engineering (ICDE), pp. 126–133 (1999)Google Scholar
  6. 6.
    Dalai, M., Leonardi, R.: Approximations of one-dimensional digital signals under the \(l^\infty \) norm. IEEE Trans. Signal Process. 54(8), 3111–3124 (2006)CrossRefGoogle Scholar
  7. 7.
    Dannecker, L., Böhm, M., Fischer, U., Rosenthal, F., Hackenbroich, G., Lehner, W.: State-of-the-Art Report on Forecasting—A Survey of Forecast Models for Energy Demand and Supply. Deliverable 4.1, The MIRACLE Consortium, Dresden, Germany (2010)Google Scholar
  8. 8.
    Eichinger, F., Pathmaperuma, D., Vogt, H., Müller, E.: Data analysis challenges in the future energy domain. In: Yu, T., Chawla, N., Simoff, S. (eds.) Computational Intelligent Data Analysis for Sustainable Development, chap. 7, pp. 181–242. Chapman and Hall/CRC, London (2013)Google Scholar
  9. 9.
    Elmeleegy, H., Elmagarmid, A.K., Cecchet, E., Aref, W.G., Zwaenepoel, W.: Online piece-wise linear approximation of numerical streams with precision guarantees. In: International Conference on Very Large Data Bases (VLDB), pp. 145–156 (2009)Google Scholar
  10. 10.
    Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. SIGMOD Rec. 23(2), 419–429 (1994)CrossRefGoogle Scholar
  11. 11.
    Färber, F., May, N., Lehner, W., Große, P., Müller, I., Rauhe, H., Dees, J.: The SAP HANA database—an architecture overview. IEEE Data Eng. Bull. 35(1), 28–33 (2012)Google Scholar
  12. 12.
    Feller, W.: The asymptotic distribution of the range of sums of independent random variables. Ann. Math. Stat. 22(3), 427–432 (1951)CrossRefMATHMathSciNetGoogle Scholar
  13. 13.
    Huffman, D.A.: A method for the construction of minimum-redundancy codes. Proc. Inst. Radio Eng. 40(9), 1098–1101 (1952)Google Scholar
  14. 14.
    Hyndman, R.J., Koehler, A.B.: Another look at measures of forecast accuracy. Int. J. Forecast. 22(4), 679–688 (2006)CrossRefGoogle Scholar
  15. 15.
    Ilic, D., Karnouskos, S., Goncalves Da Silva, P.: Sensing in power distribution networks via large numbers of smart meters. In: Conference on Innovative Smart Grid Technologies (ISGT), pp. 1–6 (2012)Google Scholar
  16. 16.
    Karnouskos, S.: Demand side management via prosumer interactions in a smart city energy marketplace. In: Conference on Innovative Smart Grid Technologies (ISGT), pp. 1–7 (2011)Google Scholar
  17. 17.
    Karnouskos, S., Goncalves Da Silva, P., Ilic, D.: Energy services for the smart grid city. In: International Conference on Digital Ecosystem Technologies—Complex Environment Engineering (DEST-CEE), pp. 1–6 (2012)Google Scholar
  18. 18.
    Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Locally adaptive dimensionality reduction for indexing large time series databases. SIGMOD Rec. 30(2), 151–162 (2001)CrossRefGoogle Scholar
  19. 19.
    Keogh, E., Kasetty, S.: On the need for time series data mining benchmarks: a survey and empirical demonstration. In: International Conference on Knowledge Discovery and Data Mining (KDD), pp. 102–111 (2002)Google Scholar
  20. 20.
    Keogh, E.J., Pazzani, M.J.: An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. In: International Conference on Knowledge Discovery and Data Mining (KDD), pp. 239–243 (1998)Google Scholar
  21. 21.
    Kolter, J.Z., Johnson, M.: REDD: a public data set for energy disaggregation research. In: Workshop on Data Mining Applications in Sustainability (SustKDD) (2011)Google Scholar
  22. 22.
    Lazaridis, I., Mehrotra, S.: Capturing sensor-generated time series with quality guarantees. In: International Conference on Data Engineering (ICDE), pp. 429–440 (2003)Google Scholar
  23. 23.
    Le Borgne, Y.A., Santini, S., Bontempi, G.: Adaptive model selection for time series prediction in wireless sensor networks. Sig. Process. 87, 3010–3020 (2007)CrossRefMATHGoogle Scholar
  24. 24.
    Lin, J., Keogh, E., Wei, L., Lonardi, S.: Experiencing SAX: a novel symbolic representation of time series. Data Min. Knowl. Disc. 15(2), 107–144 (2007)CrossRefMathSciNetGoogle Scholar
  25. 25.
    Makridakis, S.G., Wheelwright, S.C., Hyndman, R.J.: Forecasting: Methods and Applications, 3rd edn. Wiley, New York (1998)Google Scholar
  26. 26.
    Mattern, F., Staake, T., Weiss, M.: ICT for green: how computers can help us to conserve energy. In: International Conference on Energy-Efficient Computing and Networking (E-Energy), pp. 1–10 (2010)Google Scholar
  27. 27.
  28. 28.
    US Department of Energy: Estimating Appliance and Home Electronic Energy Use (2013). http://energy.gov/energysaver/articles/estimating-appliance-and-home-electronic-energy-use. Accessed 20 Nov 2013
  29. 29.
    Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)MATHGoogle Scholar
  30. 30.
    Nga, D., See, O., Do Nguyet Quang, C., Chee, L.: Visualization techniques in smart grid. Smart Grid Renew. Energy 3(3), 175–185 (2012)CrossRefGoogle Scholar
  31. 31.
    Papaioannou, T.G., Riahi, M., Aberer, K.: Towards online multi-model approximation of time series. In: International Conference on Mobile Data Management (MDM), pp. 33–38 (2011)Google Scholar
  32. 32.
    Plattner, H., Zeier, A.: In-Memory Data Management—An Inflection Point for Enterprise Applications. Springer (2011)Google Scholar
  33. 33.
    Ramanathan, R., Engle, R., Granger, C.W., Vahid-Araghi, F., Brace, C.: Short-run forecasts of electricity loads and peaks. Int. J. Forecast. 13(2), 161–174 (1997)CrossRefGoogle Scholar
  34. 34.
    Ratanamahatana, C., Lin, J., Gunopulos, D., Keogh, E., Vlachos, M., Das, G.: Mining time series data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, chap. 56, pp. 1049–1077. Springer, Berlin (2010)Google Scholar
  35. 35.
    Ringwelski, M., Renner, C., Reinhardt, A., Weigely, A., Turau, V.: The Hitchhiker’s guide to choosing the compression algorithm for your smart meter data. In: International Energy Conference (ENERGYCON), pp. 935–940 (2012)Google Scholar
  36. 36.
    Salomon, D.: A Concise Introduction to Data Compression. Springer, Berlin (2008)CrossRefMATHGoogle Scholar
  37. 37.
    Seidel, R.: Small-dimensional linear programming and convex hulls made easy. Discret. Comput. Geom. 6(1), 423–434 (1991)Google Scholar
  38. 38.
    Shahabi, C., Tian, X., Zhao, W.: TSA-tree: a wavelet-based approach to improve the efficiency of multi-level surprise and trend queries on time-serieseries data. In: International Conference on Scientific and Statistical Database Management (SSDBM), pp. 55–68 (2000)Google Scholar
  39. 39.
    Shieh, J., Keogh, E.: iSAX: indexing and mining terabyte sized time series. In: International Conference on Knowledge Discovery and Data Mining (KDD), pp. 623–631 (2008)Google Scholar
  40. 40.
    Taylor, J.W.: Triple seasonal methods for short-term electricity demand forecasting. Eur. J. Oper. Res. 204(1), 139–152 (2010)Google Scholar
  41. 41.
    Tishler, A., Zang, I.: A min-max algorithm for non-linear regression models. Appl. Math. Comput. 13(1/2), 95–115 (1983) Google Scholar
  42. 42.
    Vogt, H., Weiss, H., Spiess, P., Karduck, A.P.: Market-based prosumer participation in the smart grid. In: International Conference on Digital Ecosystems and Technologies (DEST), pp. 592–597 (2010)Google Scholar
  43. 43.
    Wijaya, T.K., Eberle, J., Aberer, K.: Symbolic representation of smart meter data. In: Workshop on Energy Data Management (EnDM), pp. 242–248 (2013)Google Scholar
  44. 44.
    Yi, B.K., Faloutsos, C.: Fast time sequence indexing for arbitrary \(L_p\) norms. In: International Conference on Very Large Data Bases (VLDB), pp. 385–394 (2000)Google Scholar
  45. 45.
    Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23(3), 337–343 (1977)CrossRefMATHMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Frank Eichinger
    • 1
  • Pavel Efros
    • 1
  • Stamatis Karnouskos
    • 2
  • Klemens Böhm
    • 1
  1. 1.Karlsruhe Institute of Technology (KIT)KarlsruheGermany
  2. 2.SAP AGKarlsruheGermany

Personalised recommendations