Abstract
Time-series data is increasingly collected in many domains. One example is the smart electricity infrastructure, which generates huge volumes of such data from sources such as smart electricity meters. Although today these data are used for visualization and billing in mostly 15-min resolution, its original temporal resolution frequently is more fine-grained, e.g., seconds. This is useful for various analytical applications such as short-term forecasting, disaggregation and visualization. However, transmitting and storing huge amounts of such fine-grained data are prohibitively expensive in terms of storage space in many cases. In this article, we present a compression technique based on piecewise regression and two methods which describe the performance of the compression. Although our technique is a general approach for time-series compression, smart grids serve as our running example and as our evaluation scenario. Depending on the data and the use-case scenario, the technique compresses data by ratios of up to factor 5,000 while maintaining its usefulness for analytics. The proposed technique has outperformed related work and has been applied to three real-world energy datasets in different scenarios. Finally, we show that the proposed compression technique can be implemented in a state-of-the-art database management system.
Similar content being viewed by others
Notes
SQLScript is a procedural programming language in SAP HANA.
L is a programming language similar to C used in SAP HANA.
References
Aggarwal, S.K., Saini, L.M., Kumar, A.: Electricity price forecasting in deregulated markets: a review and evaluation. Int. J. Electr. Power Energy Syst. 31(1), 13–22 (2009)
Barker, S., Mishra, A., Irwin, D., Cecchet, E., Shenoy, P., Albrecht, J.: Smart*: an open data set and tools for enabling research in sustainable homes. In: Workshop on Data Mining Applications in Sustainability (SustKDD) (2012)
Box, G.E.P., Jenkins, G.M.: Time Series Analysis: Forecasting and Control. Holden-Day, San Francisco (1976)
Brockwell, P.J., Davis, R.A.: Introduction to Time Series and Forecasting. Springer, Berlin (2002)
Chan, K.P., Fu, A.C.: Efficient time series matching by wavelets. In: International Conference on Data Engineering (ICDE), pp. 126–133 (1999)
Dalai, M., Leonardi, R.: Approximations of one-dimensional digital signals under the \(l^\infty \) norm. IEEE Trans. Signal Process. 54(8), 3111–3124 (2006)
Dannecker, L., Böhm, M., Fischer, U., Rosenthal, F., Hackenbroich, G., Lehner, W.: State-of-the-Art Report on Forecasting—A Survey of Forecast Models for Energy Demand and Supply. Deliverable 4.1, The MIRACLE Consortium, Dresden, Germany (2010)
Eichinger, F., Pathmaperuma, D., Vogt, H., Müller, E.: Data analysis challenges in the future energy domain. In: Yu, T., Chawla, N., Simoff, S. (eds.) Computational Intelligent Data Analysis for Sustainable Development, chap. 7, pp. 181–242. Chapman and Hall/CRC, London (2013)
Elmeleegy, H., Elmagarmid, A.K., Cecchet, E., Aref, W.G., Zwaenepoel, W.: Online piece-wise linear approximation of numerical streams with precision guarantees. In: International Conference on Very Large Data Bases (VLDB), pp. 145–156 (2009)
Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. SIGMOD Rec. 23(2), 419–429 (1994)
Färber, F., May, N., Lehner, W., Große, P., Müller, I., Rauhe, H., Dees, J.: The SAP HANA database—an architecture overview. IEEE Data Eng. Bull. 35(1), 28–33 (2012)
Feller, W.: The asymptotic distribution of the range of sums of independent random variables. Ann. Math. Stat. 22(3), 427–432 (1951)
Huffman, D.A.: A method for the construction of minimum-redundancy codes. Proc. Inst. Radio Eng. 40(9), 1098–1101 (1952)
Hyndman, R.J., Koehler, A.B.: Another look at measures of forecast accuracy. Int. J. Forecast. 22(4), 679–688 (2006)
Ilic, D., Karnouskos, S., Goncalves Da Silva, P.: Sensing in power distribution networks via large numbers of smart meters. In: Conference on Innovative Smart Grid Technologies (ISGT), pp. 1–6 (2012)
Karnouskos, S.: Demand side management via prosumer interactions in a smart city energy marketplace. In: Conference on Innovative Smart Grid Technologies (ISGT), pp. 1–7 (2011)
Karnouskos, S., Goncalves Da Silva, P., Ilic, D.: Energy services for the smart grid city. In: International Conference on Digital Ecosystem Technologies—Complex Environment Engineering (DEST-CEE), pp. 1–6 (2012)
Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Locally adaptive dimensionality reduction for indexing large time series databases. SIGMOD Rec. 30(2), 151–162 (2001)
Keogh, E., Kasetty, S.: On the need for time series data mining benchmarks: a survey and empirical demonstration. In: International Conference on Knowledge Discovery and Data Mining (KDD), pp. 102–111 (2002)
Keogh, E.J., Pazzani, M.J.: An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. In: International Conference on Knowledge Discovery and Data Mining (KDD), pp. 239–243 (1998)
Kolter, J.Z., Johnson, M.: REDD: a public data set for energy disaggregation research. In: Workshop on Data Mining Applications in Sustainability (SustKDD) (2011)
Lazaridis, I., Mehrotra, S.: Capturing sensor-generated time series with quality guarantees. In: International Conference on Data Engineering (ICDE), pp. 429–440 (2003)
Le Borgne, Y.A., Santini, S., Bontempi, G.: Adaptive model selection for time series prediction in wireless sensor networks. Sig. Process. 87, 3010–3020 (2007)
Lin, J., Keogh, E., Wei, L., Lonardi, S.: Experiencing SAX: a novel symbolic representation of time series. Data Min. Knowl. Disc. 15(2), 107–144 (2007)
Makridakis, S.G., Wheelwright, S.C., Hyndman, R.J.: Forecasting: Methods and Applications, 3rd edn. Wiley, New York (1998)
Mattern, F., Staake, T., Weiss, M.: ICT for green: how computers can help us to conserve energy. In: International Conference on Energy-Efficient Computing and Networking (E-Energy), pp. 1–10 (2010)
SWKiel Netz GmbH: VDEW-Lastprofile (2006). http://www.stadtwerke-kiel.de/index.php?id=swkielnetzgmbh_stromnetz_mustervertraege_haendler_rahmenvertrag. Accessed 25 April 2013
US Department of Energy: Estimating Appliance and Home Electronic Energy Use (2013). http://energy.gov/energysaver/articles/estimating-appliance-and-home-electronic-energy-use. Accessed 20 Nov 2013
Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)
Nga, D., See, O., Do Nguyet Quang, C., Chee, L.: Visualization techniques in smart grid. Smart Grid Renew. Energy 3(3), 175–185 (2012)
Papaioannou, T.G., Riahi, M., Aberer, K.: Towards online multi-model approximation of time series. In: International Conference on Mobile Data Management (MDM), pp. 33–38 (2011)
Plattner, H., Zeier, A.: In-Memory Data Management—An Inflection Point for Enterprise Applications. Springer (2011)
Ramanathan, R., Engle, R., Granger, C.W., Vahid-Araghi, F., Brace, C.: Short-run forecasts of electricity loads and peaks. Int. J. Forecast. 13(2), 161–174 (1997)
Ratanamahatana, C., Lin, J., Gunopulos, D., Keogh, E., Vlachos, M., Das, G.: Mining time series data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, chap. 56, pp. 1049–1077. Springer, Berlin (2010)
Ringwelski, M., Renner, C., Reinhardt, A., Weigely, A., Turau, V.: The Hitchhiker’s guide to choosing the compression algorithm for your smart meter data. In: International Energy Conference (ENERGYCON), pp. 935–940 (2012)
Salomon, D.: A Concise Introduction to Data Compression. Springer, Berlin (2008)
Seidel, R.: Small-dimensional linear programming and convex hulls made easy. Discret. Comput. Geom. 6(1), 423–434 (1991)
Shahabi, C., Tian, X., Zhao, W.: TSA-tree: a wavelet-based approach to improve the efficiency of multi-level surprise and trend queries on time-serieseries data. In: International Conference on Scientific and Statistical Database Management (SSDBM), pp. 55–68 (2000)
Shieh, J., Keogh, E.: iSAX: indexing and mining terabyte sized time series. In: International Conference on Knowledge Discovery and Data Mining (KDD), pp. 623–631 (2008)
Taylor, J.W.: Triple seasonal methods for short-term electricity demand forecasting. Eur. J. Oper. Res. 204(1), 139–152 (2010)
Tishler, A., Zang, I.: A min-max algorithm for non-linear regression models. Appl. Math. Comput. 13(1/2), 95–115 (1983)
Vogt, H., Weiss, H., Spiess, P., Karduck, A.P.: Market-based prosumer participation in the smart grid. In: International Conference on Digital Ecosystems and Technologies (DEST), pp. 592–597 (2010)
Wijaya, T.K., Eberle, J., Aberer, K.: Symbolic representation of smart meter data. In: Workshop on Energy Data Management (EnDM), pp. 242–248 (2013)
Yi, B.K., Faloutsos, C.: Fast time sequence indexing for arbitrary \(L_p\) norms. In: International Conference on Very Large Data Bases (VLDB), pp. 385–394 (2000)
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23(3), 337–343 (1977)
Acknowledgments
We thank L. Neumann and D. Kurfiss, who have helped us with the database implementation and respective experiments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Work partly done while F. Eichinger and P. Efros were with SAP AG.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Eichinger, F., Efros, P., Karnouskos, S. et al. A time-series compression technique and its application to the smart grid. The VLDB Journal 24, 193–218 (2015). https://doi.org/10.1007/s00778-014-0368-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-014-0368-8