Aggregation-Aware Compression of Probabilistic Streaming Time Series

  • Reza Akbarinia
  • Florent Masseglia
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9166)


In recent years, there has been a growing interest for probabilistic data management. We focus on probabilistic time series where a main characteristic is the high volumes of data, calling for efficient compression techniques. To date, most work on probabilistic data reduction has provided synopses that minimize the error of representation w.r.t. the original data. However, in most cases, the compressed data will be meaningless for usual queries involving aggregation operators such as SUM or AVG. We propose PHA (Probabilistic Histogram Aggregation), a compression technique whose objective is to minimize the error of such queries over compressed probabilistic data. We incorporate the aggregation operator given by the end-user directly in the compression technique, and obtain much lower error in the long term. We also adopt a global error aware strategy in order to manage large sets of probabilistic time series, where the available memory is carefully balanced between the series, according to their individual variability.


Compression Ratio Synthetic Dataset Probabilistic Data Compression Technique Aggregation Operator 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Akbarinia, R., Masseglia, F.: Fast and exact mining of probabilistic data streams. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013, Part I. LNCS, vol. 8188, pp. 493–508. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  2. 2.
    Akbarinia, R., Valduriez, P., Verger, G.: Efficient evaluation of sum queries over probabilistic data. IEEE Trans. Knowl. Data Eng. 25(4), 764–775 (2013)CrossRefGoogle Scholar
  3. 3.
    Bernecker, T., Kriegel, H.P., Renz, M., Verhein, F., Zuefle, A.: Probabilistic frequent itemset mining in uncertain databases. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2009, pp. 119–128. ACM (2009)Google Scholar
  4. 4.
    Burdick, D., Deshpande, P.M., Jayram, T.S., Ramakrishnan, R., Vaithyanathan, S.: OLAP over uncertain and imprecise data. VLDB J. 16(1), 123–144 (2007)CrossRefGoogle Scholar
  5. 5.
    Chen, Y., Dong, G., Han, J., Wah, B.W., Wang, J.: Multi-dimensional regression analysis of time-series data streams. In: Proceedings of the 28th International Conference on Very Large Data Bases, VLDB 2002, pp. 323–334. VLDB Endowment (2002)Google Scholar
  6. 6.
    Cormode, G., Garofalakis, M.: Sketching probabilistic data streams. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD 2007, pp. 281–292 (2007)Google Scholar
  7. 7.
    Cormode, G., Garofalakis, M.: Histograms and wavelets on probabilistic data. IEEE Trans. Knowl. Data Eng. 22(8), 1142–1157 (2010)CrossRefGoogle Scholar
  8. 8.
    Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. VLDB J. 16(4), 523–544 (2007)CrossRefGoogle Scholar
  9. 9.
    Hey, A.J.G., Tansley, S., Tolle, K.M. (eds.): The fourth paradigm: data-intensive scientific discovery, Microsoft Research, Redmond, Washington (2009)Google Scholar
  10. 10.
    Jayram, T.S., McGregor, A., Muthukrishnan, S., Vee, E.: Estimating statistical aggregates on probabilistic data streams. ACM Trans. Database Syst. 33(4), 26:1–26:30 (2008)CrossRefGoogle Scholar
  11. 11.
    Kanagal, B., Deshpande, A.: Efficient query evaluation over temporally correlated probabilistic streams. In: Proceedings of the 2009 IEEE International Conference on Data Engineering, ICDE 2009, pp. 1315–1318 (2009)Google Scholar
  12. 12.
    Rempala, G., Wesolowski, J.: Asymptotics for products of sums and u-statistics. Electron. Commun. Probab. 7(5), 47–54 (2002)MathSciNetGoogle Scholar
  13. 13.
    Ross, R., Subrahmanian, V.S., Grant, J.: Aggregate operators in probabilistic databases. J. ACM 52(1), 54–101 (2005)zbMATHMathSciNetCrossRefGoogle Scholar
  14. 14.
    Sathe, S., Jeung, H., Aberer, K.: Creating probabilistic databases from imprecise time-series data. In: Proceedings of the 2011 IEEE 27th International Conference on Data Engineering. ICDE 2011, pp. 327–338 (2011)Google Scholar
  15. 15.
    Zhao, Y., Aggarwal, C., Yu, P.: On wavelet decomposition of uncertain time series data sets. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM 2010, pp. 129–138 (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Inria & LIRMM, Zenith Team - Université. MontpellierMontpellier cedex 5France

Personalised recommendations