Advertisement

A new quantile tracking algorithm using a generalized exponentially weighted average of observations

  • Hugo Lewi Hammer
  • Anis Yazidi
  • Håvard Rue
Article

Abstract

The Exponentially Weighted Average (EWA) of observations is known to be a state-of-art estimator for tracking expectations of dynamically varying data stream distributions. However, how to devise an EWA estimator to track quantiles of data stream distributions is not obvious. In this paper, we present a lightweight quantile estimator using a generalized form of the EWA. To the best of our knowledge, this work represents the first reported quantile estimator of this form in the literature. An appealing property of the estimator is that the update step size is adjusted online proportionally to the difference between current observation and the current quantile estimate. Thus, if the estimator is off-track compared to the data stream, large steps will be taken to promptly get the estimator back on-track. The convergence of the estimator to the true quantile is proven using the theory of stochastic learning. Extensive experimental results using both synthetic and real-life data show that our estimator clearly outperforms legacy state-of-the-art quantile tracking estimators and achieves faster adaptivity in dynamic environments. The quantile estimator was further tested on real-life data where the objective is efficient in online control of indoor climate. We show that the estimator can be incorporated into a concept drift detector to efficiently decide when a machine learning model used to predict future indoor temperature should be retrained/updated.

Keywords

Concept drift detection Data stream Generalized exponentially weighted average Quantile tracking 

References

  1. 1.
    Abbasi B, Guillen M (2013) Bootstrap control charts in monitoring value at risk in insurance. Expert Syst Appl 40(15):6125–6135CrossRefGoogle Scholar
  2. 2.
    Arandjelovic O, Pham D-S, Venkatesh S (2015) Two maximum entropy-based algorithms for running quantile estimation in nonstationary data streams. IEEE Trans Circ Syst Video Technol 9:1469–1479CrossRefGoogle Scholar
  3. 3.
    Cao J, Li L, Chen A, Bu T (2010) Tracking quantiles of network data streams with dynamic operations. In: INFOCOM Proceedings IEEE. IEEE, pp 1–5Google Scholar
  4. 4.
    Cao J, Li EL, Chen A, Bu T (2009) Incremental tracking of multiple quantiles for network monitoring in cellular networks. In: Proceedings of the 1st ACM workshop on mobile internet through cellular networks. ACM, pp 7–12Google Scholar
  5. 5.
    Chambers JM, James DA, Lambert D, Wiel SV et al (2006) Monitoring networked applications with incremental quantile estimation. Stat Sci 21(4):463–475MathSciNetCrossRefGoogle Scholar
  6. 6.
    Chen F, Lambert D, Pinheiro JC (2000) Incremental quantile estimation for massive tracking. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 516–522Google Scholar
  7. 7.
    Choi B-Y, Moon S, Cruz R, Zhang Z-L, Diot C (2007) Quantile sampling for practical delay monitoring in internet backbone networks. Comput Netw 51(10):2701–2716CrossRefGoogle Scholar
  8. 8.
    Cormode G, Muthukrishnan S (2005) An improved data stream summary: the count-min sketch and its applications. J Algorithm 55(1):58–75MathSciNetCrossRefGoogle Scholar
  9. 9.
    Espinosa HP, García CAR, Pineda LV (2010) Features selection for primitives estimation on emotional speech. In: 2010 IEEE international conference on acoustics speech and signal processing (ICASSP). IEEE, pp 5138–5141Google Scholar
  10. 10.
    Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1CrossRefGoogle Scholar
  11. 11.
    Gaber MM, Gama J, Krishnaswamy S, Gomes JB, Stahl F (2014) Data stream mining in ubiquitous environments: state-of-the-art and current directions. Wiley Interdiscip Rev: Data Min Knowl Discov 4(2):116–138Google Scholar
  12. 12.
    Gama J (2013) Data stream mining: the bounded rationality. Informatica 37(1)Google Scholar
  13. 13.
    Gama J, Zliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv (CSUR) 46(4):44:1–44:37.  https://doi.org/10.1145/2523813 CrossRefGoogle Scholar
  14. 14.
    Everette S (2006) Gardner. Exponential smoothing: the state of the art, part II. Int J Forecast 22(4):637–666CrossRefGoogle Scholar
  15. 15.
    Gilli M et al (2006) An application of extreme value theory for measuring financial risk. Comput Econ 27 (2-3):207–228CrossRefGoogle Scholar
  16. 16.
    Gregory A, Lau F, Butler L (2018) A quantile-based approach to modelling recovery time in structural health monitoring. arXiv:1803.08444
  17. 17.
    Guha S, McGregor A (2009) Stream order and order statistics: quantile estimation in random-order streams. SIAM J Comput 38(5):2044–2059MathSciNetCrossRefGoogle Scholar
  18. 18.
    Kejariwal A, Kulkarni S, Ramasamy K (2015) Real time analytics: algorithms and systems. Proc VLDB Endowment 8(12):2040–2041CrossRefGoogle Scholar
  19. 19.
    Konda VR, Tsitsiklis JN (2004) Convergence rate of linear two-time-scale stochastic approximation. The Annals of Applied Probability 14(2):796–819MathSciNetCrossRefGoogle Scholar
  20. 20.
    Krempl G, žliobaite I, Brzeziński D, Hüllermeier E, Last M, Lemaire V, Noack T, Shaker A, Sievi S, Spiliopoulou M et al (2014) Open challenges for data stream mining research. ACM SIGKDD Explor Newsl 16(1):1–10CrossRefGoogle Scholar
  21. 21.
    Lall A Data streaming algorithms for the kolmogorov-smirnov test. In: 2015 IEEE international conference on big data (Big Data). IEEE, pp 95–104Google Scholar
  22. 22.
    Liu J, Zheng W, Zheng L, Lin N (2018) Accurate quantile estimation for skewed data streams using nonlinear interpolation. IEEE AccessGoogle Scholar
  23. 23.
    Luo G, Wang L, Yi K, Cormode G (2016) Quantiles over data streams: experimental comparisons, new analyses, and further improvements. The VLDB Journal–The International Journal on Very Large Data Bases 25 (4):449–472CrossRefGoogle Scholar
  24. 24.
    Ma Q, Muthukrishnan S, Sandler M (2013) Frugal streaming for estimating quantiles. In: space-efficient data structures, streams, and algorithms. Springer, pp 77–96Google Scholar
  25. 25.
    Ian Munro J, Paterson MS (1980) Selection and sorting with limited storage. Theor Comput Sci 12 (3):315–323MathSciNetCrossRefGoogle Scholar
  26. 26.
    Frank Norman M (1972) Markov processes and learning models, vol 84. Academic Press, New YorkGoogle Scholar
  27. 27.
    Ramírez-Gallego S, Krawczyk B, García S, Woźniak M, Herrera F (2017) A survey on data preprocessing for data stream mining: current status and future directions, vol 239CrossRefGoogle Scholar
  28. 28.
    Schmeiser BW, Deutsch SJ (1977) Quantile estimation from grouped data: The cell midpoint. Commun Stat Simul Comput 6(3):221–234CrossRefGoogle Scholar
  29. 29.
    Sen R, Maurya A, Raman B, Mehta R, Kalyanaraman R, Singh A (2014) Road-rfsense: a practical rf sensing–based road traffic estimation system for developing regions. ACM Trans Sensor Netw (TOSN) 11(1):4Google Scholar
  30. 30.
    Sommers J, Barford P, Duffield N, Ron A (2007) Accurate and efficient sla compliance monitoring. In: ACM SIGCOMM computer communication review. ACM, vol 37-4, pp 109– 120CrossRefGoogle Scholar
  31. 31.
    Sommers J, Barford P, Duffield N, Ron A (2010) Multiobjective monitoring for sla compliance. IEEE/ACM Trans Netw (TON) 18(2):652–665CrossRefGoogle Scholar
  32. 32.
    Stahl V, Fischer A, Bippus R (2000) Quantile based noise estimation for spectral subtraction and wiener filtering. In: acoustics, speech, and signal processing, 2000. ICASSP’00. Proceedings IEEE International Conference on. IEEE, vol 3, pp 1875–1878Google Scholar
  33. 33.
    Tierney L (1983) A space-efficient recursive procedure for estimating a quantile of an unknown distribution. SIAM J Sci Stat Comput 4(4):706–711MathSciNetCrossRefGoogle Scholar
  34. 34.
    Tiwari N, Pandey PC (2018) A technique with low memory and computational requirements for dynamic tracking of quantiles. Journal of Signal Processing Systems.  https://doi.org/10.1007/s11265-017-1327-6
  35. 35.
    Vogt T, André E (2005) Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. In: 2005 ICME 2005 IEEE international conference on multimedia and expo. IEEE, pp 474–477Google Scholar
  36. 36.
    Wang W, Ching W-K, Wang S, Yu L (2016) Quantiles on stream An application to monte carlo simulation. J Syst Sci Inf 4(4):334–342CrossRefGoogle Scholar
  37. 37.
    Weide B (1978) Space-efficient on-line selection algorithms. In: Computer science and statistics: proceedings of the eleventh annual symposium on the interface, pp 308–311Google Scholar
  38. 38.
    Yazidi A, Hammer HL (2017) Multiplicative Update Methods for Incremental Quantile Estimation. IEEE Transactions on Cybernetics (accepted)Google Scholar
  39. 39.
    Zamora-Martínez F, Romeu P, Botella-Rocamora P, Pardo J (2014) On-line learning of indoor temperature forecasting models towards energy efficiency. Energy Build 83:162–172CrossRefGoogle Scholar
  40. 40.
    Zhang L, Guan Y (2008) Detecting click fraud in pay-per-click streams of online advertising networks. In: 28th international conference on distributed computing systems ICDCS’08Google Scholar
  41. 41.
    Zhang X, Alexander L, Hegerl GC, Jones P, Tank AK, Peterson TC, Trewin B, Zwiers FW (2011) Indices for monitoring changes in extremes based on daily temperature and precipitation data. Wiley Interdiscip Rev Clim Chang 2(6):851–870CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.OsloMet – Oslo Metropolitan UniversityOsloNorway
  2. 2.King Abdullah University of Science and TechnologyThuwalSaudi Arabia

Personalised recommendations