Abstract
A technique for dynamic tracking of quantiles of data streams, without storage and sorting of past data samples, is presented. It updates the quantile estimate recursively by applying an increment, selected as a fraction of the range, such that the estimated quantile approaches the sample quantile. The range is dynamically estimated using first-order recursive relations for peak and valley detection. The technique does not require initial estimates and the computation steps involved are the same for all the samples. It has low memory and computational requirements and is suitable for signal processing and other applications involving online tracking of single or multiple quantiles of data streams. It has been tested using synthetic and real data with different distributions.
Similar content being viewed by others
References
Chambers, J. M., James, D. A., Lambert, D., & Wiel, S. V. (2006). Monitoring networked applications with incremental quantile estimation. Statistical Science, 21(4), 463–475.
Alsabti, K., Ranka, S., & Singh, V. (1997). A one-pass algorithm for accurately estimating quantiles for disk-resident data. In Proceedings of 23rd International Conference on Very Large Data Bases–VLDB 1997, Athens, Greece (pp. 346–355).
Agrawal, R., & Swami, A. (1995). A one-pass space-efficient algorithm for finding quantiles. In Proceedings of 7th International Conference on Management of Data–COMAD 1995, Pune, India.
Manku, G. S., Rajagopalan, S., & Lindsay, B. G. (1998). Approximate medians and other quantiles in one pass and with limited memory. In Proceedings of 1998 ACM SIGMOD International Conference on Management of Data–SIGMOD 1998, Seattle, WA (pp. 426–435). https://doi.org/10.1145/276305.276342.
Stahl, V., Fisher, A., & Bippus, R. (2000). Quantile based noise estimation for spectral subtraction and Wiener filtering. In Proceedings of 2000 I.E. International Conference on Acoustics, Speech, and Signal Processing–ICASSP 2000 (pp. 1875–1878), Istanbul, Turkey. https://doi.org/10.1109/ICASSP.2000.862122.
Evans, N. W. D., & Mason, J. S. (2002). Time-frequency quantile-based noise estimation. In Proceedings of 11th European Signal Processing Conference–EUSIPCO 2002, Toulouse, France.
Ris, C., & Dupont, S. (2001). Assessing local noise level estimation methods: Application to noise robust ASR. Speech Communication, 34(1–2), 141–158.
Munro, J. I., & Paterson, M. S. (1980). Selection and sorting with limited storage. Theoretical Computer Science, 12(3), 315–323.
Guha, S., & McGregor, A. (2009). Stream order and order statistics: Quantile estimation in random-order streams. SIAM Journal on Scientific and Statistical Computing, 38(5), 2044–2059.
Guha, S., & McGregor, A. (2007). Lower bounds for quantile estimation in random-order and multi-pass streaming. In Proceedings of 34th International Colloquium on Automata, Languages, and Programming–ICALP 2007, Wroclaw, Poland (pp. 704–715). https://doi.org/10.1007/978-3-540-73420-8_61.
Guha, S., Koudas, N., & Shim, K. (2001). Data-streams and histograms. In Proceedings of 33rd Annual ACM symposium on Theory of computing – STOC 2001, Hersonissos, Crete, Greece (pp. 471–475). https://doi.org/10.1145/380752.380841.
Manku, G.S., Rajagopalan, S., & Lindsay, B.G. (1999). Random sampling techniques for space efficient online computation of order statistics of large datasets. In Proceedings of 1999 ACM SIGMOD International Conference on Management of Data–SIGMOD 1999, Philadelphia, PA (pp. 251–262). https://doi.org/10.1145/304182.304204.
Chen, E. J., & Kelton, W. D. (2008). Estimating steady-state distribution via simulation-generated histograms. Computers and Operations Research, 35(4), 1003–1016.
Chen, E. J., & Kelton, W. D. (2014). Density estimation from correlated data. Journal of Simulation, 8(4), 281–292.
Greenwald, M., & Khanna, S. (2001). Space-efficient online computation of quantile summaries. In Proceedings of 2001 ACM SIGMOD International Conference on Management of Data–SIGMOD 2001, Santa Barbara, CA (pp. 58–66). https://doi.org/10.1145/376284.375670.
Arasu, A., & Manku, G.S. (2004). Approximate counts and quantiles over sliding windows. In Proceedings of 23rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems–PODS 2004, Paris, France (pp. 286–296). https://doi.org/10.1145/1055558.1055598.
Cormode, G., & Muthukrishnan, S. (2005). An improved data stream summary: The count-min sketch and its applications. Journal of Algorithms, 55(1), 58–75.
Cormode, G., Korn, F., Muthukrishnan, S., & Srivastava, D. (2005). Effective Computation of Biased Quantiles over Data Streams. In Proceedings of 21st International Conference on Data Engineering–ICDE 2005, Tokyo, Japan (pp. 20–31). https://doi.org/10.1109/ICDE.2005.55.
Wang, L., Leo, G., Yi, K., & Cormode, G. (2013). Quantiles over data streams: An experimental study. In Proceedings of 2013 ACM SIGMOD International Conference on Management of Data–SIGMOD 2013, New York (pp. 737–748). https://doi.org/10.1145/2463676.2465312.
Jain, R., & Chlamtac, I. (1985). The P2 algorithm for dynamic calculation of quantiles and histograms without storing observations. Communications of the ACM Magazine, 28(10), 1076–1085.
Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22(3), 400–407.
Tierney, L. (1983). A space-efficient recursive procedure for estimating a quantile of an unknown distribution. SIAM Journal on Scientific and Statistical Computing, 4(4), 706–711.
Möller, E., Grieszbach, G., Shack, B., & Witte, H. (2000). Statistical properties and control algorithms of recursive quantile estimators. Biometrical Journal, 42(6), 729–746.
Amiri, A., & Thiam, B. (2014). A smoothing stochastic algorithm for quantile estimation. Statistic and Probability Letters, 93, 116–125.
Chen, F., Lambert, D., Pinheiro, J.C. (2000). Incremental quantile estimation for massive tracking. In Proceedings of 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining–KDD 2000, Boston, MA (pp. 516–522).
Cao, J., Li, L.E., Chen, A., Bu, T. (2009). Incremental tracking of multiple quantiles for network monitoring in cellular networks. In Proceedings of 1st ACM Workshop on Mobile Internet through Cellular Networks–MINCET'09, Beijing, China (pp. 7–12). doi:https://doi.org/10.1145/1614255.1614258.
Hirsch, H., & Pearce, D. (2000). The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In Proceedings of ISCA Tutorial and Research Workshop on Automatic Speech Recognition: Challenges for the New Millenium 2000- ASR 2000, Paris, France (pp. 181–188).
Tiwari, N., & Pandey, P. C. (2015). Speech enhancement using noise estimation based on dynamic quantile tracking for hearing impaired listeners. In Proceedings of 21st National Conference on Communications–NCC 2015, Mumbai, India. doi:https://doi.org/10.1109/NCC.2015.7084849.
Acknowledgments
The research is supported by “National Programme on Perception Engineering,” sponsored by the Department of Electronics & Information Technology, Government of India.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Ripple in the estimated range: Let us consider the data to be stationary for sequence length greater than L samples, with the successive peaks (and the successive valleys) separated by at the most L samples. Let the peak and valley values be P and V, respectively. Further, let the peak detector output be \( {\overset{\frown }{P}}_1 \) at the input peak and be \( {\overset{\frown }{P}}_2 \) just before it. These values can be obtained using the recursive relation in Eq. 22, as \( {\overset{\frown }{P}}_1=\upalpha {\overset{\frown }{P}}_2+\left(1-\upalpha \right)P \) and \( {\overset{\frown }{P}}_2\approx {\upbeta}^L{\overset{\frown }{P}}_1+\left(1-{\upbeta}^L\right)V \). The peak-to-peak ripple in the peak estimation is given as
With the valley detector output as \( {\overset{\frown }{V}}_1 \) at the input valley and as \( {\overset{\frown }{V}}_2 \) just before it, the peak-to-peak ripple in the valley estimation \( {\overset{\frown }{V}}_1-{\overset{\frown }{V}}_2 \) can be shown to be the same. Therefore, the peak-to-peak ripple in the range estimation is given as
With the range R = P − V, the peak-to-peak ripple as a fraction of R is given as r = 2 (1 − α) (1 − βL)/(1 − αβL).
Rights and permissions
About this article
Cite this article
Tiwari, N., Pandey, P.C. A Technique with Low Memory and Computational Requirements for Dynamic Tracking of Quantiles. J Sign Process Syst 91, 411–422 (2019). https://doi.org/10.1007/s11265-017-1327-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-017-1327-6