Synonyms
Histogram; Median; Order statistics; Selection
Definition
Quantiles are order statistics of data: the φ-quantile (0 ≤ φ ≤ 1) of a set S is an element x such that φ|S| elements of S are less than or equal to x and the remaining (1 − φ)|S| are greater than x. This entry describes data stream (single-pass) algorithms for computing an approximation of such quantiles.
Historical Background
Since the earliest days of data processing, there has been a need to summarize data. Large volumes of raw, unstructured data easily overwhelm the human ability to comprehend or digest. Tools that help identify the major underlying trends or patterns in data have enormous value. Quantiles characterize distributions of real world data sets in ways that are less sensitive to outliers than simpler alternatives such as the mean and the variance. Consequently, quantiles are of interest to both database implementers and users: for instance, they are a fundamental tool for query optimization, splitting...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Arasu A, Manku GS. Approximate counts and quantiles over sliding windows. In: Proceedings of the 23rd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2004. p. 286–96.
Blum M, Floyd R, Pratt V, Rivest R, Tarjan RE. Time bounds for selection. J Comput Syst Sci. 1973;7(4):448–61.
Cormode G, Muthukrishnan S. An improved data stream summary: the count-min sketch and its applications. J Algorithms. 2005;55(1):58–75.
Cormode G, Korn F, Muthukrishnan S, Srivastava D. Space- and time-efficient deterministic algorithms for biased quantiles over data streams. In: Proceedings of the 25th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2006. p. 263–72.
Cormode G, Korn F, Muthukrishnan S, Johnson T, Spatscheck O, Srivastava D. Holistic UDAFs at streaming speeds. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2004. p. 35–46.
Cormode G, Muthukrishnan S, Zhuang W. What’s different: distributed, continuous monitoring of duplicate-resilient aggregates on data streams. In: Proceedings of the 22nd International Conference on Data Engineering; 2006. p. 57.
Cranor C, Johnson T, Spataschek O, Shkapenyuk V. Gigascope: a stream database for network applications. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2003. p. 647–51.
Flajolet P, Martin GN. Probabilistic counting algorithms for data base applications. J Comput Syst Sci. 1985;31(2):182–209.
Greenwald JM, Khanna S. Power-conserving computation of order-statistics over sensor networks. In: Proceedings of the 23rd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2004. p. 275–85.
Greenwald JM, Khanna S. Space-efficient online computation of quantile summaries. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2001. p. 58–66.
Gupta A, Zane F. Counting inversions in streams. In: Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms; 2003. p. 253–54.
Lin X, Lu H, Xu J, Yu JX. Continuously maintaining quantile summaries of the most recent N elements over a data stream. In: Proceedings of the 20th International Conference on Data Engineering; 2004.p. 362–74.
Manku GS, Rajagopalan S, Lindsay BG. Random sampling techniques for space efficient online computation of order statistics of large datasets. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1999. p. 251–62.
Manku GS, Rajagopalan S, Lindsay BG. Approximate medians and other quantiles in one pass and with limited memory. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1998. p. 426–35.
Munro JI, Paterson MS. Selection and sorting with limited storage. Theor Comput Sci. 1980;12(3):315–23.
Paterson MS. Progrees in selection. In: Proceedings of the Scandinavian Workshop on Algorithm Theory; 1996. p. 368–79.
Pike R, Dorward S, Griesemer R, Quinlan S. Interpreting the data: parallel analysis with sawzall. Sci Program J. 2005;13(4):227–98.
Shrivastava N, Buragohain C, Agrawal D, Suri S. Medians and beyond: new aggregation techniques for sensor networks. In: Proceedings of the 2nd International Conference on Embedded Networked Sensor Systems; 2004. p. 239–49.
Vitter JS. Random sampling with a reservoir. ACM Trans Math Softw. 1985;11(1):37–57.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Buragohain, C., Suri, S. (2018). Quantiles on Streams. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_290
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_290
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering