Years and Authors of Summarized Original Work
1996; Alon, Matias, Szegedy
Streaming algorithms aim to summarize a large volume of data into a compact summary, by maintaining a data structure that can be incrementally modified as updates are observed. They allow the approximation of particular quantities. The AMS sketch is focused on approximating the sum of squared entries of a vector defined by a stream of updates. This quantity is naturally related to the Euclidean norm of the vector and so has many applications in high-dimensional geometry and in data mining and machine learning settings that use vector representations of data.
The data structure maintains a linear projection of the stream (modeled as a vector) with a number of randomly chosen vectors. These random vectors are defined implicitly by simple hash functions, and so do not have to be stored explicitly. Varying the size of the sketch changes the accuracy guarantees on the resulting estimation. The fact...
KeywordsStreaming algorithms Second-moment estimation Euclidean norm Sketch
- 1.Alon N, Matias Y, Szegedy M (1996) The space complexity of approximating the frequency moments. In: ACM symposium on theory of computing, Philadelphia, pp 20–29Google Scholar
- 2.Alon N, Gibbons P, Matias Y, Szegedy M (1999) Tracking join and self-join sizes in limited storage. In: ACM principles of database systems, New York, pp 10–20Google Scholar
- 3.Cormode G, Garofalakis M (2005) Sketching streams through the net: distributed approximate query tracking. In: International conference on very large data bases, TrondheimGoogle Scholar
- 4.Thorup M, Zhang Y (2004) Tabulation based 4-universal hashing with applications to second moment estimation. In: ACM-SIAM symposium on discrete algorithms, New OrleansGoogle Scholar