Hierarchical Heavy Hitter Mining on Streams
Synonyms
HHH
Definition
Given a multiset Sof N elements from a hierarchical domain D and a count thres hold φ ∈ (0,1), Hierarchical Heavy Hitters (HHH) summarize the distribution of S projected along the hierarchy of D as a set of prefixes P ⊆ D, and are defined inductively as the nodes in the hierarchy such that their “HHH count” exceeds ϕ N, where the HHH count is the sum of all descendant nodes having no HHH ancestors. The approximate HHH problem over a data stream of elements e is defined with an additional error parameter ε ∈ (0,φ), where a set of prefixes P ⊆ D and estimates of their associated frequencies, with accuracy bounds on the frequency of each p ∈ P, fmin and fmax , is output with fmin (p) ≤ f∗(p) ≤ fmax (p) such that f∗(p) is the true frequency of p in S (i.e., f∗(p) = ∑epf(e)) and fmax (p) − fmin (p) ≤ εN. Additionally, there is a coverage guarantee that, for all prefixes q ∉ P, φ N > ∑ f(e): (eq) ∧ (eP), with denoting prefix containment and (eP) denoting (∃p ∈ P: ep)....
Recommended Reading
- 1.Cheung-Mon-Chan P, Clerot F. Finding hierarchical heavy hitters with the count min sketch. In: Proceedings of the International Workshop on Internet Rent, Simulation, Monitoring, Measurement; 2006.Google Scholar
- 2.Cormode G, Korn F, Muthukrishnan S, Srivastava D. Finding hierarchical heavy hitters in data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases; 2003. p. 464–75.CrossRefGoogle Scholar
- 3.Cormode G, Korn F, Muthukrishnan S, Srivastava D. Diamond in the rough: finding hierarchical heavy hitters in multi-dimensional data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2004. p. 155–66.vGoogle Scholar
- 4.Cormode G, Korn F, Muthukrishnan S, Johnson T, Spatscheck O, Srivastava D. Holistic UDAFs at streaming speeds. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2004. p. 35–46.Google Scholar
- 5.Cormode G, Korn F, Muthukrishnan S, Srivastava D Finding hierarchical heavy hitters in streaming data. ACM Trans Knowl Discov Data. 2008;1(4): 1–48.CrossRefGoogle Scholar
- 6.Demaine E, López-Ortiz A, and Munro JI. Frequency estimation of internet packet streams with limited space. In: Proceedings of the 10th Annual European Symposium on Algorithms; 2002. p. 348–60.CrossRefGoogle Scholar
- 7.Estan C, Savage S, Varghese G. Automatically inferring patterns of resource consumption in network traffic. In: Proceedings of the ACM International Conference on Data Communication; 2003. p. 137–48.Google Scholar
- 8.Estan C, Magin G. Interactive traffic analysis and visualization with Wisconsin netpy. In: Proceedings of the International Conference on Large Installation System Administration; 2005. p. 177–84.Google Scholar
- 9.Hershberger J, Shrivastava N, Suri S, Toth C. Space complexity of hierarchical heavy hitters in multi-dimensional data streams. In: Proceedings of the ACM SIGACT-SIGMOD Symposium on Principles of Database Systems; 2005. p. 338–347.Google Scholar
- 10.Manku GS, Motwani R. Approximate frequency counts over data streams. In: Proceedings of the 28th International Conference on Very Large Data Bases; 2002. p. 346–57.CrossRefGoogle Scholar
- 11.Misra J, Gries D. Finding repeated elements. Sci Comput Program. 1982;2(2):143–52.MathSciNetzbMATHCrossRefGoogle Scholar
- 12.Sekar V, Duffield N, Spatscheck O, van der Merwe J, Zhang H. LADS: large-scale automated DDoS detection system. In: Proceedings of the USENIX 2006 Annual Technical Conference, General Track; 2006. p. 171–84.Google Scholar
- 13.Zhang Y, Singh S, Sen S, Duffield N, Lund C. Online identification of hieararchical heavy hitters: algorithms, evaluation and applications. In: Proceedings of the 4th ACM SIGCOMM Conference on Internet Measurement; 2004. p. 135–48.Google Scholar