Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Hierarchical Heavy Hitter Mining on Streams

  • Flip R. Korn
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_190

Synonyms

HHH

Definition

Given a multiset Sof N elements from a hierarchical domain D and a count thres hold φ ∈ (0,1), Hierarchical Heavy Hitters (HHH) summarize the distribution of S projected along the hierarchy of D as a set of prefixes PD, and are defined inductively as the nodes in the hierarchy such that their “HHH count” exceeds ϕ N, where the HHH count is the sum of all descendant nodes having no HHH ancestors. The approximate HHH problem over a data stream of elements e is defined with an additional error parameter ε ∈ (0,φ), where a set of prefixes PD and estimates of their associated frequencies, with accuracy bounds on the frequency of each pP, fmin and fmax , is output with fmin (p) ≤ f(p) ≤ fmax (p) such that f(p) is the true frequency of p in S (i.e., f(p) = ∑epf(e)) and fmax (p) − fmin (p) ≤ εN. Additionally, there is a coverage guarantee that, for all prefixes qP, φ N > ∑ f(e): (eq) ∧ (eP), with denoting prefix containment and (eP) denoting (∃pP: ep)....

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Cheung-Mon-Chan P, Clerot F. Finding hierarchical heavy hitters with the count min sketch. In: Proceedings of the International Workshop on Internet Rent, Simulation, Monitoring, Measurement; 2006.Google Scholar
  2. 2.
    Cormode G, Korn F, Muthukrishnan S, Srivastava D. Finding hierarchical heavy hitters in data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases; 2003. p. 464–75.CrossRefGoogle Scholar
  3. 3.
    Cormode G, Korn F, Muthukrishnan S, Srivastava D. Diamond in the rough: finding hierarchical heavy hitters in multi-dimensional data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2004. p. 155–66.vGoogle Scholar
  4. 4.
    Cormode G, Korn F, Muthukrishnan S, Johnson T, Spatscheck O, Srivastava D. Holistic UDAFs at streaming speeds. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2004. p. 35–46.Google Scholar
  5. 5.
    Cormode G, Korn F, Muthukrishnan S, Srivastava D Finding hierarchical heavy hitters in streaming data. ACM Trans Knowl Discov Data. 2008;1(4): 1–48.CrossRefGoogle Scholar
  6. 6.
    Demaine E, López-Ortiz A, and Munro JI. Frequency estimation of internet packet streams with limited space. In: Proceedings of the 10th Annual European Symposium on Algorithms; 2002. p. 348–60.CrossRefGoogle Scholar
  7. 7.
    Estan C, Savage S, Varghese G. Automatically inferring patterns of resource consumption in network traffic. In: Proceedings of the ACM International Conference on Data Communication; 2003. p. 137–48.Google Scholar
  8. 8.
    Estan C, Magin G. Interactive traffic analysis and visualization with Wisconsin netpy. In: Proceedings of the International Conference on Large Installation System Administration; 2005. p. 177–84.Google Scholar
  9. 9.
    Hershberger J, Shrivastava N, Suri S, Toth C. Space complexity of hierarchical heavy hitters in multi-dimensional data streams. In: Proceedings of the ACM SIGACT-SIGMOD Symposium on Principles of Database Systems; 2005. p. 338–347.Google Scholar
  10. 10.
    Manku GS, Motwani R. Approximate frequency counts over data streams. In: Proceedings of the 28th International Conference on Very Large Data Bases; 2002. p. 346–57.CrossRefGoogle Scholar
  11. 11.
    Misra J, Gries D. Finding repeated elements. Sci Comput Program. 1982;2(2):143–52.MathSciNetzbMATHCrossRefGoogle Scholar
  12. 12.
    Sekar V, Duffield N, Spatscheck O, van der Merwe J, Zhang H. LADS: large-scale automated DDoS detection system. In: Proceedings of the USENIX 2006 Annual Technical Conference, General Track; 2006. p. 171–84.Google Scholar
  13. 13.
    Zhang Y, Singh S, Sen S, Duffield N, Lund C. Online identification of hieararchical heavy hitters: algorithms, evaluation and applications. In: Proceedings of the 4th ACM SIGCOMM Conference on Internet Measurement; 2004. p. 135–48.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.AT&T Labs–ResearchFlorham ParkUSA

Section editors and affiliations

  • Divesh Srivastava
    • 1
  1. 1.AT&T Labs - ResearchAT&TBedminsterUSA