Advertisement

FIDS: Monitoring Frequent Items over Distributed Data Streams

  • Robert Fuller
  • Mehmed Kantardzic
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4571)

Abstract

Many applications require the discovery of items which have occur frequently within multiple distributed data streams. Past solutions for this problem either require a high degree of error tolerance or can only provide results periodically. In this paper we introduce a new algorithm designed for continuously tracking frequent items over distributed data streams providing either exact or approximate answers. We tested the efficiency of our method using two real-world data sets. The results indicated significant reduction in communication cost when compared to naïve approaches and an existing efficient algorithm called Top-K Monitoring. Since our method does not rely upon approximations to reduce communication overhead and is explicitly designed for tracking frequent items, our method also shows increased quality in its tracking results.

Keywords

Data Stream Communication Cost Frequency Count Adjustment Factor Frequent Item 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Arasu, A., Manku, G.: Approximate Counts and Quantiles over Sliding Windows. In: PDDS. Proc. of the 23rd ACM Symposium on Principles of Database System, pp. 286–296. ACM Press, New York (2004)Google Scholar
  2. 2.
    Arlitt, M., Jin, T.: 1998 World Cup Web Site Access Logs (1998), http://www.acm.org/sigcomm/ITA/
  3. 3.
    Babcock, B., Olston, C.: Distributed Top-k Monitoring. In: Proc. of ACM SIGMOD Intl. Conf. on Management of Data, pp. 28–39. ACM Press, New York (2003)Google Scholar
  4. 4.
    Cormode, G., Garofalakis, M.: Sketching Streams Through the Net: Distributed Approximate Query Tracking. In: Proc. of 31st Intl. Conf. on Very Large Data Bases, pp. 13–24 (2005)Google Scholar
  5. 5.
    Cormode, G., Garofalakis, M.: Efficient Strategies for Continuous Distributed Tracking Tasks. IEE Data Engineering Bulletin 28, 33–39 (2005)Google Scholar
  6. 6.
    Cormode, G., Muthukrishnan, S.: Whats Hot and Whats Not: Tracking Most Frequent Items Dynamically. In: PODS. Proc. of the 22nd ACM Symposium on Principles of Database Systems, pp. 296–306. ACM Press, New York (2003)Google Scholar
  7. 7.
    Demaine, E., Lopez-Ortiz, A., Munro, J.: Frequency estimation of internet packet streams with limited space. In: Proc. of the 10th Annual European Symposium on Algorithms, pp. 348–360 (2002)Google Scholar
  8. 8.
    Golab, L., DeHann, D., Demaine, E., Lopez-Ortiz, A., Munro, J.: Identifying Frequent Items in Sliding Windows over On-Line Packet Streams. In: IMC. Proc. of ACM Internet Measurements Conference, pp. 173–178. ACM Press, New York (2003)CrossRefGoogle Scholar
  9. 9.
    Kim, H., Karp, B.: Autograph: Toward Automated Distributed Worm Signature Detection. In: Proc. of the 13th USENIX Security Symposium, pp. 271–286 (2004)Google Scholar
  10. 10.
    Lee, L.K., Ting, H.F.: A Simpler More Efficient Deterministic Scheme for Finding Frequent Items over Sliding Windows. In: PODS. Proc. of the 25th ACM Symposium on Principles of Database Systems, pp. 290–297. ACM Press, New York (2006)Google Scholar
  11. 11.
    Manjhi, A., Shkapenyuk, V., Dhamdhere, K., Olston, C.: Finding (Recently) Frequent Items in Distributed Data Streams. In: ICDE. Proc. of Intl. Conf. on Data Engineering, pp. 767–778 (2005)Google Scholar
  12. 12.
    Manku, G., Motwani, R.: Approximate Frequency Counts over Data Streams. In: Proceedings of 28th Intl. Conf. on Very Large Data Bases, pp. 364–357 (2002)Google Scholar
  13. 13.
    Metwally, A., Agrawal, D., Abbadi, A.: Computation of Frequent and Top-k Elements in Data Streams. In: Proceedings of the 10th ICDT. Intl. Conf. on Database Theory, pp. 398–412 (2005)Google Scholar
  14. 14.
    Paxson, V., Floyd, S.: Wide-Area Traffic: The Failure of Poisson Modeling. IEEE/ACM Trasactions on Networking 226–244 (1995)Google Scholar
  15. 15.
    van Rijsbergen, C.J.: Information Retrieval. Butterworths, London (1979)Google Scholar
  16. 16.
    Stanojevic, R.: Scalable Heavy-Hitter Identification http://www.hamilton.ie/person/rade/ScalableHH.pdf
  17. 17.
    Zhu, Y., Shasha, D.: StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time. In: Proc. of the 28th Intl. Conf. on Very Large Databases, pp. 358–369 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Robert Fuller
    • 1
  • Mehmed Kantardzic
    • 1
  1. 1.Computer Engineering and Computer Science Department, University of Louisville, Louisville, KY 40292 

Personalised recommendations