Streaming Algorithms for Data in Motion

  • M. Hoffmann
  • S. Muthukrishnan
  • Rajeev Raman
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4614)

Abstract

We propose two new data stream models: the reset model and the delta model, motivated by applications to databases, and to tracking the location of spatial points.

We present algorithms for several problems that fit within the stream constraint of polylogarithmic space and time. These include tracking the “extent” of the points and Lp sampling.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    DIMACS Workshop on Managing and Processing Data Streams, FCRC (2003), http://www.research.att.com/conf/mpds2003/
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
    DIMACS Working Group on Streaming Data Analysis, http://dimacs.rutgers.edu/Workshops/StreamingII/
  7. 7.
    Abounaga, A., Chaudhuri, S.: Self-tuning histograms: Building histograms without looking at the data. In: Proc. SIGMOD (1999)Google Scholar
  8. 8.
    Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. In: Proc. ACM STOC, pp. 20–29 (1996)Google Scholar
  9. 9.
    Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. ACM PODS, pp. 1–16 (2002)Google Scholar
  10. 10.
    Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D., Trevisan, L.: Counting distinct elements in a data stream. In: Rolim, J.D.P., Vadhan, S.P. (eds.) RANDOM 2002. LNCS, vol. 2483, pp. 1–10. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  11. 11.
    Charikar, M., O’Callaghan, L., Panigrahy, R.: Better streaming algorithms for clustering problems. ACM STOC (2003)Google Scholar
  12. 12.
    Cormode, G., Datar, M., Indyk, P., Muthukrishnan, S.: Comparing data streams using Hamming norms (How to zero in). IEEE Trans. Knowledge and Data Engineering 15, 529–541 (2003)CrossRefGoogle Scholar
  13. 13.
    Cormode, G., Muthukrishnan, S.: Radial Histograms. DIMACS TR 2003-11.Google Scholar
  14. 14.
    Cormode, G., Muthukrishnan, S.: Estimating dominance norms of multiple data streams. In: Di Battista, G., Zwick, U. (eds.) ESA 2003. LNCS, vol. 2832, pp. 148–160. Springer, Heidelberg (2003)Google Scholar
  15. 15.
    Datar, M., Muthukrishnan, S.: Estimating Rarity and Similarity over Data Stream Windows. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 323–334. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  16. 16.
    Estan, C., Savage, S., Varghese, G.: Automatically inferring patterns of resource consumption in network traffic. SIGCOMM (2003)Google Scholar
  17. 17.
    Feigenbaum, J., Kannan, S., Ziang, J.: Computing diameter in the streaming and sliding window models. Manuscript (2002)Google Scholar
  18. 18.
    Flajolet, P., Martin, G.: Probabilistic counting algorithms for database applications. JCSS 31, 182–209 (1985)MATHMathSciNetGoogle Scholar
  19. 19.
    Gibbons, P., Matias, Y.: Synopsis data structures. In: Proc. SODA, pp. 909–910 (1999)Google Scholar
  20. 20.
    Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Surfing wavelets on streams: One pass summaries for approximate aggregate queris. VLDB Journal, 79–88 (2001)Google Scholar
  21. 21.
    Gilbert, A.C., Guha, S., Indyk, P., Indyk, P., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: Fast, small-space algorithms for approximate histogram maintenance. In: Proceedings 34th ACM STOC, pp. 389–398 (2002)Google Scholar
  22. 22.
    Greenwald, M., Khanna, S.: Space-efficient online computation of quantile summaries. In: Proc. ACM SIGMOD (2001)Google Scholar
  23. 23.
    Guha, S., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams. IEEE FOCS, pp. 359–366 (2000)Google Scholar
  24. 24.
    Har-Peled, S., Mazumdar, S.: On Coresets for k-Means and k-Median Clustering. In: Proc. 36th ACM STOC, pp. 291–300 (2004)Google Scholar
  25. 25.
    Henzinger, M., Raghavan, P., Rajagopalan, S.: Computing on data stream. Technical Note 1998-011. Digital systems research center, Palo Alto (May 1998)Google Scholar
  26. 26.
    Hershberger, J., Suri, S.: Convex hulls and related problems on data streams. In: Proc. MPDS (2003)Google Scholar
  27. 27.
    Indyk, P.: Algorithms for dynamic geometric problems over data streams. In: Proc. Annual ACM Symposium on Theory of Computing (STOC), pp. 373–380 (2004)Google Scholar
  28. 28.
    Indyk, P.: Stable distributions, pseudorandom generators, embeddings and data stream computation. IEEE FOCS, pp. 189–197 (2000)Google Scholar
  29. 29.
    Indyk, P., Thorup, M.: Unpublished manuscript (2001)Google Scholar
  30. 30.
    Jana, R., Johnson, T., Muthukrishnan, S., Vitaletti, A.: Location based services in a wireless WAN using cellular digital packet data (CDPD). MobiDE 2001: 74–80Google Scholar
  31. 31.
    Korn, F., Muthukrishnan, S., Srivastava, D.: Reverse nearest neighbor aggregates over data streams. In: Proc. VLDB (2002)Google Scholar
  32. 32.
    Krishnamurthy, B., Sen, S., Zhang, Y., Chen, Y.: Sketch-based change detection: methods, evaluation and applications. In: Proc. Internet Measurement Conference (IMC) (2003)Google Scholar
  33. 33.
    Manku, G., Motwani, R.: Approximate frequency counts over data streams. In: Proc. VLDB, pp. 346–357 (2002)Google Scholar
  34. 34.
    Madden, S., Franklin, M.: Fjording the stream: An architecture for queryies over streaming sensor data. In: Proc. ICDE (2002)Google Scholar
  35. 35.
    Muthukrishnan, S.: Data Streams: Algorithms and Applications. The Foundations and Trends in Theoretical Computer Science series, Now Publishers (2005)Google Scholar
  36. 36.
    Bates, J.: Talk at NAS meeting on Statistics and Massive Data, http://www7.nationalacademies.org/bms/Massive_Data_Workshop.html
  37. 37.
    Querying and mining data streams: you only get one look. Tutorial at SIGMOD, VLDB 2002 etc. (2002), See http://www.bell-labs.com/user/minos/tutorial.html
  38. 38.
    Varghese, G.: Detecting packet patterns at high speeds. Tutorial at SIGCOMM (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • M. Hoffmann
    • 1
  • S. Muthukrishnan
    • 2
  • Rajeev Raman
    • 1
  1. 1.Department of Computer Science, University of Leicester, Leicester LE1 7RHUK
  2. 2.Division of Computer and Information Sciences, Rutgers University, Piscataway, NJ 08854-8019USA

Personalised recommendations