Skip to main content

Streaming Analytics

  • Reference work entry
  • First Online:
  • 55 Accesses

Stream processing applications require the processing and analysis of continuously generated multimodal and distributed data streams. This requires a unique combination of multiple features that distinguishes streaming analytics from traditional data analysis paradigms, which are often batch and offline. These features can be summarized as follows:

  • In-Motion Analysis: Streaming analytics need to process data on-the-fly, as it continues to flow, in order to support real-time, low-latency analysis and to match the computation to the naturally streaming properties of the data. This limits the amount of prior data that can be accessed and necessitates one-pass, online algorithms. Several streaming algorithms are described in [7, 1].

  • Distributed Analysis: Data streams are often distributed, and/or high volume, and their large rates make it infeasible to adopt centralized solutions. Hence, the applications and analytic algorithms themselves need to be distributed.

  • High-Performance Analysis:...

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

  1. Aggarwal C (ed). Data streams: models and algorithms. Boston: Springer; 2007.

    MATH  Google Scholar 

  2. Aggarwal CC, Han J, Wang J, Yu PS. A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases; 2003. p. 81–92.

    Chapter  Google Scholar 

  3. Aggarwal CC, Han J, Wang J, Yu PS. A framework for high dimensional projected clustering of data streams. In: Proceedings of the 30th International Conference on Very Large Data Bases; 2004. p. 852–63.

    Google Scholar 

  4. Aggarwal CC, Han J, Wang J, Yu PS. On demand classification of data streams. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2004. p. 503–8.

    Google Scholar 

  5. Aggarwal CC, Yu PS. A framework for clustering uncertain data streams. In: Proceedings of the 24th International Conference on Data Engineering; 2008. p. 150–59.

    Google Scholar 

  6. Alon N, Matias Y, Szegedy M. The space complexity of approximating the frequency moments. In: Proceedings of the 28th Annual ACM Symposium on Theory of Computing; 1996. p. 20–9.

    Google Scholar 

  7. Andrade H, Gedik B, Turaga D. Fundamentals of stream processing: application design, systems, and analytics. Cambridge: Cambridge University Press; 2013.

    Google Scholar 

  8. Arasu A, Manku G. Approximate counts and quantiles over sliding windows. In: Proceedings of the 23rd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2004. p. 286–96.

    Google Scholar 

  9. Ardilly P, Tillé Y. Sampling methods. Springer; 2006.

    Google Scholar 

  10. Babcock B, Datar M, Motwani R, O’Callaghan L. Maintaining variance and k-medians over data stream windows. In: Proceedings of the 22nd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2003. p. 234–43.

    Google Scholar 

  11. Bagnall AJ, (Ann) Ratanamahatana C, Keogh EJ, Lonardi S, Janacek GJ. A bit level representation for time series data mining with shape based similarity. Springer Data Min Knowl Disc. 2006;13(1):11–40.

    Article  MathSciNet  MATH  Google Scholar 

  12. Boufounos P. Universal rate-efficient scalar quantization. IEEE Trans Inf Theory. 2012;58(3):1861–72.

    Article  MathSciNet  MATH  Google Scholar 

  13. Chandola V, Banerjee A, Kumar V. Anomaly detection: a survey. ACM Comput Surv. 2009;41(3).

    Article  Google Scholar 

  14. Chang JH, Lee WS. Finding recent frequent itemsets adaptively over online data streams. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2003. p. 487–92.

    Google Scholar 

  15. Cheng J, Ke Y, Ng W. A survey on algorithms for mining frequent itemsets over data streams. Knowl Inf Syst. 2008;16(1):1–27.

    Article  Google Scholar 

  16. Chi Y, Wang H, Yu PS, Muntz RR. Moment: maintaining closed frequent itemsets over a stream sliding window. In: Proceedings of the 4th IEEE International Conference on Data Mining; 2004. p. 59–66.

    Google Scholar 

  17. Cormode G, Muthukrishnan S. An improved data stream summary: the count-min sketch and its applications. J Algorithms. 2005;55(1):58–75.

    Article  MathSciNet  MATH  Google Scholar 

  18. Cormode G, Garofalakis M, Haas P, Jermaine C. Synopses for massive data: samples, histograms, wavelets, sketches. Foundations and trends in databases series. Boston: Now Publishing; 2011.

    MATH  Google Scholar 

  19. Datar M, Gionis A, Indyk P, Motwani R. Maintaining stream statistics over sliding windows. SIAM J Comput. 2002;31(6):1794–813.

    Article  MathSciNet  MATH  Google Scholar 

  20. Delp E, Saenz M, Salama P. Block truncation coding. In: Al Bovik, editor. The handbook of image and video processing. Amsterdam/Boston: Academic Press; 2005. p. 661–72.

    Google Scholar 

  21. Domingos P, Hulten G. Mining high-speed data streams. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2000. p. 71–80.

    Google Scholar 

  22. Duchi J, Hazan E, Singer Y. An improved data stream summary: the Count-Min sketch and its applications. J Mach Learn Res. 2010;12:2121–59.

    MATH  Google Scholar 

  23. Fan W, Stolfo SJ, Zhang J. The application of AdaBoost for distributed, scalable and on-line learning. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 1999. p. 362–66.

    Google Scholar 

  24. Fang J, Li H. Optimal/near-optimal dimensionality reduction for distributed estimation in homogeneous and certain inhomogeneous scenarios. IEEE Trans Signal Process (TSP). 2010;58(8):4339–53.

    Article  MathSciNet  MATH  Google Scholar 

  25. Fox J, editor. Applied regression analysis, linear models, and related methods. Thousands Oaks: SAGE Publications; 1997.

    Google Scholar 

  26. Ganguly S, Majumder A. CR-precis: a deterministic summary structure for update data streams. In: Proceedings of the International Symposium on Combinatorics; 2007. p. 48–59.

    MATH  Google Scholar 

  27. Gardner WA. Learning characteristics of stochastic-gradient-descent algorithms: a general study, analysis, and critique. Signal Process. 1984;6(2): 113–33.

    Article  MathSciNet  Google Scholar 

  28. Gersho A, Gray RM. Vector quantization and signal compression. Boston: Kluwer Academic Publishers; 1991.

    MATH  Google Scholar 

  29. Giannella C, Han J, Pei J, Yan X, Yu P. Mining frequent patterns in data streams at multiple time granularities. In: Kargupta H, Joshi A, Sivakumar K, Yesha Y, editors. Data mining: next generation challenges and future directions. MIT Press; 2002. p. 105–24.

    Google Scholar 

  30. Gilbert A, Guha S, Indyk P, Kotidis Y, Muthukrishnan S, Strauss M. Fast, small-space algorithms for approximate histogram maintenance. In: Proceedings of the 34th Annual ACM Symposium on Theory of Computing; 2002. p. 389–98.

    Google Scholar 

  31. Gilbert A, Kotidis Y, Muthukrishnan S, Strauss M. Surfing wavelets on streams: one-pass summaries for approximate aggregate queries. In: Proceedings of the 27th International Conference on Very Large Data Bases; 2001. p. 79–88.

    Google Scholar 

  32. Goethals B. Survey on frequent pattern mining. Technical report, Helsinki institute for information technology basic research unit., 2003.

    Google Scholar 

  33. Guha S, Mishra N, Motwani R, OĆallaghan L. Clustering data streams. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science; 2000. p. 359–66.

    Google Scholar 

  34. Haipeng Z, Kulkarni SR, Poor HV. Attribute-distributed learning: models, limits, and algorithms. 2011;59(1):386–98.

    Google Scholar 

  35. Hansen LK, Salamon P. Neural network ensembles. 1990;12(10):993–1001.

    Google Scholar 

  36. Hulten G, Spencer L, Domingos P. Mining time changing data streams. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2001. p. 97–106.

    Google Scholar 

  37. Jin R, Agrawal G. An algorithm for in-core frequent itemset mining on streaming data. In: Proceedings of the 5th IEEE International Conference on Data Mining; 2005. p. 201–17.

    Google Scholar 

  38. Kamilov U, Goyal VK, Rangan S. Optimal quantization for compressive sensing under message passing reconstruction. In: Proceedings of the IEEE International Symposium on Information Theory; 2011. p. 459–63.

    Google Scholar 

  39. Karampatziakis N, Langford J. Online importance weight aware updates. In: Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence; 2011. p. 392–99.

    Google Scholar 

  40. Kira K, Rendell L. A practical approach to feature selection. In: Proceedings of the 9th International Conference on Machine Learning; 1992. p. 249–56.

    Chapter  Google Scholar 

  41. Lin J, Vlachos M, Keogh E, Gunopulos D. Iterative incremental clustering of data streams. In: Advances in Database Technology, Proceedings of the 9th International Conference on Extending Database Technology; 2004. p. 106–22.

    Google Scholar 

  42. Lughofer E. Extensions of vector quantization for incremental clustering. Pattern Recogn. 2008;41(3):995–1011.

    Article  MATH  Google Scholar 

  43. Mallat S. A wavelet tour of signal processing, the sparse way. Amsterdam: Academic Press; 2009.

    MATH  Google Scholar 

  44. Manku GS, Motwani R. Approximate frequency counts over data streams. In: Proceedings of the 28th International Conference on Very Large Data Bases; 2002. p. 346–57.

    Chapter  Google Scholar 

  45. Masud MM, Gao J, Khan L, Han J, Thuraisingham B. Integrating novel class detection with classification for concept-drifting data streams. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases; 2009. p. 79–94.

    Google Scholar 

  46. Mateos G, Bazerque JA, Giannakis GB. Distributed sparse linear regression. 2010;58(10):5262–76.

    Google Scholar 

  47. Matias Y, Gibbons P, Poosala V. Fast incremental maintenance of approximate histograms. In: Proceedings of the 23th International Conference on Very Large Data Bases; 1997. p. 466–75.

    Google Scholar 

  48. McMahan B, Streeter M. Adaptive bound optimization for online convex optimization. In: Proceedings of the International Conference on Learning Theory; 2010. p. 244–56.

    Google Scholar 

  49. Monemizadeh M, Woodruff DP. 1-pass relative-error lp-sampling with applications. In: Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms; 2010. p. 1143–60.

    Chapter  Google Scholar 

  50. Motwani R, Chaudhuri S, Narasayya V. Random sampling for histogram construction. How much is enough? In: Proceedings of the ACM SIGMOD Workshop on the Web and Databases; 1998. p. 436–47.

    Google Scholar 

  51. Papadimitriou S, Sun J, Faloutsos C. Streaming pattern discovery in multiple time-series. In: Proceedings of the 31st International Conference on Very Large Data Bases; 2005. p. 697–708.

    Google Scholar 

  52. Percival D, Walden A. Spectral analysis for physical applications. Cambridge: Cambridge University Press; 1993.

    Book  MATH  Google Scholar 

  53. Pharr M, Humphreys G. Physically based rendering: from theory to implementation. Burlington: Morgan Kaufmann; 2010.

    Google Scholar 

  54. Polikar R. Ensemble based systems in decision making. IEEE Circuits Syst Mag. 2006;6(3):21–45.

    Article  Google Scholar 

  55. Russel S, Norvig P. Artificial intelligence: a modern approach. Upper Saddle River: Prentice Hall; 2010.

    Google Scholar 

  56. Sayood K. Introduction to data compression. Morgan Kaufmann; 2005.

    MATH  Google Scholar 

  57. Schapire RE, Singer Y. Improved boosting algorithms using confidence-rated predictors. Mach Learn. 1999;37(3):297–336.

    Article  MATH  Google Scholar 

  58. Shinozaki T, Kubota Y, Furui S. Unsupervised acoustic model adaptation based on ensemble methods. 2010;4(6):1007–15.

    Google Scholar 

  59. Sugiyama M, Kawanabe M, Chui PL. Dimensionality reduction for density ratio estimation in high-dimensional spaces. Neural Netw. 2010;23(1): 44–59.

    Article  MATH  Google Scholar 

  60. Takezawa K, editor. Introduction to nonparametric regression. Wiley; 2005.

    Google Scholar 

  61. Towfic ZJ, Chen J, Sayed AH. On distributed online classification in the midst of concept drifts. Neurocomputing. 2013;112(Jul):139–52.

    Google Scholar 

  62. Vapnik V. Statistical learning theory. New York: Wiley; 1998.

    MATH  Google Scholar 

  63. Wang H, Fan W, Yu PS, Han J. Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2003. p. 226–35.

    Google Scholar 

  64. Witten IH, Frank E, Hall MA, editors. Data mining: practical machine learning tools and techniques. 3rd ed. Amsterdam: Morgan Kauffman; 2011.

    MATH  Google Scholar 

  65. Yi B-K, Sidiropoulos N, Johnson T, Jagadish HV, Faloutsos C, Biliris A. Online data mining for co-evolving time sequences. In: Proceedings of the 16th International Conference on Data Engineering; 2000. p. 13–22.

    Google Scholar 

  66. Zhang T, Ramakrishnan R, Livny M. BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1996. p. 103–14.

    Google Scholar 

  67. Zhu Y, Shasha D. Statstream: statistical monitoring of thousands of data streams in real-time. In: Proceedings of the 28th International Conference on Very Large Data Bases; 2002. p. 358–69.

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deepak Turaga .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Turaga, D. (2018). Streaming Analytics. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_80673

Download citation

Publish with us

Policies and ethics