Skip to main content

A Survey of Real-Time Big Data Processing Algorithms

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Mechanical Engineering ((LNME))

Abstract

Data collection and processing in real time is one of the most challenging domains for big data. The sustainable proliferation of unbounded streaming data has become arduous for data collection, data pre-process, data optimization, etc. Real-time streaming for data collection can effectively be performed by windowing mechanism. In this communication, we have discussed various windowing mechanisms such as sliding window, tumbling window, landmark window, index-based window, adaptive size tumbling window, and partitioned-based window. The reliability measure, which depends upon selection of appropriate windowing mechanism, has also been discussed. These window-based algorithms have been compared on the basis of CPU utilization, memory consumption, time efficiency, and operation compatibility. In this paper, we have surveyed various aggregation algorithms such as reactive aggregator, flatFAT, flatFIT, B-Int, DABA, and two stacks aggregator and compared them based on time complexity. Remarkably, a hybrid window mechanism has been introduced in this study which can handle the most recent data stream and variable rate of data stream by sliding window and tumbling window, respectively.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Gibbonsand BP, Tirthapura S (2002) Distributed streams algorithms for sliding windows. In: Proceedings of the fourteenth annual ACM symposium on parallel algorithms and architectures. ACM

    Google Scholar 

  2. Rivetti N, Busnel Y, Mostefaoui A (2015) Efficiently summarizing data streams over sliding windows. In: 2015 IEEE 14th international symposium on network computing and applications (NCA). IEEE

    Google Scholar 

  3. Mousavi H, Zaniolo C (2013) Fast computation of approximate biased histograms on sliding windows over data streams. In: Proceedings of the 25th international conference on scientific and statistical database management. ACM

    Google Scholar 

  4. Badiozamany S, Orsborn K, Risch T (2016) Framework for real-time clustering over sliding windows. In: Proceedings of the 28th international conference on scientific and statistical database management. ACM

    Google Scholar 

  5. Wei Z, Liu X, Li F, Shang S, Du X, Wen JR (2016) Matrix sketching over sliding windows. In: Proceedings of the 2016 international conference on management of data. ACM

    Google Scholar 

  6. Wu F, Wu Q, Zhong Y, Jin X (2009) Mining frequent patterns in data stream over sliding windows. In: 2009 international conference on computational intelligence and software engineering, 2009, CiSE. IEEE, New York

    Google Scholar 

  7. Zaharia M, Das T, Li H, Hunter T, Shenker S, Stoica I (2013) Discretized streams: fault-tolerant streaming computation at scale. In: Proceedings of the twenty-fourth ACM symposium on operating systems principles. ACM

    Google Scholar 

  8. Epasto A, Lattanzi S, Vassilvitskii S, Zadimoghaddam M (2017) Submodular optimization over sliding windows. In: Proceedings of the 26th international conference on world wide web international world wide web conferences steering committee

    Google Scholar 

  9. Zhang L, Zhanhuai L, Yiqiang Z, Min Y, Yang Z (2007) A priority random sampling algorithm for time-based sliding windows over weighted streaming data. In: Proceedings of the 2007 ACM symposium on applied computing. ACM

    Google Scholar 

  10. Braverman V, Ostrovsky R, Zaniolo C (2009) Optimal sampling from sliding windows. In: Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems ACM

    Google Scholar 

  11. Balazinska M, Hwang JH, Shah MA (2009) Fault-tolerance and high availability in data stream management systems. In: Encyclopedia of database systems. Springer US, 1109–1115

    Google Scholar 

  12. Liberty E (2013) Simple and deterministic matrix sketching. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining. ACM

    Google Scholar 

  13. Patroumpas K, Sellis T (2009) Window update patterns in stream operators. In: East European conference on advances in databases and information systems. Springer, Berlin

    Google Scholar 

  14. Bhatotia P, Acar UA, Junqueira FP, Rodrigues R (2014) Slider: incremental sliding window analytics. In: Proceedings of the 15th international middleware conference. ACM

    Google Scholar 

  15. Badiozamany S (2016) Real-time data stream clustering over sliding windows. Diss. Acta Univ Ups

    Google Scholar 

  16. Zhang L, Lin J, Karim R (2017) Sliding window-based fault detection from high-dimensional data streams. IEEE Trans Syst Man Cybernet Syst 47(2):289–303

    Google Scholar 

  17. Golab L (2004) Querying sliding windows over online data streams. In: International conference on extending database technology. Springer, Berlin

    Google Scholar 

  18. Patroumpas K, Sellis T (2006) Window specification over data streams. In: Current trends in database technology–EDBT, pp 445–464

    Google Scholar 

  19. Balkesen C, Tatbul N (2011) Scalable data partitioning techniques for parallel sliding window processing over data streams. In: International workshop on data management for sensor networks (DMSN)

    Google Scholar 

  20. Marcu OC, Tudoran R, Nicolae B, Costan A, Antoniu G, Hernandez MSP (2017) Exploring shared state in key-value store for window-based multi-pattern streaming analytics. In: Proceedings of the 17th IEEE/ACM international symposium on cluster, cloud and grid computing. IEEE Press

    Google Scholar 

  21. Chen H, Wang Y, Wang Y, Ma X (2016) GDSW: a general framework for distributed sliding window over data streams. In: IEEE 22nd international conference on parallel and distributed systems (ICPADS). IEEE

    Google Scholar 

  22. Tangwongsan K, Hirzel M, Schneider S (2017) Low-latency sliding-window aggregation in worst-case constant time. In: Proceedings of the 11th ACM international conference on distributed and event-based systems. ACM

    Google Scholar 

  23. Hirzel M, Schneider S, Tangwongsan K (2017) Sliding-window aggregation algorithms: tutorial. In: Proceedings of the 11th ACM international conference on distributed and event-based systems. ACM

    Google Scholar 

  24. Tangwongsan K et al (2015) General incremental sliding-window aggregation. In: Proceedings of the VLDB endowment vol 8(7), pp 702–713

    Google Scholar 

  25. Shein AU, Chrysanthis PK, Labrinidis A (2017) FlatFIT: accelerated incremental sliding-window aggregation for real-time analytics. In: Proceedings of the 29th international conference on scientific and statistical database management. ACM

    Google Scholar 

  26. Arasu A, Widom J (2004) Resource sharing in continuous sliding-window aggregates. In: Proceedings of the thirtieth international conference on very large data bases, vol 30. VLDB Endowment

    Google Scholar 

  27. Cormode G, Yi K (2011) Brief announcement: tracking distributed aggregates over time-based sliding windows. PODC 11

    Google Scholar 

Download references

Acknowledgements

I offer most sincere gratitude to the Council of Scientific and Industrial Research (CSIR), Government of India, for financial support in the form of Junior Research Fellowships.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Devesh Kumar Lal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lal, D.K., Suman, U. (2020). A Survey of Real-Time Big Data Processing Algorithms. In: Gupta, V., Varde, P., Kankar, P., Joshi, N. (eds) Reliability and Risk Assessment in Engineering. Lecture Notes in Mechanical Engineering. Springer, Singapore. https://doi.org/10.1007/978-981-15-3746-2_1

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-3746-2_1

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-3745-5

  • Online ISBN: 978-981-15-3746-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics