Advertisement

Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Real-time intelligent big data processing: technology, platform, and applications

  • 129 Accesses

  • 1 Citations

Abstract

Human beings keep exploring the physical space using information means. Only recently, with the rapid development of information technologies and the increasing accumulation of data, human beings can learn more about the unknown world with data-driven methods. Given data timeliness, there is a growing awareness of the importance of real-time data. There are two categories of technologies accounting for data processing: batching big data and streaming processing, which have not been integrated well. Thus, we propose an innovative incremental processing technology named after Stream Cube to process both big data and stream data. Also, we implement a real-time intelligent data processing system, which is based on real-time acquisition, real-time processing, real-time analysis, and real-time decision-making. The real-time intelligent data processing technology system is equipped with a batching big data platform, data analysis tools, and machine learning models. Based on our applications and analysis, the real-time intelligent data processing system is a crucial solution to the problems of the national society and economy.

This is a preview of subscription content, log in to check access.

References

  1. 1

    Pan Y. Heading toward artificial intelligence 2.0. Engineering, 2016, 2: 409–413

  2. 2

    Chen C. Real-time processing technology, platform and application of streaming big data. Big Data, 2017, 3: 1–8

  3. 3

    Shvachko K, Kuang H, Radia S, et al. The hadoop distributed file system. In: Proceedings of Mass Storage Systems and Technologies (MSST), 2010. 1–10

  4. 4

    Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. Commun ACM, 2008, 51: 107–113

  5. 5

    Zaharia M, Chowdhury M, Franklin M J, et al. Spark: cluster computing with working sets. HotCloud, 2010, 10: 95

  6. 6

    Zhang Q, Cheng L, Boutaba R. Cloud computing: state-of-the-art and research challenges. J Internet Serv Appl, 2010, 1: 7–18

  7. 7

    Hashem I A T, Yaqoob I, Anuar N B, et al. The rise of “big data” on cloud computing: review and open research issues. Inf Syst, 2015, 47: 98–115

  8. 8

    Wu Q, Ishikawa F, Zhu Q, et al. Deadline-constrained cost optimization approaches for workflow scheduling in clouds. IEEE Trans Parallel Distrib Syst, 2017, 28: 3401–3412

  9. 9

    Saha B, Shah H, Seth S, et al. Apache tez: a unifying framework for modeling and building data processing applications. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, 2015. 1357–1369

  10. 10

    Maarala A I, Rautiainen M, Salmi M, et al. Low latency analytics for streaming traffic data with Apache Spark. In: Proceedings of IEEE International Conference on Big Data (Big Data), 2015. 2855–2858

  11. 11

    Toshniwal A, Taneja S, Shukla A, et al. Storni@ twitter. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, 2014. 147–156

  12. 12

    Carbone P, Katsifodimos A, Ewen S, et al. Apache flink: stream and batch processing in a single engine. Bull IEEE Comput Soc Tech Committee Data Eng, 2015, 36: 4

  13. 13

    Zaharia M, Das T, Li H, et al. Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. HotCloud, 2012, 12: 10

  14. 14

    Zhao X, Garg S, Queiroz C, et al. A taxonomy and survey of stream processing systems. In: Proceedings of Software Architecture for Big Data and the Cloud, 2017. 183–206

  15. 15

    Ali M. An introduction to microsoft SQL server streaminsight. In: Proceedings of the 1st International Conference and Exhibition on Computing for Geospatial Research & Application, 2010. 66

  16. 16

    Hyde J. Data in flight. Commun ACM, 2010, 53: 48–52

  17. 17

    Demers A J, Gehrke J, Panda B, et al. Cayuga: a general purpose event monitoring system. In: Proceedings of the 3rd Biennial Conference on Innovative Data Systems Research, Asilomar, 2007. 7: 412–422

  18. 18

    Strohbach M, Ziekow H, Gazis V, et al. Towards a big data analytics framework for IoT and smart city applications. In: Proceedings of Modeling and Processing for Next-generation Big-data Technologies, 2015. 257–282

  19. 19

    Noghabi S A, Paramasivam K, Pan Y, et al. Samza: stateful scalable stream processing at LinkedIn. Proc VLDB Endow, 2017, 10: 1634–1645

  20. 20

    Chauhan J, Chowdhury S A, Makaroff D. Performance evaluation of Yahoo! S4: a first look. In: Proceedings of the 7th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, 2012. 58–65

  21. 21

    Fernandez R C, Pietzuch P R, Kreps J, et al. Liquid: unifying nearline and offline big data integration. In: Proceedings of the 7th Biennial Conference on Innovative Data Systems Research, Asilomar, 2015

  22. 22

    Pacaci A, Ozsu M T. Distribution-aware stream partitioning for distributed stream processing systems. In: Proceedings of the 5th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond, 2018. 6

  23. 23

    Jin H, Chen F, Wu S, et al. Towards low-latency batched stream processing by pre-scheduling. IEEE Trans Parallel Distrib Syst, 2019, 30: 710–722

  24. 24

    Venkataraman S, Panda A, Ousterhout K, et al. Drizzle: fast and adaptable stream processing at scale. In: Proceedings of the 26th Symposium on Operating Systems Principles, 2017. 374–389

  25. 25

    Zhang B, Jin X, Ratnasamy S, et al. Awstream: adaptive wide-area streaming analytics. In: Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, 2018. 236–252

  26. 26

    Li W X, Niu D, Liu Y N, et al. Wide-area spark streaming: automated routing and batch sizing. IEEE Trans Parall Distributed Syst, 2019, 30: 1434–1448

  27. 27

    Traub J, Grulich P M, Cuellar A R, et al. Scotty: efficient window aggregation for out-of-order stream processing. In: Proceedings of 2018 IEEE 34th International Conference on Data Engineering, 2018. 1300–1303

  28. 28

    Srinivasan V, Bulkowski B, Chu W L, et al. Aerospike. Proc VLDB Endow, 2016, 9: 1389–1400

  29. 29

    Carlson J L. Redis in Action. New York: Manning Publications Co., 2013

Download references

Author information

Correspondence to Xingen Wang.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zheng, T., Chen, G., Wang, X. et al. Real-time intelligent big data processing: technology, platform, and applications. Sci. China Inf. Sci. 62, 82101 (2019). https://doi.org/10.1007/s11432-018-9834-8

Download citation

Keywords

  • batching big data
  • streaming processing technology
  • real-time data processing
  • incremental computation
  • intelligent data processing system