Real-time intelligent big data processing: technology, platform, and applications

Abstract

Human beings keep exploring the physical space using information means. Only recently, with the rapid development of information technologies and the increasing accumulation of data, human beings can learn more about the unknown world with data-driven methods. Given data timeliness, there is a growing awareness of the importance of real-time data. There are two categories of technologies accounting for data processing: batching big data and streaming processing, which have not been integrated well. Thus, we propose an innovative incremental processing technology named after Stream Cube to process both big data and stream data. Also, we implement a real-time intelligent data processing system, which is based on real-time acquisition, real-time processing, real-time analysis, and real-time decision-making. The real-time intelligent data processing technology system is equipped with a batching big data platform, data analysis tools, and machine learning models. Based on our applications and analysis, the real-time intelligent data processing system is a crucial solution to the problems of the national society and economy.

This is a preview of subscription content, access via your institution.

References

  1. 1

    Pan Y. Heading toward artificial intelligence 2.0. Engineering, 2016, 2: 409–413

    Article  Google Scholar 

  2. 2

    Chen C. Real-time processing technology, platform and application of streaming big data. Big Data, 2017, 3: 1–8

    Google Scholar 

  3. 3

    Shvachko K, Kuang H, Radia S, et al. The hadoop distributed file system. In: Proceedings of Mass Storage Systems and Technologies (MSST), 2010. 1–10

  4. 4

    Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. Commun ACM, 2008, 51: 107–113

    Article  Google Scholar 

  5. 5

    Zaharia M, Chowdhury M, Franklin M J, et al. Spark: cluster computing with working sets. HotCloud, 2010, 10: 95

    Google Scholar 

  6. 6

    Zhang Q, Cheng L, Boutaba R. Cloud computing: state-of-the-art and research challenges. J Internet Serv Appl, 2010, 1: 7–18

    Article  Google Scholar 

  7. 7

    Hashem I A T, Yaqoob I, Anuar N B, et al. The rise of “big data” on cloud computing: review and open research issues. Inf Syst, 2015, 47: 98–115

    Article  Google Scholar 

  8. 8

    Wu Q, Ishikawa F, Zhu Q, et al. Deadline-constrained cost optimization approaches for workflow scheduling in clouds. IEEE Trans Parallel Distrib Syst, 2017, 28: 3401–3412

    Article  Google Scholar 

  9. 9

    Saha B, Shah H, Seth S, et al. Apache tez: a unifying framework for modeling and building data processing applications. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, 2015. 1357–1369

  10. 10

    Maarala A I, Rautiainen M, Salmi M, et al. Low latency analytics for streaming traffic data with Apache Spark. In: Proceedings of IEEE International Conference on Big Data (Big Data), 2015. 2855–2858

  11. 11

    Toshniwal A, Taneja S, Shukla A, et al. Storni@ twitter. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, 2014. 147–156

  12. 12

    Carbone P, Katsifodimos A, Ewen S, et al. Apache flink: stream and batch processing in a single engine. Bull IEEE Comput Soc Tech Committee Data Eng, 2015, 36: 4

    Google Scholar 

  13. 13

    Zaharia M, Das T, Li H, et al. Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. HotCloud, 2012, 12: 10

    Google Scholar 

  14. 14

    Zhao X, Garg S, Queiroz C, et al. A taxonomy and survey of stream processing systems. In: Proceedings of Software Architecture for Big Data and the Cloud, 2017. 183–206

  15. 15

    Ali M. An introduction to microsoft SQL server streaminsight. In: Proceedings of the 1st International Conference and Exhibition on Computing for Geospatial Research & Application, 2010. 66

  16. 16

    Hyde J. Data in flight. Commun ACM, 2010, 53: 48–52

    Article  Google Scholar 

  17. 17

    Demers A J, Gehrke J, Panda B, et al. Cayuga: a general purpose event monitoring system. In: Proceedings of the 3rd Biennial Conference on Innovative Data Systems Research, Asilomar, 2007. 7: 412–422

    Google Scholar 

  18. 18

    Strohbach M, Ziekow H, Gazis V, et al. Towards a big data analytics framework for IoT and smart city applications. In: Proceedings of Modeling and Processing for Next-generation Big-data Technologies, 2015. 257–282

  19. 19

    Noghabi S A, Paramasivam K, Pan Y, et al. Samza: stateful scalable stream processing at LinkedIn. Proc VLDB Endow, 2017, 10: 1634–1645

    Article  Google Scholar 

  20. 20

    Chauhan J, Chowdhury S A, Makaroff D. Performance evaluation of Yahoo! S4: a first look. In: Proceedings of the 7th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, 2012. 58–65

  21. 21

    Fernandez R C, Pietzuch P R, Kreps J, et al. Liquid: unifying nearline and offline big data integration. In: Proceedings of the 7th Biennial Conference on Innovative Data Systems Research, Asilomar, 2015

  22. 22

    Pacaci A, Ozsu M T. Distribution-aware stream partitioning for distributed stream processing systems. In: Proceedings of the 5th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond, 2018. 6

  23. 23

    Jin H, Chen F, Wu S, et al. Towards low-latency batched stream processing by pre-scheduling. IEEE Trans Parallel Distrib Syst, 2019, 30: 710–722

    Article  Google Scholar 

  24. 24

    Venkataraman S, Panda A, Ousterhout K, et al. Drizzle: fast and adaptable stream processing at scale. In: Proceedings of the 26th Symposium on Operating Systems Principles, 2017. 374–389

  25. 25

    Zhang B, Jin X, Ratnasamy S, et al. Awstream: adaptive wide-area streaming analytics. In: Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, 2018. 236–252

  26. 26

    Li W X, Niu D, Liu Y N, et al. Wide-area spark streaming: automated routing and batch sizing. IEEE Trans Parall Distributed Syst, 2019, 30: 1434–1448

    Article  Google Scholar 

  27. 27

    Traub J, Grulich P M, Cuellar A R, et al. Scotty: efficient window aggregation for out-of-order stream processing. In: Proceedings of 2018 IEEE 34th International Conference on Data Engineering, 2018. 1300–1303

  28. 28

    Srinivasan V, Bulkowski B, Chu W L, et al. Aerospike. Proc VLDB Endow, 2016, 9: 1389–1400

    Article  Google Scholar 

  29. 29

    Carlson J L. Redis in Action. New York: Manning Publications Co., 2013

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Xingen Wang.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zheng, T., Chen, G., Wang, X. et al. Real-time intelligent big data processing: technology, platform, and applications. Sci. China Inf. Sci. 62, 82101 (2019). https://doi.org/10.1007/s11432-018-9834-8

Download citation

Keywords

  • batching big data
  • streaming processing technology
  • real-time data processing
  • incremental computation
  • intelligent data processing system