Advertisement

The VLDB Journal

, Volume 27, Issue 6, pp 847–872 | Cite as

A survey of state management in big data processing systems

  • Quoc-Cuong To
  • Juan Soto
  • Volker Markl
Regular Paper
  • 275 Downloads

Abstract

The concept of state and its applications vary widely across big data processing systems. This is evident in both the research literature and existing systems, such as Apache Flink, Apache Heron, Apache Samza, Apache Spark, and Apache Storm. Given the pivotal role that state management plays, particularly, for iterative batch and stream processing, in this survey, we present examples of state as an enabler, discuss the alternative approaches used to handle and implement state, capture the many facets of state management, and highlight new research directions. Our aim is to provide insight into disparate state management techniques, motivate others to pursue research in this area, and draw attention to open problems.

Keywords

Big data processing systems State management Survey 

Notes

Acknowledgements

This work was funded by the H2020 STREAMLINE Project under Grant Agreement No. 688191 and by the German Federal Ministry for Education and Research (BMBF) funded Berlin Big Data Center (BBDC), Under Funding Mark 01IS14013A.

References

  1. 1.
    Doulkeridis, C., Nørvåg, K.: A survey of large-scale analytical query processing in MapReduce. VLDB J. 23(3), 355–380 (2014)CrossRefGoogle Scholar
  2. 2.
    Sakr, S., Liu, A., Fayoumi, A.: The family of MapReduce and large scale data processing systems. J. ACM Comput. Surv. (ACM CSUR) 46(1), 11 (2013)Google Scholar
  3. 3.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  4. 4.
    Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., Tzoumas, K.: Apache flink™: stream and batch processing in a single engine. IEEE Data Eng. Bull. 38(4), 28–38 (2015)Google Scholar
  5. 5.
    Apache Flink. http://flink.apache.org/ (2018)
  6. 6.
    Alexandrov, A., et al.: The stratosphere platform for big data analytics. VLDB J. 23(6), 939–964 (2014)CrossRefGoogle Scholar
  7. 7.
    Kulkarni, S., et al.: Twitter Heron: stream processing at scale. In: SIGMOD, pp. 239–250 (2015)Google Scholar
  8. 8.
  9. 9.
    Apache Samza. http://samza.apache.org/ (2018)
  10. 10.
    Apache Spark. http://spark.apache.org/ (2018)
  11. 11.
    Hirzel, M., Soulé, R., Schneider, S., Gedik, B., Grimm, R.: A catalog of stream processing optimizations. ACM Comput. Surv. (CSUR) 46(4), 46 (2014)CrossRefGoogle Scholar
  12. 12.
    Van Roy, P., Haridi, S.: Concepts, Techniques, and Models of Computer Programming. MIT Press, Cambridge (2004)Google Scholar
  13. 13.
    Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M.: MapReduce online. In: NSDI (2010)Google Scholar
  14. 14.
    Ekanayake, J., Fox, G.: High performance parallel computing with clouds and cloud technologies. In: CloudComp (2009)Google Scholar
  15. 15.
    Logothetis, D., Olston, C., Reed, B., Webb, K.C., Yocum, K.: Stateful bulk processing for incremental analytics. In: ACM Symposium on Cloud Computing (SoCC), pp. 51–62 (2010)Google Scholar
  16. 16.
    Matteis, T.D., Mencagli, G.: Parallel patterns for window-based stateful operators on data streams: an algorithmic skeleton approach. J. Parallel Program. 45, 382–401 (2016)CrossRefGoogle Scholar
  17. 17.
    Fernandez, R.C., Migliavacca, M., Kalyvianaki, E., Pietzuch, P.: Integrating scale out and fault tolerance in stream processing using operator state management. In: SIGMOD (2013)Google Scholar
  18. 18.
    Wu, Y., Tan, K.: ChronoStream: elastic stateful stream computation in the cloud. In: ICDE, pp. 723–734 (2015)Google Scholar
  19. 19.
    Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.: Distributed GraphLab: a framework for machine learning in the cloud. PVLDB 5(8), 716–727 (2012)Google Scholar
  20. 20.
    Meehan, J., et al.: S-Store: streaming meets transaction processing. PVLDB 8(13), 2134–2145 (2015)Google Scholar
  21. 21.
    Losa, G., et al.: CAPSULE: language and system support for efficient state sharing in distributed stream processing systems. In: DEBS, pp. 268–277 (2012)Google Scholar
  22. 22.
    Ding, J., et al.: Efficient operator state migration for cloud-based data stream management systems. In: The Computing Research Repository (CoRR). arXiv:1501.03619 (2016)
  23. 23.
    Feng, Y.-H., et al.: Efficient and adaptive stateful replication for stream processing engines in high-availability cluster. TPDS 22(11), 1788–1796 (2011)Google Scholar
  24. 24.
    Fegaras, L.: Incremental query processing on big data streams. In: TKDE (2016)Google Scholar
  25. 25.
    Brito, A., Fetzer, C., Sturzrehm, H., Felber, P.: Speculative out-of-order event processing with software transaction memory. In: DEBS, pp. 265–275 (2008)Google Scholar
  26. 26.
    Nicolae, B., Cappello, F.: AI-Ckpt: leveraging memory access patterns for adaptive asynchronous incremental checkpointing. In: High-Performance Parallel and Distributed Computing (HPDC), pp. 155–166 (2013)Google Scholar
  27. 27.
    Ren, K., Diamond, T., Abadi, D.J., Thomson, A.: Low-overhead asynchronous checkpointing in main-memory database systems. In: SIGMOD, pp. 1539–1551 (2016)Google Scholar
  28. 28.
    Liu, B., Zhu, Y., Rundensteiner, E.A.: Run-time operator state spilling for memory intensive long-running queries. In: SIGMOD, pp. 347–358 (2006)Google Scholar
  29. 29.
    Ananthanarayanan, R., et al.: Photon: fault-tolerant and scalable joining of continuous data streams. In: SIGMOD, pp. 577–588 (2013)Google Scholar
  30. 30.
    Zhang, H., Chen, G., Ooi, B.C., Tan, K.L., Zhang, M.: In-memory big data management and processing: a survey. TKDE 27(7), 1920–1948 (2015)Google Scholar
  31. 31.
    Kwon, Y., Balazinska, M., Greenberg, A.: Fault-tolerant stream processing using a distributed, replicated file system. PVLDB 1(1), 574–585 (2008)Google Scholar
  32. 32.
    Tu, Y.-C., Liu, S., Prabhakar, S., Yao, B.: Load shedding in stream databases: a control-based approach. In: VLDB, pp. 787–798 (2006)Google Scholar
  33. 33.
    Mokbel, M., Lu, M., Aref, W.: Hash-merge join: a non-blocking join algorithm for producing fast and early join results. In: ICDE, pp. 251–262 (2004)Google Scholar
  34. 34.
    Urhan, T., Franklin, M.J.: Xjoin: a reactively-scheduled pipelined join operator. IEEE Data Eng. Bull. 23(2), 27–33 (2000)Google Scholar
  35. 35.
    Viglas, S., Naughton, J.F., Burger, J.: Maximizing the output rate of multi-way join queries over streaming information sources. In: VLDB, pp. 285–296 (2003)CrossRefGoogle Scholar
  36. 36.
    Hwang, J.H., Balazinska, M., Rasin, A., Cetintemel, U., Stonebraker, M., Zdonik, S.: High-availability algorithms for distributed stream processing. In ICDE, pp. 779–790 (2005)Google Scholar
  37. 37.
    Fernandez, R.C., Migliavacca, M., Kalyvianaki, E., Pietzuch, P.: Making state explicit for imperative big data processing. In: USENIX ATC (2014)Google Scholar
  38. 38.
    Murray, D.G., McSherry, F., Isaacs, R., Isard, M., Barham, P., Abadi, M.: Naiad: a timely dataflow system. In: ACM Symposium on Operating Systems Principles (SOSP), pp. 439–455 (2013)Google Scholar
  39. 39.
    Toshniwal, A., et al.: Storm@twitter. In: SIGMOD, pp. 147–156 (2014)Google Scholar
  40. 40.
    Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI (2012)Google Scholar
  41. 41.
    Ding, L., Mehta, N., Rundensteiner, E.A., Heineman, G.T.: Joining punctuated streams. In: EDBT, pp. 587–604 (2004)Google Scholar
  42. 42.
    Tucker, P.A., Maier, D., Sheard, T., Fegaras, L.: Exploiting punctuation semantics in continuous data streams. TKDE 15(3), 555–568 (2003)Google Scholar
  43. 43.
    Li, H.G., Chen, S., Tatemura, J., Agrawal, D., Candan, K.S., Hsiung, W.P.: Safety guarantee of continuous join queries over punctuated data streams. In: VLDB, pp. 19–30 (2006)Google Scholar
  44. 44.
    Li, J., Tufte, K., Shkapenyuk, V., Papadimos, V., Johnson, T., Maier, D.: Out-of-order processing: a new architecture for high-performance stream systems. PVLDB 1(1), 274–288 (2008)Google Scholar
  45. 45.
    Zhu, Y., Rundensteiner, E., Heineman, G.T.: Dynamic plan migration for continuous queries over data streams. In: SIGMOD (2004)Google Scholar
  46. 46.
    Gulisano, V., Peris, R.J., Martínez, M.P., Soriente, C., Valduriez, P.: StreamCloud: an elastic and scalable data stream system. TPDS 23(12), 2351–2365 (2012)Google Scholar
  47. 47.
    Pietzuch, P., Ledlie, J., Shneidman, J., Roussopoulos, M., Welsh, M., Seltzer, M.: Network-aware operator placement for stream-processing systems. In: ICDE (2006)Google Scholar
  48. 48.
    Ottenwalder, B., Koldehofe, B., Rothermel, K., Ramachandran, U.: MigCEP: operator migration for mobility driven distributed complex event processing. In: DEBS, pp. 183–194 (2013)Google Scholar
  49. 49.
    Fernandez, R.C., Garefalakis, P., Pietzuch, P.: Java2SDG: stateful big data processing for the masses. In: ICDE, pp. 1390–1393 (2016)Google Scholar
  50. 50.
    Ahmad, Y., Kennedy, O., Koch, C., Nikolic, M.: DBToaster: higher-order delta processing for dynamic, frequently fresh views. PVLDB 5(10), 968–979 (2012)Google Scholar
  51. 51.
    Arasu, A., Babu, S., Widom, J.: The CQL continuous query language: semantic foundations and query execution. VLDB J. 15(2), 121–142 (2006)CrossRefGoogle Scholar
  52. 52.
    Gordon, M.I., Thies, W., Amarasinghe, S.: Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In: Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 151–162 (2006)Google Scholar
  53. 53.
    Sermulins, J., Thies, W., Rabbah, R., Amarasinghe, S.: Cache aware optimization of stream programs. In: Languages, Compiler, and Tool Support for Embedded Systems (LCTES), pp. 115–126 (2005)Google Scholar
  54. 54.
    Kuntschke, R., Stegmaier, B., Kemper, A.: Data stream sharing. Technical Report, TU Munich (2005)Google Scholar
  55. 55.
    Tatbul, N., et al.: Handling shared, mutable state in stream processing with correctness guarantees. IEEE Data Eng. Bull. 38(4), 94–104 (2015)Google Scholar
  56. 56.
    Naksinehaboon, N., et al.: Reliability-aware approach: an incremental checkpoint/restart model in HPC environments. In: CCGRID, pp. 783–788 (2008)Google Scholar
  57. 57.
    Sebepou, Z., Magoutis, K.: CEC: continuous eventual checkpointing for data stream processing operators. In: DSN, pp. 145–156 (2011)Google Scholar
  58. 58.
    Koch, C.: Incremental query evaluation in a ring of databases. In: PODS, pp. 87–98 (2010)Google Scholar
  59. 59.
    Koch, C., Ahmad, Y., Kennedy, O., Nikolic, M., Nötzli, A., Lupei, D., Shaikhha, A.: DBToaster: higher-order delta processing for dynamic, frequently fresh views. VLDB J. 23(2), 253–278 (2014)CrossRefGoogle Scholar
  60. 60.
    Koch, C., Lupei, D., Tannen, V.: Incremental view maintenance for collection programming. In: PODS, pp. 75–90 (2016)Google Scholar
  61. 61.
    McSherry, F., Murray, D.G., Isaacs, R., Isard, M.: Differential dataflow. In: CIDR (2013)Google Scholar
  62. 62.
    Nikolic, M., Elseidy, M., Koch, C.: LINVIEW: incremental view maintenance for complex analytical queries. In: SIGMOD, pp. 253–264 (2014)Google Scholar
  63. 63.
    Nikolic, M., Dashti, M., Koch, C.: How to win a hot dog eating contest: distributed incremental view maintenance with batch updates. In: SIGMOD, pp. 511–526 (2016)Google Scholar
  64. 64.
    Padmanabhan, S., Malkemus, T., Jhingran, A., Agarwal, R.: Block oriented processing of relational database operations in modern computer architectures. In: ICDE, pp. 567–574 (2001)Google Scholar
  65. 65.
    Wang, L., Fu, T.Z.J., Ma, R.T.B., Winslett, M., Zhang, Z.: Elasticutor: rapid elasticity for realtime stateful stream processing. In: The Computing Research Repository (CoRR). arXiv:1711.01046 (2017)
  66. 66.
    Shah, M.A., Hellerstein, J.M., Chandrasekaran, S., Franklin, M.J.: Flux: an adaptive partitioning operator for continuous query systems. In: ICDE (2003)Google Scholar
  67. 67.
    Gedik, B.: Partitioning functions for stateful data parallelism in stream processing. VLDB J. 23(4), 517–539 (2014)CrossRefGoogle Scholar
  68. 68.
    Nasir, M.A.U., Morales, G.D.F., García-Soriano, D., Kourtellis, N., Serafini, M.: The power of both choices: practical load balancing for distributed stream processing engines. In: ICDE, pp. 137–148 (2015)Google Scholar
  69. 69.
    Nasir, M.A.U., Morales, G.D.F., Kourtellis, N., Serafini, M.: When two choices are not enough: balancing at scale in distributed stream processing. In: ICDE, pp. 589–600 (2016)Google Scholar
  70. 70.
    Katsipoulakis, N.R., Labrinidis, A., Chrysanthis, P.K.: A holistic view of stream partitioning costs. PVLDB 10(11), 1286–1297 (2017)Google Scholar
  71. 71.
    Sayed, N.E., Schroeder, B.: Checkpoint/restart in practice: when simple is better. In: IEEE International Conference on Cluster Computing (CLUSTER), pp. 84–92 (2014)Google Scholar
  72. 72.
    Bouguerra, M.S., Trystram, D., Wagner, F.: Complexity analysis of checkpoint scheduling with variable costs. IEEE Trans. Comput. 62(6), 1269–1275 (2013)MathSciNetCrossRefGoogle Scholar
  73. 73.
    Young, J.W.: A first order approximation to the optimum checkpoint interval. Commun. ACM 17(9), 530–531 (1974)CrossRefGoogle Scholar
  74. 74.
    Robert, Y., Vivien, F., Zaidouni, D.: On the complexity of scheduling checkpoints for computational workflows. In: DSN, pp. 1–6 (2012)Google Scholar
  75. 75.
    Logothetis, D., Yocum, K.: Data indexing for stateful, large-scale data processing. In: NETDB (2009)Google Scholar
  76. 76.
    Schelter, S., Ewen, S., Tzoumas, K., Markl, V.: “All roads lead to Rome:” optimistic recovery for distributed iterative data processing. In: CIKM, pp. 1919–1928 (2013)Google Scholar
  77. 77.
    Ewen, S., Tzoumas, K., Kaufmann, M., Markl, V.: Spinning fast iterative data flows. PVLDB 5(11), 1268–1279 (2012)Google Scholar
  78. 78.
    Ewen, S., Schelter, S., Tzoumas, K., Warneke, D., Markl, V.: Iterative parallel data processing with stratosphere: an inside look. In: SIGMOD, pp. 1053–1056 (2013)Google Scholar
  79. 79.
    Markl, V.: Breaking the chains: on declarative data analysis and data independence in the big data era. PVLDB 7(13), 1730–1733 (2014)Google Scholar
  80. 80.
    Weimer, M., Condie, T., Ramakrishnan, R.: Machine learning in ScalOps, a higher order cloud computing language. NIPS BigLearn 9, 389–396 (2011)Google Scholar
  81. 81.
    Zinkevich, M., Weimer, M., Smola, A.J., Li, L.: Parallelized stochastic gradient descent. In: Neural Information Processing Systems (NIPS), pp. 2595–2603 (2010)Google Scholar
  82. 82.
    Benjelloun, O., Sarma, A.D., Halevy, A., Widom, J.: ULDBs: databases with uncertainty and lineage. In: VLDB, pp. 953–964 (2006)Google Scholar
  83. 83.
    Dudoladov, S., Xu, C., Schelter, S., Katsifodimos, A., Ewen, S., Tzoumas, K., Markl, V.: Optimistic recovery for iterative dataflows in action. In: SIGMOD, pp. 1439–1443 (2015)Google Scholar
  84. 84.
    Xu, C., Holzemer, M., Kaul, M., Markl, V.: Efficient fault-tolerance for iterative graph processing on distributed dataflow systems. In: ICDE, pp. 613–624 (2016)Google Scholar
  85. 85.
    Hwang, J.H., Xing, Y., Cetintemel, U., Zdonik, S.: A cooperative, self-configuring high-availability solution for stream processing. In: ICDE (2007)Google Scholar
  86. 86.
    Chen, Z., Dongarra, J.: Highly scalable self-healing algorithms for high performance scientific computing. IEEE Trans. Comput. 58(11), 1512–1524 (2009)MathSciNetCrossRefGoogle Scholar
  87. 87.
    Hakkarinen, D., Chen, Z.: Multilevel diskless checkpointing. IEEE Trans. Comput. 62(4), 772–783 (2013)MathSciNetCrossRefGoogle Scholar
  88. 88.
    Koldehofe, B., Mayer, R., Ramachandran, U., Rothermel, K., Völz, M.: Rollback-recovery without checkpoints in distributed event processing systems. In: DEBS, pp. 27–38 (2013)Google Scholar
  89. 89.
    Su, L., Zhou, Y.: Tolerating correlated failures in massively parallel stream processing engines. In: ICDE, pp. 517–528 (2016)Google Scholar
  90. 90.
    Upadhyaya, P., et al.: A latency and fault-tolerance optimizer for online parallel query plans. In: SIGMOD, pp. 241–252 (2011)Google Scholar
  91. 91.
    Wang, H., Peh, L.-S., Koukoumidis, E., Tao, S., Chan, M.C.: Meteor shower: a reliable stream processing system for commodity data centers. In: IEEE IPDPS, pp. 1180–1191 (2012)Google Scholar
  92. 92.
    Balazinska, M., Balakrishnan, H., Madden, S., Stonebraker, M.: Fault-tolerance in the Borealis distributed stream processing system. In: SIGMOD, pp. 13–24 (2005)Google Scholar
  93. 93.
    Balazinska, M., Balakrishnan, H., Madden, S., Stonebraker, M.: Fault-tolerance in the Borealis distributed stream processing system. TODS 33(1), 1–44 (2008)CrossRefGoogle Scholar
  94. 94.
    Abadi, D.J., et al.: The design of the Borealis stream processing engine. In: CIDR, pp. 277–289 (2005)Google Scholar
  95. 95.
    Carbone, P., Fóra, G., Ewen, S., Haridi, S., Tzoumas, K.: Lightweight asynchronous snapshots for distributed dataflows. In: The Computing Research Repository (CoRR). arXiv:1506.08603 (2015)
  96. 96.
    Jangjaimon, I., Tzeng, N.-F.: Adaptive incremental checkpointing via delta compression for networked multicore systems. In: IEEE IPDPS, pp. 7–18 (2013)Google Scholar
  97. 97.
    Paun, M., et al.: Incremental checkpoint schemes for Weibull failure distribution. J. Found. Comput. Sci. 21(3), 329–344 (2010)MathSciNetCrossRefGoogle Scholar
  98. 98.
    Madsen, K.G.S., Zhou, Y.: Dynamic resource management in a massively parallel stream processing engine. In: CIKM, pp. 13–22 (2015)Google Scholar
  99. 99.
    Madsen, K.G.S., Zhou, Y., Cao, J.: Integrative dynamic reconfiguration in a parallel stream processing engine. In: The Computing Research Repository (CoRR). arXiv:1602.03770 (2016)
  100. 100.
    McSherry, F., Isaacs, R., Isard, M., Murray, D.G.: Composable incremental and iterative data-parallel computation with Naiad. Technical report number MSR-TR-2012-105. Microsoft Research Silicon Valley (2012)Google Scholar
  101. 101.
    Carbone, P., Ewen, S., Fóra, G., Haridi, S., Richter, S., Tzoumas, K.: State management in apache flink: consistent stateful distributed stream processing. PVLDB 10(12), 1718–1729 (2017)Google Scholar
  102. 102.
    Cai, Y., Giarrusso, P.G., Rendel, T., Ostermann, K.: A theory of changes for higher-order languages: incrementalizing λ-calculi by static differentiation. In: Programming Language Design and Implementation (PLDI), pp. 145–155 (2014)CrossRefGoogle Scholar
  103. 103.
    Fegaras, L.: An algebra for distributed big data analytics. Technical report (2016)Google Scholar
  104. 104.
    Hammer, M.A., Dunfield, J., Headley, K., Labich, N., Foster, J.S., Hicks, M., Horn, D.V.: Incremental computation with names. SIGPLAN 50(10), 748–766 (2015)CrossRefGoogle Scholar
  105. 105.
    Alexandrov, A., et al.: Implicit parallelism through deep language embedding. In: SIGMOD, pp. 47–61 (2015)Google Scholar
  106. 106.
    Silva, G.J., Gedik, B., Andrade, H., Wu, K.-L.: Language level checkpointing support for stream processing applications. In: DSN (2009)Google Scholar
  107. 107.
    Agrawal, D., et al.: Road to freedom in big data analytics. In: EDBT, pp. 479–484 (2016)Google Scholar
  108. 108.
    Agrawal, D., et al. Rheem: enabling multi-platform task execution. In: SIGMOD, pp. 2069–2072 (2016)Google Scholar
  109. 109.
    Wu, X., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2007)CrossRefGoogle Scholar
  110. 110.
    Aggarwal, C., Yu, P.: A survey of synopsis construction in data streams. In: Data Streams, Advances in Database Systems, vol. 31. Springer, New York (2007)Google Scholar
  111. 111.
    Johnson, T., Muthukrishnan, S., Rozenbaum, I.: Sampling algorithms in a stream operator. In: SIGMOD, pp. 1–12 (2005)Google Scholar
  112. 112.
    Liu, W., Li, G., Cheng, J.: Fast PageRank approximation by adaptive sampling. Knowl. Inf. Syst. 42(1), 127–146 (2015)CrossRefGoogle Scholar
  113. 113.
    Mitliagkas, I., Borokhovich, M., Dimakis, A.G., Caramanis, C.: FrogWild!: fast PageRank approximations on graph engines. PVLDB 8(8), 874–885 (2015)Google Scholar
  114. 114.
    Yossef, Z.B., Mashiach, L.: Local approximation of PageRank and reverse PageRank. In: Research and Development in Information Retrieval (SIGIR), pp. 865–866 (2008)Google Scholar
  115. 115.
    Zhu, F., Fang, Y., Chang, K.C.-C., Ying, J.: Scheduled approximation for personalized PageRank with utility-based hub selection. VLDB J. 24(5), 655–679 (2015)CrossRefGoogle Scholar
  116. 116.
    Fujiwara, Y., Nakatsuji, M., Onizuka, M., Kitsuregawa, M.: Fast and exact top-k search for random walk with restart. PVLDB 5(5), 442–453 (2012)Google Scholar
  117. 117.
    Yu, W., Lin, X., Zhang, W.: Fast incremental SimRank on link-evolving graphs. In: ICDE, pp. 304–315 (2014)Google Scholar
  118. 118.
    Hochreiner, C., Vögler, M., Schulte, S., Dustdar, S.: Elastic stream processing for the internet of things. In: CLOUD, pp. 100–107 (2016)Google Scholar
  119. 119.
    Boykin, O., Ritchie, S., O’Connell, I., Lin, J.: Summingbird: a framework for integrating batch and online mapreduce computations. PVLDB 7(13), 1441–1451 (2014)Google Scholar
  120. 120.
    Meehan, J., Zdonik, S., Tian, S., Tian, Y., Tatbul, N., Dziedzic, A., Elmore, A.: Integrating real-time and batch processing in a polystore. In: High-Performance Extreme Computing Conference (HPEC) (2016)Google Scholar
  121. 121.
    Marz, N., Warren, J.: Big data: principles and best practices of scalable realtime data systems. ISBN 9781617290343 (2015)Google Scholar
  122. 122.
    Kappa Architecture. http://kappa-architecture.com (2018)
  123. 123.
    Elmore, A., et al.: A demonstration of the BigDAWG polystore system. PVLDB 8(12), 1908–1911 (2015)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.German Research Center for Artificial Intelligence (DFKI)BerlinGermany
  2. 2.FG DIMATechnische Universität BerlinBerlinGermany

Personalised recommendations