Consistency Maintenance in Distributed Analytical Stream Processing

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 909)


State-of-the-art industrial and research projects in the area of distributed stream processing mainly consider only a limited set of delivery-level consistency models, which do not guarantee consistency regarding business requirements. However, such guarantees are able to make stream analytics more reliable. In this paper we define a problem of designing mechanisms, which can detect and possibly fix semantic-based inconsistencies. The results which have been already obtained and a detailed plan of further research are discussed.


Stream Analysis Stream Processing Systems Consistent High Level Exactly-once Semantics User-defined Consistency 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Apache Hadoop, October 2017.
  2. 2.
  3. 3.
    Akidau, T., et al.: Millwheel: fault-tolerant stream processing at internet scale. Proc. VLDB 6(11), 1033–1044 (2013)CrossRefGoogle Scholar
  4. 4.
    Apache Storm, Octoner 2017.
  5. 5.
    Baylor, D., et al.: Tfx: a tensorflow-based production-scale machine learning platform. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017, pp. 1387–1395. ACM, New York (2017).
  6. 6.
    Carbone, P., Ewen, S., Fóra, G., Haridi, S., Richter, S., Tzoumas, K.: State management in Apache Flink\({\textregistered }\): consistent stateful distributed stream processing. Proc. VLDB 10(12), 1718–1729 (2017)CrossRefGoogle Scholar
  7. 7.
    Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., Tzoumas, K.: Apache Flink: stream and batch processing in a single engine. Bull. IEEE Comput. Soc. Tech. Comm. Data Eng. 36(4), 28–38 (2015)Google Scholar
  8. 8.
    Chintapalli, S., et al.: Benchmarking streaming computation engines: storm, flink and spark streaming. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1789–1792, May 2016.
  9. 9.
    Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008). Scholar
  10. 10.
    Doulkeridis, C., Norvaag, K.: A survey of large-scale analytical query processing in mapreduce. VLDB J. 23(3), 355–380 (2014)CrossRefGoogle Scholar
  11. 11.
    Fischer, P.M., Esmaili, K.S., Miller, R.J.: Stream schema: providing and exploiting static metadata for data stream processing. In: Proceedings of the 13th International Conference on Extending Database Technology, EDBT 2010, pp. 207–218. ACM, New York (2010).
  12. 12.
    Garcia-Molina, H.: Using semantic knowledge for transaction processing in a distributed database. ACM Trans. Database Syst. 8(2), 186–213 (1983). Scholar
  13. 13.
    Guo, J., Lam, I.H., Chan, C., Xiao, G.: Collaboratively maintaining semantic consistency of heterogeneous concepts towards a common concept set. In: Proceedings of the 2nd ACM SIGCHI Symposium on Engineering Interactive Computing Systems, EICS 2010, pp. 213–218. ACM, New York (2010).
  14. 14.
    Hambling, B., Van Goethem, P.: User acceptance testing: a step-by-step guide. BCS Learning & Development (2013)Google Scholar
  15. 15.
    Jacques-Silva, G., et al.: Consistent regions: guaranteed tuple processing in IBM streams. Proc. VLDB Endow. 9(13), 1341–1352 (2016)CrossRefGoogle Scholar
  16. 16.
    Kulkarni, S., et al.: Twitter heron: stream processing at scale. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD 2015, pp. 239–250. ACM, New York (2015).
  17. 17.
    Kuralenok, I.E., Marshalkin, N., Trofimov, A., Novikov, B.: An optimistic approach to handle out-of-order events within analytical stream processing. Accepted at SEIM (2018).
  18. 18.
    Kuralenok, I.E., Trofimov, A., Marshalkin, N., Novikov, B.: Flamestream: model and runtime for distributed stream processing. In: Proceedings of the 5th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond, BeyondMR 2018, pp. 8:1–8:2. ACM, New York (2018).
  19. 19.
    Li, J., Tufte, K., Shkapenyuk, V., Papadimos, V., Johnson, T., Maier, D.: Out-of-order processing: a new architecture for high-performance stream systems. Proc. VLDB Endow. 1(1), 274–288 (2008)CrossRefGoogle Scholar
  20. 20.
    Mihaila, G.A., Stanoi, I., Lang, C.A.: Anomaly-free incremental output in stream processing. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM 2008, pp. 359–368. ACM, New York (2008).
  21. 21.
    Noghabi, S.A., et al.: Samza: stateful scalable stream processing at Linkedin. Proc. VLDB Endow. 10(12), 1634–1645 (2017)CrossRefGoogle Scholar
  22. 22.
    Rodríguez, M.A., Bertossi, L., Caniupán, M.: An inconsistency tolerant approach to querying spatial databases. In: Proceedings of the 16th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS 2008, pp. 36:1–36:10. ACM, New York (2008).
  23. 23.
    Tucker, P.A., Maier, D., Sheard, T., Fegaras, L.: Exploiting punctuation semantics in continuous data streams. IEEE Trans. Knowl. Data Eng. 15(3), 555–568 (2003). Scholar
  24. 24.
    Zaharia, M., Das, T., Li, H., Shenker, S., Stoica, I.: Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. In: Proceedings of the 4th USENIX Conference on Hot Topics in Cloud Ccomputing, HotCloud 2012, p. 10. USENIX Association, Berkeley (2012)Google Scholar
  25. 25.
    Zou, Q., et al.: From a stream of relational queries to distributed stream processing. Proc. VLDB Endow. 3(1–2), 1394–1405 (2010)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.JetBrains ResearchSt. PetersburgRussia
  2. 2.Saint Petersburg State UniversitySt. PetersburgRussia

Personalised recommendations