Advertisement

Consistency Maintenance in Distributed Analytical Stream Processing

  • Artem Trofimov
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 909)

Abstract

State-of-the-art industrial and research projects in the area of distributed stream processing mainly consider only a limited set of delivery-level consistency models, which do not guarantee consistency regarding business requirements. However, such guarantees are able to make stream analytics more reliable. In this paper we define a problem of designing mechanisms, which can detect and possibly fix semantic-based inconsistencies. The results which have been already obtained and a detailed plan of further research are discussed.

References

  1. 1.
    Apache Hadoop, October 2017. http://hadoop.apache.org/
  2. 2.
  3. 3.
    Akidau, T., et al.: Millwheel: fault-tolerant stream processing at internet scale. Proc. VLDB 6(11), 1033–1044 (2013)CrossRefGoogle Scholar
  4. 4.
    Apache Storm, Octoner 2017. http://storm.apache.org/
  5. 5.
    Baylor, D., et al.: Tfx: a tensorflow-based production-scale machine learning platform. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017, pp. 1387–1395. ACM, New York (2017).  https://doi.org/10.1145/3097983.3098021
  6. 6.
    Carbone, P., Ewen, S., Fóra, G., Haridi, S., Richter, S., Tzoumas, K.: State management in Apache Flink\({\textregistered }\): consistent stateful distributed stream processing. Proc. VLDB 10(12), 1718–1729 (2017)CrossRefGoogle Scholar
  7. 7.
    Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., Tzoumas, K.: Apache Flink: stream and batch processing in a single engine. Bull. IEEE Comput. Soc. Tech. Comm. Data Eng. 36(4), 28–38 (2015)Google Scholar
  8. 8.
    Chintapalli, S., et al.: Benchmarking streaming computation engines: storm, flink and spark streaming. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1789–1792, May 2016.  https://doi.org/10.1109/IPDPSW.2016.138
  9. 9.
    Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008).  https://doi.org/10.1145/1327452.1327492CrossRefGoogle Scholar
  10. 10.
    Doulkeridis, C., Norvaag, K.: A survey of large-scale analytical query processing in mapreduce. VLDB J. 23(3), 355–380 (2014)CrossRefGoogle Scholar
  11. 11.
    Fischer, P.M., Esmaili, K.S., Miller, R.J.: Stream schema: providing and exploiting static metadata for data stream processing. In: Proceedings of the 13th International Conference on Extending Database Technology, EDBT 2010, pp. 207–218. ACM, New York (2010).  https://doi.org/10.1145/1739041.1739068
  12. 12.
    Garcia-Molina, H.: Using semantic knowledge for transaction processing in a distributed database. ACM Trans. Database Syst. 8(2), 186–213 (1983).  https://doi.org/10.1145/319983.319985CrossRefzbMATHGoogle Scholar
  13. 13.
    Guo, J., Lam, I.H., Chan, C., Xiao, G.: Collaboratively maintaining semantic consistency of heterogeneous concepts towards a common concept set. In: Proceedings of the 2nd ACM SIGCHI Symposium on Engineering Interactive Computing Systems, EICS 2010, pp. 213–218. ACM, New York (2010).  https://doi.org/10.1145/1822018.1822052
  14. 14.
    Hambling, B., Van Goethem, P.: User acceptance testing: a step-by-step guide. BCS Learning & Development (2013)Google Scholar
  15. 15.
    Jacques-Silva, G., et al.: Consistent regions: guaranteed tuple processing in IBM streams. Proc. VLDB Endow. 9(13), 1341–1352 (2016)CrossRefGoogle Scholar
  16. 16.
    Kulkarni, S., et al.: Twitter heron: stream processing at scale. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD 2015, pp. 239–250. ACM, New York (2015).  https://doi.org/10.1145/2723372.2742788
  17. 17.
    Kuralenok, I.E., Marshalkin, N., Trofimov, A., Novikov, B.: An optimistic approach to handle out-of-order events within analytical stream processing. Accepted at SEIM (2018). http://seim-conf.org/en/about/accepted-papers/
  18. 18.
    Kuralenok, I.E., Trofimov, A., Marshalkin, N., Novikov, B.: Flamestream: model and runtime for distributed stream processing. In: Proceedings of the 5th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond, BeyondMR 2018, pp. 8:1–8:2. ACM, New York (2018).  https://doi.org/10.1145/3206333.3209273
  19. 19.
    Li, J., Tufte, K., Shkapenyuk, V., Papadimos, V., Johnson, T., Maier, D.: Out-of-order processing: a new architecture for high-performance stream systems. Proc. VLDB Endow. 1(1), 274–288 (2008)CrossRefGoogle Scholar
  20. 20.
    Mihaila, G.A., Stanoi, I., Lang, C.A.: Anomaly-free incremental output in stream processing. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM 2008, pp. 359–368. ACM, New York (2008).  https://doi.org/10.1145/1458082.1458132
  21. 21.
    Noghabi, S.A., et al.: Samza: stateful scalable stream processing at Linkedin. Proc. VLDB Endow. 10(12), 1634–1645 (2017)CrossRefGoogle Scholar
  22. 22.
    Rodríguez, M.A., Bertossi, L., Caniupán, M.: An inconsistency tolerant approach to querying spatial databases. In: Proceedings of the 16th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS 2008, pp. 36:1–36:10. ACM, New York (2008).  https://doi.org/10.1145/1463434.1463480
  23. 23.
    Tucker, P.A., Maier, D., Sheard, T., Fegaras, L.: Exploiting punctuation semantics in continuous data streams. IEEE Trans. Knowl. Data Eng. 15(3), 555–568 (2003).  https://doi.org/10.1109/TKDE.2003.1198390CrossRefGoogle Scholar
  24. 24.
    Zaharia, M., Das, T., Li, H., Shenker, S., Stoica, I.: Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. In: Proceedings of the 4th USENIX Conference on Hot Topics in Cloud Ccomputing, HotCloud 2012, p. 10. USENIX Association, Berkeley (2012)Google Scholar
  25. 25.
    Zou, Q., et al.: From a stream of relational queries to distributed stream processing. Proc. VLDB Endow. 3(1–2), 1394–1405 (2010)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.JetBrains ResearchSt. PetersburgRussia
  2. 2.Saint Petersburg State UniversitySt. PetersburgRussia

Personalised recommendations