DIsCO: DynamIc Data COmpression in Distributed Stream Processing Systems

  • Nikos ZacheilasEmail author
  • Vana Kalogeraki
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10320)


Supporting high throughput in Distributed Stream Processing Systems (DSPSs) has been an important goal in recent years. Current works either focus on automatically increasing the system resources whenever the current setup is inadequate or apply load shedding techniques discarding some of the incoming data. However, both approaches have significant shortcomings as they require on the fly application reconfiguration where the application needs to be stopped and re-uploaded in the cluster with the new configurations, and can lead to significant information loss. One approach that has not yet been considered for improving the throughput of DSPSs is exploiting compression algorithms to minimize the communication overhead between components especially in cases where we have large-sized data like live CCTV camera reports. This work is the first that provides a novel framework, built on top of Apache Storm, which enables dynamic compression of incoming streaming data. Our approach uses a profiling algorithm to automatically determine the compression algorithm that should be applied and supports both lossless and lossy compression techniques. Furthermore, we propose a novel algorithm for determining when profiling should be applied. Finally, our detailed experimental evaluation with commonly used stream processing applications, indicates a clear improvement on the applications’ throughput when our proposed techniques are applied.


Utility Score Compression Algorithm Compression Technique JPEG Compression Special Thread 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This research has been financed by the European Union through the FP7 ERC IDEAS 308019 NGHCS project and the Horizon2020 688380 VaVeL project.


  1. 1.
    Agarwal, R., Khandelwal, A., Stoica, I.: Succinct: enabling queries on compressed data. In: NSDI, Oakland, CA, pp. 337–350 (2015)Google Scholar
  2. 2.
    Bicer, T., Yin, J., Chiu, D., Agrawal, G., Schuchardt, K.: Integrating online compression to accelerate large-scale data analytics applications. In: IPDPS, Cambridge, MA, USA, pp. 1205–1216 (2013)Google Scholar
  3. 3.
    Boutsis, I., Kalogeraki, V.: Location privacy for crowdsourcing applications. In: UbiComp, Heidelberg, Germany, pp. 694–705 (2016)Google Scholar
  4. 4.
    Carbone, P., Ewen, S., Haridi, S., Katsifodimos, A., Markl, V., Tzoumas, K.: Apache flink: stream and batch processing in a single engine. Data Engineering, p. 28 (2015)Google Scholar
  5. 5.
    Chen, Y., Ganapathi, A., Katz, R.H.: To compress or not to compress-compute vs. io tradeoffs for mapreduce energy efficiency. In: ACM SIGCOMM Workshop on Green Networking, New Delhi, India, pp. 23–28 (2010)Google Scholar
  6. 6.
    Chintapalli, S., Dagit, D., Evans, B., Farivar, R., Graves, T., Holderbaugh, M., Liu, Z., Nusbaum, K., Patil, K., Peng, B.J., et al.: Benchmarking streaming computation engines: storm, flink and spark streaming. In: Parallel and Distributed Processing Symposium Workshops, Chicago, IL, USA, pp. 1789–1792 (2016)Google Scholar
  7. 7.
    Das, T., Zhong, Y., Stoica, I., Shenker, S.: Adaptive stream processing using dynamic batch sizing. In: SoCC, Seattle, WA, USA, pp. 1–13 (2014)Google Scholar
  8. 8.
    Eberle, J., Wijaya, T.K., Aberer, K.: Online unsupervised state recognition in sensor data. In: PerCom, St. Louis, MO, USA, pp. 29–36 (2015)Google Scholar
  9. 9.
    Gedik, B., Schneider, S., Hirzel, M., Wu, K.L.: Elastic scaling for data stream processing. IEEE Trans. Parallel Distrib. Syst. 25(6), 1447–1463 (2014)CrossRefGoogle Scholar
  10. 10.
    Hu, L., Schwan, K., Amur, H., Chen, X.: Elf: efficient lightweight fast stream processing at scale. In: Usenix ATC, Philadelphia, PA, USA, pp. 25–36 (2014)Google Scholar
  11. 11.
    Liu, M., Ray, M., Zhang, D., Rundensteiner, E.A., Dougherty, D.J., Gupta, C., Wang, S., Ari, I.: Realtime healthcare services via nested complex event processing technology. EDBT, Berlin, Germany, pp. 622–625 (2012)Google Scholar
  12. 12.
    Lux, M., Chatzichristofis, S.A.: LIRe: lucene image retrieval: an extensible Java CBIR library. In: ACM International Conference on Multimedia, Vancouver, British Columbia, Canada, pp. 1085–1088 (2008)Google Scholar
  13. 13.
  14. 14.
    Nathan Marz’s Storm.
  15. 15.
  16. 16.
    Tatbul, N., Çetintemel, U., Zdonik, S.: Staying fit: efficient load shedding techniques for distributed stream processing. In: VLDB, pp. 159–170 (2007)Google Scholar
  17. 17.
    Venkataraman, S., Yang, Z., Franklin, M., Recht, B., Stoica, I.: Ernest: efficient performance prediction for large-scale advanced analytics. In: NSDI, Santa Clara, CA, USA, pp. 363–378 (2016)Google Scholar
  18. 18.
    Yang, F., Tschetter, E., Léauté, X., Ray, N., Merlino, G., Ganguli, D.: Druid: a real-time analytical data store. In: SIGMOD, Snowbird, UT, USA, pp. 157–168 (2014)Google Scholar
  19. 19.
    Zacheilas, N., Kalogeraki, V., Zygouras, N., Panagiotou, N., Gunopulos, D.: Elastic Complex Event Processing exploiting Prediction. Big Data, Santa Clara, CA, USA, pp. 213–222 (2015)Google Scholar
  20. 20.
    Zacheilas, N., Zygouras, N., Panagiotou, N., Kalogeraki, V., Gunopulos, D.: Dynamic load balancing techniques for distributed complex event processing systems. In: Jelasity, M., Kalyvianaki, E. (eds.) DAIS 2016. LNCS, vol. 9687, pp. 174–188. Springer, Cham (2016). doi: 10.1007/978-3-319-39577-7_14 Google Scholar
  21. 21.
    Zaharia, M., Das, T., Li, H., Shenker, S., Stoica, I.: Discretized Streams: Fault Tolerant Streaming Computation at Scale, pp. 423–438. SOSP, Farmington, PA, USA (2013)Google Scholar
  22. 22.

Copyright information

© IFIP International Federation for Information Processing 2017

Authors and Affiliations

  1. 1.Athens University of Economics and BusinessAthensGreece

Personalised recommendations