Advertisement

Multi-engine Analytics with IReS

  • Katerina DokaEmail author
  • Ioannis Mytilinis
  • Nikolaos Papailiou
  • Victor Giannakouris
  • Dimitrios Tsoumakos
  • Nectarios Koziris
Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 337)

Abstract

We present IReS, the Intelligent Resource Scheduler that is able to abstractly describe, optimize and execute any batch analytics workflow with respect to a multi-objective policy. Relying on cost and performance models of the required tasks over the available platforms, IReS allocates distinct workflow parts to the most advantageous execution and/or storage engine among the available ones and decides on the exact amount of resources provisioned. Moreover, IReS efficiently adapts to the current cluster/engine conditions and recovers from failures by effectively monitoring the workflow execution in real-time. Our current prototype has been tested in a plethora of business driven and synthetic workflows, proving its potential of yielding significant gains in cost and performance compared to statically scheduled, single-engine executions. IReS incurs only marginal overhead to the workflow execution performance, managing to discover an approximate pareto-optimal set of execution plans within a few seconds.

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
    Agrawal, D., et al.: Rheem: enabling multi-platform task execution. In: SIGMOD (2016)Google Scholar
  14. 14.
    Armbrust, M., et al.: SparkSQL: relational data processing in spark. In: SIGMOD, pp. 1383–1394. ACM (2015)Google Scholar
  15. 15.
    Bharathi, S., et al.: Characterization of scientific workflows. In: Workshop on Workflows in Support of Large-Scale Science (2008)Google Scholar
  16. 16.
    Bugiotti, F., et al.: Invisible glue: scalable self-tuning multi-stores. In: CIDR (2015)Google Scholar
  17. 17.
    Chawathe, S., et al.: The TSIMMIS project: integration of heterogenous information sources. In: IPSJ, pp. 7–18 (1994)Google Scholar
  18. 18.
    Deb, K., et al.: A fast and elitist multiobjective genetic algorithm: NSGA-ii. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)CrossRefGoogle Scholar
  19. 19.
    Doka, K., Papailiou, N., Tsoumakos, D., Mantas, C., Koziris, N.: IReS: intelligent, multi-engine resource scheduler for big data analytics workflows. In: Proceedings of the 2015 ACM SIGMOD, pp. 1451–1456. ACM (2015)Google Scholar
  20. 20.
    Doka, K., et al.: Mix “n” match multi-engine analytics. In: Big data, pp. 194–203. IEEE (2016)Google Scholar
  21. 21.
    Duggan, J., et al.: The bigDAWG polystore system. ACM Sigmod Rec. 44(2), 11–16 (2015)CrossRefGoogle Scholar
  22. 22.
    Giannakopoulos, I., Tsoumakos, D., Koziris, N.: A decision tree based approach towards adaptive profiling of cloud applications. In: IEEE Big Data (2017)Google Scholar
  23. 23.
    Gog, I., et al.: Musketeer: all for one, one for all in data processing systems. In: Eurosys, p. 2. ACM (2015)Google Scholar
  24. 24.
    Haynes, B., Cheung, A., Balazinska, M.: Pipegen: data pipe generator for hybrid analytics. arXiv:1605.01664 (2016)
  25. 25.
    Henrikson, J.: Completeness and total boundedness of the hausdorff metric. MIT Undergrad. J. Math. 1, 69–80 (1999)Google Scholar
  26. 26.
    Herodotou, H., et al.: Starfish: a self-tuning system for big data analytics. In: CIDR (2011)Google Scholar
  27. 27.
    Johnson, N., Near, J.P., Song, D.: Towards practical differential privacy for SQL queries. Vertica 1, 1000Google Scholar
  28. 28.
    Karpathiotakis, et al.: No data left behind: real-time insights from a complex data ecosystem. In: SoCC, pp. 108–120. ACM (2017)Google Scholar
  29. 29.
    Kohavi, R., et al.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI (1995)Google Scholar
  30. 30.
    Kolev, B., et al.: CloudMdsQL: querying heterogeneous cloud data stores with a common language. Distrib. Parallel Databases 34, 1–41 (2015)Google Scholar
  31. 31.
    Lim, H., Herodotou, H., Babu, S.: Stubby: a transformation-based optimizer for mapreduce workflows. In: VLDB (2012)Google Scholar
  32. 32.
    Roth, M.T., Schwarz, P.M.: Don’t scrap it, wrap it! a wrapper architecture for legacy data sources. In: VLDB, vol. 97 (1997)Google Scholar
  33. 33.
    Sharma, B., Wood, T., Das, C.R.: HybridMR: A Hierarchical MapReduce Scheduler for Hybrid Data Centers. In: ICDCS (2013)Google Scholar
  34. 34.
    Simitsis, A., et al.: HFMS: managing the lifecycle and complexity of hybrid analytic data flows. In: ICDE. IEEE (2013)Google Scholar
  35. 35.
    Tomasic, A., Raschid, L., Valduriez, P.: Scaling access to heterogeneous data sources with DISCO. IEEE TKDE 10(5), 808–823 (1998)Google Scholar
  36. 36.
    Tsoumakos, D., Mantas, C.: The case for multi-engine data analytics. In: an Mey, D., et al. (eds.) Euro-Par 2013. LNCS, vol. 8374, pp. 406–415. Springer, Heidelberg (2014).  https://doi.org/10.1007/978-3-642-54420-0_40CrossRefGoogle Scholar
  37. 37.
    Vavilapalli, V.K., et al.: Apache hadoop yarn: yet another resource negotiator. In: SoCC, p. 5. ACM (2013)Google Scholar
  38. 38.
    Wang, J., et al.: The myria big data management and analytics system and cloud services. In: CIDR (2017)Google Scholar
  39. 39.
    Zhang, Z., et al.: Automated profiling and resource management of pig programs for meeting service level objectives. In: ICAC, pp. 53–62. ACM (2012)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Katerina Doka
    • 1
    Email author
  • Ioannis Mytilinis
    • 1
  • Nikolaos Papailiou
    • 1
  • Victor Giannakouris
    • 1
  • Dimitrios Tsoumakos
    • 2
  • Nectarios Koziris
    • 1
  1. 1.Computing Systems LaboratoryNational Technical University of AthensAthensGreece
  2. 2.Department of InformaticsIonian UniversityCorfuGreece

Personalised recommendations