Advertisement

Distributed Data Processing on Microcomputers with Ascheduler and Apache Spark

  • Vladimir Korkhov
  • Ivan Gankevich
  • Oleg Iakushkin
  • Dmitry Gushchanskiy
  • Dmitry Khmel
  • Andrey Ivashchenko
  • Alexander Pyayt
  • Sergey Zobnin
  • Alexander Loginov
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10408)

Abstract

Modern architectures of data acquisition and processing often consider low-cost and low-power devices that can be bound together to form a distributed infrastructure. In this paper we overview possibilities to organize a distributed computing testbed based on microcomputers similar to Raspberry Pi and Intel Edison. The goal of the research is to investigate and develop a scheduler for orchestrating distributed data processing and general purpose computations on such unreliable and resource-constrained hardware. Also we consider integration of the scheduler with well-known distributed data processing framework Apache Spark. We outline the project carried out in collaboration with Siemens LLC to compare different configurations of the hardware and software deployment and evaluate performance and applicability of the tools to the testbed.

Keywords

Microcomputers Scheduling Apache Spark Raspberry Pi Fault tolerance High availability 

Notes

Acknowledgments

The research was supported by Siemens LLC.

References

  1. 1.
    Apache spark official website. http://spark.apache.org/
  2. 2.
    B.A.T.M.A.N. official web page. https://www.open-mesh.org/projects/open-mesh/wiki
  3. 3.
    Cox, S.J., Cox, J.T., Boardman, R.P., Johnston, S.J., Scott, M., Obrien, N.S.: Iridis-pi: a low-cost, compact demonstration cluster. Cluster Comput. 17(2), 349–358 (2014)CrossRefGoogle Scholar
  4. 4.
    Fox, K., Mongan, W.M., Popyack, J.: Raspberry hadoopi: a low-cost, hands-on laboratory in big data and analytics. In: SIGCSE, p. 687 (2015)Google Scholar
  5. 5.
    Gankevich, I., Tipikin, Y., Gaiduchok, V.: Subordination: cluster management without distributed consensus. In: 2015 International Conference on High Performance Computing & Simulation (HPCS), pp. 639–642. IEEE (2015)Google Scholar
  6. 6.
    Gankevich, I., Tipikin, Y., Korkhov, V., Gaiduchok, V.: Factory: non-stop batch jobs without checkpointing. In: 2016 International Conference on High Performance Computing & Simulation (HPCS), pp. 979–984. IEEE (2016)Google Scholar
  7. 7.
    Gankevich, I., Tipikin, Y., Korkhov, V., Gaiduchok, V., Degtyarev, A., Bogdanov, A.: Factory: master node high-availability for big data applications and beyond. In: Gervasi, O., et al. (eds.) ICCSA 2016, Part II. LNCS, vol. 9787, pp. 379–389. Springer, Cham (2016). doi: 10.1007/978-3-319-42108-7_29 CrossRefGoogle Scholar
  8. 8.
    Hajji, W., Tso, F.P.: Understanding the performance of low power raspberry pi cloud for big data. Electronics 5(2), 29 (2016)CrossRefGoogle Scholar
  9. 9.
    Kaewkasi, C., Srisuruk, W.: A study of big data processing constraints on a low-power hadoop cluster. In: 2014 International Conference on Computer Science and Engineering Conference (ICSEC), pp. 267–272. IEEE (2014)Google Scholar
  10. 10.

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Vladimir Korkhov
    • 1
  • Ivan Gankevich
    • 1
  • Oleg Iakushkin
    • 1
  • Dmitry Gushchanskiy
    • 1
  • Dmitry Khmel
    • 1
  • Andrey Ivashchenko
    • 1
  • Alexander Pyayt
    • 2
  • Sergey Zobnin
    • 2
  • Alexander Loginov
    • 2
  1. 1.Saint Petersburg State UniversitySt. PetersburgRussia
  2. 2.Siemens LLCSt. PetersburgRussia

Personalised recommendations