Skip to main content

Distributed Data Processing on Microcomputers with Ascheduler and Apache Spark

  • Conference paper
  • First Online:
Computational Science and Its Applications – ICCSA 2017 (ICCSA 2017)

Abstract

Modern architectures of data acquisition and processing often consider low-cost and low-power devices that can be bound together to form a distributed infrastructure. In this paper we overview possibilities to organize a distributed computing testbed based on microcomputers similar to Raspberry Pi and Intel Edison. The goal of the research is to investigate and develop a scheduler for orchestrating distributed data processing and general purpose computations on such unreliable and resource-constrained hardware. Also we consider integration of the scheduler with well-known distributed data processing framework Apache Spark. We outline the project carried out in collaboration with Siemens LLC to compare different configurations of the hardware and software deployment and evaluate performance and applicability of the tools to the testbed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Apache spark official website. http://spark.apache.org/

  2. B.A.T.M.A.N. official web page. https://www.open-mesh.org/projects/open-mesh/wiki

  3. Cox, S.J., Cox, J.T., Boardman, R.P., Johnston, S.J., Scott, M., Obrien, N.S.: Iridis-pi: a low-cost, compact demonstration cluster. Cluster Comput. 17(2), 349–358 (2014)

    Article  Google Scholar 

  4. Fox, K., Mongan, W.M., Popyack, J.: Raspberry hadoopi: a low-cost, hands-on laboratory in big data and analytics. In: SIGCSE, p. 687 (2015)

    Google Scholar 

  5. Gankevich, I., Tipikin, Y., Gaiduchok, V.: Subordination: cluster management without distributed consensus. In: 2015 International Conference on High Performance Computing & Simulation (HPCS), pp. 639–642. IEEE (2015)

    Google Scholar 

  6. Gankevich, I., Tipikin, Y., Korkhov, V., Gaiduchok, V.: Factory: non-stop batch jobs without checkpointing. In: 2016 International Conference on High Performance Computing & Simulation (HPCS), pp. 979–984. IEEE (2016)

    Google Scholar 

  7. Gankevich, I., Tipikin, Y., Korkhov, V., Gaiduchok, V., Degtyarev, A., Bogdanov, A.: Factory: master node high-availability for big data applications and beyond. In: Gervasi, O., et al. (eds.) ICCSA 2016, Part II. LNCS, vol. 9787, pp. 379–389. Springer, Cham (2016). doi:10.1007/978-3-319-42108-7_29

    Chapter  Google Scholar 

  8. Hajji, W., Tso, F.P.: Understanding the performance of low power raspberry pi cloud for big data. Electronics 5(2), 29 (2016)

    Article  Google Scholar 

  9. Kaewkasi, C., Srisuruk, W.: A study of big data processing constraints on a low-power hadoop cluster. In: 2014 International Conference on Computer Science and Engineering Conference (ICSEC), pp. 267–272. IEEE (2014)

    Google Scholar 

  10. Laskowski, J.: Mastering apache spark 2.0. https://www.gitbook.com/book/jaceklaskowski/mastering-apache-spark/details

Download references

Acknowledgments

The research was supported by Siemens LLC.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vladimir Korkhov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Korkhov, V. et al. (2017). Distributed Data Processing on Microcomputers with Ascheduler and Apache Spark. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2017. ICCSA 2017. Lecture Notes in Computer Science(), vol 10408. Springer, Cham. https://doi.org/10.1007/978-3-319-62404-4_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-62404-4_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-62403-7

  • Online ISBN: 978-3-319-62404-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics