Skip to main content

Automated Spark Clusters Deployment for Big Data with Standalone Applications Integration

Part of the Lecture Notes in Computer Science book series (LNAI,volume 9868)

Abstract

The huge amount of data stored nowadays has turned big data analytics into a very trendy research field. Spark has emerged as a very powerful and widely used paradigm for clusters deployment and big data management. However, to get started is still a very tough task, due to the excessive requisites that all nodes must fulfil. Thus, this work introduces a web service specifically designed for an easy and efficient Spark cluster management. In particular, a service with a friendly graphical user interface has been developed to automate the deploying of clusters. Another relevant feature is the possibility of integrating any algorithm into the web service. That is, the user only needs to provide the executable file and the number of required inputs for a proper parametrization. Finally, an illustrative case study is included to show ad hoc algorithms usage (the MLlib implementation for k-means, in this case) across the nodes of the configured cluster.

Keywords

  • Big data
  • Spark
  • Algorithms
  • Automated deployment

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-44636-3_14
  • Chapter length: 10 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   79.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-44636-3
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   99.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.

References

  1. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    CrossRef  Google Scholar 

  2. Gorton, I., Greenfield, P., Szalay, A., Williams, R.: Computing in the 21st century. IEEE Comput. 41(4), 30–32 (2008)

    CrossRef  Google Scholar 

  3. Hamstra, M., Karau, H., Zaharia, M., Knwinski, A., Wendell, P.: Learning Spark: Lightning-Fast Big Analytics. O’ Really Media, Sebastopol (2015)

    Google Scholar 

  4. Kouzes, R.T., Anderson, G.A., Elbert, S.T., Gorton, I., Gracio, D.K.: The changing paradigm of data-intensive computing. Computer 42(1), 26–34 (2009)

    CrossRef  Google Scholar 

  5. Middleton, A.M.: Data-Intensive Technologies for Cloud Computing. Springer, Heidelberg (2010)

    CrossRef  Google Scholar 

  6. Minelli, M., Chambers, M., Dhiraj, A., Data, B., Analytics, B.: Emerging Business Intelligence and Analytics Trends for Today’s Businesses. Wiley, Hoboken (2013)

    CrossRef  Google Scholar 

  7. Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., Dewitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 35th SIGMOD International conference on Management of Data, pp. 165–178 (2009)

    Google Scholar 

  8. Pérez-Chacón, R., Talavera-Llames, R.L., Troncoso, A., Martínez-Álvarez, F.: Finding electric energy consumption patterns in big time series data. In: Proceedings of the International Conference on Distributed Computing and Artificial Intelligence, pp. 231–238 (2016)

    Google Scholar 

  9. Talavera-Llames, R.L., Pérez-Chacón, R., Martínez-Ballesteros, M., Troncoso, A., Martínez-Álvarez, F.: A nearest neighbours-based algorithm for big time series data forecasting. In: Martínez-Álvarez, F., Troncoso, A., Quintián, H., Corchado, E. (eds.) HAIS 2016. LNCS, vol. 9648, pp. 174–185. Springer, Heidelberg (2016). doi:10.1007/978-3-319-32034-2_15

    CrossRef  Google Scholar 

  10. White, T.: Hadoop: The definitive Guide. O’ Really Media, Sebastopol (2012)

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank the Spanish Ministry of Economy and Competitiveness, Junta de Andalucía for the support under projects TIN2014-55894-C2-R and P12-TIC-1728 and PRY153/14, respectively.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to F. Martínez-Álvarez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Fernández, A.M., Torres, J.F., Troncoso, A., Martínez-Álvarez, F. (2016). Automated Spark Clusters Deployment for Big Data with Standalone Applications Integration. In: , et al. Advances in Artificial Intelligence. CAEPIA 2016. Lecture Notes in Computer Science(), vol 9868. Springer, Cham. https://doi.org/10.1007/978-3-319-44636-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44636-3_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44635-6

  • Online ISBN: 978-3-319-44636-3

  • eBook Packages: Computer ScienceComputer Science (R0)