Skip to main content

Using High Performance Computing for Conquering Big Data

  • Chapter
  • First Online:
Conquering Big Data with High Performance Computing

Abstract

The journey of Big Data begins at its collection stage, continues to analyses, culminates in valuable insights, and could finally end in dark archives. The management and analyses of Big Data through these various stages of its life cycle presents challenges that can be addressed using High Performance Computing (HPC) resources and techniques. In this chapter, we present an overview of the various HPC resources available at the open-science data centers that can be used for developing end-to-end solutions for the management and analysis of Big Data. We also present techniques from the HPC domain that can be used to solve Big Data problems in a scalable and performance-oriented manner. Using a case-study, we demonstrate the impact of using HPC systems on the management and analyses of Big Data throughout its life cycle.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://rdgroups.ciemat.es/en_US/web/sci-track/euler.

  2. 2.

    https://wiki.csiro.au/display/ASC/CSIRO+Accelerator+Cluster+-+Bragg.

  3. 3.

    https://www.tacc.utexas.edu/systems/stockyard.

References

  1. Apache Hadoop Framework website. http://hadoop.apache.org/. Accessed 15 Feb 2016

  2. Apache Hive Framework website. http://hive.apache.org/. Accessed 15 Feb 2016

  3. Apache Spark Framework website. http://spark.apache.org/. Accessed 15 Feb 2016

  4. Apache Yarn Framework website. http://hortonworks.com/hadoop/yarn/. Accessed 15 Feb 2016

  5. Chameleon Cloud Computing Testbed website. https://www.tacc.utexas.edu/systems/chame leon. Accessed 15 Feb 2016

  6. Corral High Performance and Data Storage System website. https://www.tacc.utexas.edu/systems/corral. Accessed 15 Feb 2016

  7. FFmpeg website. https://www.ffmpeg.org. Accessed 15 Feb 2016

  8. File Profiling Tool DROID. http://www.nationalarchives.gov.uk/information-management/manage-information/policy-process/digital-continuity/file-profiling-tool-droid/. Accessed 15 Feb 2016

  9. Globus website. https://www.globus.org. Accessed 15 Feb 2016

  10. Google Earth website. https://www.google.com/intl/ALL/earth/explore/products/desktop.html. Accessed 15 Feb 2016

  11. Gordon Supercomputer website. http://www.sdsc.edu/services/hpc/hpc_systems.html#gordon. Accessed 15 Feb 2016

  12. iRods website. http://irods.org/. Accessed 15 Feb 2016

  13. ITER. https://www.iter.org/. Accessed 15 Feb 2016

  14. Lonestar5 Supercomputer website. https://www.tacc.utexas.edu/systems/lonestar. Accessed 15 Feb 2016

  15. Maverick Supercomputer website. https://www.tacc.utexas.edu/systems/maverick. Accessed 15 Feb 2016

  16. Paraview website. https://www.paraview.org. Accessed 15 Feb 2016

  17. Ranch Mass Archival Storage System website. https://www.tacc.utexas.edu/systems/ranch. Accessed 15 Feb 2016

  18. Stampede Supercomputer website. https://www.tacc.utexas.edu/systems/stampede. Accessed 15 Feb 2016

  19. Tableau website. http://www.tableau.com/. Accessed 15 Feb 2016

  20. TACC Visualization Portal. https://vis.tacc.utexas.edu. Accessed 15 Feb 2016

  21. Wrangler Supercomputer website. https://www.tacc.utexas.edu/systems/wrangler. Accessed 15 Feb 2016

  22. R. Arora, M. Esteva, J. Trelogan, Leveraging high performance computing for managing large and evolving data collections. IJDC 9 (2), 17–27 (2014). doi:10.2218/ijdc.v9i2.331. http://dx.doi.org/10.2218/ijdc.v9i2.331

    Google Scholar 

  23. H. Childs, E. Brugger, B. Whitlock, J. Meredith, S. Ahern, D. Pugmire, K. Biagas, M. Miller, C. Harrison, G.H. Weber, H. Krishnan, T. Fogal, A. Sanderson, C. Garth, E.W. Bethel, D. Camp, O. Rübel, M. Durant, J.M. Favre, P. Navrátil, VisIt: an end-user tool for visualizing and analyzing very large data, in High Performance Visualization—Enabling Extreme-Scale Scientific Insight (2012), pp. 357–372

    Google Scholar 

  24. J. Dean, S. Ghemawat, Mapreduce: simplified data processing on large clusters. Commun. ACM 51 (1), 107–113 (2008). doi:10.1145/1327452.1327492. http://doi.acm.org/10.1145/1327452.1327492

    Google Scholar 

  25. A. Gómez-Iglesias, Solving large numerical optimization problems in HPC with python, in Proceedings of the 5th Workshop on Python for High-Performance and Scientific Computing, PyHPC 2015, Austin, TX, November 15, 2015 (ACM, 2015) pp. 7:1–7:8. doi:10.1145/2835857.2835864. http://doi.acm.org/10.1145/2835857.2835864

  26. A. Gómez-Iglesias, F. Castejón, M.A. Vega-Rodríguez, Distributed bees foraging-based algorithm for large-scale problems, in 25th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2011 - Workshop Proceedings Anchorage, AK, 16–20 May 2011 (IEEE, 2011), pp. 1950–1960. doi:10.1109/IPDPS.2011.355. http://dx.doi.org/10.1109/IPDPS.2011.355

  27. A. Gómez-Iglesias, M.A. Vega-Rodríguez, F. Castejón, Distributed and asynchronous solver for large CPU intensive problems. Appl. Soft Comput. 13 (5), 2547–2556 (2013). doi:10.1016/j.asoc.2012.11.031

    Article  Google Scholar 

  28. A. Gómez-Iglesias, M.A. Vega-Rodríguez, F. Castejón, M.C. Montes, E. Morales-Ramos, Artificial bee colony inspired algorithm applied to fusion research in a grid computing environment, in Proceedings of the 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, PDP 2010, Pisa, Feb 17–19, 2010 (IEEE Computer Society, 2010), pp. 508–512, ed. by M. Danelutto, J. Bourgeois, T. Gross. doi:10.1109/PDP.2010.50. http://dx.doi.org/10.1109/PDP.2010.50

  29. C.C. Hegna, N. Nakajima, On the stability of mercier and ballooning modes in stellarator configurations. Phys. Plasmas 5 (5), 1336–1344 (1998)

    Article  MathSciNet  Google Scholar 

  30. S.P. Hirshman, G.H. Neilson, External inductance of an axisymmetric plasma. Phys. Fluids 29 (3), 790–793 (1986)

    Article  MATH  Google Scholar 

  31. D. Karaboga, B. Basturk, A powerful and efficient algorithm for numerical function optimization: artificial bee colony (abc) algorithm. J. Glob. Optim. 39 (3), 459–471 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  32. S. Krishnan, M. Tatineni, C. Baru, myHadoop - hadoop-on-demand on traditional HPC resources. Tech. rep., Chapter in ‘Contemporary HPC Architectures’ [KV04] Vassiliki Koutsonikola and Athena Vakali. Ldap: framework, practices, and trends, in IEEE Internet Computing (2004)

    Google Scholar 

  33. R. Sanchez, S. Hirshman, J. Whitson, A. Ware, Cobra: an optimized code for fast analysis of ideal ballooning stability of three-dimensional magnetic equilibria. J. Comput. Phys. 161 (2), 576–588 (2000). doi:http://dx.doi.org/10.1006/jcph.2000.6514. http://www.sciencedirect.com/science/article/pii/S0021999100965148

    Google Scholar 

  34. W.I. van Rij, S.P. Hirshman, Variational bounds for transport coefficients in three-dimensional toroidal plasmas. Phys. Fluids B 1 (3), 563–569 (1989)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonio Gómez-Iglesias .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Gómez-Iglesias, A., Arora, R. (2016). Using High Performance Computing for Conquering Big Data. In: Arora, R. (eds) Conquering Big Data with High Performance Computing. Springer, Cham. https://doi.org/10.1007/978-3-319-33742-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-33742-5_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-33740-1

  • Online ISBN: 978-3-319-33742-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics