Abstract
The journey of Big Data begins at its collection stage, continues to analyses, culminates in valuable insights, and could finally end in dark archives. The management and analyses of Big Data through these various stages of its life cycle presents challenges that can be addressed using High Performance Computing (HPC) resources and techniques. In this chapter, we present an overview of the various HPC resources available at the open-science data centers that can be used for developing end-to-end solutions for the management and analysis of Big Data. We also present techniques from the HPC domain that can be used to solve Big Data problems in a scalable and performance-oriented manner. Using a case-study, we demonstrate the impact of using HPC systems on the management and analyses of Big Data throughout its life cycle.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Apache Hadoop Framework website. http://hadoop.apache.org/. Accessed 15 Feb 2016
Apache Hive Framework website. http://hive.apache.org/. Accessed 15 Feb 2016
Apache Spark Framework website. http://spark.apache.org/. Accessed 15 Feb 2016
Apache Yarn Framework website. http://hortonworks.com/hadoop/yarn/. Accessed 15 Feb 2016
Chameleon Cloud Computing Testbed website. https://www.tacc.utexas.edu/systems/chame leon. Accessed 15 Feb 2016
Corral High Performance and Data Storage System website. https://www.tacc.utexas.edu/systems/corral. Accessed 15 Feb 2016
FFmpeg website. https://www.ffmpeg.org. Accessed 15 Feb 2016
File Profiling Tool DROID. http://www.nationalarchives.gov.uk/information-management/manage-information/policy-process/digital-continuity/file-profiling-tool-droid/. Accessed 15 Feb 2016
Globus website. https://www.globus.org. Accessed 15 Feb 2016
Google Earth website. https://www.google.com/intl/ALL/earth/explore/products/desktop.html. Accessed 15 Feb 2016
Gordon Supercomputer website. http://www.sdsc.edu/services/hpc/hpc_systems.html#gordon. Accessed 15 Feb 2016
iRods website. http://irods.org/. Accessed 15 Feb 2016
ITER. https://www.iter.org/. Accessed 15 Feb 2016
Lonestar5 Supercomputer website. https://www.tacc.utexas.edu/systems/lonestar. Accessed 15 Feb 2016
Maverick Supercomputer website. https://www.tacc.utexas.edu/systems/maverick. Accessed 15 Feb 2016
Paraview website. https://www.paraview.org. Accessed 15 Feb 2016
Ranch Mass Archival Storage System website. https://www.tacc.utexas.edu/systems/ranch. Accessed 15 Feb 2016
Stampede Supercomputer website. https://www.tacc.utexas.edu/systems/stampede. Accessed 15 Feb 2016
Tableau website. http://www.tableau.com/. Accessed 15 Feb 2016
TACC Visualization Portal. https://vis.tacc.utexas.edu. Accessed 15 Feb 2016
Wrangler Supercomputer website. https://www.tacc.utexas.edu/systems/wrangler. Accessed 15 Feb 2016
R. Arora, M. Esteva, J. Trelogan, Leveraging high performance computing for managing large and evolving data collections. IJDC 9 (2), 17–27 (2014). doi:10.2218/ijdc.v9i2.331. http://dx.doi.org/10.2218/ijdc.v9i2.331
H. Childs, E. Brugger, B. Whitlock, J. Meredith, S. Ahern, D. Pugmire, K. Biagas, M. Miller, C. Harrison, G.H. Weber, H. Krishnan, T. Fogal, A. Sanderson, C. Garth, E.W. Bethel, D. Camp, O. Rübel, M. Durant, J.M. Favre, P. Navrátil, VisIt: an end-user tool for visualizing and analyzing very large data, in High Performance Visualization—Enabling Extreme-Scale Scientific Insight (2012), pp. 357–372
J. Dean, S. Ghemawat, Mapreduce: simplified data processing on large clusters. Commun. ACM 51 (1), 107–113 (2008). doi:10.1145/1327452.1327492. http://doi.acm.org/10.1145/1327452.1327492
A. Gómez-Iglesias, Solving large numerical optimization problems in HPC with python, in Proceedings of the 5th Workshop on Python for High-Performance and Scientific Computing, PyHPC 2015, Austin, TX, November 15, 2015 (ACM, 2015) pp. 7:1–7:8. doi:10.1145/2835857.2835864. http://doi.acm.org/10.1145/2835857.2835864
A. Gómez-Iglesias, F. Castejón, M.A. Vega-Rodríguez, Distributed bees foraging-based algorithm for large-scale problems, in 25th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2011 - Workshop Proceedings Anchorage, AK, 16–20 May 2011 (IEEE, 2011), pp. 1950–1960. doi:10.1109/IPDPS.2011.355. http://dx.doi.org/10.1109/IPDPS.2011.355
A. Gómez-Iglesias, M.A. Vega-Rodríguez, F. Castejón, Distributed and asynchronous solver for large CPU intensive problems. Appl. Soft Comput. 13 (5), 2547–2556 (2013). doi:10.1016/j.asoc.2012.11.031
A. Gómez-Iglesias, M.A. Vega-Rodríguez, F. Castejón, M.C. Montes, E. Morales-Ramos, Artificial bee colony inspired algorithm applied to fusion research in a grid computing environment, in Proceedings of the 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, PDP 2010, Pisa, Feb 17–19, 2010 (IEEE Computer Society, 2010), pp. 508–512, ed. by M. Danelutto, J. Bourgeois, T. Gross. doi:10.1109/PDP.2010.50. http://dx.doi.org/10.1109/PDP.2010.50
C.C. Hegna, N. Nakajima, On the stability of mercier and ballooning modes in stellarator configurations. Phys. Plasmas 5 (5), 1336–1344 (1998)
S.P. Hirshman, G.H. Neilson, External inductance of an axisymmetric plasma. Phys. Fluids 29 (3), 790–793 (1986)
D. Karaboga, B. Basturk, A powerful and efficient algorithm for numerical function optimization: artificial bee colony (abc) algorithm. J. Glob. Optim. 39 (3), 459–471 (2007)
S. Krishnan, M. Tatineni, C. Baru, myHadoop - hadoop-on-demand on traditional HPC resources. Tech. rep., Chapter in ‘Contemporary HPC Architectures’ [KV04] Vassiliki Koutsonikola and Athena Vakali. Ldap: framework, practices, and trends, in IEEE Internet Computing (2004)
R. Sanchez, S. Hirshman, J. Whitson, A. Ware, Cobra: an optimized code for fast analysis of ideal ballooning stability of three-dimensional magnetic equilibria. J. Comput. Phys. 161 (2), 576–588 (2000). doi:http://dx.doi.org/10.1006/jcph.2000.6514. http://www.sciencedirect.com/science/article/pii/S0021999100965148
W.I. van Rij, S.P. Hirshman, Variational bounds for transport coefficients in three-dimensional toroidal plasmas. Phys. Fluids B 1 (3), 563–569 (1989)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Gómez-Iglesias, A., Arora, R. (2016). Using High Performance Computing for Conquering Big Data. In: Arora, R. (eds) Conquering Big Data with High Performance Computing. Springer, Cham. https://doi.org/10.1007/978-3-319-33742-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-33742-5_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-33740-1
Online ISBN: 978-3-319-33742-5
eBook Packages: Computer ScienceComputer Science (R0)