I/O and File Systems for Data-Intensive Applications
Largecany other knowledge discoveries. During the evolution of parallel computing, it forms two major camps: high-performance computing (or Supercomputing) and cloud computing. HPC is computing-oriented and the typical applications are scientific simulation, numerical computation, and etc. They rely on low-latency networks for message passing and use parallel programming paradigms such as MPI to enable parallelism . Cloud computing is usually data-processing-oriented and the typical framework is designed for large-scale batch data processing.
KeywordsCloud Computing Data Access File System Hadoop Distribute File System Chunk Size
- 1.“The Message Passing Interface (MPI) standard” [Online]. Available: http://www.mcs.anl.gov/research/projects/mpi/.
- 2.F. Schmuck and R. Haskin, “GPFS: A Shared-disk FileSystem for Large Computing Clusters,” in Proceedings of the 1st USENIX Conference on File and, 2002.Google Scholar
- 3.“Lustre File Systems Website,” [Online]. Available: http://wiki.lustre.org/index.php/Main_Page.
- 4.P. J. Braam., “The Lustre Storage Architecture,” [Online]. Available: http://www.lustre.org/documentation.html.
- 5.“OrangeFS Website,” [Online]. Available: orangefs.org.Google Scholar
- 6.Carns, P.H., Ligon, W.B. III, and Ross, R.B., “PVFS: A Parallel File System for Linux Clusters,” in Proceedings of the 4th Annual Linux Showcase and Conference, 2000.Google Scholar
- 7.“MPI-2: Extensions to the Message-Passing Interface,” [Online]. Available: http://www.mpi-forum.org/docs/mpi-20-html/mpi2-report.html.
- 8.R. Thakur, W. Gropp, and E. Lusk, “Data Sieving and Collective I/O in ROMIO,” in FRONTIERS ’99: Proceedings of the 7th Symposium on the Frontiers of Massively Parallel Computation, 1999.Google Scholar
- 9.Dean, Jeffrey, and Ghemawat, Sanjay, “MapReduce: Simplified Data Processing on Large Clusters,” in Sixth Symposium on Operating System Design and Implementation, 2004.Google Scholar
- 10.Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, “The Google File System,” in 19th ACM Symposium on Operating Systems Principles, 2003.Google Scholar
- 11.“Hadoop Distribute Filesystem Website,” [Online]. Available: http://hadoop.apache.org/hdfs/.
- 12.“Kosmos Distributed Filesystem” [Online]. Available: http://code.google.com/p/kosmosfs/.
- 13.“libHDFS Source Code” [Online]. Available: http://github.com/apache/hadoop-hdfs/blob/trunk/src/c++/libhdfs/hdfs.h.
- 14.Brewer, E, “PODC Keynote Presentation,” 2000. [Online]. Available: http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf.
- 15.H. Song, Y. Yin, Y. Chen, and X.-H. Sun, “A Cost-Intelligent Application-Specific Data Layout Scheme for Parallel File Systems,” in Proc. of the 20th International ACM Symposium on High Performance Distributed Computing, 2011.Google Scholar
- 16.Prost, J.-P.; Treumann, R.; Hedges, R.; Jia, B.; Koniges, A., “MPI-IO/GPFS, an Optimized Implemetation of MPI-IO on top of GPFS,” in Proc. of the International Conference for High Performance Computing, Networks, Storage and Analysis (Supercomputing), 2001.Google Scholar
- 17.Liao, Wei-keng, and Choudhary, Alok, “Dynamically Adapting File Domain Partitioning Methods for Collective I/O Based On Underlying Parallel File System Locking Protocols,” in International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008, 2008.Google Scholar
- 18.H. Jin, J. Ji, X.-H. Sun, Y. Chen and R. Thakur, “CHAIO: Enabling HPC Applications on Data-Intensive File Systems,” in 41st International Conference on Parallel Processing, 2012.Google Scholar
- 19.“TOP500 Supercomputer Sites” [Online]. Available: http://www.top500.org/.
- 20.“Magellan Project: A Cloud for Science,” [Online]. Available: http://magellan.alcf.anl.gov/.
- 21.Walker, E., “Benchmarking Amazon EC2 for High-Performance Scientific Computing,” Usenix Login, 2008.Google Scholar
- 22.He, Q.; Zhou, S.; Kobler, B.; Duffy, D.; McGlynn, T., “Case Study for Running HPC Applications in Public Clouds,” in Proc. of 1st Workshop on Scientific Cloud Computing (ScienceCloud), 2010.Google Scholar
- 23.“HPC in the Cloud,” [Online]. Available: http://www.hpcinthecloud.com/.
- 24.Moody, A.; Bronevetsky, G.; Mohror, K.; Supinski, B. R., “Design, Modeling and Evaluation of a Scalable Multi-Level Checkpointing System,” in Proc. of the International Conference for High Performance Computing, Networks, Storage and Analysis (Supercomputing), 2010.Google Scholar
- 25.Oldfield, R.; Ward, L.; Riesen, R.; Riesen, A.; Widener, P.; Widener, T., “Lightweight I/O for Scientific Applications,” in Proc. of IEEE Cluster Computing (Cluster), 2006.Google Scholar
- 26.C. Mitchell, J. Ahrensy and J. Wang, “VisIO: Enabling Interactive Visualization of Ultra-Scale, Time Series Data via High-Bandwidth Distirburted I/O Systems,” in IEEE International Parallel & Distributed Processing Symposium, 2011.Google Scholar
- 27.Bent John and Gibson Garth and Grider Gary and McClelland Ben and Nowoczynski Paul and Nunez James and Polte Milo and Wingate Meghan, “PLFS: A Checkpoint Filesystem for Parallel Applications,” in Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2009.Google Scholar
- 28.Sehrish Saba and Mackey Grant and Wang Jun and Bent John, “MRAP: a Novel Mapreduce-based Framework to Support HPC Analytics Applications with Access Patterns,” in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010.Google Scholar
- 29.Al-Kiswany, S.; Ripeanu, M.; Vazhkudai, S. S.; Gharaibeh, A., “stdchk: A Checkpoint Storage System for Desktop Grid Computing,” in Proc. of The 28th International Conference on Distributed Computing Systems (ICDCS), 2008.Google Scholar
- 30.“IOR HPC Benchmark,” [Online]. Available: http://sourceforge.net/projects/ior-sio/.
- 32.M.-E. Esteban, G. Maya, M. Carlos, J. Bent and S. Brandt, “Mixing Hadoop and HPC Workloads on Parallel,” in the 2009 ACM Petascale Data Storage Workshop (PDSW 09), 2009.Google Scholar
- 33.W. Tantisiriroj, S. Patil, G. Gibson, S. W. Son, S. J. Lang and R. B. Ross, “On the Duality of Data-Intensive File System Design: Reconciling HDFS and PVFS,” in International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2011.Google Scholar
- 34.“Hamster: Hadoop And Mpi on the same cluSTER,” [Online]. Available: http://issues.apache.org/jira/browse/MAPREDUCE-2911.
- 35.“Apache Mesos” [Online]. Available: http://mesos.apache.org/.
- 36.B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker and I. Stoica, “Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center,” in the 8th USENIX conference on Networked systems design and implementation, 2011.Google Scholar
- 37.“MapR Direct Access NFS” [Online]. Available: http://www.mapr.com/products/only-with-mapr/direct-access-nfs.