Advertisement

I/O and File Systems for Data-Intensive Applications

  • Yanlong Yin
  • Hui Jin
  • Xian-He Sun
Chapter

Abstract

Largecany other knowledge discoveries. During the evolution of parallel computing, it forms two major camps: high-performance computing (or Supercomputing) and cloud computing. HPC is computing-oriented and the typical applications are scientific simulation, numerical computation, and etc. They rely on low-latency networks for message passing and use parallel programming paradigms such as MPI to enable parallelism [1]. Cloud computing is usually data-processing-oriented and the typical framework is designed for large-scale batch data processing.

Keywords

Cloud Computing Data Access File System Hadoop Distribute File System Chunk Size 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    “The Message Passing Interface (MPI) standard” [Online]. Available: http://www.mcs.anl.gov/research/projects/mpi/.
  2. 2.
    F. Schmuck and R. Haskin, “GPFS: A Shared-disk FileSystem for Large Computing Clusters,” in Proceedings of the 1st USENIX Conference on File and, 2002.Google Scholar
  3. 3.
    “Lustre File Systems Website,” [Online]. Available: http://wiki.lustre.org/index.php/Main_Page.
  4. 4.
    P. J. Braam., “The Lustre Storage Architecture,” [Online]. Available: http://www.lustre.org/documentation.html.
  5. 5.
    “OrangeFS Website,” [Online]. Available: orangefs.org.Google Scholar
  6. 6.
    Carns, P.H., Ligon, W.B. III, and Ross, R.B., “PVFS: A Parallel File System for Linux Clusters,” in Proceedings of the 4th Annual Linux Showcase and Conference, 2000.Google Scholar
  7. 7.
    “MPI-2: Extensions to the Message-Passing Interface,” [Online]. Available: http://www.mpi-forum.org/docs/mpi-20-html/mpi2-report.html.
  8. 8.
    R. Thakur, W. Gropp, and E. Lusk, “Data Sieving and Collective I/O in ROMIO,” in FRONTIERS ’99: Proceedings of the 7th Symposium on the Frontiers of Massively Parallel Computation, 1999.Google Scholar
  9. 9.
    Dean, Jeffrey, and Ghemawat, Sanjay, “MapReduce: Simplified Data Processing on Large Clusters,” in Sixth Symposium on Operating System Design and Implementation, 2004.Google Scholar
  10. 10.
    Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, “The Google File System,” in 19th ACM Symposium on Operating Systems Principles, 2003.Google Scholar
  11. 11.
    “Hadoop Distribute Filesystem Website,” [Online]. Available: http://hadoop.apache.org/hdfs/.
  12. 12.
    “Kosmos Distributed Filesystem” [Online]. Available: http://code.google.com/p/kosmosfs/.
  13. 13.
    “libHDFS Source Code” [Online]. Available: http://github.com/apache/hadoop-hdfs/blob/trunk/src/c++/libhdfs/hdfs.h.
  14. 14.
    Brewer, E, “PODC Keynote Presentation,” 2000. [Online]. Available: http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf.
  15. 15.
    H. Song, Y. Yin, Y. Chen, and X.-H. Sun, “A Cost-Intelligent Application-Specific Data Layout Scheme for Parallel File Systems,” in Proc. of the 20th International ACM Symposium on High Performance Distributed Computing, 2011.Google Scholar
  16. 16.
    Prost, J.-P.; Treumann, R.; Hedges, R.; Jia, B.; Koniges, A., “MPI-IO/GPFS, an Optimized Implemetation of MPI-IO on top of GPFS,” in Proc. of the International Conference for High Performance Computing, Networks, Storage and Analysis (Supercomputing), 2001.Google Scholar
  17. 17.
    Liao, Wei-keng, and Choudhary, Alok, “Dynamically Adapting File Domain Partitioning Methods for Collective I/O Based On Underlying Parallel File System Locking Protocols,” in International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008, 2008.Google Scholar
  18. 18.
    H. Jin, J. Ji, X.-H. Sun, Y. Chen and R. Thakur, “CHAIO: Enabling HPC Applications on Data-Intensive File Systems,” in 41st International Conference on Parallel Processing, 2012.Google Scholar
  19. 19.
    “TOP500 Supercomputer Sites” [Online]. Available: http://www.top500.org/.
  20. 20.
    “Magellan Project: A Cloud for Science,” [Online]. Available: http://magellan.alcf.anl.gov/.
  21. 21.
    Walker, E., “Benchmarking Amazon EC2 for High-Performance Scientific Computing,” Usenix Login, 2008.Google Scholar
  22. 22.
    He, Q.; Zhou, S.; Kobler, B.; Duffy, D.; McGlynn, T., “Case Study for Running HPC Applications in Public Clouds,” in Proc. of 1st Workshop on Scientific Cloud Computing (ScienceCloud), 2010.Google Scholar
  23. 23.
    “HPC in the Cloud,” [Online]. Available: http://www.hpcinthecloud.com/.
  24. 24.
    Moody, A.; Bronevetsky, G.; Mohror, K.; Supinski, B. R., “Design, Modeling and Evaluation of a Scalable Multi-Level Checkpointing System,” in Proc. of the International Conference for High Performance Computing, Networks, Storage and Analysis (Supercomputing), 2010.Google Scholar
  25. 25.
    Oldfield, R.; Ward, L.; Riesen, R.; Riesen, A.; Widener, P.; Widener, T., “Lightweight I/O for Scientific Applications,” in Proc. of IEEE Cluster Computing (Cluster), 2006.Google Scholar
  26. 26.
    C. Mitchell, J. Ahrensy and J. Wang, “VisIO: Enabling Interactive Visualization of Ultra-Scale, Time Series Data via High-Bandwidth Distirburted I/O Systems,” in IEEE International Parallel & Distributed Processing Symposium, 2011.Google Scholar
  27. 27.
    Bent John and Gibson Garth and Grider Gary and McClelland Ben and Nowoczynski Paul and Nunez James and Polte Milo and Wingate Meghan, “PLFS: A Checkpoint Filesystem for Parallel Applications,” in Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2009.Google Scholar
  28. 28.
    Sehrish Saba and Mackey Grant and Wang Jun and Bent John, “MRAP: a Novel Mapreduce-based Framework to Support HPC Analytics Applications with Access Patterns,” in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010.Google Scholar
  29. 29.
    Al-Kiswany, S.; Ripeanu, M.; Vazhkudai, S. S.; Gharaibeh, A., “stdchk: A Checkpoint Storage System for Desktop Grid Computing,” in Proc. of The 28th International Conference on Distributed Computing Systems (ICDCS), 2008.Google Scholar
  30. 30.
    “IOR HPC Benchmark,” [Online]. Available: http://sourceforge.net/projects/ior-sio/.
  31. 31.
    B. Nicolae, G. Antoniu, L. Bougé, D. Moise and A. Carpen-Amarie, “BlobSeer: Next-Generation Data Management for Large Scale Infrastructures,” Journal of Parallel and Distributed Computing, vol. 2, pp. 169–184, 2011.CrossRefGoogle Scholar
  32. 32.
    M.-E. Esteban, G. Maya, M. Carlos, J. Bent and S. Brandt, “Mixing Hadoop and HPC Workloads on Parallel,” in the 2009 ACM Petascale Data Storage Workshop (PDSW 09), 2009.Google Scholar
  33. 33.
    W. Tantisiriroj, S. Patil, G. Gibson, S. W. Son, S. J. Lang and R. B. Ross, “On the Duality of Data-Intensive File System Design: Reconciling HDFS and PVFS,” in International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2011.Google Scholar
  34. 34.
    “Hamster: Hadoop And Mpi on the same cluSTER,” [Online]. Available: http://issues.apache.org/jira/browse/MAPREDUCE-2911.
  35. 35.
    “Apache Mesos” [Online]. Available: http://mesos.apache.org/.
  36. 36.
    B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker and I. Stoica, “Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center,” in the 8th USENIX conference on Networked systems design and implementation, 2011.Google Scholar
  37. 37.
    “MapR Direct Access NFS” [Online]. Available: http://www.mapr.com/products/only-with-mapr/direct-access-nfs.

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Department of Computer ScienceIllinois Institute of TechnologyChicagoUSA
  2. 2.Parallel Execution GroupOracle CorporationRedwood CityUSA

Personalised recommendations