A Semantics-Aware I/O Interface for High Performance Computing
File systems as well as I/O libraries offer interfaces which can be used to interact with them, albeit on different levels of abstraction. While an interface’s syntax simply describes the available operations, its semantics determine how these operations behave and which assumptions developers can make about them. There are several different interface standards in existence, some of them dating back decades and having been designed for local file systems. Examples are the POSIX standard for file system interfaces and the MPI-I/O standard for MPI-based I/O.
Most file systems implement a POSIX-compliant interface to improve portability. While the syntactical part of the interface is usually not modified in any way, the semantics are often relaxed to reach maximum performance. However, this can lead to subtly different behavior on different file systems, which in turn can cause application misbehavior that is hard to track down.
On the other hand, providing only fixed semantics also makes it very hard to achieve optimal performance for different use cases. An additional problem is the fact that the underlying file system does not have any information about the semantics offered in higher levels of the I/O stack. While currently available interfaces do not allow application developers to influence the I/O semantics, applications could benefit greatly from the possibility of being able to adapt the I/O semantics at runtime.
The work we present in this paper includes the design of our semantics-aware I/O interface and a prototypical file system developed to support the interface’s features. Using the proposed I/O interface, application developers can specify their applications’ I/O behavior by providing semantical information. The general goal is an interface where developers can specify what operations should do and how they should behave – leaving the actual realization and possible optimizations to the underlying file system. Due to the unique requirements of the proposed I/O interface, the file system prototype is designed from scratch. However, it uses suitable existing technologies to keep the implementation overhead low.
The new I/O interface and file system prototype are evaluated using parallel metadata benchmarks. Using a single metadata server, they deliver a sustained performance of up to 50,000 lookup and 20,000 create operations per second, which is comparable to – and in some cases, better than – other well-established parallel distributed file systems.
KeywordsDistributed File Systems I/O Interfaces I/O Semantics
Unable to display preview. Download preview PDF.
- 1.10gen, Inc.: MongoDB (2012), http://www.mongodb.org/ (last accessed: February 2013)
- 2.Al-Kiswany, S., Gharaibeh, A., Ripeanu, M.: The Case for a Versatile Storage System. SIGOPS Oper. Syst. Rev. (January 2010)Google Scholar
- 3.Cluster File Systems, Inc.: Lustre: A Scalable, High-Performance File System (November 2002), http://www.cse.buffalo.edu/faculty/tkosar/cse710/papers/lustre-whitepaper.pdf (last accessed: February 2013)
- 4.Corbett, P., Feitelson, D., Fineberg, S., Hsu, Y., Nitzberg, B., Prost, J.P., Snir, M., Traversat, B., Wong, P.: Overview of the MPI-IO Parallel I/O Interface. In: IPPS 1995 Workshop on Input/Output in Parallel and Distributed Systems (April 1995)Google Scholar
- 5.Frings, W., Wolf, F., Petkov, V.: Scalable massively parallel I/O to task-local files. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC 2009 (2009)Google Scholar
- 6.Gharaibeh, A., Al-Kiswany, S., Ripeanu, M.: Configurable security for scavenged storage systems. In: Proceedings of the 4th ACM International Workshop on Storage Security and Survivability, Storage (2008)Google Scholar
- 7.Knüpfer, A., Brendel, R., Brunst, H., Mix, H., Nagel, W.E.: Introducing the Open Trace Format (OTF). In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2006. LNCS, vol. 3992, pp. 526–533. Springer, Heidelberg (2006), http://dx.doi.org/10.1007/11758525_71 CrossRefGoogle Scholar
- 8.Latham, R., Ross, R., Thakur, R.: Implementing MPI-IO Atomic Mode and Shared File Pointers Using MPI One-Sided Communication. Int. J. High Perform. Comput. Appl. (May 2007)Google Scholar
- 9.Lensing, P., Meister, D., Brinkmann, A.: hashFS: Applying Hashing to Optimize File Systems for Small File Reads. In: Proceedings of the 2010 International Workshop on Storage Network Architecture and Parallel I/Os, SNAPI 2010 (2010)Google Scholar
- 10.Lofstead, J.F., Klasky, S., Schwan, K., Podhorszki, N., Jin, C.: Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS). In: Proceedings of the 6th International Workshop on Challenges of Large Applications in Distributed Environments, CLADE 2008 (June 2008)Google Scholar
- 11.Message Passing Interface Forum: MPI: A Message-Passing Interface Standard. Version 3.0 (September 2012), http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf (last accessed: February 2013)
- 12.Message Passing Interface Forum: Opening a File (February 2013), http://www.mpi-forum.org/docs/mpi22-report/node265.htm (last accessed: February 2013)
- 13.Minartz, T., Molka, D., Kunkel, J., Knobloch, M., Kuhn, M., Ludwig, T.: Tool Environments to Measure Power Consumption and Computational Performance, ch. 31. Chapman and Hall/CRC Press Taylor and Francis Group (2012)Google Scholar
- 14.Patil, S., Gibson, G.A., Ganger, G.R., Lopez, J., Polte, M., Tantisiroj, W., Xiao, L.: In search of an API for scalable file systems: Under the table or above it? In: Proceedings of the 2009 Conference on Hot Topics in Cloud Computing, HotCloud 2009 (2009)Google Scholar
- 15.Rew, R., Davis, G.: Data Management: NetCDF: an Interface for Scientific Data Access. IEEE Comput. Graph. Appl. (July 1990)Google Scholar
- 16.Ross, R., Latham, R., Gropp, W., Thakur, R., Toonen, B.: Implementing MPI-IO atomic mode without file system support. In: Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid, CCGRID 2005 (2005)Google Scholar
- 17.Schmuck, F., Haskin, R.: GPFS: A Shared-Disk File System for Large Computing Clusters. In: Proceedings of the 1st USENIX Conference on File and Storage Technologies, FAST 2002 (2002)Google Scholar
- 18.Sehrish, S.: Improving Performance and Programmer Productivity for I/O-Intensive High Performance Computing Applications. Phd thesis, School of Electrical Engineering and Computer Science in the College of Engineering and Computer Science at the University of Central Florida (2010)Google Scholar
- 19.Sterling, T., Lusk, E., Gropp, W. (eds.): Beowulf Cluster Computing with Linux, 2nd edn. MIT Press (2003)Google Scholar
- 20.Thakur, R., Ross, R., Lusk, E., Gropp, W., Latham, R.: Users Guide for ROMIO: A High-Performance, Portable MPI-IO Implementation (April 2010), http://www.mcs.anl.gov/research/projects/romio/doc/users-guide.pdf (last accessed: February 2013)
- 21.The HDF Group: Hierarchical data format version 5 (2000-2010), http://www.hdfgroup.org/HDF5 (last accessed: February 2013)
- 22.Vilayannur, M., Lang, S., Ross, R., Klundt, R., Ward, L.: Extending the POSIX I/O Interface: A Parallel File System Perspective. Tech. Rep. ANL/MCS-TM-302 (October 2008)Google Scholar
- 23.Vilayannur, M., Nath, P., Sivasubramaniam, A.: Providing Tunable Consistency for a Parallel File Store. In: Proceedings of the 4th Conference on USENIX Conference on File and Storage Technologies, FAST 2005, vol. 4 (2005)Google Scholar
- 24.Wikipedia: Festplattenlaufwerk – Geschwindigkeit (February 2013), http://de.wikipedia.org/wiki/Festplattenlaufwerk#Geschwindigkeit (last accessed: February 2013)
- 25.Wikipedia: Mark Kryder – Kryder’s Law (February 2013), http://en.wikipedia.org/wiki/Mark_Kryder#Kryder.27s_Law (last accessed: February 2013)
- 26.Wikipedia: TOP500 (February 2013), http://en.wikipedia.org/wiki/TOP500 (last accessed: February 2013)
- 27.Norcott, W.D., Capps, D.: IOzone Filesystem Benchmark (2006), http://www.iozone.org/ (last accessed: February 2013)