Abstract
Efficient I/O on large-scale spatiotemporal scientific data requires scrutiny of both the logical layout of the data (e.g., row-major vs. column-major) and the physical layout (e.g., distribution on parallel filesystems). For increasingly complex datasets, hand optimization is a difficult matter prone to error and not scalable to the increasing heterogeneity of analysis workloads. Given these factors, we present a partial data replication system called RADAR. We capture datatype- and collective-aware I/O access patterns (indicating logical access) via MPI-IO tracing and use a combination of coarse-grained and fine-grained performance modeling to evaluate and select optimized physical data distributions for the task at hand. Unlike conventional methods, we store all replica data and metadata, along with the original untouched data, under a single file container using the object abstraction in parallel filesystems. Our system results in manyfold improvements in some commonly used subvolume decomposition access patterns.Moreover, the modeling approach can determine whether such optimizations should be undertaken in the first place.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bent, J., Gibson, G., Grider, G., McClelland, B., Nowoczynski, P., Nunez, J., Polte, M., Wingate, M.: PLFS: A checkpoint filesystem for parallel applications. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC 2009, pp. 21:1–21:12. ACM, New York (2009)
Bhadkamkar, M., Guerra, J., Useche, L., Burnett, S., Liptak, J., Rangaswami, R., Hristidis, V.: BORG: Block-reORGanization for self-optimizing storage systems. In: Proccedings of the 7th Conference on File and Storage Technologies, FAST 2009, pp. 183–196. USENIX Association, Berkeley (2009)
Bucy, J.S., Schindler, J., Schlosser, S., Ganger, G.: Contributors. The DiskSim simulation environment version 4.0 reference manual. Technical Report CMU-PDL-08-101, Carnegie Mellon University Parallel Data Lab (2008)
Byna, S., Chen, Y., Sun, X.-H., Thakur, R., Gropp, W.: Parallel I/O prefetching using MPI file caching and I/O signatures. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008, pp. 1–12. IEEE (2008)
Carns, P., Harms, K., Allcock, W., Bacon, C., Lang, S., Latham, R., Ross, R.: Understanding and improving computational science storage access through continuous characterization. ACM Transactions on Storage (TOC) 7(3), 8:1–8:26 (2011)
Carns, P., Latham, R., Ross, R., Iskra, K., Lang, S., Riley, K.: 24/7 characterization of petascale I/O workloads. In: IEEE International Conference on Cluster Computing, Cluster 2010, pp. 1–10 (2009)
Carns, P.H., Ligon III, W.B., Ross, R.B., Thakur, R.: PVFS: A parallel file system for Linux clusters. In: Proceedings of the 4th Annual Linux Showcase and Conference, pp. 317–327 (2000)
Carns, P.H., Ligon III, W.B., Ross, R.B., Wyckoff, P.: BMI: A network abstraction layer for parallel I/O. In: Workshop on Communication Architecture for Clusters, Proceedings of IPDPS 2005, Denver, CO (April 2005)
Dayal, S.: Characterizing HEC storage systems at rest. Technical Report CMU-PDL-09-109, Carnegie Mellon University Parallel Data Laboratory (2008)
Frazier, M.W.: An Introduction to Wavelets through Linear Algebra. Springer (1999)
Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google File System. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, SOSP 2003, pp. 29–43. ACM, New York (2003)
Godard, S.: Sysstat utilities home page, http://sebastien.godard.pagesperso-orange.fr/index.html
Gong, Z., Boyuka II, D.A., Zou, X., Liu, Q., Podhorszki, N., Klasky, S., Ma, X., Samatova, N.F.: PARLO: PArallel Run-time Layout Optimization for scientific data explorations with heterogeneous access patterns. In: 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2013), Delft, The Netherlands (2013)
Gong, Z., Rogers, T., Jenkins, J., Kolla, H., Ethier, S., Chen, J., Ross, R., Klasky, S., Samatova, N.F.: MLOC: Multi-level layout optimization framework for compressed scientific data exploration with heterogeneous access patterns. In: Proceedings of the 41st International Conference on Parallel Processing, ICPP 2012 (2012)
Goodell, D., Kim, S.J., Latham, R., Kandemir, M., Ross, R.: An evolutionary path to object storage access. In: Proceedings of the Seventh Workshop on Parallel Data Storage, PDSW 2012 (2012)
He, J., Bent, J., Torres, A., Grider, G., Gibson, G., Maltzahn, C., Sun, X.-H.: Discovering structure in unstructured I/O. In: Proceedings of the Seventh Workshop on Parallel Data Storage, PDSW 2012 (2012)
Huang, H., Hung, W., Shin, K.G.: Fs2: Dynamic data replication in free disk space for improving disk performance and energy consumption. In: Proceedings of the Twentieth ACM Symposium on Operating Systems Principles, SOSP 2005, pp. 263–276. ACM, New York (2005)
Idreos, S.: Database Cracking: Towards Auto-tuning Database Kernels. PhD thesis, University of Amsterdam (2010)
Idreos, S., Kersten, M., Manegold, S.: Database cracking. In: Proceedings of the 3rd International Conference on Innovative Data Systems Research, CIDR 2007 (2007)
Interleaved or random (IOR) parallel filesystem I/O benchmark, https://github.com/chaos/ior
Jenkins, J., et al.: ALACRITY: Analytics-driven lossless data compression for rapid in-situ indexing, storing, and querying. In: Hameurlain, A., Küng, J., Wagner, R., Liddle, S.W., Schewe, K.-D., Zhou, X. (eds.) TLDKS X. LNCS, vol. 8220, pp. 95–114. Springer, Heidelberg (2013)
Jenkins, J., Schendel, E., Lakshminarasimhan, S., Boyuka II, D.A., Rogers, T., Ethier, S., Ross, R., Klasky, S., Samatova, N.F.: Byte-precision level of detail processing for variable precision analytics. In: ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), Salt Lake City, UT, USA (2012)
Kim, S.J., Son, S.W., Liao, W.-K., Kandemir, M., Thakur, R., Choudhary, A.: IOPin: Runtime profiling of parallel I/O in HPC systems. In: 7th Parallel Data Storage Workshop, PDSW 2012 (2012)
Kim, S.J., Zhang, Y., Son, S.W., Prabhakar, R., Kandemir, M., Patrick, C., Liao, W.-k., Choudhary, A.: Automated tracing of I/O stack. In: Keller, R., Gabriel, E., Resch, M., Dongarra, J. (eds.) EuroMPI 2010. LNCS, vol. 6305, pp. 72–81. Springer, Heidelberg (2010)
Lakshminarasimhan, S., Jenkins, J., Arkatkar, I., Gong, Z., Kolla, H., Ku, S.-H., Ethier, S., Chen, J., Chang, C.S., Klasky, S., Latham, R., Ross, R., Samatova, N.F.: ISABELA-QA: query-driven analytics with ISABELA-compressed extreme-scale scientific data. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC, pp. 31:1–31:11. ACM, New York (2011)
Lawder, J.K., King, P.J.H.: Querying multi-dimensional data indexed using the Hilbert Space-Filling Curve. SIGMOD Record 30 (2001)
Luk, C.-K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: Building customized program analysis tools with dynamic instrumentation. In: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2005, pp. 190–200. ACM, New York (2005)
Madhyastha, T.M., Reed, D.A.: Learning to classify parallel input/output access patterns. IEEE Transactions on Parallel and Distributed Systems 13(8), 802–813 (2002)
McKusick, M.K., Quinlan, S.: GFS: Evolution on fast-forward. Queue 7(7), 10:10–10:20 (2009)
MPI parallel environment (MPE), http://www.mcs.anl.gov/research/projects/perfvis/software/MPE/
Narayanan, S., Catalyurek, U., Kurc, T., Kumar, V.S., Saltz, J.: A runtime framework for partial replication and its application for on-demand data exploration. In: High Performance Computing Symposium, SCS Spring Simulation Multiconference, HPC 2005 (2005)
Noeth, M., Ratn, P., Mueller, F., Schulz, M., de Supinski, B.R.: ScalaTrace: Scalable compression and replay of communication traces for high-performance computing. Journal of Parallel and Distributed Computing 69(8), 696–710 (2009)
Oly, J., Reed, D.A.: Markov model prediction of I/O requests for scientific applications. In: Proceedings of the 16th International Conference on Supercomputing, ICS 2002, pp. 147–155. ACM, New York (2002)
Parallel I/O benchmarking consortium, http://www.mcs.anl.gov/research/projects/pio-benchmark/
Pascucci, V., Frank, R.J.: Global static indexing for real-time exploration of very large regular grids. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2001 (2001)
Ratn, P., Mueller, F., de Supinski, B.R., Schulz, M.: Preserving time in large-scale communication traces. In: Proceedings of the 22nd Annual International Conference on Supercomputing, ICS 2008, pp. 46–55. ACM, New York (2008)
Schmuck, F., Haskin, R.: GPFS: A shared-disk file system for large computing clusters. In: Proceedings of the 1st USENIX Conference on File and Storage Technologies, FAST 2002. USENIX Association, Berkeley (2002)
Schwan, P.: Lustre: Building a file system for 1000-node clusters. In: Proceedings of the 2003 Linux Symposium (2003)
Shorter, F.: Design and analysis of a performance evaluation standard for parallel file systems. Master’s thesis, Clemson University (2003)
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies, MSST 2010, pp. 1–10. IEEE Computer Society, Washington, DC (2010)
Son, S.W., Latham, R., Ross, R., Thakur, R.: Reliable MPI-IO through layout-aware replication. In: Proceedings of the 7th IEEE International Workshop on Storage Network Architecture and Parallel I/O, SNAPI 2011 (2011)
Song, H., Yin, Y., Chen, Y., Sun, X.-H.: A cost-intelligent application-specific data layout scheme for parallel file systems. In: Proceedings of the 20th International Symposium on High Performance Distributed Computing, HPDC 2011, pp. 37–48. ACM, New York (2011)
Song, H., Yin, Y., Sun, X.-H., Thakur, R., Lang, S.: A segment-level adaptive data layout scheme for improved load balance in parallel file systems. In: IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 414–423 (2011)
Tantisiriroj, W., Son, S.W., Patil, S., Lang, S.J., Gibson, G., Ross, R.B.: On the duality of data-intensive file system design: Reconciling HDFS and PVFS. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2011, pp. 67:1–67:12. ACM, New York (2011)
Thakur, R., Choudhary, A.: An extended two-phase method for accessing sections of out-of-core arrays. Scientific Programming 5(4), 301–317 (1996)
Thakur, R., Gropp, W., Lusk, E.: An abstract-device interface for implementing portable parallel-I/O interfaces. In: Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation, FRONTIERS 1996, pp. 180–187. IEEE Computer Society, Washington, DC (1996)
Thakur, R., Ross, R., Lust, E., Gropp, W.: Users guide for ROMIO: A high-performance, portable MPI-IO implementation. Technical Report ANL/MCS-TM-234, Mathematics and Computer Science Division, Argonne National Laboratory (2004)
Tran, N., Reed, D.A.: Automatic ARIMA time series modeling for adaptive I/O prefetching. IEEE Transactions on Parallel and Distributed Systems 15(4), 362–377 (2004)
Vetter, J.S., McCracken, M.O.: Statistical scalability analysis of communication operations in distributed applications. In: Proceedings of the Eighth ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, PPoPP 2001, pp. 123–132. ACM, New York (2001)
Vijayakumar, K., Mueller, F., Ma, X., Roth, P.C.: Scalable I/O tracing and analysis. In: Proceedings of the 4th Annual Workshop on Petascale Data Storage, PDSW 2009, pp. 26–31. ACM, New York (2009)
Weil, S.A., Brandt, S.A., Miller, E.L., Long, D.D.E., Maltzahn, C.: Ceph: A scalable, high-performance distributed file system. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation, OSDI 2006, pp. 307–320. USENIX Association, Berkeley (2006)
Weng, L., Catalyurek, U., Kurc, T., Agrawal, G., Saltz, J.: Servicing range queries on multidimensional datasets with partial replicas. In: IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2005, vol. 2, pp. 726–733. IEEE (2005)
Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann (1999)
Wu, X., Vijayakumar, K., Mueller, F., Ma, X., Roth, P.C.: Probabilistic communication and I/O tracing with deterministic replay at scale. In: Proceedings of the 2011 International Conference on Parallel Processing, ICPP 2011, pp. 196–205. IEEE Computer Society, Washington, DC (2011)
Yin, Y., Byna, S., Song, H., Sun, X.-H., Thakur, R.: Boosting application-specific parallel I/O optimization using IOSIG. In: Cluster, Cloud and Grid Computing (CCGrid), pp. 196–203 (2012)
Yin, Y., Li, J., He, J., Sun, X.-H., Thakur, R.: Pattern-direct and layout-aware replication scheme for parallel i/o systems. In: IEEE International Symposium on Parallel and Distributed Computing, IPDPS 2013, pp. 345–356 (2013)
Zhang, X., Jiang, S.: InterferenceRemoval: Removing interference of disk access for mpi programs through data replication. In: Proceedings of the 24th ACM International Conference on Supercomputing, ICS 2010, pp. 223–232. ACM, New York (2010)
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Computing Surveys 38(2) (July 2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Jenkins, J., Zou, X., Tang, H., Kimpe, D., Ross, R., Samatova, N.F. (2014). RADAR: Runtime Asymmetric Data-Access Driven Scientific Data Replication. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds) Supercomputing. ISC 2014. Lecture Notes in Computer Science, vol 8488. Springer, Cham. https://doi.org/10.1007/978-3-319-07518-1_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-07518-1_19
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07517-4
Online ISBN: 978-3-319-07518-1
eBook Packages: Computer ScienceComputer Science (R0)