Cluster Computing

, Volume 9, Issue 1, pp 79–99 | Cite as

Improving Data Access for Computational Grid Applications

Article

Abstract

High-performance computing increasingly occurs on “computational grids” composed of heterogeneous and geographically distributed systems of computers, networks, and storage devices that collectively act as a single “virtual” computer. A key challenge in this environment is to provide efficient access to data distributed across remote data servers. Our parallel I/O framework, called Armada, allows application and data-set providers to flexibly compose graphs of processing modules that describe the distribution, application interfaces, and processing required of the dataset before computation. Although the framework provides a simple programming model for the application programmer and the data-set provider, the resulting graph may contain bottlenecks that prevent efficient data access. In this paper, we present an algorithm used to restructure Armada graphs that distributes computation and data flow to improve performance in the context of a wide-area computational grid.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    F. Aminzadeh, N. Burkhard, L. Nicoletis, F. Rocca and K. Wyatt, SEG/EAEG 3-D modeling project: 2nd update, The Leading Edge 13(9) (September 1994).Google Scholar
  2. [2]
    Remzi H. Arpaci-Dusseau, Run-time adaptation in River, ACM Transactions on Computer Systems 21(1) (2003) 36–86.Google Scholar
  3. [3]
    Remzi H. Arpaci-Dusseau, Eric Anderson, Noah Treuhaft, David E. Culler, Joseph M. Hellerstein, David Patterson, and Kathy Yelick, Cluster I/O with River: Making the fast case common, in: Proceedings of the Sixth Workshop on Input/Output in Parallel and Distributed Systems, ACM Press, Atlanta, GA, (1999) pp. 10–22.Google Scholar
  4. [4]
    James Abello and Jeffrey Scott Vitter, (eds.) External Memory Algorithms and Visualization. DIMACS Series in Discrete Mathematics and Theoretical Computer Science. American Mathematical Society Press, Providence, RI (1999).Google Scholar
  5. [5]
    Henri E. Bal, A comparitive study of five parallel programming languages, Future Generation Computer Systems 8 (1992) 121–135.Google Scholar
  6. [6]
    Hans Leo Bodlaender and Babette de Fluiter, Parallel algorithms for series parallel graphs, in: Proc. 4th Eur. Symp. Algorithms, number 1136 in Lecture Notes in Computer Science, Springer-Verlag (1996) pp. 277–289.Google Scholar
  7. [7]
    Michael D. Beynon, Renato Ferreira, Tahsin Kurc, Alan Sussman and Joel Saltz, DataCutter: Middleware for filtering very large scientific datasets on archival storage systems, in: Proceedings of the 2000 Mass Storage Systems Conference, College Park, MD, IEEE Computer Society Press (2000) pp. 119–133.Google Scholar
  8. [8]
    Henri E. Bal, Jennifer G. Steiner and Andrew S. Tanenbaum, Programming languages for distributed computing systems, ACM Computing Surveys 21(3) (1989) 261–322.Google Scholar
  9. [9]
    Alex Colvin and Thomas H. Cormen, ViC*: A compiler for virtual-memory C*, in: Proceedings of the Third International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS '98), IEEE Computer Society Press (1998) pp. 23–33.Google Scholar
  10. [10]
    Peter F. Corbett and Dror G. Feitelson, The Vesta parallel file system, ACM Transactions on Computer Systems 14(3) (1996) 225–264.Google Scholar
  11. [11]
    J. Carretero, F. Pérez, P. de Miguel, F. García and L. Alonso, Prototype POSIX-style parallel file server and report for the CS-2. Technical Report D1.7/1, Universidad Politecnic Madrid, Madrid, Spain, (1993).Google Scholar
  12. [12]
    David DeWitt and Jim Gray, Parallel database systems: The future of high-performance database systems, Communications of the ACM 35(6) (1992) 85–98.CrossRefGoogle Scholar
  13. [13]
    David J. DeWitt, Robert H. Gerber, Goetz Graefe, Michael L. Heytens, Krishna B. Kumar and M. Muralikrishna, GAMMA—A high performance dataflow database machine, in: Proceedings of the 12th International Conference on Very Large Data Bases (1986) pp. 228–237.Google Scholar
  14. [14]
    Matteo Frigo and Steven G. Johnson, FFTW: An adaptive software architecture for the FFT, in: Proc. 1998 IEEE Intl. Conf. Acoustics Speech and Signal Processing, vol. 3, IEEE (1998) pp. 1381–1384.Google Scholar
  15. [15]
    Ian Foster and Carl Kesselman (eds.) The Grid: Blueprint for a New Computing Infrastructure (Morgan Kaufmann Publishers, 1998).Google Scholar
  16. [16]
    Benoit A. Gennart, Marc Mazzariol, Vincent Messerli and Roger D. Hersch, Synthesizing parallel imaging applications using the CAP computer-aided parallelization tool, in: Proceedings of the IS&T/SPIE 10th Annual Symposium on Electronic Imaging, Storage & Retrieval for Image and Video Databases VI, San Jose, CA, (1998) pp. 446–458.Google Scholar
  17. [17]
    Bruce Hendrickson and Robert Leland, The Chaco user's guide: Version 2.0. Technical Report SAND94-2692, Sandia National Laboratories, 1994.Google Scholar
  18. [18]
    Vincent Messerli, Tools for Parallel I/O and Compute Intensive Applications. PhD thesis, École Polytechnique Fédérale de Lausanne, 1999. Thèse 1915.Google Scholar
  19. [19]
    Jarek Nieplocha, Ian Foster and Rick Kendall, ChemIO: High-performance parallel I/O for computational chemistry applications, The International Journal of High Performance Computing Applications 12(3) (1998) 345–363.Google Scholar
  20. [20]
    Nils Nieuwejaar, David Kotz, Apratim Purakayastha, Carla Schlatter Ellis and Michael Best, File-access characteristics of parallel scientific workloads, IEEE Transactions on Parallel and Distributed Systems, 7(10) (1996) 1075–1089.Google Scholar
  21. [21]
    Ron Oldfield and David Kotz, Armada: A parallel file system for computational grids, in: Proceedings of the First IEEE/ACM International Symposium on Cluster Computing and the Grid, Brisbane, Australia, IEEE Computer Society Press (2001) pp. 194–201.Google Scholar
  22. [22]
    Ron Oldfield and David Kotz, Scientific applications using parallel I/O, in Hai Jin, Toni Cortes, and Rajkumar Buyya (eds.), High Performance Mass Storage and Parallel I/O: Technologies and Applications, chapter 45, IEEE Computer Society Press and John Wiley & Sons, (2001) pp. 655–666.Google Scholar
  23. [23]
    Ron Oldfield and David Kotz, Armada: A parallel I/O framework for computational grids, Future Generation Computing Systems (FGCS) 18(4) (2002) 501–523.Google Scholar
  24. [24]
    Ron Oldfield. Efficient I/O for Computational Grid Applications. PhD thesis, Dept. of Computer Science, Dartmouth College, May 2003. Available as Dartmouth Computer Science Technical Report TR2003-459.Google Scholar
  25. [25]
    Curtis Ober, Ron Oldfield, David Womble, L. Romero and Charles Burch, Practical aspects of prestack depth migration with finite differences, in: Proceedings of the 67th Annual International Meeting of the Society of Exploration Geophysicists, Dallas Texas, Expanded Abstracts (1997) pp. 1758–1761.Google Scholar
  26. [26]
    Ron A. Oldfield, David E. Womble and Curtis C. Ober, Efficient parallel I/O in seismic imaging, The International Journal of High Performance Computing Applications 12(3) (1998) 333—344.Google Scholar
  27. [27]
    Beth Plale and Karsten Schwan, dQUOB: Managing large data flows by dynamic embedded queries in: Proceedings of the Ninth IEEE International Symposium on High Performance Distributed Computing, Pittsburgh, Pennsylvania, (2000) pp. 263–270.Google Scholar
  28. [28]
    Luigi Rizzo. Dummynet: a simple approach to the evaluation of network protocols, ACM Computer Communication Review 27(1) (1997) 31–41.Google Scholar
  29. [29]
    M. Spencer, R. Ferreira, M. Beynon, T. Kurc, U. Catalyurek, A. Sussman and J. Saltz, Executing multiple pipelined data analysis operations in the grid, in: Proceedings of SC2002: High Performance Networking and Computing, Baltimore, Maryland (2002).Google Scholar
  30. [30]
    Jeffrey Scott Vitter, External memory algorithms and data structures: dealing with massive data, in Abello and Vitter abello:dimacs, pages 1—38.Google Scholar
  31. [31]
    Jacobo Valdes, Robert E. Tarjan and Eugene L. Lawler, The recognition of series parallel digraphs, SIAM Journal of Computing 11(2) (1982) 298–313.Google Scholar
  32. [32]
    Rajiv Wickremesinghe, Jeffrey S. Chase and Jeffrey S. Vitter, Distributed computing with load-managed active storage, in: Proceedings of the Eleventh IEEE International Symposium on High Performance Distributed Computing, Edinburgh, Scotland, IEEE Computer Society Press (2002) pp. 24–34.Google Scholar
  33. [33]
    Brian White, Jay Lepreau, Leigh Stoller, Robert Ricci, Shashi Guruprasad, Mac Newbold, Mike Hibler, Chad Barb and Abhijeet Joglekar, An integrated experimental environment for distributed systems and networks, in: Proceedings of the 2002 Symposium on Operating Systems Design and Implementation, Boston, MA, December 2002. USENIX Association (2002) pp. 255—270.Google Scholar
  34. [34]
    Ozdogan Yilmaz, Seismic Data Processing (Society of Exploration Goephysics, 1987).Google Scholar

Copyright information

© Springer Science + Business Media, Inc. 2006

Authors and Affiliations

  1. 1.Scalable Computing SystemsSandia National LaboratoriesAlbuquerque
  2. 2.Department of Computer ScienceDartmouth College, 6211 Sudikoff LaboratoryHanover

Personalised recommendations