Skip to main content
Log in

Improving Data Access for Computational Grid Applications

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

High-performance computing increasingly occurs on “computational grids” composed of heterogeneous and geographically distributed systems of computers, networks, and storage devices that collectively act as a single “virtual” computer. A key challenge in this environment is to provide efficient access to data distributed across remote data servers. Our parallel I/O framework, called Armada, allows application and data-set providers to flexibly compose graphs of processing modules that describe the distribution, application interfaces, and processing required of the dataset before computation. Although the framework provides a simple programming model for the application programmer and the data-set provider, the resulting graph may contain bottlenecks that prevent efficient data access. In this paper, we present an algorithm used to restructure Armada graphs that distributes computation and data flow to improve performance in the context of a wide-area computational grid.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. F. Aminzadeh, N. Burkhard, L. Nicoletis, F. Rocca and K. Wyatt, SEG/EAEG 3-D modeling project: 2nd update, The Leading Edge 13(9) (September 1994).

  2. Remzi H. Arpaci-Dusseau, Run-time adaptation in River, ACM Transactions on Computer Systems 21(1) (2003) 36–86.

  3. Remzi H. Arpaci-Dusseau, Eric Anderson, Noah Treuhaft, David E. Culler, Joseph M. Hellerstein, David Patterson, and Kathy Yelick, Cluster I/O with River: Making the fast case common, in: Proceedings of the Sixth Workshop on Input/Output in Parallel and Distributed Systems, ACM Press, Atlanta, GA, (1999) pp. 10–22.

  4. James Abello and Jeffrey Scott Vitter, (eds.) External Memory Algorithms and Visualization. DIMACS Series in Discrete Mathematics and Theoretical Computer Science. American Mathematical Society Press, Providence, RI (1999).

  5. Henri E. Bal, A comparitive study of five parallel programming languages, Future Generation Computer Systems 8 (1992) 121–135.

  6. Hans Leo Bodlaender and Babette de Fluiter, Parallel algorithms for series parallel graphs, in: Proc. 4th Eur. Symp. Algorithms, number 1136 in Lecture Notes in Computer Science, Springer-Verlag (1996) pp. 277–289.

  7. Michael D. Beynon, Renato Ferreira, Tahsin Kurc, Alan Sussman and Joel Saltz, DataCutter: Middleware for filtering very large scientific datasets on archival storage systems, in: Proceedings of the 2000 Mass Storage Systems Conference, College Park, MD, IEEE Computer Society Press (2000) pp. 119–133.

  8. Henri E. Bal, Jennifer G. Steiner and Andrew S. Tanenbaum, Programming languages for distributed computing systems, ACM Computing Surveys 21(3) (1989) 261–322.

  9. Alex Colvin and Thomas H. Cormen, ViC*: A compiler for virtual-memory C*, in: Proceedings of the Third International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS '98), IEEE Computer Society Press (1998) pp. 23–33.

  10. Peter F. Corbett and Dror G. Feitelson, The Vesta parallel file system, ACM Transactions on Computer Systems 14(3) (1996) 225–264.

  11. J. Carretero, F. Pérez, P. de Miguel, F. García and L. Alonso, Prototype POSIX-style parallel file server and report for the CS-2. Technical Report D1.7/1, Universidad Politecnic Madrid, Madrid, Spain, (1993).

  12. David DeWitt and Jim Gray, Parallel database systems: The future of high-performance database systems, Communications of the ACM 35(6) (1992) 85–98.

    Article  Google Scholar 

  13. David J. DeWitt, Robert H. Gerber, Goetz Graefe, Michael L. Heytens, Krishna B. Kumar and M. Muralikrishna, GAMMA—A high performance dataflow database machine, in: Proceedings of the 12th International Conference on Very Large Data Bases (1986) pp. 228–237.

  14. Matteo Frigo and Steven G. Johnson, FFTW: An adaptive software architecture for the FFT, in: Proc. 1998 IEEE Intl. Conf. Acoustics Speech and Signal Processing, vol. 3, IEEE (1998) pp. 1381–1384.

  15. Ian Foster and Carl Kesselman (eds.) The Grid: Blueprint for a New Computing Infrastructure (Morgan Kaufmann Publishers, 1998).

  16. Benoit A. Gennart, Marc Mazzariol, Vincent Messerli and Roger D. Hersch, Synthesizing parallel imaging applications using the CAP computer-aided parallelization tool, in: Proceedings of the IS&T/SPIE 10th Annual Symposium on Electronic Imaging, Storage & Retrieval for Image and Video Databases VI, San Jose, CA, (1998) pp. 446–458.

  17. Bruce Hendrickson and Robert Leland, The Chaco user's guide: Version 2.0. Technical Report SAND94-2692, Sandia National Laboratories, 1994.

  18. Vincent Messerli, Tools for Parallel I/O and Compute Intensive Applications. PhD thesis, École Polytechnique Fédérale de Lausanne, 1999. Thèse 1915.

  19. Jarek Nieplocha, Ian Foster and Rick Kendall, ChemIO: High-performance parallel I/O for computational chemistry applications, The International Journal of High Performance Computing Applications 12(3) (1998) 345–363.

    Google Scholar 

  20. Nils Nieuwejaar, David Kotz, Apratim Purakayastha, Carla Schlatter Ellis and Michael Best, File-access characteristics of parallel scientific workloads, IEEE Transactions on Parallel and Distributed Systems, 7(10) (1996) 1075–1089.

  21. Ron Oldfield and David Kotz, Armada: A parallel file system for computational grids, in: Proceedings of the First IEEE/ACM International Symposium on Cluster Computing and the Grid, Brisbane, Australia, IEEE Computer Society Press (2001) pp. 194–201.

  22. Ron Oldfield and David Kotz, Scientific applications using parallel I/O, in Hai Jin, Toni Cortes, and Rajkumar Buyya (eds.), High Performance Mass Storage and Parallel I/O: Technologies and Applications, chapter 45, IEEE Computer Society Press and John Wiley & Sons, (2001) pp. 655–666.

  23. Ron Oldfield and David Kotz, Armada: A parallel I/O framework for computational grids, Future Generation Computing Systems (FGCS) 18(4) (2002) 501–523.

    Google Scholar 

  24. Ron Oldfield. Efficient I/O for Computational Grid Applications. PhD thesis, Dept. of Computer Science, Dartmouth College, May 2003. Available as Dartmouth Computer Science Technical Report TR2003-459.

  25. Curtis Ober, Ron Oldfield, David Womble, L. Romero and Charles Burch, Practical aspects of prestack depth migration with finite differences, in: Proceedings of the 67th Annual International Meeting of the Society of Exploration Geophysicists, Dallas Texas, Expanded Abstracts (1997) pp. 1758–1761.

  26. Ron A. Oldfield, David E. Womble and Curtis C. Ober, Efficient parallel I/O in seismic imaging, The International Journal of High Performance Computing Applications 12(3) (1998) 333—344.

  27. Beth Plale and Karsten Schwan, dQUOB: Managing large data flows by dynamic embedded queries in: Proceedings of the Ninth IEEE International Symposium on High Performance Distributed Computing, Pittsburgh, Pennsylvania, (2000) pp. 263–270.

  28. Luigi Rizzo. Dummynet: a simple approach to the evaluation of network protocols, ACM Computer Communication Review 27(1) (1997) 31–41.

    Google Scholar 

  29. M. Spencer, R. Ferreira, M. Beynon, T. Kurc, U. Catalyurek, A. Sussman and J. Saltz, Executing multiple pipelined data analysis operations in the grid, in: Proceedings of SC2002: High Performance Networking and Computing, Baltimore, Maryland (2002).

  30. Jeffrey Scott Vitter, External memory algorithms and data structures: dealing with massive data, in Abello and Vitter abello:dimacs, pages 1—38.

  31. Jacobo Valdes, Robert E. Tarjan and Eugene L. Lawler, The recognition of series parallel digraphs, SIAM Journal of Computing 11(2) (1982) 298–313.

  32. Rajiv Wickremesinghe, Jeffrey S. Chase and Jeffrey S. Vitter, Distributed computing with load-managed active storage, in: Proceedings of the Eleventh IEEE International Symposium on High Performance Distributed Computing, Edinburgh, Scotland, IEEE Computer Society Press (2002) pp. 24–34.

  33. Brian White, Jay Lepreau, Leigh Stoller, Robert Ricci, Shashi Guruprasad, Mac Newbold, Mike Hibler, Chad Barb and Abhijeet Joglekar, An integrated experimental environment for distributed systems and networks, in: Proceedings of the 2002 Symposium on Operating Systems Design and Implementation, Boston, MA, December 2002. USENIX Association (2002) pp. 255—270.

  34. Ozdogan Yilmaz, Seismic Data Processing (Society of Exploration Goephysics, 1987).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ron Oldfield.

Additional information

This work was supported by Sandia National Laboratories under DOE contract DOE-AV6184.

Ron A. Oldfield is a senior member of the technical staff at Sandia National Laboratories in Albuquerque, NM. He received the B.Sc. in computer science from the University of New Mexico in 1993. From 1993 to 1997, he worked in the computational sciences department of Sandia National Laboratories, where he specialized in seismic research and parallel I/O. He was the primary developer for the GONII-SSD (Gas and Oil National Information Infrastructure–Synthetic Seismic Dataset) project and a co-developer for the R&D 100 award winning project “Salvo”, a project to develop a 3D finite-difference prestack-depth migration algorithm for massively parallel architectures. From 1997 to 2003 he attended graduate school at Dartmouth college and received his Ph.D. in June, 2003. In September of 2003, he returned to Sandia to work in the Scalable Computing Systems department. His research interests include parallel and distributed computing, parallel I/O, and mobile computing.

David Kotz is a Professor of Computer Science at Dartmouth College in Hanover NH. After receiving his A.B. in Computer Science and Physics from Dartmouth in 1986, he completed his Ph.D in Computer Science from Duke University in 1991. He returned to Dartmouth to join the faculty in 1991, where he is now Professor of Computer Science, Director of the Center for Mobile Computing, and Executive Director of the Institute for Security Technology Studies. His research interests include context-aware mobile computing, pervasive computing, wireless networks, and intrusion detection. He is a member of the ACM, IEEE Computer Society, and USENIX associations, and of Computer Professionals for Social Responsibility. For more information see http://www.cs.dartmouth.edu/dfk/.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Oldfield, R., Kotz, D. Improving Data Access for Computational Grid Applications. Cluster Comput 9, 79–99 (2006). https://doi.org/10.1007/s10586-006-4899-7

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-006-4899-7

Keywords

Navigation