Advertisement

Non-Strict Execution in Parallel and Distributed Computing

  • Alfredo Cristobal-Salas
  • Andrei Tchernykh
  • Jean-Luc Gaudiot
  • Wen-Yen Lin
Article

Abstract

This paper surveys and demonstrates the power of non-strict evaluation in applications executed on distributed architectures. We present the design, implementation, and experimental evaluation of single assignment, incomplete data structures in a distributed memory architecture and Abstract Network Machine (ANM). Incremental Structures (IS), Incremental Structure Software Cache (ISSC), and Dynamic Incremental Structures (DIS) provide non-strict data access and fully asynchronous operations that make them highly suited for the exploitation of fine-grain parallelism in distributed memory systems. We focus on split-phase memory operations and non-strict information processing under a distributed address space to improve the overall system performance. A novel technique of optimization at the communication level is proposed and described. We use partial evaluation of local and remote memory accesses not only to remove much of the excess overhead of message passing, but also to reduce the number of messages when some information about the input or part of the input is known. We show that split-phase transactions of IS, together with the ability of deferring reads, allow partial evaluation of distributed programs without losing determinacy. Our experimental evaluation indicates that commodity PC clusters with both IS and a caching mechanism, ISSC, are more robust. The system can deliver speedup for both regular and irregular applications. We also show that partial evaluation of memory accesses decreases the traffic in the interconnection network and improves the performance of MPI IS and MPI ISSC applications.

Incremental structures software cache message passing partial evaluation non-strict information processing 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

REFERENCES

  1. 1.
    A.M.Stepanov,Parallel Computation on Associative Networks, Preprint,AS USSR. Institute of Precise Mechanics and Computer Technology,Vol.2,Moscow,p.53 (1991) (In Russian).Google Scholar
  2. 2.
    S.Wray and J.Fairbaim,Non-Strict Languages-Programming and Implementation, Comput.J. 32(2:l42-151 (1989).Google Scholar
  3. 3.
    Y.-H. Wei and J.-L.Gaudiot,Lazy Evaluation of FP Programs:A Data-Flow Approach, Proc.of the Int 'l.Conf.on Fifth Generation Computer Systems (1988).Google Scholar
  4. 4.
    G.Tremblay and G.R.Gao,The Impact of Laziness on Parallelism and the Limits of Strictness Analysis.In Proceedings High Performance Functional Computing, W. Bohm and J.T. Feo (eds.),pp.119-133 (April 1995).Google Scholar
  5. 5.
    R.Bird,Introduction to Functional Programming using Haskell,2nd edn.,Prentice Hall Press,460 pp.(1998).Google Scholar
  6. 6.
    R.Nikhil and M. Arvind,Implicit Parallel Programming in pH,Morgan Kaufmann Publishers,p.400 (2001).Google Scholar
  7. 7.
    M. Amamiya and R. Hasegawa,Data-Flow Computing and Eager and Lazy Evaluations, Computing 2(2):105-129 (1984).Google Scholar
  8. 8.
    M.Amamiya, R. Hasegawa,and H. Mikami,List Processing with a Data-Flow Machine, Lecture Notes in Comput.Sci.147:165-190 (1983).Google Scholar
  9. 9.
    N.Jones, An Introduction to Partial Evaluation, ACMComput.Surv. 28(3):480-503 (1996).Google Scholar
  10. 10.
    T. Mogensen and P. Sestoft, Partial Evaluation,An Article for Encyclopedia of Computer Science and Technology,FTP version (1996).Google Scholar
  11. 11.
    A.P.Ershov,Mixed Computation:Potential Applications and Problems for Study, Theoret.Comput.Sci.18 (1982).Google Scholar
  12. 12.
    I.Bjorner, A. Ershov,and N. Jones, Partial Evaluation and Mixed Computational Evaluation of Pattern Matching in String, Inform. Process.Lett.30(2):79-86 (1989).Google Scholar
  13. 14.
    C. Consel and O. Danvy,Static and Dynamic Semantic Processing, ACM Symposium on Principles of Programming Languages,pp.14-23 (1991).Google Scholar
  14. 15.
    J. Jørgensen,Generating a Compiler for a Lazy Language by Partial Evaluation, ACM Symposium on Principles of Programming Languages,pp.258-268 (1992).Google Scholar
  15. 16.
    N.Jones, C. Gomard,and P. Sestoft, Partial Evaluation and Automatic Program Genera-tion,Prentice-Hall (1993).Google Scholar
  16. 17.
    P. Sesyoft and H. Sondergaard (eds.), Special Issue on Partial Evaluation and Semantic-Based Program Manipulation (PEPM '94) (Lisp and Symbolic Computation, Vol.8, No.3)(1995).Google Scholar
  17. 18.
    M. Sperber, H. Klaeren,and P. Thiemann,Distributed Partial Evaluation.In PASCO '97: Second Int 'l.Symposium on Parallel Symbolic Computation,Erich Kaltofen (ed.), p.8087,Maui,Hawaii, World Scientific Publishing Company (1997).Google Scholar
  18. 19.
    http://www.diku.dk/research-groups/topps/activities/cmix/Google Scholar
  19. 20.
    http://compose.labri.fr/prototypes/tempo/Google Scholar
  20. 21.
    P. Kumar, J.P. Gupta,and S.C. Winter,CTDNET III-An Eager Reduction Model with Laziness Features.In Abstract Machine Models for Highly Parallel Computers,J.R. Davy and P.M. Dew (eds.),pp.103-117 (1995).Google Scholar
  21. 22.
    J.P. Gupta, S.C. Winter,and D.R. Wilson,CTDNet-A Mechanism for the Concurrent Execution of Lambda Graphs,IEEE Trans.Soft.Eng.15:1357-1367 (1989).Google Scholar
  22. 23.
    J.Jaakko,Tuples and multiple return values in C++,TUCS Technical report No.249, Turku Centre for Computer Science (March 1999).ISBN 952-12-0401.Google Scholar
  23. 24.
    Arvind, R.S. Nikhil,and K.K. Pingali,I-Structures:Data Structures for Parallel Comput-ing,ACMTransaction on Programming Languages and Systems 11(4):598-632 (Oct.1989).Google Scholar
  24. 25.
    P.S.Barth,Using Atomic Data Structures for Parallel Simulation.In Proceedings of the Scalable High Performance Computing Conference,Williamsburg,VA,April 27 (1992).Google Scholar
  25. 26.
    S. Sur and W. Böhm,Efficient Declarative Programs:Experience in Implementing NAS Benchmark FT,Technical Report CS-93-128,Colorado State University (October 1993).Google Scholar
  26. 27.
    X. Shen and B.S. Ang,Implementing I-Structures at Cache Coherence Level,Proceedings on the 5th Annual MIT Student Workshop on Scalable Computing,MIT (1995).Google Scholar
  27. 28.
    W.-Y. Lin and J.-L. Gaudiot,I-Structure Software Cache-A Split-Phase Transaction Runtime Cache System,Proceedings of PACT '96 Boston,MA,Oct. 20-23 (1996).Google Scholar
  28. 30.
    M. Sato, Y. Kodama, S. Sakai, Y. Yamaguchi,and S. Sekiguti,Distributed Data Struc-ture in Thread-Based Programming for a Highly Parallel Dataflow Machine,EM-4.Proc. of ISCA 92 Dataflow Workshop (1992).Google Scholar
  29. 31.
    P. Wadler,Monads for Functional Programming.In Advanced Functional Programming, J. Jeuring and E. Meijer (eds.),Springer Verlag,LNCS 925 (1995).Google Scholar
  30. 32.
    P.S. Barth, R.S. Nikhil,and Arvind,M-Structures:Extending a Parallel,Non-strict.-Computation Structures,Proceedings on Functional Programming and Computer Architec-ture,Cambridge,MA,August 28-30 (1991).Google Scholar
  31. 33.
    S. Sur and W. Böhm,Functional,I-Structure,and M-Structure Implementations of NAS Benchmark FT,Proceedings of the Int 'l.Conf.on Parallel Architecture and Compilation Techniques (PACT '94)(August 1994).Google Scholar
  32. 34.
    I. Attali, D. Caromel, Y.-S. Chen, J.-L. Gaudiot,and A.L. Wendelbom,Enhanced Functional and Irregular Parallelism:Stateful Fucntions and Their Semantics,Int.J. Parallel Progr.29(4)(August 2001),in press.Google Scholar
  33. 35.
    D. Kranz, B.-H. Lim, A. Agarwal,and D. Yeoung,Low-Cost Support for Fine-Grain Synchronization in Multiprocessors.In Multithreaded Computer Architecture:A Summary of the State of the Art,R.A. Iannucci, G.R. Gao,and R.H. Halstead,Jr.(eds.),Kluwer Academic Publishers (1992).Google Scholar
  34. 36.
    A. Agarwal, R. Bianchini, D.Chiken,K. Johnson, D. Kranz, J. Kubiatowicz, B.-H. Lim, K. Mackenzie,and D. Yeung,The MIT Alewife Machine:Architecture and Performance, Proceedings of the 22nd Annual Int 'l.Symposium on Computer Architecture,ISCA '95, June 22-24,Santa Margherita Ligure,Italy,pp.2-13 (1995).Google Scholar
  35. 37.
    A.M. Stepanov, A.N. Tchernykh, A.I. Lupenko,and N.G. Tchernykh,Parallel Com-putation on Associative Network.In Proceedings of MPCS '96 MFCS '96 Second Int 'l.Conf.on Massively Parallel Computing Systems,IEEE Computer Society Press, pp.190-197 (1996).Google Scholar
  36. 38.
    A. Tchernykh, A. Stepanov, A. Rodrýguez,and I.Scherson,Parallel Computation in Abstract Network Machina,Revista Iberoamericana de Investigacion ''Computacion y Sistemas ''v.TV,No.4,pp.143-157 (2000).Google Scholar
  37. 39.
    K. Ueda and T. Chikayama,Design of the Kernel Language for the Parallel Inference Machine,Comput.J.33(6):494-500 (1990).Google Scholar
  38. 40.
    K.Ueda,Guarded Horn Clauses.In Concurrent Prolog:Collected Papers,E.Shapiro (ed.),MIT Press,Vol.1,pp.140-156 (1987).Google Scholar
  39. 41.
    K. Ueda,Designing a Concurrent Programming Language.In Proceedings of an Int 'l. Conf.organized by the IPSJ to Commemorate the 30th Anniversary (InfoJapan '90), Information Processing Society of Japan,pp.87-94 (October 1990).Google Scholar
  40. 42.
    W.-Y. Lin, J.N. Amaral, J.-L. Gaudiot,and G.R. Gao,Caching Single-Assignment Structures to Build a Robust Fine-Grain Multi-Threading System,Technical report,Dept. of E.E.-Systems,University of Southern California (July 1999).Google Scholar
  41. 43.
    W.-Y. Lin, J.-L. Gaudiot, J.N. Amaral,and G.R. Gao,Performance Analysis of the I-Structure Software Cache on Multi-Threading Systems,19th IEEE Int 'l.Performance, Computing and Communication Conference,IPCCC2000,Phoenix,Arizona,Feb.20-22 (2000).Google Scholar
  42. 44.
    W.-Y. Lin, J.N. Amaral, J.-L. Gaudiot,and G.R. Gao,Caching Single-Assignment Structures to Build a Robust Fine-Grain Multi-Threading System,Int 'l.Parallel and Dis-tributed Processing Symposium,IPDPS2000,Cancun,Mexico May 1-5 (2000).Google Scholar
  43. 45.
    J.N. Amaral, W.-Y. Lin, J.-L. Gaudiot,and G.R. Gao,Exploiting Locality in Single Assignment Data Structures Updated Through Split-Phase Transactions,Cluster Comput. J.4(4)(October 2001).Google Scholar
  44. 46.
    H. Ogawa and S. Matsuoka,OMPI:Optimizing MPI Programs Using Partial Evaluation, Proceedings of the 1996 IEEE/ACM Supercomputing Conference,Pittsburgh (November 1996).Google Scholar
  45. 47.
    T. von Eicken, D.E. Culler, S.C. Goldstein,and K.E. Schauser,Active Messages: A Mechnisim for Integrated Communication and Computation,Proceedings of the 19th Int 'l.Symposium on Computer Architecture,pp.256-266 (May 1992).Google Scholar
  46. 48.
    A.M. Stepanov, A.N. Tchernykh,A.I. Lupenko,and N.G. Tchernykh,Dynamic Partial Evaluations as Declarative Program Parallelization and Optimization Technique, Information Technology and Computer Systems 1(4):32-41 (1997).Google Scholar
  47. 49.
    A.N. Tchernykh, A.M. Stepanov, A.I. Lupenko,and N.G. Tchernykh,Extraction and Optimization of the Implicit Program Parallelism by Dynamic Partial Evaluation, pAs '97 The Second Aizu Int 'l.Symposium on Parallel Algorithms/Architecture Synthesis, pp.332-339,IEEE Computer Society Press (1997).Google Scholar
  48. 50.
    A. Stepanov and A. Lupenko,Programming for ANM,Institute of Precise Mechanics and Computer Technology RAS;3,p.53,Moscow (1991).Google Scholar
  49. 51.
    J.B. Dennis and G.R. Gao,On Memory Models and Cache Management for Shared-Memory Multiprocessors,CSG MEMO 363,Laboratory for Computer Science,MIT (March 1995).Google Scholar
  50. 52.
    D.E. Culler, S.C. Goldstein, K.E. Schauser,and T. von Eicken,Empirical Study of a Dataflow Language on the CM-5.In Advanced Topics in Dataflow Computing and Multi-threading,G.R. Gao, L. Bic,and J.-L. Gaudiot (eds.),pp.187-210,IEEE press (1994).Google Scholar
  51. 53.
    R. Govindarajan, S. Nemawarkar,and P. LeNir, Design and Performance Evaluation of a Multithreaded Architecture.In Proceedings of the First Int 'l.Symposium on High-Per-formance Computer Architecture,Raliegh,pp.298-307 (1995).Google Scholar
  52. 54.
    W.-Y. Lin and J.-L. Gaudiot,Exploiting Global Data Locality in Non-Blocking Multi-threading Architectures,Proceedings of ISPAN '97,Taipei,Taiwan (December 1997).Google Scholar
  53. 55.
    W.-Y. Lin and J.-L. Gaudiot,The Design of an I-Structure Software Cache System, Proceedings of MTEAC '98,Las Vegas, February 1-4.Google Scholar
  54. 56.
    H.-S. Kim, S. Ha,and C.S. Jhon,Performance Impacts of Caching I-Structure Data on Frame-Based Multithreaded Processing,Proceedings of the High-performance Computing on the Information Superhighway,HPC-Asia '97 (1997).Google Scholar
  55. 57.
    K.M. Kavi, A.R. Hurson, P. Patadia, E. Abraham,and P. Shanmugam,Design of Cache Memories for Multithreaded Dataflow Architecture.In ISCA 95,pp.253264 (1995).Google Scholar
  56. 58.
    J. Darlington, M. Cripps, T. Field, P. Harrison,and M. Reeve,The Design and Imple-mentation of ALICE:A Parallel Graph Reduction Machine.In Selected Reprints on Dataflow and Reduction Architectures,S.S. Trakkan (ed.),IEEE Computer Society Press (1987).Google Scholar
  57. 59.
    J. Peyton, C. Clark, J. Salkild,and M. Hardie,GRID-A High-Performance Architecture for Parallel Graph Reduction,Processing of 1987 Functional Programming Languages and Computer Architecture Conference,Springer-Verlag LNCS 274,pp.98-112 (1987).Google Scholar
  58. 60.
    P. Sesyoft,Deriving a Lazy Abstract Machine,J.Functioning Programming 1(1) (1993).Google Scholar
  59. 61.
    R. Surati and A. Berlin,Exploiting the Parallelism Exposed by Partial Evaluation,MIT A.I.Memo No.1414a (May 1994).Google Scholar

Copyright information

© Plenum Publishing Corporation 2003

Authors and Affiliations

  • Alfredo Cristobal-Salas
    • 1
  • Andrei Tchernykh
    • 1
  • Jean-Luc Gaudiot
    • 2
  • Wen-Yen Lin
    • 3
  1. 1.CICESE Research CenterEnsenada, BCMexico
  2. 2.UCI Parallel Systems & Computer Architectures Lab, Department of Electrical and Computer EngineeringUniversity of CaliforniaIrvine
  3. 3.TIA Mobile, Inc.Los Angeles

Personalised recommendations