Making a packet: Cost-effective communication for a parallel graph reducer

  • Hans-Wolfgang Loidl
  • Kevin Hammond
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1268)


This paper studies critical runtime-system issues encountered when packing data for transmission in a lazy, parallel graph reduction system. In particular, we aim to answer two questions:
  • How much graph should go into a packet?

  • How aggressively should a processor look for work after requesting remote data?

In order to answer the first question, we compare various packing schemes, of which one extreme packs just the node that is demanded (“incremental fetching”), and the other packs all the graph that is reachable from that node (“bulk fetching”). The second question is addressed by considering various mechanisms for latency hiding during communication, ranging from fully synchronous communication with no attempt to mask latency, to full thread migration during asynchronous communication. In order to make our results as general as possible, we have used the GranSim simulator to study a wide variety of parallel machine configurations. Based on these measurements we propose concrete improvements for parallel graph reducers such as the GUM implementation of Glasgow Parallel Haskell.


Parallel Machine Packet Size Functional Programming Asynchronous Communication Remote Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [Cha97]
    M. Chakravarty. On the Massively Parallel Execution of Declarative Programs. PhD Thesis, Technical Univ. of Berlin, Feb. 1997.Google Scholar
  2. [CGS93]
    D.E. Culler, S.C. Goldstein, K.E. Schauser, T. von Eicken. TAM — A Compiler Controlled Threaded Abstract Machine. Journal of Parallel and Distributed Computing, 18:347–370, Jun. 1993.Google Scholar
  3. [FN96]
    C. Flanagan and R.S. Nikhil. pHluid: The Design of a Parallel Functional Language Implementation on Workstations. In ICFP'96 — Intl. Conf. on Functional Programming, pp. 169–179, Philadelphia, PA, May 24–26, 1996. ACM Press.Google Scholar
  4. [Gol88]
    B. Goldberg. Multiprocessor Execution of Functional Programs. Intl. Journal of Parallel Programming, 17(5):425–473, Oct. 1988.Google Scholar
  5. [HLP95]
    K. Hammond, H-W. Loidl, and A. Partridge. Visualising Granularity in Parallel Programs: A Graphical Winnowing System for Haskell. In HPFC'95 — Conf. on High Performance Functional Computing, pp. 208–221, Denver, CO, Apr. 10–12, 1995.Google Scholar
  6. [Kes96]
    M. Kesseler. The Implementation of Functional Languages on Parallel Machines with Distributed Memory. PhD thesis, Univ. of Nijmegen, Apr. 1996.Google Scholar
  7. [KLB91]
    H. Kingdon, D. Lester, and G. Burn. The HDG-machine: a highly distributed graph-reducer for a transputer network. The Computer Journal, 34(4):290–301, 1991.Google Scholar
  8. [Loi96]
    H-W. Loidl. GranSim User's Guide. Dept. of Computing Science, Univ. of Glasgow. Jul. 1996.Google Scholar
  9. [MKH90]
    E. Mohr, D.A. Kranz, and R.H. Halstead Jr. Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs. In LFP'90 — Conf. on Lisp and Functional Programming, pp. 185–197, Nice, France, Jun. 27–29, 1990.Google Scholar
  10. [MSS94]
    R.G. Morgan, M.H. Smith, and S. Short. Translation by Meaning and Style in Lolita. In Intl. BCS Conf. — Machine Translation Ten Years On, Cranfield Univ., Nov. 1994.Google Scholar
  11. [Ost93]
    G. Ostheimer. Parallel Functional Programming for Message-Passing Multiprocessors. PhD thesis, Univ. of St Andrews, Mar. 1993.Google Scholar
  12. [PCS89]
    S.L. Peyton Jones, C. Clack, and J. Salkild. High-Performance Parallel Graph Reduction. In PARLE'89 — Parallel Architectures and Languages Europe, LNCS 365, pp. 193–206. Springer-Verlag, 1989.Google Scholar
  13. [PCSH87]
    S.L. Peyton Jones, C. Clack, J. Salkild, and M. Hardie. GRIP — a High-Performance Architecture for Parallel Graph Reduction. In FPCA'87 — Intl. Conf. on Functional Programming Languages and Computer Architecture, LNCS 274, pp. 98–112, Portland, OR, Sep. 14–16, 1987. Springer-Verlag.Google Scholar
  14. [Pey89]
    S.L. Peyton. Jones. Parallel Implementations of Functional Programming Languages. The Computer Journal, 32(2):175–186, Apr. 1989.Google Scholar
  15. [PH+97]
    J.C. Peterson, K. Hammond (eds.) et al. Haskell 1.4 — A Non-Strict, Purely Functional Language, Apr. 1997.Google Scholar
  16. [TD94]
    I. Toyn and A.J. Dix. Efficient Binary Transfer of Pointer Structures. Software — Practice and Experience, 24(11):1001–1023, Nov. 1994.Google Scholar
  17. [THL+96]
    P. Trinder, K. Hammond, H-W. Loidl, S.L. Peyton Jones, and J. Wu. A Case Study of Data-intensive Programs in Parallel Haskell. In Glasgow Workshop on Functional Programming 1996, Ullapool, Scotland, Jul. 8–10.Google Scholar
  18. [THM+96]
    P. Trinder, K. Hammond, J.S. Mattson Jr., A.S. Partridge, and S.L. Peyton Jones. GUM: a portable parallel implementation of Haskell. In PLDI'96 — Programming Languages Design and Implementation, pp. 79–88, Philadelphia, PA, May 1996.Google Scholar
  19. [WH96]
    J. Wu and L. Harbird. A Functional Database System for Road Accident Analysis. Advances in Engineering Software, 26(l):29–43, 1996.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Hans-Wolfgang Loidl
    • 1
  • Kevin Hammond
    • 2
  1. 1.Department of Computing ScienceUniversity of GlasgowScotland, UK
  2. 2.Division of Computer ScienceUniversity of St. AndrewsScotland, UK

Personalised recommendations