The Nornir run-time system for parallel programs using Kahn process networks on multi-core machines—a flexible alternative to MapReduce

Vrba, Željko; Halvorsen, Pål; Griwodz, Carsten; Beskow, Paul; Espeland, Håvard; Johansen, Dag

doi:10.1007/s11227-010-0503-2

The Nornir run-time system for parallel programs using Kahn process networks on multi-core machines—a flexible alternative to MapReduce

Open access
Published: 13 November 2010

Volume 63, pages 191–217, (2013)
Cite this article

Download PDF

You have full access to this open access article

The Journal of Supercomputing Aims and scope Submit manuscript

The Nornir run-time system for parallel programs using Kahn process networks on multi-core machines—a flexible alternative to MapReduce

Download PDF

Željko Vrba^1,2,
Pål Halvorsen^1,2,
Carsten Griwodz^1,2,
Paul Beskow^1,2,
Håvard Espeland^1,2 &
…
Dag Johansen³

947 Accesses
3 Citations
3 Altmetric
Explore all metrics

Abstract

Even though shared-memory concurrency is a paradigm frequently used for developing parallel applications on small- and middle-sized machines, experience has shown that it is hard to use. This is largely caused by synchronization primitives which are low-level, inherently non-deterministic, and, consequently, non-intuitive to use. In this paper, we present the Nornir run-time system. Nornir is comparable to well-known frameworks such as MapReduce and Dryad that are recognized for their efficiency and simplicity. Unlike these frameworks, Nornir also supports process structures containing branches and cycles. Nornir is based on the formalism of Kahn process networks, which is a shared-nothing, message-passing model of concurrency. We deem this model a simple and deterministic alternative to shared-memory concurrency. Experiments with real and synthetic benchmarks on up to 8 CPUs show that performance in most cases scales almost linearly with the number of CPUs, when not limited by data dependencies. We also show that the modeling flexibility allows Nornir to outperform its MapReduce counterparts using well-known benchmarks.

Article PDF

Distributed execution of communicating sequential process-style concurrency: Golang case study

Article 17 October 2018

Comparing Runtime Systems with Exascale Ambitions Using the Parallel Research Kernels

Comparison of Load Balancing Schemes for Asynchronous Many-Task Runtimes

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Allen G, Zucknick P, Evans B (2007) A distributed deadlock detection and resolution algorithm for process networks. In: IEEE international conference on acoustics, speech and signal processing, (ICASSP) 2, April 2007, pp II-33–II-36
Google Scholar
Apache Hadoop, Accessed July 2009. http://hadoop.apache.org/
Armstrong J (2007) A history of Erlang. In: HOPL III: Proceedings of the 3rd ACM SIGPLAN conference on history of programming languages, pp 6-1–6-26. ACM, New York
Chapter Google Scholar
Arora NS, Blumofe RD, Plaxton CG (1998) Thread scheduling for multiprogrammed multiprocessors. In: Proceedings of ACM symposium on parallel algorithms and architectures (SPAA). ACM, New York, pp 119–129
Chapter Google Scholar
Brooks C, Lee EA, Liu X, Neuendorffer S, Zhao Y, Zheng H (2008) Heterogeneous concurrent modeling and design in Java (vol 1: Introduction to Ptolemy II). Tech rep UCB/EECS-2008-28, EECS Department, University of California, Berkeley, Apr 2008
Buhr PA, Stroobosscher RA (1990) The μ system: providing light-weight concurrency on shared-memory multiprocessor computers running UNIX. Softw Pract Exp 20(9):929–964
Article Google Scholar
Catalyurek U, Boman E, Devine K, Bozdag D, Heaphy R, Riesen L (2007) Hypergraph-based dynamic load balancing for adaptive scientific computations. In: Proc of 21st international parallel and distributed processing symposium (IPDPS’07). IEEE Press, New York. Also available as Sandia National Labs Tech Report SAND2006-6450C
Google Scholar
Chaiken R, Jenkins B, Larson P-Å, Ramsey B, Shakib D, Weaver S, Zhou J (2008) Scope: easy and efficient parallel processing of massive data sets. Proc VLDB Endow 1(2):1265–1276
Google Scholar
Chih Yang H, Dasdan A, Hsiao R-L, Parker DS (2007) Map-Reduce-Merge: simplified relational data processing on large clusters. In: Proceedings of ACM international conference on management of data (SIGMOD), pp 1029–1040
Google Scholar
de Kock E, Essink G, Smits WJM, van der Wolf R, Brunei J-Y, Kruijtzer W, Lieverse P, Vissers KA, Yapi K (2000) Application modeling for signal processing systems. In: Proceedings of design automation conference, pp 402–405
Chapter Google Scholar
de Kruijf M, Sankaralingam K (2007) MapReduce for the Cell BE architecture. University of Wisconsin Computer Sciences technical report CS-TR-2007 1625
Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: Proceedings of symposium on operating systems design & implementation (OSDI). USENIX Association, Berkeley, p 10
Google Scholar
Dean J, Ghemawat S (2010) System and method for efficient large-scale data processing. US Patent No 7650331, Jan 2010
Gedik B, Andrade H, Wu K-L, Yu PS, Doo M (2008) Spade: the system s declarative stream processing engine. In: SIGMOD ’08: proceedings of the 2008 ACM SIGMOD international conference on management of data. ACM, New York, pp 1123–1134
Chapter Google Scholar
Geilen M, Basten T (2003) Requirements on the execution of Kahn process networks. In: Programming languages and systems, European symposium on programming (ESOP). Springer, Berlin, pp 319–334
Chapter Google Scholar
Giacomoni J, Moseley T, Vachharajani M (2008) FastForward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue. In: PPoPP: proceedings of the ACM SIGPLAN symposium on principles and practice of parallel programming. ACM, New York, pp 43–52
Chapter Google Scholar
Gordon MI, Thies W, Amarasinghe S (2006) Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In: ASPLOS-XII: proceedings of the 12th international conference on architectural support for programming languages and operating systems. ACM, New York, pp 151–162
Chapter Google Scholar
He B, Fang W, Luo Q, Govindaraju NK, Wang T (2008) Mars: a MapReduce framework on graphics processors. In: PACT ’08: proceedings of the 17th international conference on parallel architectures and compilation techniques. ACM, New York, pp 260–269
Chapter Google Scholar
Hudak P, Hughes J, Jones SP, Wadler P (2007) A history of Haskell: being lazy with class. In: HOPL III: proceedings of the 3rd ACM SIGPLAN conference on history of programming languages, pp 12-1–12-55. ACM, New York
Chapter Google Scholar
Intel Corporation, Threading building blocks. http://www.threadingbuildingblocks.org
Isard M, Budiu M, Yu Y, Birrell A, Fetterly D (2007) Dryad: distributed data-parallel programs from sequential building blocks. In: Proc of the ACM SIGOPS/EuroSys European conference on computer systems. ACM, New York, pp 59–72
Chapter Google Scholar
Kahn G (1974) The semantics of a simple language for parallel programming. Inf Process 74
Knuth DE (1997) Fundamental Algorithms. The Art of Computer Programming, vol 1. Addison–Wesley, Reading
Google Scholar
Lämmel R (2007) Google’s MapReduce programming model—revisited. Sci Comput Program 68(3):208–237
Google Scholar
Lee EA, Parks T (1995) Dataflow process networks. Proc IEEE 83(5):773–801
Article Google Scholar
Message passing interface forum, Accessed July 2009. http://www.mpi-forum.org/
Olson A, Evans B (2005) Deadlock detection for distributed process networks. In: ICASSP: Proc of IEEE international conference on acoustics, speech, and signal processing, March 2005, vol 5, pp 73–76
Google Scholar
Olston C, Reed B, Srivastava U, Kumar R, Tomkins A (2008) Pig latin: a not-so-foreign language for data processing. In: SIGMOD ’08: proceedings of the 2008 ACM SIGMOD international conference on management of data. ACM, New York, pp 1099–1110
Chapter Google Scholar
Pike R, Dorward S, Griesemer R, Quinlan S (2005) Interpreting the data: parallel analysis with Sawzall. Sci Program 13(4):277–298
Google Scholar
PVM (Parallel Virtual Machine), Accessed August 2010. http://www.csm.ornl.gov/pvm/
Ranger C, Raghuraman R, Penmetsa A, Bradski G, Kozyrakis C (2007) Evaluating MapReduce for multi-core and multiprocessor systems. In: Proceedings of the IEEE international symposium on high performance computer architecture (HPCA). IEEE Computer Society, Washington, pp 13–24
Google Scholar
Richardson IEG H.264/MPEG-4 part 10 white paper. Available online. http://www.vcodex.com/files/h264_overview_orig.pdf
The OpenMP API specification for parallel programming, Accessed July 2009. http://openmp.org/wp/
Thompson M, Pimentel A (2007) Towards multi-application workload modeling in sesame for system-level design space exploration. In: Embedded computer systems: architectures, modeling, and simulation, vol 4599/2007, pp 222–232
Chapter Google Scholar
Valvåg SV, Johansen D (2008) Oivos: Simple and efficient distributed data processing. In: Proceedings of IEEE international conference on high performance computing and communications (HPCC), pp 113–122
Chapter Google Scholar
Valvåg SV, Johansen D (2009) Cogset: A unified engine for reliable storage and parallel processing. In: Proceedings of IFIP international conference on network and parallel computing workshops (NPC), pp 174–181
Chapter Google Scholar
Vrba Ž (2009) Implementation and performance aspects of Kahn process networks. PhD thesis, Department of Informatics, University of Oslo, Norway, Dec 2009. Dissertation No 903
Vrba Ž, Halvorsen P, Griwodz C (2009) Evaluating the run-time performance of Kahn process network implementation techniques on shared-memory multiprocessors. In: International conference on complex, intelligent and software intensive systems (CISIS)—international workshop on multi-core computing systems (MuCoCoS), pp 639–644
Google Scholar
Vrba Ž, Halvorsen P, Griwodz C, Beskow P (2009) Kahn process networks are a flexible alternative to MapReduce. In: IEEE international conference on high performance computing and communications (HPCC), pp 154–162
Google Scholar
Vrba Ž, Halvorsen P, Griwodz C, Beskow P, Johansen D (2009) The Nornir run-time system for parallel programs using Kahn process networks. In: 6th international conference on network and parallel computing (NPC), October 2009. IEEE Computer Society, Los Alamitos, pp 1–8
Google Scholar
Vrba Ž, Halvorsen P, Griwodz C (2010) A simple improvement of the work-stealing scheduling algorithm. In: International conference on complex, intelligent and software intensive systems (CISIS)—international workshop on multi-core computing systems (MuCoCoS), pp 925–930
Google Scholar

Download references

Author information

Authors and Affiliations

Simula Research Laboratory, Oslo, Norway
Željko Vrba, Pål Halvorsen, Carsten Griwodz, Paul Beskow & Håvard Espeland
Department of Informatics, University of Oslo, Oslo, Norway
Željko Vrba, Pål Halvorsen, Carsten Griwodz, Paul Beskow & Håvard Espeland
Department of Computer Science, University of Tromsø, Tromsø, Norway
Dag Johansen

Authors

Željko Vrba
View author publications
You can also search for this author in PubMed Google Scholar
Pål Halvorsen
View author publications
You can also search for this author in PubMed Google Scholar
Carsten Griwodz
View author publications
You can also search for this author in PubMed Google Scholar
Paul Beskow
View author publications
You can also search for this author in PubMed Google Scholar
Håvard Espeland
View author publications
You can also search for this author in PubMed Google Scholar
Dag Johansen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Željko Vrba.

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Vrba, Ž., Halvorsen, P., Griwodz, C. et al. The Nornir run-time system for parallel programs using Kahn process networks on multi-core machines—a flexible alternative to MapReduce. J Supercomput 63, 191–217 (2013). https://doi.org/10.1007/s11227-010-0503-2

Download citation

Published: 13 November 2010
Issue Date: January 2013
DOI: https://doi.org/10.1007/s11227-010-0503-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The Nornir run-time system for parallel programs using Kahn process networks on multi-core machines—a flexible alternative to MapReduce

Abstract

Article PDF

Similar content being viewed by others

Distributed execution of communicating sequential process-style concurrency: Golang case study

Comparing Runtime Systems with Exascale Ambitions Using the Parallel Research Kernels

Comparison of Load Balancing Schemes for Asynchronous Many-Task Runtimes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The Nornir run-time system for parallel programs using Kahn process networks on multi-core machines—a flexible alternative to MapReduce

Abstract

Article PDF

Similar content being viewed by others

Distributed execution of communicating sequential process-style concurrency: Golang case study

Comparing Runtime Systems with Exascale Ambitions Using the Parallel Research Kernels

Comparison of Load Balancing Schemes for Asynchronous Many-Task Runtimes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation