Abstract
The Fresh Breeze memory model and system architecture is proposed as an approach to achieving significant improvements in massively parallel computation by supporting fine-grain management of memory and processing resources and utilizing a global shared name space for all processors and computation tasks. Memory management and the scheduling of tasks are done by hardware realizations, eliminating nearly all operating system execution cycles for data access, task scheduling and security. In particular, the Fresh Breeze memory model uses trees of fixed-size chunks of memory to represent all data objects, which eliminates data consistency issues and simplifies memory management. Low-cost reference-count garbage collection is used to support modular programming in type-safe programming languages.
The main contributions of this paper are: (1) a program exection model for massively parallel computing as the Fresh Breeze application programming interface (API) comprising a radical memory model and a scheme for expressing concurrency; (2) an experimental implementation of the API through simulation using the FAST simulator of the IBM Cyclops 64 many-core chip; (3) simulation results that demonstrate that (a) fine-grain hardware-implemented resource management mechanisms can support massive parallelism and high processor utilization through the latency-hiding properties of multi-tasking; and (b) hardware implementation of a work stealing scheme incorporated in our simulation can effectively distribute tasks over the processors of a many-core parallel computer.
Similar content being viewed by others
References
Dennis JB (1997) A parallel program execution model supporting modular software construction. In: Massively parallel programming models. IEEE Comput Soc, Los Alamitos, pp 50–60
Dennis JB (2003) Fresh breeze: a multiprocessor chip architecture guided by modular programming principles. SIGARCH Comput Archit News 31(1):7–15
Dennis JB, Horn ECV (1966) Programming semantics for multi-programmed computations. Commun ACM, 9, Feb 1966
Levy H (1984) Capability-based computer systems. Butterworth-Heinemann, Stoneham-London
Wilkes MV (1979) The Cambridge CAP computer and its operating system (Operating and programming systems series). Operating and programming systems series. North-Holland, Amsterdam
Shapiro JS, Smith JM, Farber DJ (1999) Eros: a fast capability system. In: Proceedings of the seventeenth ACM symposium on operating systems principles, SOSP’99. ACM, New York, pp 170–185
Dennis JB (2006) The Fresh Breeze model of thread execution. In: Workshop on programming models for ubiquitous parallelism. IEEE Comput Soc, Los Alamitos. Published with PACT-2006
Frigo M, Leiserson CE, Randall KH (1998) The implementation of the Cilk-5 multithreaded language. ACM SIGPLAN Not 33:212–223
Ginzburg I (2007) Compiling array computations for the Fresh Breeze parallel processor. Thesis for the Master of Engineering degree, MIT Department of Electrical Engineering and Computer Science, May 2007
del Cuvillo J, Zhu W, Hu Z, Gao GR (2005) Tiny threads: a thread virtual machine for the Cyclops 64 cellular architecture. In: International parallel and distributed processing symposium. IEEE Comput Soc, Los Alamitos, p 265
Schmidt B (2008) A shared memory system for Fresh Breeze. Master’s thesis, MIT Department of Electrical Engineering and Computer Science, May 2008
del Cuvillo J, Zhu W, Hu Z, Gao GR (2005) FAST: a functionally accurate simulation toolset for the Cyclops 64 cellular architecture
Bensoussan A, Clingen CT, Daley RC (1969) The Multics virtual memory. In: Proceedings of the second symposium on operating systems principles. ACM, New York, pp 30–42
Soltis FG (1996) Inside the AS/400. Duke Press, Loveland
Vee V-Y, Hsu W-J (1999) Applying Cilk in provably efficient task scheduling. Comput J 42:699–712
Theobald KB (1999) EARTH: an efficient architecture for running threads. PhD thesis, University of Delaware, May 1999
Hum HHJ, Maquelin O, Theobald KB, Tian X, Tang X, Gao GR (1995) A design study of the EARTH multiprocessor. In: Conference on parallel architectures and compilation techniques, PACT. IEEE Comput Soc, Los Alamitos, pp 59–68
Theobald KB, Gao GR, Sterling TL (1999) Superconducting processors for HTMT: Issues and challenges. In: ACM’87: the 7th symp on the frontiers of massively parallel computation: today and tomorrow. ACM, New York, pp 260–267
Charles P, Grotho C, Saraswat V, Donawa C, Kielstra A, Ebcioglu K, von Praun C, Sarkar V (2005) X10: an object-oriented approach to non-uniform cluster computing. In: 2005 conference on objectoriented programming. ACM, New York, pp 519–538
Sarkar V, Hennessy J (1986) Compile-time partitioning and scheduling of parallel programs. In: 86 symposium on compiler construction, SIGPLAN. ACM, New York, pp 17–26
Shirako J, Peixotto D, Sarkar V, Scherer W (2008) Phasers: a unified deadlock-free construct for collective and point-to-point synchronization. In: Twenty-second international conference on supercomputing. IEEE Comput Soc, Los Alamitos
Guo Y, Barik R, Raman R, Sarkar V (2009) Work-first and help-first scheduling policies for async-finish task parallelism. In: International parallel and distributed processing symposium, IPDPS. IEEE Comput Soc, Los Alamitos
Callahan D, Chamberlain BL, Zima HP (2004) The Cascade high productivity language. In: Ninth international workshop on high-level parallel programming models and supportive environments
Yuba T, Hiraki K, Shimada T, Sekiguchi S, Nishida K (1987) The Sigma-1 dataflow computer. In: ACM’87: proceedings of the 1987 fall joint computer conference on exploring technology: today and tomorrow. IEEE Comput Soc, Los Alamitos, pp 578–585
Darringer J, Davidson E, Hathaway D, Koenemann B, Lavin M, Morrell J, Rahmat K, Roesner W, Schanzenbach E, Tellez G, Trevillyan L (2000) EDA in IBM: past present, and future. IEEE Trans Comput-Aided Des Integr Circuits Syst 19:1476–1497
Dubois M, Jeong J, Song Y, Moga A (1998) Rapid hardware prototyping on RPM-2. IEEE Des Test Comput, pp 112–118
Wawrzynek J, Patterson D, Oskin M, Lu S-L, Kozyrakis C, Hoe J, Chiou D, Asanovic K (2007) RAMP: research accelerator for multiple processors. IEEE MICRO 27:46–57
Cavé V, Budimlić Z, Sarkar V (2010) Comparing the usability of library vs. language approaches to task parallelism. ACM PLATEAU’10, evaluation and usability of programming languages and tools, pp 9.1–9.6
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dennis, J.B., Gao, G.R. & Meng, X.X. Experiments with the Fresh Breeze tree-based memory model. Comput Sci Res Dev 26, 325–337 (2011). https://doi.org/10.1007/s00450-011-0165-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00450-011-0165-1