Abstract
In this chapter, after introducing the principles of invasive computing and a considered multi-processor system-on-a-chip (MPSoC) architecture, we dig into deeper details by introducing Tightly Coupled Processor Arrays (TCPAs), a class of coarse-grained reconfigurable processor arrays. After briefly explaining our loop mapping methodology on such architectures, we make the following contributions for realising invasive computing concepts on TCPAs: (a) development of ultra fast, distributed, and hardware-based resource invasion strategies to acquire regions of Processing Elements (PEs) of different shapes and sizes. (b) Proposing two different design variants for realising invasion strategies at the hardware level, and evaluate their timing overheads as well as hardware costs. (c) Investigation of different signalling concepts and data structure to collect information about the number and the location of invaded PEs. (d) Development of the hardware/software interfaces for integrating TCPAs into a tiled architecture, and finally, (e) evaluation of the hardware costs and timing overheads based on prototype implementations on the basis of FPGA hardware
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
By default, a claim may be used exclusively by the invading application. This is the implication of the term invasive computing. Through invasion, an application may isolate itself from other application which allows to provide and enforce predictability in many non-functional aspects of program execution.
- 3.
This conception goes back to the notion of a “servlet”, which is a (Java) application program snippet target for execution within a web server.
- 4.
Safety integrity levels are defined based on the Probability of Failures per Hour (PFH), namely, SIL 1: PFH = \(10^-5 \ldots 10^-6\); SIL 2: PFH = \(10^-6\ldots 10^-7\); SIL 3: PFH = \(10^-7\ldots 10^-8\); SIL 4: PFH = \(10^-8\ldots 10^-9\).
- 5.
- 6.
Note that \(\pi \) may be often chosen smaller than the latency \(L_l\) of one loop iteration. In that case, the execution of multiple iterations does overlap (also called modulo scheduling).
- 7.
There could be another flavour of implementation by invading the first column and sending horizontal invades by PEs in the first column. However w. l. o. g., in this book, we only describe the first flavour for the sake of simplicity.
- 8.
There might be other parameters such as inequality operators over the number of invaded PEs, e.g., minimum, maximum, or exact number of PEs to be invaded by invasion operands. The explanation of such parameters are w. l. o. g. excluded from this book for the sake of simplicity.
- 9.
Here the order of the directions defines the direction priority at which invade signals are propagated. Without loss of generality and for the sake of simplicity in the rest of this book, we consider the horizontal direction to have higher priority, as may be observed in Fig. 2.6. Therefore, combinations such as north–west or south–west are not considered.
- 10.
The use of such turn-points and their effects on the power consumption of TCPAs is discussed in Chap. 3 but in order to keep the size of invasion commands as small as possible, this feature is excluded from the explanations given in this chapter.
References
Association S et al (2014) International technology roadmap for semiconductors. Technical report, Semiconductor Industry Association
Borkar S, Jouppi N, Stenstrom P (2007) Microprocessors in the era of terascale integration. In: Proceedings of the conference on design, automation and test in Europe (DATE). EDA Consortium, pp 237–242. ISBN 978-3-9810801-2-4
Teich J (2008) Invasive algorithms and architectures. it - Inf Technol 50(5):300–310
Teich J, Weichslgartner A, Oechslein B, Schröder-Preikschat W (2012) Invasive computing – concepts and overheads. In: Proceedings of the forum on specification and design languages (FDL)
Hannig F, Roloff S, Snelting G, Teich J, Zwinkau A (2011) Resource-aware programming and simulation of MPSoC architectures through extension of X10. In: Proceedings of the 14th international workshop on software and compilers for embedded systems (SCOPES). ACM Press, pp 48–55. doi:10.1145/1988932.1988941. ISBN 978-1-4503-0763-5
Teich J, Henkel J, Herkersdorf A, Schmitt-Landsiedel D, Schröder-Preikschat W, Snelting G (1011) Invasive computing: an overview. In: Hübner M, Becker J (eds) Multiprocessor system-on-chip – hardware design and tool integration. Springer, Berlin, pp 241–268. doi:10.1007/978-1-4419-6460-1_11. ISBN 978-1-4419-6459-5
Gerndt M, Hollmann A, Meyer M, Schreiber M, Weidendorfer J (2012) Invasive computing with iOMP. In: Proceedings of the forum on specification and design languages (FDL), pp 225–231. ISBN 978-2-9530504-5-5
Saraswat V, Bloom B, Peshansky I, Tardieu O, Grove D (2011) X10 language specification
Teich J, Schröder-Preikschat W, Herkersdorf A (2013) Invasive computing - common terms and granularity of invasion. In: CoRR. abs/1304.6067
Charles P, Grothoff C, Saraswat V, Donawa C, Kielsstra A, Ebcioglu K, von Praun C, Sarkar V (2005) X10: an object-oriented approach to non-uniform cluster computing. In: Proceedings of the 20th annual ACM SIGPLAN conference on object-oriented programming, systems, languages, and applications. ACM, pp 519–538
Braun M, Buchwald S, Mohr M, Zwinkau A (2012) An x10 compiler for invasive architectures. Technical Report 9, Karlsruhe Institute of Technology. http://digbib.ubka.uni-karlsruhe.de/volltexte/1000028112
Gall H (2008) Functional safety iec 61508 / iec 61511 the impact to certification and the user. In: IEEE/ACS international conference on Computer systems and applications, 2008. AICCSA 2008, pp 1027–1031. doi:10.1109/AICCSA.2008.4493673
Heisswolf J, Zaib A, Zwinkau A, Kobbe S, Weichslgartner A, Teich J, Henkel J, Snelting G, Herkersdorf A, Becker J (2014) CAP: communication aware programming. In: Proceedings of the 51th annual design automation conference (DAC), pp 105:1–105:6. doi:10.1145/2593069.2593103
Heisswolf J, Zaib A, Weichslgartner A, König R, Wild T, Teich J, Herkersdorf A, Becker J (2013) Virtual networks – distributed communication resource management. ACM Trans Reconfig Technol Syst 6(2):8:1–8:14. doi:10.1145/2492186. ISSN 1936-7406
Grudnitsky A, Bauer L, Henkel J (2014) COREFAB: concurrent reconfigurable fabric utilization in heterogeneous multi-core systems. In: International conference on compilers, architecture and synthesis for embedded systems (CASES). doi:10.1145/2656106.2656119
Pujari RK, Wild T, Herkersdorf A, Vogel B, Henkel J (2012) Hardware assisted thread assignment for RISC based MPSoCs in invasive computing. In: Proceedings of the 13th international symposium on integrated circuits (ISIC), pp 106–109. doi:10.1109/ISICir.2011.6131920
Oechslein B, Schedel J, Kleinöder J, Bauer L, Henkel J, Lohmann D, Schröder-Preikschat W (2011) OctoPOS: a parallel operating system for invasive computing. In: McIlroy R, Sventek J, Harris T, Roscoe T (eds) Proceedings of the international workshop on systems for future multi-core architectures (sfma), volume usb proceedings of sixth international acm/eurosys european conference on computer systems (EuroSys). EuroSys, Apr., pp 9–14
Boppu S, Hannig F, Teich J (2014) Compact code generation for tightly-coupled processor arrays. J Signal Process Syst (JSPS), 77(1–2):5–29. doi:10.1007/s11265-014-0891-2. ISSN 1939-8018
Kissler D, Hannig F, Kupriyanov A, Teich J (2006) A dynamically reconfigurable weakly programmable processor array architecture template. In: Proceedings of the international workshop on reconfigurable communication centric system-on-chips (ReCoSoC), pp 31–37
Boppu S, Hannig F, Teich J (2013) Loop program mapping and compact code generation for programmable hardware accelerators. In: Proceedings of the 24th ieee international conference on application-specific systems, architectures and processors (ASAP). IEEE, pp 10–17. doi:10.1109/ASAP.2013.6567544. ISBN 978-1-4799-0493-8
Kissler D (2011) Power-efficient tightly-coupled processor arrays for digital signal processing. Dissertation, Hardware/Software Co-Design, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
Teich J, Boppu S, Hannig F, Lari V (2015) Compact code generation and throughput optimization for coarse-grained reconfigurable arrays, chapter 10. Imperial College Press, London, pp 167–206. doi:10.1142/9781783266975_0010. ISBN 978-1-78326-696-8
Bondhugula U, Hartono A, Ramanujam J, Sadayappan P (2008) Pluto: A practical and fully automatic polyhedral program optimization system. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation (PLDI). Citeseer
Yuki T, Rajopadhye S (2013) Parametrically tiled distributed memory parallelization of polyhedral programs. Technical report, CS-13-105, Colorado State University
Thiele L, Roychowdhury V (1991) Systematic design of local processor arrays for numerical algorithms. In: Proceedings of the international workshop on algorithms and parallel VLSI architectures, volume A: Tutorials. Elsevier, Amsterdam, The Netherlands, pp 329–339
Thiele L (1989) On the design of piecewise regular processor arrays. IEEE Int Symp Circuits Syst 3:2239–2242
Feautrier P (1996) Automatic parallelization in the polytope model. In: Laboratoire PRiSM, Université des Versailles St-Quentin en Yvelines, 45, avenue des États-Unis, F-78035 Versailles Cedex. Springer, pp 79–103
Lari V, Tanase A, Teich J, Witterauf M, Khosravi F, Hannig F, Meyer B (2015) Co-design approach for fault-tolerant loop execution on coarse-grained reconfigurable arrays. In: Proceedings of the NASA/ESA conference on adaptive hardware and systems (AHS)
Teich J, Thiele L (1993) Partitioning of processor arrays: a piecewise regular approach. Integr, VLSI J 14(3):297–332. doi:10.1016/0167-9260(93)90013-3. ISSN 0167-9260
Teich J, Thiele L (1993a) Partitioning of processor arrays: a piecewise regular approach. Integr, VLSI J 14(3):297–332
Teich J, Thiele L, Zhang L (1996) Scheduling of partitioned regular algorithms on processor arrays with constrained resources. In: Proceedings of international conference on application specific systems, architectures and processors (ASAP). IEEE, pp 131–144
Teich J, Tanase A, Hannig F (2013) Symbolic parallelization of loop programs for massively parallel processor arrays. In: Proceedings of the IEEE international conference on application-specific systems, architectures and processors (ASAP). IEEE, pp 1–9. doi:10.1109/ASAP.2013.6567543. ISBN 978-1-4799-0493-8. Best Paper Award
Rau BR, Glaeser CD (1981) Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. SIGMICRO Newsl. 12(4):183–198. ISSN 1050-916X
Sun G, Li Y, Zhang Y, Su L, Jin D, Zeng L (2010) Energy-aware run-time mapping for homogeneous noc. In: Proceedings of the international symposium on system on chip (SoC), pp 8–11. doi:10.1109/ISSOC.2010.5625542. ISBN 978-1-4244-8279-5
Lari V, Narovlyanskyy A, Hannig F, Teich J (2011) Decentralized dynamic resource management support for massively parallel processor arrays. In: Proceedings of the IEEE international conference on application-specific systems, architectures and processors (ASAP). IEEE Computer Society, pp 87–94. doi:10.1109/ASAP.2011.6043240. ISBN 978-1-4577-1291-3
Georgakarakos G, Daneshtalab M, Plosila J (2013) Efficient application mapping in resource limited homogeneous noc-based manycore systems. In: Proceedings of the international conference on high performance computing and simulation (HPCS). IEEE, pp 207–212. doi:10.1109/HPCSim.2013.6641415
Arifin F, Membarth R, Abdulazim A, Hannig F, Teich J (2009) FSM-controlled architectures for linear invasion. In: Proceedings of the 17th IFIP/IEEE international conference on very large scale integration (VLSI-SoC), pp 59–64. doi:10.1109/VLSISOC.2009.6041331. ISBN 978-3-90188-237-1
Baumgarte V, Ehlers G, May F, Nückel A, Vorbach M, Weinhardt M (2003) PACT XPP a self-reconfigurable data processing architecture. J Supercomput 26:167–184. ISSN 0920-8542
Kissler D, Hannig F, Kupriyanov A, Teich J (2006) A highly parameterizable parallel processor array architecture. In: Proceedings of the IEEE international conference on field programmable technology (FPT), Bangkok, Thailand. IEEE, pp 105–112. doi:10.1109/FPT.2006.270293. ISBN 0-7803-9728-2
Lari V, Hannig F, Teich J (2011) Distributed resource reservation in massively parallel processor arrays. In: Proceedings of the international parallel and distributed processing symposium workshops (IPDPSW). IEEE Computer Society, pp 318–321. doi:10.1109/IPDPS.2011.157. ISBN 978-0-7695-4385-7
Hannig F, Schmid M, Lari V, Boppu S, Teich J (2013) System integration of tightly-coupled processor arrays using reconfigurable buffer structures. In: Proceedings of the ACM international conference on computing frontiers (CF). ACM, pp 2:1–2:4. doi:10.1145/2482767.2482770. ISBN 978-1-4503-2053-5
Henkel J, Herkersdorf A, Bauer L, Wild T, Hübner M, Pujari R, Grudnitsky A, Heisswolf J, Zaib A, Vogel B, Lari V, Kobbe S (2012) Invasive manycore architectures. In: Proceedings of the 17th Asia and South Pacific design automation conference (ASP-DAC), pp 193–200. doi:10.1109/ASPDAC.2012.6164944
Hannig F, Lari V, Boppu S, Tanase A, Reiche O (2014) Invasive tightly-coupled processor arrays: a domain-specific architecture/compiler co-design approach. ACM Trans Embed Comput Syst (TECS) 13(4s):133:1–133:29. doi:10.1145/2584660
Hannig F, Ruckdeschel H, Dutta H, Teich J (2008) PARO: synthesis of hardware accelerators for multi-dimensional dataflow-intensive applications. In: Proceedings of the fourth international workshop on applied reconfigurable computing (ARC). Lecture notes in computer science (LNCS). Springer, London, United Kingdom, pp 287–293
Dutta H (2011) Synthesis and exploration of loop accelerators for systems-on-a-chip. PhD thesis, University of Erlangen-Nuremberg
Weichslgartner A, Wildermann S, Teich J (2011) Dynamic decentralized mapping of tree-structured applications on NoC architectures. In: Proceedings of the fifth IEEE/ACM international symposium on networks on chip (NoCS), pp 201–208
Weichslgartner A, Gangadharan D, Wildermann S, Glaß M, Teich J (2014) DAARM: design-time application analysis and run-time mapping for predictable execution in many-core systems. In: Proceedings of the international conference on hardware/software codesign and system synthesis (CODES+ISSS), pp 10, 2014. doi:10.1145/2656075.2656083
Kupriyanov A, Kissler D, Hannig F, Teich J (2007) Efficient event-driven simulation of parallel processor architectures. In: Proceedings of the 10th international workshop on software and compilers for embedded systems (SCOPES). ACM Press, Nice, France, pp 71–80. doi:10.1145/1269843.1269854
Beauchemin S, Barron J (1995) The computation of optical flow. ACM Comput Surv 27:433–466. doi:10.1145/212094.212141 ISSN 0360-0300
Hartenstein RW (2001) A decade of reconfigurable computing: a visionary retrospective. In: Proceedings of the conference on design, automation and test in Europe. IEEE Press, Piscataway, NJ, USA, pp 642–649. ISBN 0-7695-0993-2
Hartenstein RW, Kress R (1995) A datapath synthesis system for the reconfigurable datapath architecture. In: Proceedings of the asia and south pacific design automation conference (ASP-DAC), pp 479–484. doi:10.1109/ASPDAC.1995.486359
Waingold E, Taylor M, Srikrishna D, Sarkar V, Lee W, Lee V, Kim J, Frank M, Finch P, Barua R et al (1997) Baring it all to software: raw machines. Computer 30(9):86–93. doi:10.1109/2.612254. ISSN 0018-9162
Bouwens F, Berekovic M, De Sutter B, Gaydadjiev G (2008) Architecture enhancements for the ADRES coarse-grained reconfigurable array. In: Proceedings of the 3rd international conference on high performance embedded architectures and compilers (HiPEAC). Springer, Gothenburg, Sweden, pp 66–81. ISBN 3-540-77559-5, 978-3-540-77559-1
Ebeling C, Cronquist DC, Franklin P (1996) Rapid–reconfigurable pipelined datapath. In: Field-programmable logic smart applications, new paradigms and compilers, vol 1142. Springer, pp 126–135. doi:10.1007/3-540-61730-2_13. ISBN 978-3-540-61730-3
Goldstein SC, Schmit H, Moe M, Budiu M, Cadambi S, Taylor RR, Laufer R (1999) Piperench: a co/processor for streaming multimedia acceleration. ACM SIGARCH Comput Arch News 27(2):28–39. doi:10.1145/307338.300982
Yeung AK, Rabaey JM (1993) A reconfigurable data-driven multiprocessor architecture for rapid prototyping of high throughput dsp algorithms. In: Proceeding of the Hawaii international conference on system sciences (HICSS), vol 1. IEEE, pp 169–178. doi:10.1109/HICSS.1993.270747
Rabaey JM (1997) Reconfigurable processing: the solution to low-power programmable dsp. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP), vol 1. IEEE, pp 275–278. doi:10.1109/ICASSP.1997.599622
Thomas A, Becker J (2004) Dynamic adaptive runtime routing techniques in multigrain reconfigurable hardware architectures. In: Becker J, Platzner M, Vernalde S (eds) Field programmable logic and application. Lecture notes in computer science, vol 3203. Springer, Berlin, pp 115–124. doi:10.1007/978-3-540-30117-2_14. ISBN 978-3-540-22989-6
Teich J (1993) A compiler for application-specific processor arrays. PhD thesis, Institut f ür Mikroelektronik, Universität des Saarlandes, Saarbrücken, Deutschland
Muchnick S (1997) Advanced compiler design and implementation. Morgan Kaufmann
Xue J (2000) Loop tiling for parallelism. Springer Science & Business Media, Norwell
Irigoin F, Triolet R (1988) Supernode partitioning. In: Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on principles of programming languages (POPL). ACM, San Diego, CA, USA, pp 319–329. doi:10.1145/73560.73588. ISBN 0-89791-252-7
Högstedt K, Carter L, Ferrante J (1999) Selecting tile shape for minimal execution time. In: Proceedings of the 11th annual acm symposium on parallel algorithms and architectures, Saint Malo, France, pp 201–211
Becker J (1997) A partitioning compiler for computers with Xputer-based accelerators. PhD thesis, Universität Kaiserslautern
Bondhugula U, Hartono A, Ramanujam J, Sadayappan P (2008) A practical automatic polyhedral parallelizer and locality optimizer. ACM SIGPLAN Not 43(6):101–113
Di P, Ye D, Su Y, Sui Y, Xue J (2010) Automatic parallelization of tiled loop nests with enhanced fine-grained parallelism on GPUs. In: Proceedings of the 41st international conference on parallel processing (ICPP). IEEE Computer Society, Pittsburgh, PA, USA, pp 350–359. doi:10.1109/ICPP.2012.19
Darte A, Robert Y (1998) Affine-by-statement scheduling of uniform and affine loop nests over parametric domains. J Parallel Distrib Comput 29(1):43–59. ISSN 0743-7315
Teich J, Tanase A, Hannig F (2014) Symbolic mapping of loop programs onto processor arrays. J Signal Process Syst (JSPS) 77(1-2):31–59. doi:10.1007/s11265-014-0905-0. ISSN 1939-8018
Thoma F, Kühnle M, Bonnot P, Panainte E, Bertels K, Goller S, Schneider A, Guyetant S, Schüler E, Müller-Glaser K, Becker J (2007) MORPHEUS: heterogeneous reconfigurable computing. In: Proceedings of the international conference on field programmable logic and applications (FPL), Amsterdam, Netherlands, pp 409–414. doi:10.1109/FPL.2007.4380681
Resano J, Mozos D, Catthoor F (2005) A hybrid prefetch scheduling heuristic to minimize at run-time the reconfiguration overhead of dynamically reconfigurable hardware. In: Proceedings of the conference on design, automation and test in Europe (DATE), vol 1, Munich, Germany, pp 106–111. doi:10.1109/DATE.2005.18
Maestre R, Fernandez M, Kurdahi F, Bagherzadeh N, Singh H (2000) Configuration management in multi-context reconfigurable systems for simultaneous performance and power optimizations. In: Proceedings of the international symposium on system synthesis (ISSS), Madrid, Spain, pp 106–111. doi:10.1145/501790.501815. ISBN 1-58113-267-0
Shang L, Jha N (2002) Hardware-software co-synthesis of low power real-time distributed embedded systems with dynamically reconfigurable FPGAs. In: Proceedings of the Asia and South Pacific design automation conference (ASP-DAC), Bangalore, India, pp 345–360. ISBN 0-7695-1441-3
Motomura M (2002) A dynamically reconfigurable processor architecture. In: Microprocessor forum, San Jose, CA, USA. In-Stat/MDR
Yi Y, Han W, Zhao X, Erdogan AT, Arslan T (2009) An ILP formulation for task mapping and scheduling on multi-core architectures. In: Proceedings of the design, automation test in Europe conference exhibition (DATE), Nice, France, pp 33–38. doi:10.1109/DATE.2009.5090629
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media Singapore
About this chapter
Cite this chapter
Lari, V. (2016). Invasive Tightly Coupled Processor Arrays. In: Invasive Tightly Coupled Processor Arrays. Computer Architecture and Design Methodologies. Springer, Singapore. https://doi.org/10.1007/978-981-10-1058-3_2
Download citation
DOI: https://doi.org/10.1007/978-981-10-1058-3_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-1057-6
Online ISBN: 978-981-10-1058-3
eBook Packages: EngineeringEngineering (R0)