Abstract
Future computers may take advantage of a dataflow program execution model (PXM) for both performance and energy advantages. One key element to provide a compilation tool-chain for such machines is a framework for developing initial benchmarks. DRT (Dataflow Run-Time) is a tool that enables the fast prototyping of those benchmarks for the Dataflow Threads (DF-Threads) PXM. In this work, we show how to use DRT to develop dataflow based examples to be targeted by a future compiler for the dataflow PXM.
DRT has been written in portable C code (tested with the GNU C compiler), and it is open-source, therefore, it can be used on real machines based on architectures like x86, AArch, RISC-V ISA.
Here, we discuss some didactic examples, and we show how to study and debug the data exchange, which is flowing through frames that are detached from the data stack. We compare DRT against similar dataflow runtime libraries such as DARTS and OCR. Even though our environment is not yet optimized, we found that DRT outperforms the above runtime frameworks in terms of execution time. We also give an evaluation of the time and complexity to develop DF-Threads examples in DRT compared to the approach of using a full system simulator and FPGAs for more accurate modeling.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Checkout the DRT repository by this command: svn co https://svn.code.sf.net/p/drt/code/.
- 2.
Source lines of code.
References
Alves, T.A.O., Marzulo, L.A.J., Franca, F.M.G., Costa, V.S.: Trebuchet: exploring TLP with dataflow virtualisation. Int. J. High Perform. Syst. Archit. 3(2/3), 137–148 (2011)
Argollo, E., Falcón, A., Faraboschi, P., Monchiero, M., Ortega, D.: COTSon: infrastructure for full system simulation. SIGOPS Oper. Syst. Rev. 43(1), 52–61 (2009)
Arvind, Culler, D.E.: Dataflow architectures. Ann. Rev. Comput. Sci. 1, 225–253 (1986)
Arvind, K., Nikhil, R.S.: Executing a program on the MIT tagged-token dataflow architecture. IEEE Trans. Comput. 39(3), 300–318 (1990). https://doi.org/10.1109/12.48862
CAPSL: The codelet execution model. https://www.capsl.udel.edu/codelets.shtml
Chen, Y., Emer, J., Sze, V.: Using dataflow to optimize energy efficiency of deep neural network accelerators. IEEE Micro 37(3), 12–21 (2017)
Dennis, J.B.: Data flow computation. In: Broy, M. (ed.) Control Flow and Data Flow: Concepts of Distributed Programming. Springer Study Edition, vol. 14, pp. 345–398. Springer, Heidelberg (1986). https://doi.org/10.1007/978-3-642-82921-5_8
Dennis, J.B., Misunas, D.P.: A preliminary architecture for a basic data-flow processor (1974)
Farabet, C., Martini, B., Corda, B., Akselrod, P., Culurciello, E., LeCun, Y.: NeuFlow: a runtime reconfigurable dataflow processor for vision. In: CVPR 2011 Workshops, pp. 109–116 (2011)
Filgueras, A., et al.: The AXIOM project: IoT on heterogeneous embedded platforms. IEEE Des. Test., 1–6 (2019). http://www.dii.unisi.it/~giorgi/papers/Filgueras19-ieee_dnt.pdf. ISSN 2168-2356
Gautier, T., Lima, J.V.F., Maillard, N., Raffin, B.: XKaapi: a runtime system for data-flow task programming on heterogeneous architectures. In: 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, pp. 1299–1308 (2013)
Giorgi, R., Faraboschi, P.: An introduction to DF-Threads and their execution model. In: IEEE MPP, Paris, France, pp. 60–65, October 2014
Giorgi, R., Khalili, F., Procaccini, M.: AXIOM: a scalable, efficient and reconfigurable embedded platform. In: IEEE Proceedings of DATEi, pp. 1–6, March 2019
Giorgi, R., Khalili, F., Procaccini, M.: A design space exploration tool set for future 1k-core high-performance computers. In: ACM RAPIDO Workshop, pp. 1–6 (2019)
Giorgi, R., Khalili, F., Procaccini, M.: Translating timing into an architecture: the synergy of COTSon and HLS (domain expertise - designing a computer architecture via HLS). Hindawi - Int. J. Reconfigurable Comput. 2019, 1–18 (2019). https://doi.org/10.1155/2019/2624938
Giorgi, R., Procaccini, M.: Bridging a data-flow execution model to a lightweight programming model. In: 2019 International Conference on HPCS (2019)
Giorgi, R., Scionti, A.: A scalable thread scheduling co-processor based on data-flow principles. Future Gener. Comput. Syst. 53, 100–108 (2015)
Giorgi, R.: Scalable embedded computing through reconfigurable hardware: comparing DF-Threads, cilk, OpenMPI and jump. Microprocess. Microsyst. 63, 66–74 (2018)
Giorgi, R., et al.: TERAFLUX: Harnessing dataflow in next generation teradevices. Microprocess. Microsyst. 38(8, Part B), 976–990 (2014)
Hum, H.H.J., Maquelin, O., Theobald, K.B., Tian, X., Gao, G.R., Hendren, L.J.: A study of the EARTH-MANNA multithreaded system. Int. J. Parallel Program. 24(4), 319–348 (1996). https://doi.org/10.1007/BF03356753
Kabrick, R., Perdomo, D.A.R., Raskar, S., Diaz, J.M.M., Fox, D., Gao, G.R.: CODIR: towards an MLIR codelet model dialect. In: 2020 IEEE/ACM Fourth Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware (IPDRM), pp. 33–40. IEEE (2020)
Kavi, K., Arul, J., Giorgi, R.: Performance evaluation of a non-blocking multithreaded architecture for embedded, real-time and DSP applications. In: 14th International Conference on Parallel and Distributed Computing Systems (ISCA-PDCS-2001), Richardson, TX, USA, pp. 365–371, August 2001
Kavi, K.M., Giorgi, R., Arul, J.: Scheduled dataflow: execution paradigm, architecture, and performance evaluation. IEEE Trans. Comput. 50(8), 834–846 (2001)
HP Labs: COTSon: Infrastructure for full system simulation. https://sourceforge.net/projects/cotson/files/
Stéphane., Z.: DARTS: An asynchonous fine-grained runtime based on the codelet model. https://github.com/szuckerm/DARTS. Accessed Jan 2021
Matheou, G., Evripidou, P.: FREDDO: an efficient framework for runtime execution of data-driven objects. In: International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), January 2016
Mattson, T.G., et al.: The open community runtime: a runtime system for extreme scale computing. In: 2016 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–7 (2016)
Najjar, W.A., Lee, E.A., Gao, G.R.: Advances in the dataflow computational model. Parallel Comput. 25(13–14), 1907–1929 (1999)
Nowatzki, T., Gangadhar, V., Ardalani, N., Sankaralingam, K.: Stream-dataflow acceleration. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA 2017, pp. 416–429. Association for Computing Machinery, New York (2017)
Nowatzki, T., Gangadhar, V., Sankaralingam, K.: Heterogeneous Von Neumann/dataflow microprocessors. Commun. ACM 62(6), 83–91 (2019)
OCR: Open community runtime v1.0. https://xstack.exascale-tech.com/git/public/ocr.git. Accessed Jan 2021
Pochayevets, O.: BMDFM: a hybrid dataflow runtime parallelization environment for shared memory multiprocessors. MS thesis in Computer Engineering (2006)
AXIOM Project: Agile, extensible, fast I/O module for the cyber-physical era. https://git.axiom-project.eu/. Accessed Jan 2021
Sarkar, V., Hennessy, J.: Partitioning parallel programs for macro-dataflow. In: Proceedings of the 1986 ACM Conference on LISP and Functional Programming, LFP 1986, pp. 202–211. Association for Computing Machinery, New York (1986)
SECO s.r.l. http://www.seco.com
Silva, R.J.N., et al.: Task scheduling in sucuri dataflow library. In: 2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW), pp. 37–42 (2016)
Stavrou, K., et al.: Programming abstractions and toolchain for dataflow multithreading architectures. In: Proceedings of the 8th International Symposium on Parallel and Distributed Computing (ISPDC 2009), pp. 107–114. IEEE, July 2009
Swanson, S., Michelson, K., Schwerin, A., Oskin, M.: Wavescalar. In: Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-36, p. 291. IEEE Computer Society (2003)
Verdoscia, L., Giorgi, R.: A data-flow soft-core processor for accelerating scientific calculation on FPGAs. Math. Probl. Eng. 2016(1), 1–21 (2016). Article ID: 3190234
Weis, S., Garbade, A., Fechner, B., Mendelson, A., Giorgi, R., Ungerer, T.: Architectural support for fault tolerance in a teradevice dataflow system. Int. J. Parallel Program. 44(2), 208–232 (2014). https://doi.org/10.1007/s10766-014-0312-y
Wilde, M., Hategan, M., Wozniak, J.M., Clifford, B., Katz, D.S., Foster, I.: Swift: a language for distributed parallel scripting. Parallel Comput. 37(9), 633–652 (2011). Emerging Programming Paradigms for Large-Scale Scientific Computing
Wozniak, J.M., Armstrong, T.G., Wilde, M., Katz, D.S., Lusk, E., Foster, I.T.: Swift/T: large-scale application composition via distributed-memory dataflow processing. In: 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, pp. 95–102 (2013)
Yazdanpanah, F., Alvarez-Martinez, C., Jimenez-Gonzalez, D., Etsion, Y.: Hybrid dataflow/von-Neumann architectures. IEEE Trans. Parallel Distrib. Syst. 25(6), 1489–1509 (2014)
Liu, Y., Furber, S.: A low power embedded dataflow coprocessor. In: IEEE Computer Society Annual Symposium on VLSI: New Frontiers in VLSI Design (ISVLSI 2005), pp. 246–247 (2005)
Zuckerman, S., Landwehr, A., Livingston, K., Gao, G.: Toward a self-aware codelet execution model. In: 2014 Fourth Workshop on DFM, pp. 26–29 (2014)
Zuckerman, S., Suetterlein, J., Knauerhase, R., Gao, G.R.: Using a “codelet” program execution model for exascale machines: position paper. In: Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era, EXADAPT 2011, pp. 64–69. Association for Computing Machinery, New York (2011)
Acknowledgements
We would like to thank the anonymous reviewers for their insightful comments and suggestions. The work of this paper is partly funded by the European Commission on AXIOM H2020 (id. 645496), TERAFLUX (id. 249013), HiPEAC (id. 871174).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Giorgi, R., Procaccini, M., Sahebi, A. (2021). DRT: A Lightweight Runtime for Developing Benchmarks for a Dataflow Execution Model. In: Hochberger, C., Bauer, L., Pionteck, T. (eds) Architecture of Computing Systems. ARCS 2021. Lecture Notes in Computer Science(), vol 12800. Springer, Cham. https://doi.org/10.1007/978-3-030-81682-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-81682-7_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-81681-0
Online ISBN: 978-3-030-81682-7
eBook Packages: Computer ScienceComputer Science (R0)