DRT: A Lightweight Runtime for Developing Benchmarks for a Dataflow Execution Model

Giorgi, Roberto; Procaccini, Marco; Sahebi, Amin

doi:10.1007/978-3-030-81682-7_6

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12800))

Included in the following conference series:

International Conference on Architecture of Computing Systems

657 Accesses
1 Citations

Abstract

Future computers may take advantage of a dataflow program execution model (PXM) for both performance and energy advantages. One key element to provide a compilation tool-chain for such machines is a framework for developing initial benchmarks. DRT (Dataflow Run-Time) is a tool that enables the fast prototyping of those benchmarks for the Dataflow Threads (DF-Threads) PXM. In this work, we show how to use DRT to develop dataflow based examples to be targeted by a future compiler for the dataflow PXM.

DRT has been written in portable C code (tested with the GNU C compiler), and it is open-source, therefore, it can be used on real machines based on architectures like x86, AArch, RISC-V ISA.

Here, we discuss some didactic examples, and we show how to study and debug the data exchange, which is flowing through frames that are detached from the data stack. We compare DRT against similar dataflow runtime libraries such as DARTS and OCR. Even though our environment is not yet optimized, we found that DRT outperforms the above runtime frameworks in terms of execution time. We also give an evaluation of the time and complexity to develop DF-Threads examples in DRT compared to the approach of using a full system simulator and FPGAs for more accurate modeling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Checkout the DRT repository by this command: svn co https://svn.code.sf.net/p/drt/code/.
2.
Source lines of code.

References

Alves, T.A.O., Marzulo, L.A.J., Franca, F.M.G., Costa, V.S.: Trebuchet: exploring TLP with dataflow virtualisation. Int. J. High Perform. Syst. Archit. 3(2/3), 137–148 (2011)
Article Google Scholar
Argollo, E., Falcón, A., Faraboschi, P., Monchiero, M., Ortega, D.: COTSon: infrastructure for full system simulation. SIGOPS Oper. Syst. Rev. 43(1), 52–61 (2009)
Article Google Scholar
Arvind, Culler, D.E.: Dataflow architectures. Ann. Rev. Comput. Sci. 1, 225–253 (1986)
Google Scholar
Arvind, K., Nikhil, R.S.: Executing a program on the MIT tagged-token dataflow architecture. IEEE Trans. Comput. 39(3), 300–318 (1990). https://doi.org/10.1109/12.48862
Article MATH Google Scholar
CAPSL: The codelet execution model. https://www.capsl.udel.edu/codelets.shtml
Chen, Y., Emer, J., Sze, V.: Using dataflow to optimize energy efficiency of deep neural network accelerators. IEEE Micro 37(3), 12–21 (2017)
Article Google Scholar
Dennis, J.B.: Data flow computation. In: Broy, M. (ed.) Control Flow and Data Flow: Concepts of Distributed Programming. Springer Study Edition, vol. 14, pp. 345–398. Springer, Heidelberg (1986). https://doi.org/10.1007/978-3-642-82921-5_8
Chapter Google Scholar
Dennis, J.B., Misunas, D.P.: A preliminary architecture for a basic data-flow processor (1974)
Google Scholar
Farabet, C., Martini, B., Corda, B., Akselrod, P., Culurciello, E., LeCun, Y.: NeuFlow: a runtime reconfigurable dataflow processor for vision. In: CVPR 2011 Workshops, pp. 109–116 (2011)
Google Scholar
Filgueras, A., et al.: The AXIOM project: IoT on heterogeneous embedded platforms. IEEE Des. Test., 1–6 (2019). http://www.dii.unisi.it/~giorgi/papers/Filgueras19-ieee_dnt.pdf. ISSN 2168-2356
Gautier, T., Lima, J.V.F., Maillard, N., Raffin, B.: XKaapi: a runtime system for data-flow task programming on heterogeneous architectures. In: 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, pp. 1299–1308 (2013)
Google Scholar
Giorgi, R., Faraboschi, P.: An introduction to DF-Threads and their execution model. In: IEEE MPP, Paris, France, pp. 60–65, October 2014
Google Scholar
Giorgi, R., Khalili, F., Procaccini, M.: AXIOM: a scalable, efficient and reconfigurable embedded platform. In: IEEE Proceedings of DATEi, pp. 1–6, March 2019
Google Scholar
Giorgi, R., Khalili, F., Procaccini, M.: A design space exploration tool set for future 1k-core high-performance computers. In: ACM RAPIDO Workshop, pp. 1–6 (2019)
Google Scholar
Giorgi, R., Khalili, F., Procaccini, M.: Translating timing into an architecture: the synergy of COTSon and HLS (domain expertise - designing a computer architecture via HLS). Hindawi - Int. J. Reconfigurable Comput. 2019, 1–18 (2019). https://doi.org/10.1155/2019/2624938
Giorgi, R., Procaccini, M.: Bridging a data-flow execution model to a lightweight programming model. In: 2019 International Conference on HPCS (2019)
Google Scholar
Giorgi, R., Scionti, A.: A scalable thread scheduling co-processor based on data-flow principles. Future Gener. Comput. Syst. 53, 100–108 (2015)
Article Google Scholar
Giorgi, R.: Scalable embedded computing through reconfigurable hardware: comparing DF-Threads, cilk, OpenMPI and jump. Microprocess. Microsyst. 63, 66–74 (2018)
Article Google Scholar
Giorgi, R., et al.: TERAFLUX: Harnessing dataflow in next generation teradevices. Microprocess. Microsyst. 38(8, Part B), 976–990 (2014)
Article Google Scholar
Hum, H.H.J., Maquelin, O., Theobald, K.B., Tian, X., Gao, G.R., Hendren, L.J.: A study of the EARTH-MANNA multithreaded system. Int. J. Parallel Program. 24(4), 319–348 (1996). https://doi.org/10.1007/BF03356753
Article Google Scholar
Kabrick, R., Perdomo, D.A.R., Raskar, S., Diaz, J.M.M., Fox, D., Gao, G.R.: CODIR: towards an MLIR codelet model dialect. In: 2020 IEEE/ACM Fourth Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware (IPDRM), pp. 33–40. IEEE (2020)
Google Scholar
Kavi, K., Arul, J., Giorgi, R.: Performance evaluation of a non-blocking multithreaded architecture for embedded, real-time and DSP applications. In: 14th International Conference on Parallel and Distributed Computing Systems (ISCA-PDCS-2001), Richardson, TX, USA, pp. 365–371, August 2001
Google Scholar
Kavi, K.M., Giorgi, R., Arul, J.: Scheduled dataflow: execution paradigm, architecture, and performance evaluation. IEEE Trans. Comput. 50(8), 834–846 (2001)
Article Google Scholar
HP Labs: COTSon: Infrastructure for full system simulation. https://sourceforge.net/projects/cotson/files/
Stéphane., Z.: DARTS: An asynchonous fine-grained runtime based on the codelet model. https://github.com/szuckerm/DARTS. Accessed Jan 2021
Matheou, G., Evripidou, P.: FREDDO: an efficient framework for runtime execution of data-driven objects. In: International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), January 2016
Google Scholar
Mattson, T.G., et al.: The open community runtime: a runtime system for extreme scale computing. In: 2016 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–7 (2016)
Google Scholar
Najjar, W.A., Lee, E.A., Gao, G.R.: Advances in the dataflow computational model. Parallel Comput. 25(13–14), 1907–1929 (1999)
Article Google Scholar
Nowatzki, T., Gangadhar, V., Ardalani, N., Sankaralingam, K.: Stream-dataflow acceleration. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA 2017, pp. 416–429. Association for Computing Machinery, New York (2017)
Google Scholar
Nowatzki, T., Gangadhar, V., Sankaralingam, K.: Heterogeneous Von Neumann/dataflow microprocessors. Commun. ACM 62(6), 83–91 (2019)
Article Google Scholar
OCR: Open community runtime v1.0. https://xstack.exascale-tech.com/git/public/ocr.git. Accessed Jan 2021
Pochayevets, O.: BMDFM: a hybrid dataflow runtime parallelization environment for shared memory multiprocessors. MS thesis in Computer Engineering (2006)
Google Scholar
AXIOM Project: Agile, extensible, fast I/O module for the cyber-physical era. https://git.axiom-project.eu/. Accessed Jan 2021
Sarkar, V., Hennessy, J.: Partitioning parallel programs for macro-dataflow. In: Proceedings of the 1986 ACM Conference on LISP and Functional Programming, LFP 1986, pp. 202–211. Association for Computing Machinery, New York (1986)
Google Scholar
SECO s.r.l. http://www.seco.com
Silva, R.J.N., et al.: Task scheduling in sucuri dataflow library. In: 2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW), pp. 37–42 (2016)
Google Scholar
Stavrou, K., et al.: Programming abstractions and toolchain for dataflow multithreading architectures. In: Proceedings of the 8th International Symposium on Parallel and Distributed Computing (ISPDC 2009), pp. 107–114. IEEE, July 2009
Google Scholar
Swanson, S., Michelson, K., Schwerin, A., Oskin, M.: Wavescalar. In: Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-36, p. 291. IEEE Computer Society (2003)
Google Scholar
Verdoscia, L., Giorgi, R.: A data-flow soft-core processor for accelerating scientific calculation on FPGAs. Math. Probl. Eng. 2016(1), 1–21 (2016). Article ID: 3190234
Article Google Scholar
Weis, S., Garbade, A., Fechner, B., Mendelson, A., Giorgi, R., Ungerer, T.: Architectural support for fault tolerance in a teradevice dataflow system. Int. J. Parallel Program. 44(2), 208–232 (2014). https://doi.org/10.1007/s10766-014-0312-y
Article Google Scholar
Wilde, M., Hategan, M., Wozniak, J.M., Clifford, B., Katz, D.S., Foster, I.: Swift: a language for distributed parallel scripting. Parallel Comput. 37(9), 633–652 (2011). Emerging Programming Paradigms for Large-Scale Scientific Computing
Article Google Scholar
Wozniak, J.M., Armstrong, T.G., Wilde, M., Katz, D.S., Lusk, E., Foster, I.T.: Swift/T: large-scale application composition via distributed-memory dataflow processing. In: 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, pp. 95–102 (2013)
Google Scholar
Yazdanpanah, F., Alvarez-Martinez, C., Jimenez-Gonzalez, D., Etsion, Y.: Hybrid dataflow/von-Neumann architectures. IEEE Trans. Parallel Distrib. Syst. 25(6), 1489–1509 (2014)
Article Google Scholar
Liu, Y., Furber, S.: A low power embedded dataflow coprocessor. In: IEEE Computer Society Annual Symposium on VLSI: New Frontiers in VLSI Design (ISVLSI 2005), pp. 246–247 (2005)
Google Scholar
Zuckerman, S., Landwehr, A., Livingston, K., Gao, G.: Toward a self-aware codelet execution model. In: 2014 Fourth Workshop on DFM, pp. 26–29 (2014)
Google Scholar
Zuckerman, S., Suetterlein, J., Knauerhase, R., Gao, G.R.: Using a “codelet” program execution model for exascale machines: position paper. In: Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era, EXADAPT 2011, pp. 64–69. Association for Computing Machinery, New York (2011)
Google Scholar

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their insightful comments and suggestions. The work of this paper is partly funded by the European Commission on AXIOM H2020 (id. 645496), TERAFLUX (id. 249013), HiPEAC (id. 871174).

Author information

Authors and Affiliations

University of Siena, Siena, Italy
Roberto Giorgi, Marco Procaccini & Amin Sahebi
University of Florence, Florence, Italy
Amin Sahebi

Authors

Roberto Giorgi
View author publications
You can also search for this author in PubMed Google Scholar
Marco Procaccini
View author publications
You can also search for this author in PubMed Google Scholar
Amin Sahebi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Roberto Giorgi or Amin Sahebi .

Editor information

Editors and Affiliations

Technische Universität Darmstadt, Darmstadt, Germany
Christian Hochberger
Karlsruhe Institute of Technology, Karlsruhe, Germany
Lars Bauer
Otto-von-Guericke University Magdeburg, Magdeburg, Germany
Thilo Pionteck

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Giorgi, R., Procaccini, M., Sahebi, A. (2021). DRT: A Lightweight Runtime for Developing Benchmarks for a Dataflow Execution Model. In: Hochberger, C., Bauer, L., Pionteck, T. (eds) Architecture of Computing Systems. ARCS 2021. Lecture Notes in Computer Science(), vol 12800. Springer, Cham. https://doi.org/10.1007/978-3-030-81682-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-81682-7_6
Published: 15 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-81681-0
Online ISBN: 978-3-030-81682-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics