Software Simultaneous Multi-Threading, a Technique to Exploit Task-Level Parallelism to Improve Instruction- and Data-Level Parallelism

Scarpazza, Daniele Paolo; Raghavan, Praveen; Novo, David; Catthoor, Francky; Verkest, Diederik

doi:10.1007/11847083_2

Daniele Paolo Scarpazza^19,20,
Praveen Raghavan^19,21,
David Novo^19,21,
Francky Catthoor^19,21 &
…
Diederik Verkest^19,21,22

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4148))

Included in the following conference series:

International Workshop on Power and Timing Modeling, Optimization and Simulation

1210 Accesses
7 Citations

Abstract

The search for energy efficiency in the design of embedded systems is leading toward CPUs with higher instruction-level and data-level parallelism. Unfortunately, individual applications do not have sufficient parallelism to keep all these CPU resources busy. Since embedded systems often consist of multiple tasks, task-level parallelism can be used for the purpose. Simultaneous multi-threading (SMT) proved a valuable technique to do so in high-performance systems, but it cannot be afforded in system with tight energy budgets. Moreover, it does not exploit data-level parallel hardware, and does not exploit the available information on threads.

We propose software-SMT (SW-SMT), a technique to exploit task-level parallelism to improve the utilization of both instruction-level and data-level parallel hardware, thereby improving performance. The technique performs simultaneous compilation of multiple threads at design-time, and it includes a run-time selection of the most efficient mixes.

We have applied the technique to two major blocks of a SDR (software-defined radio) application, achieving energy gains up to 46% on different ILP and DLP architectures. We show that the potentials of SW-SMT increase with SIMD datapath size and VLIW issue width.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Philips Research, Philips SiliconHive Avispa Accelerator, http://www.siliconhive.com
Mei, B., Vernalde, S., Verkest, D., Man, H.D., Laurereins, R.: ADRES: An architecture with tightly coupled VLIW processor and coarse-grained reconfigurable matrix. In: Proc. of FPL (2003)
Google Scholar
Lin, Y., Harel, Y., Woh, M., Baron, N., Lee, H., Mahlke, S., Mudge, T., Flautner, K.: A system solution for high-performance, low-power SDR. In: SDR Forum (2005)
Google Scholar
Lee, H.-S., Lin, Y., Harel, Y., Woh, M., Mahlke, S.A., Mudge, T.N., Flautner, K.: Software defined radio – A high performance embedded challenge. In: Conte, T., Navarro, N., Hwu, W.-m.W., Valero, M., Ungerer, T. (eds.) HiPEAC 2005. LNCS, vol. 3793, pp. 6–26. Springer, Heidelberg (2005)
Chapter Google Scholar
Berkel, K.V., Heinle, F., Meuwissen, P., Moerman, K., Weiss, M.: Vector processing as an enabler for software-defined radio in handsets from 3G+WLAN onwards. In: Proc. Software Defined Radio Tech. Conf., pp. 125–130 (2004)
Google Scholar
Alverson, R., Callahan, D., Cummings, D., Koblenz, B., Porterfield, A., Smith, B.: The Tera computer system. In: Proc. Intl. Conf. on Supercomputing, pp. 1–6 (1990)
Google Scholar
Tullsen, D.M., Eggers, S.J., Levy, H.M.: Simultaneous multithreading: Maximizing on-chip parallelism. In: Proc. ISCA, pp. 392–403 (1995)
Google Scholar
Koufaty, D., Marr, D.T.: Hyperthreading technology in the netburst microarchitecture. IEEE Micro 23(2), 56–65 (2003)
Article Google Scholar
Li, Y., Brooks, D., Hu, Z., Skadron, K., Bose, P.: Understanding the energy efficiency of simultaneous multithreading. In: Proc. ISLPED, pp. 44–49 (2004)
Google Scholar
van der Horst, M., van Berkel, K., Lukkien, J., Mak, R.: Recursive filtering on a vector DSP with linear speedup. In: Proc. ASAP, pp. 23–25 (2005)
Google Scholar
Thoen, F., Catthoor, F.: Modeling, Verification and Exploration of Task-level Concurrency in Real-time Embedded Systems. Kluwer Academic Publishing, Dordrecht (1999)
Google Scholar
Ma, Z., Catthoor, F., Vounckx, J.: Hierarchical task scheduler for interleaving subtasks on heterogeneous multiprocessor platforms. In: Proc. ASP-DAC (2005)
Google Scholar
Ma, Z.: Interleaved sub-task scheduling on multi-processor SoC. PhD thesis, Katholieke Universiteit Leuven (2006)
Google Scholar
Parssinen, A.: System design for multi-standard radios. In: Proc. ISSCC (2006)
Google Scholar
Sasanka, R.: Energy Efficient Support for All levels of Parallelism for Complex Media Applications. PhD thesis, University of Illinois at Urbana-Champaign (2005)
Google Scholar
Hirata, H., Kimura, K., Nagamine, S., Mochizuki, Y., Nishimura, A., Nakase, Y., Nishizawa, T.: An elementary processor architecture with simultaneous instruction issuing from multiple threads. In: Proc. ISCA, pp. 136–145 (1992)
Google Scholar
Seng, J.S., Tullsen, D.M., Cai, G.Z.: Power-sensitive multithreaded architecture. In: Proc. ICCD, pp. 199–208 (2000)
Google Scholar
Corbal, J., Espasa, R., Valero, M.: DLP+TLP processors for the next generation of media workloads. In: Proc. HPCA, pp. 219–228 (2001)
Google Scholar
Lo, J., Eggers, S., Emer, J., Levy, H., Stamm, R., Tullsen, D.: Converting thread-level parallelism into instruction-level parallelism via simultaneous multithreading. ACM Transactions on Computer Systems 15(5), 322–354 (1997)
Article Google Scholar
Özer, E., Conte, T.M., Sharma, S.: Weld: A multithreading technique towards latency-tolerant VLIW processors. In: Monien, B., Prasanna, V.K., Vajapeyam, S. (eds.) HiPC 2001. LNCS, vol. 2228, pp. 192–203. Springer, Heidelberg (2001)
Chapter Google Scholar
Ferreira, V.M.G., Yasuura, H.: Simultaneous multithreading vliw processor architecture. Technical report, Dept. of Computer Science and Communication Engineering, Kyushu University, Japan (2001)
Google Scholar
Kaxiras, S., Narlikar, G., Berenbaum, A.D., Hu, Z.: Comparing power consumption of an smt and a cmp dsp for mobile phone workloads. In: Proc. CASES, pp. 211–220 (2001)
Google Scholar
Op de Beeck, P., Barat, F., Jayapala, M., Lauwereins, R.: CRISP: A template for reconfigurable instruction set processors. In: Brebner, G., Woods, R. (eds.) FPL 2001. LNCS, vol. 2147, p. 296. Springer, Heidelberg (2001)
Chapter Google Scholar
Trimaran: An Infrastructure for Research in Instruction-Level Parallelism (1999), http://www.trimaran.org
Cotterell, S., Vahid, F.: Synthesis of customized loop caches for core-based embedded systems. In: Proc. ICCAD (2002)
Google Scholar
Jayapala, M., Barat, F., Aa, T.V., Catthoor, F., Corporaal, H., Deconinck, G.: Clustered loop buffer organization for low energy VLIW embedded processors. IEEE Transactions on Computers 54(6), 672–683 (2005)
Article Google Scholar
Scarpazza, D.P.: A Source-Level Estimation and Optimization Methodology for the Execution Time and Energy Consumption of Embedded Software. PhD thesis, Politecnico di Milano (May 2006), http://www.scarpaz.com/phd

Download references

Author information

Authors and Affiliations

IMEC vzw, Kapeldreef 75, Heverlee, 3001, Belgium
Daniele Paolo Scarpazza, Praveen Raghavan, David Novo, Francky Catthoor & Diederik Verkest
Dipartimento di Elettronica e Informazione, Politecnico di Milano, Italy
Daniele Paolo Scarpazza
ESAT, K. U. Leuven, Kasteelpark Arenberg 10, Heverlee, 3001, Belgium
Praveen Raghavan, David Novo, Francky Catthoor & Diederik Verkest
Electrical Engineering, Vrije Universiteit Brussels, Belgium
Diederik Verkest

Authors

Daniele Paolo Scarpazza
View author publications
You can also search for this author in PubMed Google Scholar
Praveen Raghavan
View author publications
You can also search for this author in PubMed Google Scholar
David Novo
View author publications
You can also search for this author in PubMed Google Scholar
Francky Catthoor
View author publications
You can also search for this author in PubMed Google Scholar
Diederik Verkest
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IMEC, Kapeldreef 75, 3001, Heverlee, Belgium
Johan Vounckx
LIRMM, UMR CNRS/Université de Montpellier II, (C5506), 161 rue Ada, 34392, Montpellier, France
Nadine Azemard
University of Montpellier / LIRMM, II, 161 rue Ada, 34392, Montpellier, France
Philippe Maurine

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Scarpazza, D.P., Raghavan, P., Novo, D., Catthoor, F., Verkest, D. (2006). Software Simultaneous Multi-Threading, a Technique to Exploit Task-Level Parallelism to Improve Instruction- and Data-Level Parallelism. In: Vounckx, J., Azemard, N., Maurine, P. (eds) Integrated Circuit and System Design. Power and Timing Modeling, Optimization and Simulation. PATMOS 2006. Lecture Notes in Computer Science, vol 4148. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11847083_2

Download citation

DOI: https://doi.org/10.1007/11847083_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-39094-7
Online ISBN: 978-3-540-39097-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics