Skip to main content

Software Simultaneous Multi-Threading, a Technique to Exploit Task-Level Parallelism to Improve Instruction- and Data-Level Parallelism

  • Conference paper
Integrated Circuit and System Design. Power and Timing Modeling, Optimization and Simulation (PATMOS 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4148))

Abstract

The search for energy efficiency in the design of embedded systems is leading toward CPUs with higher instruction-level and data-level parallelism. Unfortunately, individual applications do not have sufficient parallelism to keep all these CPU resources busy. Since embedded systems often consist of multiple tasks, task-level parallelism can be used for the purpose. Simultaneous multi-threading (SMT) proved a valuable technique to do so in high-performance systems, but it cannot be afforded in system with tight energy budgets. Moreover, it does not exploit data-level parallel hardware, and does not exploit the available information on threads.

We propose software-SMT (SW-SMT), a technique to exploit task-level parallelism to improve the utilization of both instruction-level and data-level parallel hardware, thereby improving performance. The technique performs simultaneous compilation of multiple threads at design-time, and it includes a run-time selection of the most efficient mixes.

We have applied the technique to two major blocks of a SDR (software-defined radio) application, achieving energy gains up to 46% on different ILP and DLP architectures. We show that the potentials of SW-SMT increase with SIMD datapath size and VLIW issue width.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Philips Research, Philips SiliconHive Avispa Accelerator, http://www.siliconhive.com

  2. Mei, B., Vernalde, S., Verkest, D., Man, H.D., Laurereins, R.: ADRES: An architecture with tightly coupled VLIW processor and coarse-grained reconfigurable matrix. In: Proc. of FPL (2003)

    Google Scholar 

  3. Lin, Y., Harel, Y., Woh, M., Baron, N., Lee, H., Mahlke, S., Mudge, T., Flautner, K.: A system solution for high-performance, low-power SDR. In: SDR Forum (2005)

    Google Scholar 

  4. Lee, H.-S., Lin, Y., Harel, Y., Woh, M., Mahlke, S.A., Mudge, T.N., Flautner, K.: Software defined radio – A high performance embedded challenge. In: Conte, T., Navarro, N., Hwu, W.-m.W., Valero, M., Ungerer, T. (eds.) HiPEAC 2005. LNCS, vol. 3793, pp. 6–26. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  5. Berkel, K.V., Heinle, F., Meuwissen, P., Moerman, K., Weiss, M.: Vector processing as an enabler for software-defined radio in handsets from 3G+WLAN onwards. In: Proc. Software Defined Radio Tech. Conf., pp. 125–130 (2004)

    Google Scholar 

  6. Alverson, R., Callahan, D., Cummings, D., Koblenz, B., Porterfield, A., Smith, B.: The Tera computer system. In: Proc. Intl. Conf. on Supercomputing, pp. 1–6 (1990)

    Google Scholar 

  7. Tullsen, D.M., Eggers, S.J., Levy, H.M.: Simultaneous multithreading: Maximizing on-chip parallelism. In: Proc. ISCA, pp. 392–403 (1995)

    Google Scholar 

  8. Koufaty, D., Marr, D.T.: Hyperthreading technology in the netburst microarchitecture. IEEE Micro 23(2), 56–65 (2003)

    Article  Google Scholar 

  9. Li, Y., Brooks, D., Hu, Z., Skadron, K., Bose, P.: Understanding the energy efficiency of simultaneous multithreading. In: Proc. ISLPED, pp. 44–49 (2004)

    Google Scholar 

  10. van der Horst, M., van Berkel, K., Lukkien, J., Mak, R.: Recursive filtering on a vector DSP with linear speedup. In: Proc. ASAP, pp. 23–25 (2005)

    Google Scholar 

  11. Thoen, F., Catthoor, F.: Modeling, Verification and Exploration of Task-level Concurrency in Real-time Embedded Systems. Kluwer Academic Publishing, Dordrecht (1999)

    Google Scholar 

  12. Ma, Z., Catthoor, F., Vounckx, J.: Hierarchical task scheduler for interleaving subtasks on heterogeneous multiprocessor platforms. In: Proc. ASP-DAC (2005)

    Google Scholar 

  13. Ma, Z.: Interleaved sub-task scheduling on multi-processor SoC. PhD thesis, Katholieke Universiteit Leuven (2006)

    Google Scholar 

  14. Parssinen, A.: System design for multi-standard radios. In: Proc. ISSCC (2006)

    Google Scholar 

  15. Sasanka, R.: Energy Efficient Support for All levels of Parallelism for Complex Media Applications. PhD thesis, University of Illinois at Urbana-Champaign (2005)

    Google Scholar 

  16. Hirata, H., Kimura, K., Nagamine, S., Mochizuki, Y., Nishimura, A., Nakase, Y., Nishizawa, T.: An elementary processor architecture with simultaneous instruction issuing from multiple threads. In: Proc. ISCA, pp. 136–145 (1992)

    Google Scholar 

  17. Seng, J.S., Tullsen, D.M., Cai, G.Z.: Power-sensitive multithreaded architecture. In: Proc. ICCD, pp. 199–208 (2000)

    Google Scholar 

  18. Corbal, J., Espasa, R., Valero, M.: DLP+TLP processors for the next generation of media workloads. In: Proc. HPCA, pp. 219–228 (2001)

    Google Scholar 

  19. Lo, J., Eggers, S., Emer, J., Levy, H., Stamm, R., Tullsen, D.: Converting thread-level parallelism into instruction-level parallelism via simultaneous multithreading. ACM Transactions on Computer Systems 15(5), 322–354 (1997)

    Article  Google Scholar 

  20. Özer, E., Conte, T.M., Sharma, S.: Weld: A multithreading technique towards latency-tolerant VLIW processors. In: Monien, B., Prasanna, V.K., Vajapeyam, S. (eds.) HiPC 2001. LNCS, vol. 2228, pp. 192–203. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  21. Ferreira, V.M.G., Yasuura, H.: Simultaneous multithreading vliw processor architecture. Technical report, Dept. of Computer Science and Communication Engineering, Kyushu University, Japan (2001)

    Google Scholar 

  22. Kaxiras, S., Narlikar, G., Berenbaum, A.D., Hu, Z.: Comparing power consumption of an smt and a cmp dsp for mobile phone workloads. In: Proc. CASES, pp. 211–220 (2001)

    Google Scholar 

  23. Op de Beeck, P., Barat, F., Jayapala, M., Lauwereins, R.: CRISP: A template for reconfigurable instruction set processors. In: Brebner, G., Woods, R. (eds.) FPL 2001. LNCS, vol. 2147, p. 296. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  24. Trimaran: An Infrastructure for Research in Instruction-Level Parallelism (1999), http://www.trimaran.org

  25. Cotterell, S., Vahid, F.: Synthesis of customized loop caches for core-based embedded systems. In: Proc. ICCAD (2002)

    Google Scholar 

  26. Jayapala, M., Barat, F., Aa, T.V., Catthoor, F., Corporaal, H., Deconinck, G.: Clustered loop buffer organization for low energy VLIW embedded processors. IEEE Transactions on Computers 54(6), 672–683 (2005)

    Article  Google Scholar 

  27. Scarpazza, D.P.: A Source-Level Estimation and Optimization Methodology for the Execution Time and Energy Consumption of Embedded Software. PhD thesis, Politecnico di Milano (May 2006), http://www.scarpaz.com/phd

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Scarpazza, D.P., Raghavan, P., Novo, D., Catthoor, F., Verkest, D. (2006). Software Simultaneous Multi-Threading, a Technique to Exploit Task-Level Parallelism to Improve Instruction- and Data-Level Parallelism. In: Vounckx, J., Azemard, N., Maurine, P. (eds) Integrated Circuit and System Design. Power and Timing Modeling, Optimization and Simulation. PATMOS 2006. Lecture Notes in Computer Science, vol 4148. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11847083_2

Download citation

  • DOI: https://doi.org/10.1007/11847083_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-39094-7

  • Online ISBN: 978-3-540-39097-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics