Skip to main content

Multi-threading in Uni-threaded Processor

  • Chapter
  • First Online:
  • 519 Accesses

Abstract

This chapter introduces the concept of executing multiple incompatible loops in parallel and thereby enabling multi-threading in an efficient way in a VLIW processor. The proposed multi-threading is enabled by the use of a distributed instruction memory organization with a minimal hardware overhead. This forms one of the core contributions of this book. It also shows how the proposed instruction memory hierarchy extension can both improve performance as well as reduce the energy consumption compared to state-of-the-art simultaneous multi-threaded (SMT) architectures over various DSP benchmarks. The chapter also shows that the proposed architecture can be compiled for.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R.Banakar, S.Steinke, B.Lee, M.Balakrishnan, and P.Marwedel. Scratchpad memory: design alternative for cache on-chip memory in embedded systems. Proc. of the 10th Intnl. Symposium on Hardware/software Codesign, CODES’02, pages 73–78, May 2002.

    Google Scholar 

  2. M.Kandemir, I.Kadayif, A.Choudhary, J.Ramanujam, and I.Kolcu. Compiler-directed scratch pad memory optimization for embedded multiprocessors. IEEE Trans on VLSI, pages 281–287, March 2004.

    Google Scholar 

  3. S.Rixner, W.Dally, B.Khialany, P.Mattson, U.Kapasi, and J.Owens. Register organization for media processing. Proc. of 26th Intnl. Symposium on High-Performance Computer Architecture (HiPC), pages 375–386, January 2000.

    Google Scholar 

  4. V.Lapinskii, M.Jacome, and G.de Veciana. Application-specific clustered vliw datapaths: Early exploration on a parameterized design space. IEEE Trans. on Computer Aided Design of Integrated Circuits and Systems, 21(8):889–903, August 2002.

    Article  Google Scholar 

  5. M.Jayapala, F.Barat, T.Vander Aa, F.Catthoor, H.Corporaal, and G.Deconinck. Clustered loop buffer organization for low energy VLIW embedded processors. IEEE Trans. on Computers, 54(6):672–683, June 2005.

    Article  Google Scholar 

  6. A.Halambi, A.Shrivastava, P.Biswas, N.Dutt, and A.Nicolau. An efficient compiler technique for code size reduction using reduced bit-width isas. Proc. of Design Automation Conf. (DAC), March 2002.

    Google Scholar 

  7. T.Conte, S.Banerjia, S.Larin, and K.Menezes. Instruction fetch mechanisms for vliw architectures with compressed encodings. Proc. of 29th Intnl. Symposium on Microarchitecture (MICRO), December 1996.

    Google Scholar 

  8. H.DeMan. Ambient intelligence: Giga-scale dreams and nano-scale realities. Proc. of ISSCC, Keynote Speech, February 2005.

    Google Scholar 

  9. Texas Instruments, Inc, http://www.ti.com. TMS320C6000 CPU and Instruction Set Reference Guide, October 2000.

  10. STMicroelectronics, http://www.st.com. ST120 DSP-MCU Programming Manual, December 2000.

  11. AT & T, http://www.att.com. AT&T DSP1600 Microprocessor, 1990.

  12. J.Eyre and J.Bier. Infineon’s tricore tackles dsp. Article 13/5, Microprocessor Report, April 1999.

    Google Scholar 

  13. W.Dally. Low power architectures. IEEE Intnl. Solid State Circuits Conf., Panel Talk on “When Processors Hit the Power Wall”, February 2005.

    Google Scholar 

  14. M.Joshi, NS. Nagaraj, and A.Hill. Impact of interconnect scaling and process variations on performance. Proc. of CMOS Emerging Technologies, 2006.

    Google Scholar 

  15. D.Sylvester and K.Keutzer. Getting to the bottom of deep submicron ii: a global wiring paradigm. ISPD ’99: Proceedings of the 1999 international symposium on Physical design, pages 193–200, New York, NY, USA, 1999. ACM.

    Google Scholar 

  16. M.Jayapala. Low Energy Instruction Memory Organization. Doctoral dissertation, ESAT/EE Dept., K.U.Leuven, Belgium, Sep. 2005.

    Google Scholar 

  17. T.Van der Aa. Low Energy Instruction Memory Exploration. PhD thesis, KULeuven, ESAT/ELECTA, 2005.

    Google Scholar 

  18. F.Quillere, S.Rajopadhye, and D.Wilde. Generation of efficient nested loops from polyhedra. Intl. Journal on Parallel Programming, 2000.

    Google Scholar 

  19. J.Gómez, P.Marchal, S.Verdoorlaege, L.Piñuel, and F.Catthoor. Optimizing the memory bandwidth with loop morphing. Proc. of ASAP wsh, pages 213–223, 2004.

    Google Scholar 

  20. M.Palkovic, E.Brockmeyer, P.Vanbroekhoven, H.Corporaal, F.Catthoor, “Systematic Preprocessing of Data Dependent Constructs for Embedded Systems”, Proc. IEEE Wsh. on Power and Timing Modeling, Optimization and Simulation (PATMOS), Leuven, Belgium, Lecture Notes Comp. Sc., Springer-Verlag, Vol.3728, pp.89–90, Sep. 2005.

    Google Scholar 

  21. S.Cotterell and F.Vahid. Synthesis of customized loop caches for core-based embedded systems. Proc. of Intnl. Conf. on Computer Aided Design (ICCAD), November 2002.

    Google Scholar 

  22. J.Sias, H.Hunter, and W.Hwu. Enhancing loop buffering of media and telecommunications applications using low-overhead predication. Proc. of 34th Annual Intnl. Symposium on Microarchitecture (MICRO), December 2001.

    Google Scholar 

  23. S.Steinke, L.Wehmeyer, B.Lee, and P.Marwedel. Assigning program and data objects to scratchpad for energy reduction. Design Automation and Test in Europe (DATE), pages 409–414, March 2002.

    Google Scholar 

  24. Y.Kobayashi. Low Power Design Method for Embedded Systems Using VLIW Processor. PhD thesis, Graduate School of Inforamation Science and Technology at Osaka University, July 2007.

    Google Scholar 

  25. Starcore DSP Techology, http://www.starcore-dsp.com. SC140 DSP Core Reference Manual, June 2000.

  26. D.Scarpazza, P.Raghavan, D.Novo, F.Catthoor, and D.Verkest. Software simultaneous multi-threading, a technique to exploit task-level parallelism to improve instruction- and data-level parallelism. Proc. of PATMOS. Springer Verlag LNCS, Sep. 2006.

    Google Scholar 

  27. E.Ozer and T.M.Conte. High-performance and low-cost dual thread VLIW processor using weld architecture paradigm. IEEE Trans. on Parallel and Distributed Systems, volume 16(12), December 2005.

    Google Scholar 

  28. S.Kaxiras, G.Narlikar, A.Berenbaum, and Z.Hu. Comparing power consumption of an SMT and a CMP DSP for mobile phone workloads. Proc. of Intnl. Conf. on Compilers, Architecture, and Synthesis for Smbedded Systems (CASES), pages 211–220, November 2001.

    Google Scholar 

  29. D.Tullsen, S.Eggers, and H.Levy. Simultaneous multithreading: Maximizing on-chip parallelism. Proc. of Intnl. Symposium on Computer Architecture (ISCA), pages 392–403, June 1995.

    Google Scholar 

  30. ARM, http://www.arm.com/products/physicalip/memory.html. Artisan Memory Generator.

  31. TI DSP Benchmark Suite. http://focus.ti.com/docs/toolsw/folders/print/sprc092.html, 2009.

  32. M.Palkovic and A.Folens. Mapping of the 40mhz WLAN SDM receiver on the FLAI ADRES baseband engine. Apollo Deliverable 200803_DE_SDR_BB_D41̇, IMEC vzw, April 2008.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francky Catthoor .

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media B.V.

About this chapter

Cite this chapter

Catthoor, F., Raghavan, P., Lambrechts, A., Jayapala, M., Kritikakou, A., Absar, J. (2010). Multi-threading in Uni-threaded Processor. In: Ultra-Low Energy Domain-Specific Instruction-Set Processors., vol 0. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-9528-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-90-481-9528-2_6

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-9527-5

  • Online ISBN: 978-90-481-9528-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics