Abstract
Application-specific processors are used to obtain the efficiency of fixed-function application-specific integrated circuits and flexibility of software implementations on programmable processors. The efficiency is achieved by tailoring the processor architecture according to the requirements of the application while the flexibility is provided by the programmability. In this chapter, we introduce a hardware/software codesign environment for developing application-specific processors, which is using processor templates based on the transport-triggering paradigm, hence the name transport-triggered architecture (TTA). Fast Fourier transform (FFT) is used as an example application to illustrate the customization. Specific features of FFTs are discussed, and we show how those can be exploited in FFT implementations. We have customized a TTA processor for FFT, and its energy efficiency is compared against several other FFT implementations to prove the potential of the concept.
References
Baek JH, Kim SD, Sunwoo MH (2008) SPOCS: application specific signal processor for OFDM communication systems. J Signal Process Syst 53(3):383–397. DOI 10.1007/s11265-008-0240-4
Chang WH, Nguyen TQ (2008) On the fixed-point accuracy analysis of FFT algorithms. IEEE Trans Signal Proc 56(10):4673–4682
Chang YN, Parhi KK (1999) Efficient FFT implementation using digit-serial arithmetic. In: Proceedings of IEEE international workshop signal processing system, Taipei, pp 645–653. DOI 10.1109/SIPS.1999. 822371
Chen CM, Hung CC, Huang YH (2010) An energy-efficient partial FFT processor for the OFDMA communication system. IEEE Trans Circuits Syst II 57(2):136–140. DOI 10.1109/TCSII.2010.2040318
Cheng KT, Wang YC (2011) Using mobile GPU for general-purpose computing: a case study of face recognition on smartphones. In: Proceedings of international symposium VLSI design automation test, Hsinchu, pp 1–4. DOI 10.1109/VDAT.2011.5783575
Chi JC, Chen SG (2004) An efficient FFT twiddle factor generator. In: Proceeding of European signal processing conference, Vienna, pp 1533–1536
Chu E, George, A (2000) Inside the FFT black box: serial and parallel fast Fourier transform algorithms. CRC Press, Boca Raton
Cichon G, Robelly P, Seidel H, Matúš E, Bronzel M, Fettweis G (2004) Synchronous transfer architecture (STA). In: Computer systems: architectures, modeling, and simulation. Lecture notes in computer science, vol 3133. Springer, Berlin/Heidelberg, pp 193–207. DOI 10. 1007/978-3-540-27776-7∖_36
Cooley JW, Tukey JW (1965) An algorithm for the machine calculation of complex Fourier series. Math Comput 19(90):297–301
Corporaal H (1997) Microprocessor architectures: from VLIW to TTA. Wiley, Chichester
Corporaal H, Mulder H (1991) MOVE: a framework for high-performance processor design. In: Proceedings of ACM/IEEE conference on supercomputing, Albuquerque, pp 692–701. DOI 10.1145/125826.126159
Dally W, Balfour J, Black-Shaffer D, Chen J, Harting R, Parikh V, Park J, Sheffield D (2008) Efficient embedded computing. Computer 41:27–32. DOI 10.1109/MC.2008.224
Despain AM (1974) Fourier transform computers using CORDIC iterations. IEEE Trans Comput C-23(10):993–1001. DOI 10.1109/T-C. 1974.223800
Fanucci L, Roncella R, Saletti R (2001) A sine wave digital synthesizer based on a quadratic approximation. In: Proceedings of IEEE international frequency control symposium PDA exhibition, pp 806–810. DOI 10.1109/FREQ.2001.956385
Garrido M, Grajal J (2007) Efficient memoryless CORDIC for FFT computation. In: Proceedings of IEEE international conference acoustics speech signal processing, Honolulu, vol 2, pp 113–116. DOI 10.1109/ ICASSP.2007.366185
Guan X, Fei Y, Lin H (2012) Hierarchical design of an application-specific instruction set processor for high-throughput and scalable FFT processing. IEEE Trans VLSI Syst 20(3):551–563. DOI 10. 1109/TVLSI.2011.2105512
Hasan M, Arslan T (2002) FFT coefficient memory reduction technique for OFDM applications. In: IEEE international conference acoustics speech signal process, Orlando, vol 1, pp 1085–1088
He Y, She D, Mesman B, Corporaal H (2011) MOVE-Pro: a low power and high code density TTA architecture. In: Proceedings of international conference on embedded computer system: architectures modeling simulation, pp 294–301. DOI 10.1109/SAMOS.2011.6045474
Heikkinen J, Takala J, Corporaal H (2009) Dictionary-based program compression on customizable processor architectures. Microprocess Microsyst 33(2):139–153. DOI 10.1016/j.micpro.2008.10.001
IEEE 802.16.1 (2012) IEEE standard for wireless MAN – advanced air interface for broadband wireless access systems. Std 802.16.1–2012. IEEE
Jääskeläinen P, Kultala H, Viitanen T, Takala J (2014) Code density and energy efficiency of exposed datapath architectures. J Signal Process Syst 1–16. DOI 10.1007/s11265-014-0924-x
Jääskeläinen P, de La Lama C, Huerta P, Takala J (2011) OpenCL-based design methodology for application-specific processors. Transactions on HiPEAC 5. Available online
Jääskeläinen P, de La Lama CS, Schnetter E, Raiskila K, Takala J, Berg H (2014) pocl: a performance-portable OpenCL implementation. Int J Parallel Prog 1–34. DOI 10.1007/s10766-014-0320-y
Jääskeläinen P, Salminen E, de La Lama C, Takala J, Ignacio Martinez J (2011) TCEMC: a co-design flow for application-specific multicores. In: Proceeding of international conference on embedded computer system: architectures modeling and simulations, Samos, pp 85–92. DOI 10.1109/SAMOS.2011.6045448
Jiang RM (2007) An area-efficient FFT architecture for OFDM digital video broadcasting. IEEE Trans Consum Electron 53(4):1322–1326. DOI 10.1109/TCE.2007.4429219
Johnson H, Burrus C (1984) An in-order, in-place radix-2 FFT. In: IEEE international conference on acoustics speech signal processing, vol 9, San Diego, pp 473–476. DOI 10.1109/ICASSP.1984.1172660
Johnsson SL, Krawitz RL, Frye R, MacDonald D (1989) A radix-2 FFT on connection machine. In: Proceeding of ACM/IEEE conference on supercomputing, Reno, pp 809–819. DOI 10.1145/76263.76355
Jui PC, Wey CL, Shiue MT (2013) Low-cost parallel FFT processors with conflict-free ROM-based twiddle factor generator for DVB-T2 applications. In: Proceedings of IEEE international midwest symposium circuits system, Columbus, pp 1003–1006
Lattner C, Adve V (2004) LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of the 2004 international symposium on code generation and optimization (CGO’04), Palo Alto
Ma Y, Wanhammar L (2000) A hardware efficient control of memory addressing for high-performance FFT processors. IEEE Trans Signal Process 48(3):917–921. DOI 10.1109/78.824693
Oppenheim AV, Schafer RW (2010) Discrete-time signal processing, 3rd edn. Pearson, Upper Saddle River
Pitkänen T, Partanen T, Takala J (2007) Low-power twiddle factor unit for FFT computation. In: Vassiliadis S, Berekovic M, Hämäläinen T (eds) Embedded computer systems: architectures, modeling, and simulation. Proceeding of 7th international workshop SAMOS VII, vol LNCS 4599. Springer, Berlin, pp 233–240. DOI 10. 1007/978-3-540-73625-7∖_9
Pitkänen T, Takala J (2011) Low-power application-specific processor for FFT computations. J Signal Process Syst 63(1):165–176. DOI 10. 1007/s11265-010-0528-z
Senthilvelan M, Sima M, Iancu D, Schulte M, Glossner J (2013) Instruction set extensions for matrix decompositions on software defined radio architectures. J Signal Process Syst 70:289–303. DOI 10.1007/ s11265-012-0665-7
Singleton R (1967) A method for computing the fast Fourier transform with auxiliary memory and limited high-speed memory. IEEE Trans Audio Electroacoust 15(2):91–98
Strang G (1994) Wavelets. Am Sci 82(3):250–255
Suleiman A, Saleh H, Hussein A, Akopian D (2008) A family of scalable FFT architectures and an implementation of 1024-point radix-2 FFT for real-time communications. In: IEEE international conference on computer design, Lake Tahoe, pp 321–327. DOI 10.1109/ICCD.2008. 4751880
Tang SN, Liao CH, Chang TY (2012) An area- and energy-efficient multimode FFT processor for WPAN/WLAN/WMAN systems. IEEE J Solid-State Circuits 47(6):1419–1435. DOI 10.1109/JSSC.2012.2187406
Tang Y, Qian L, Wang Y, Savaria Y (2003) A new memory reference reduction method for FFT implementation on DSP. In: Proceedings of ISCAS, Bangkok, vol 4, pp 496–499. DOI 10.1109/ISCAS.2003.1205932
TTA-based co-design environment (2015). http://tce.cs.tut.fi. Accessed: 15 Jan 2016
Texas Instruments, Inc. (2003) TMS320C64x DSP Library programmer’s reference, Dallas
Thuresson M, Själander M, Björk M, Svensson L, Larsson-Edefors P, Stenström P (2007) FlexCore: utilizing exposed datapath control for efficient computing. In: Proceedings of international conference on embedded computer system: architectures modeling simulation, Samos, pp 18–25. DOI 10.1109/ICSAMOS.2007.4285729
Viitanen T, Kultala H, Jääskeläinen P, Takala J (2014) Heuristics for greedy transport triggered architecture interconnect exploration. In: Proceedings of international conference compilers architecture synthesis embedded system, New Delhi, pp 2:1–2:7. DOI 10.1145/ 2656106.2656123
Volder JE (1959) The CORDIC trigonometric computing technique. IRE Trans Electron Comput EC–8(3):330–334. DOI 10.1109/TEC.1959. 5222693
Wang W, Li L, Zhang G, Liu D, Qiu J (2011) An application specific instruction set processor optimized for FFT. In: IEEE international midwest symposium circuits and systems, Seoul, pp 1–4. DOI 10.1109/ MWSCAS.2011.6026391
Wanhammar L (1999) DSP integrated circuits. Academic Press, San Diego
Yu CY, Chen SG, Chih JC (2006) Efficient CORDIC designs for multi-mode OFDM FFT. In: Proceedings IEEE international conference acoustics speech signal processing, vol 3, Toulouse, pp III-1036–III-1039. DOI 10.1109/ICASSP.2006.1660834
Acknowledgements
The authors thank the Finnish Funding Agency for Innovation in the context of the FiDiPro project StreamPro (decision no. 40142/14).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media Dordrecht
About this entry
Cite this entry
Takala, J., Jääskeläinen, P., Pitkänen, T. (2016). Codesign Case Study on Transport-Triggered Architectures. In: Ha, S., Teich, J. (eds) Handbook of Hardware/Software Codesign. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-7358-4_39-1
Download citation
DOI: https://doi.org/10.1007/978-94-017-7358-4_39-1
Received:
Accepted:
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-017-7358-4
Online ISBN: 978-94-017-7358-4
eBook Packages: Springer Reference EngineeringReference Module Computer Science and Engineering