Abstract
As technology scaling reduces pace and energy efficiency becomes a new important design constraint, superscalar processor designs are reaching their performance limits due to area and power restrictions. As a result, new microarchitectural paradigms need to be developed. This work proposes a new organization for x86 processors, based on a traditional superscalar design coupled to a reconfigurable array. The system exploits the fact that few basic blocks are responsible for most of the instructions that execute in the processor, and transforms these basic blocks into configurations for the reconfigurable array. Each configuration encodes the semantics and dependencies for all instructions in the block, so that the ones already mapped can execute bypassing the fetch, decode and dependency checks stages and improving instruction throughput. Our study on the potential of the architecture shows that performance gains of up to 2.5\(\times \) with respect to a traditional superscalar can be achieved.
Similar content being viewed by others
References
Altman E, Kaeli D, Sheffer Y (2000) Welcome to the opportunities of binary translation. Computer 33(3):40–45. doi:10.1109/2.825694
Beck ACS, Rutzig MB, Gaydadjiev G, Carro L (2008) Transparent reconfigurable acceleration for heterogeneous embedded applications. In: Proceedings of the conference on design, automation and test in Europe (DATE ’08). ACM Press, p 1208. doi:10.1145/1403375.1403669
Beck ACS, Carro L (2010) Dynamic reconfigurable architectures and transparent optimization techniques. Springer, Berlin
Beck ACS, Lisboa CAL, Carro L (2012) Adaptable embedded systems. Springer, London
Beck ACS, Rutzig MB, Carro L (2014) A transparent and adaptive reconfigurable system. Microprocess Microsyst 38(5):509–524. doi:10.1016/j.micpro.2014.03.004
Berticelli Lo T, Beck ACS, Rutzig MB, Carro L (2010) A low-energy approach for context memory in reconfigurable systems. In: 2010 IEEE international symposium on parallel & distributed processing, workshops and phd forum (IPDPSW), IEEE, pp 1–8. doi:10.1109/IPDPSW.2010.5470745
Borkar S, Chien AA (2011) The future of microprocessors. Commun ACM 54(5):67. doi:10.1145/1941487.1941507
Clark N, Kudlur M, Mahlke S, Flautner K (2004) Application-specific processing on a general-purpose core via transparent instruction set customization. In: 37th international symposium on microarchitecture (MICRO-37’04), pp 30–40. doi:10.1109/MICRO.2004.5
Compton K, Hauck S (2002) Reconfigurable computing: a survey of systems and software. ACM Comput Surv 34(2):171–210. doi:10.1145/508352.508353
Dixon M, Hammarlund P, Jourdan S, Singhal R (2010) The next generation intel core microarchitecture. Intel Technol J 14(3):8–28
Fajardo J, Rutzig MB, Carro L, Beck ACS (2013) Towards a multiple-ISA embedded system. J Syst Archit 59(2):103–119. doi:10.1016/j.sysarc.2012.10.001
Flynn M, Hung P (2005) Microprocessor design issues: thoughts on the road ahead. Micro IEEE 25(3):16–31. doi:10.1109/MM.2005.56
Folegnani D, Gonzalez A (2001) Energy-effective issue logic. In: Proceedings 28th annual international symposium on computer architecture. IEEE Computer Society, pp 230–239. doi:10.1109/ISCA.2001.937452
Gupta SB, Feng S, Ansari A, Mahlke S, August D (2011) Bundled execution of recurring traces for energy-efficient general purpose processing. In: Proceedings of the annual international symposium on microarchitecture (MICRO), pp 12–23. doi:10.1145/2155620.2155623
Guthaus M, Ringenberg J, Ernst D, Austin T, Mudge T, Brown R (2001) MiBench: A free, commercially representative embedded benchmark suite. In: Proceedings of the fourth annual IEEE international workshop on workload characterization. WWC-4 (Cat. No.01EX538), pp 3–14. doi:10.1109/WWC.2001.990739
Henessy JL, David A Patterson (2011) Computer architecture: a quantitative approach, 5th edn. Morgan Kaufmann, San Francisco
Hinton G, Sager D, Upton M, Boggs D, Carmean D, Kyker A, Roussel P (2001) The microarchitecture of the Pentium 4 processor. Intel Technol J 5(1):1–13
Intel (2014) Intel 64 and IA-32 Architectures optimization reference manual. http://www.intel.com.br/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf
Lysecky R, Stitt G, Vahid F (2006) Warp processors. ACM Trans Des Autom Electron Syst 11(3):659–681. doi:10.1145/1142980.1142986
Olukotun K, Hammond L (2005) The future of microprocessors. Queue 3(7):26. doi:10.1145/1095408.1095418
Rotenberg E, Bennett S, Smith JE (1996) Trace cache: a low latency approach to high bandwidth instruction fetching. In: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture, pp 24–35
Ubal R, Jang B, Mistry P, Schaa D, Kaeli D (2012) Multi2Sim: a simulation framework for CPU-GPU computing. In: Proceedings of the 21st international conference on Parallel architectures and compilation techniques (PACT ’12). ACM Press, New York, p 335. doi:10.1145/2370816.2370865
Wall DW (1991) Limits of instruction-level parallelism. ACM SIGPLAN Notices 26(4):176–188. doi:10.1145/106973.106991
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Brandalero, M., Beck, A.C.S. Potential analysis of a superscalar core employing a reconfigurable array for improving instruction-level parallelism. Des Autom Embed Syst 20, 155–169 (2016). https://doi.org/10.1007/s10617-016-9174-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10617-016-9174-4