PaCT 2015: Parallel Computing Technologies pp 390-404 | Cite as
Toward a Core Design to Distribute an Execution on a Manycore Processor
Abstract
This paper presents a parallel execution model and a core design to run C programs in parallel. The model automatically builds parallel flows of machine instructions from the run trace. It parallelizes instruction fetch, renaming, execution and retirement. Predictor based fetch is replaced by a fetch-decode-and-partly-execute stage able to compute in-order most of the control instructions. Tomasulo’s register renaming is extended to memory with a technique to match consumer/producer pairs. The Reorder Buffer is adapted to parallel retirement. A sum reduction code is used to illustrate the model and to give a short analytical evaluation of its performance potential.
Keywords
Microarchitecture Parallelism Manycore Automatic parallelizationReferences
- 1.Shun, J., Blelloch, G.E., Fineman, J.T., Gibbons, P.B., Kyrola, A., Simhadri, H.V., Tangwongsan, K.: Brief announcement: the problem based benchmark suite. In: Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2012, pp. 68–70 (2012)Google Scholar
- 2.Wall, D.W.: Limits of instruction-level parallelism. In: WRL Technical Note TN-15 (1990)Google Scholar
- 3.Tomasulo, R.M.: An efficient algorithm for exploiting multiple arithmetic units. IBM J. Res. Dev. 11, 25–33 (1967)CrossRefMATHGoogle Scholar
- 4.Tjaden, G.S., Flynn, M.J.: Detection and parallel execution of independent instructions. IEEE Trans. Comput. 19, 889–895 (1970)CrossRefGoogle Scholar
- 5.Nicolau, A., Fisher, J.: Measuring the parallelism available for very long instruction word architectures. IEEE Trans. Comput. C–33, 968–976 (1984)CrossRefGoogle Scholar
- 6.Austin, T.M., Sohi, G.S.: Dynamic dependency analysis of ordinary programs. In: Proceedings of the 19th Annual International Symposium on Computer Architecture, ISCA 1992, pp. 342–351 (1992)Google Scholar
- 7.Lam, M.S., Wilson, R.P.: Limits of control flow on parallelism. In: Proceedings of the 19th Annual International Symposium on Computer Architecture, ISCA 1992, pp. 46–57 (1992)Google Scholar
- 8.Moshovos, A., Breach, S.E., Vijaykumar, T.N., Sohi, G.S.: Dynamic speculation and synchronization of data dependences. In: Proceedings of the 24th Annual International Symposium on Computer Architecture, ISCA 1997, pp. 181–193 (1997)Google Scholar
- 9.Postiff, M.A., Greene, D.A., Tyson, G.S., Mudge, T.N.: The limits of instruction level parallelism in SPEC95 applications. In: CAN, vol. 27, pp. 31–34 (1999)Google Scholar
- 10.Cristal, A., Santana, O.J., Valero, M., Martínez, J.F.: Toward kilo-instruction processors. ACM Trans. Archit. Code Optim. 1, 389–417 (2004)CrossRefGoogle Scholar
- 11.Sharafeddine, M., Jothi, K., Akkary, H.: Disjoint out-of-order execution processor. ACM Trans. Archit. Code Optim. (TACO) 9, 19:1–19:32 (2012)Google Scholar
- 12.Goossens, B., Parello, D.: Limits of instruction-level parallelism capture. Procedia Comput. Sci. 18, 1664–1673 (2013). 2013 International Conference on Computational ScienceCrossRefGoogle Scholar