Toward a Core Design to Distribute an Execution on a Manycore Processor

  • Bernard GoossensEmail author
  • David Parello
  • Katarzyna Porada
  • Djallal Rahmoune
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9251)


This paper presents a parallel execution model and a core design to run C programs in parallel. The model automatically builds parallel flows of machine instructions from the run trace. It parallelizes instruction fetch, renaming, execution and retirement. Predictor based fetch is replaced by a fetch-decode-and-partly-execute stage able to compute in-order most of the control instructions. Tomasulo’s register renaming is extended to memory with a technique to match consumer/producer pairs. The Reorder Buffer is adapted to parallel retirement. A sum reduction code is used to illustrate the model and to give a short analytical evaluation of its performance potential.


Microarchitecture Parallelism Manycore Automatic parallelization 


  1. 1.
    Shun, J., Blelloch, G.E., Fineman, J.T., Gibbons, P.B., Kyrola, A., Simhadri, H.V., Tangwongsan, K.: Brief announcement: the problem based benchmark suite. In: Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2012, pp. 68–70 (2012)Google Scholar
  2. 2.
    Wall, D.W.: Limits of instruction-level parallelism. In: WRL Technical Note TN-15 (1990)Google Scholar
  3. 3.
    Tomasulo, R.M.: An efficient algorithm for exploiting multiple arithmetic units. IBM J. Res. Dev. 11, 25–33 (1967)CrossRefzbMATHGoogle Scholar
  4. 4.
    Tjaden, G.S., Flynn, M.J.: Detection and parallel execution of independent instructions. IEEE Trans. Comput. 19, 889–895 (1970)CrossRefGoogle Scholar
  5. 5.
    Nicolau, A., Fisher, J.: Measuring the parallelism available for very long instruction word architectures. IEEE Trans. Comput. C–33, 968–976 (1984)CrossRefGoogle Scholar
  6. 6.
    Austin, T.M., Sohi, G.S.: Dynamic dependency analysis of ordinary programs. In: Proceedings of the 19th Annual International Symposium on Computer Architecture, ISCA 1992, pp. 342–351 (1992)Google Scholar
  7. 7.
    Lam, M.S., Wilson, R.P.: Limits of control flow on parallelism. In: Proceedings of the 19th Annual International Symposium on Computer Architecture, ISCA 1992, pp. 46–57 (1992)Google Scholar
  8. 8.
    Moshovos, A., Breach, S.E., Vijaykumar, T.N., Sohi, G.S.: Dynamic speculation and synchronization of data dependences. In: Proceedings of the 24th Annual International Symposium on Computer Architecture, ISCA 1997, pp. 181–193 (1997)Google Scholar
  9. 9.
    Postiff, M.A., Greene, D.A., Tyson, G.S., Mudge, T.N.: The limits of instruction level parallelism in SPEC95 applications. In: CAN, vol. 27, pp. 31–34 (1999)Google Scholar
  10. 10.
    Cristal, A., Santana, O.J., Valero, M., Martínez, J.F.: Toward kilo-instruction processors. ACM Trans. Archit. Code Optim. 1, 389–417 (2004)CrossRefGoogle Scholar
  11. 11.
    Sharafeddine, M., Jothi, K., Akkary, H.: Disjoint out-of-order execution processor. ACM Trans. Archit. Code Optim. (TACO) 9, 19:1–19:32 (2012)Google Scholar
  12. 12.
    Goossens, B., Parello, D.: Limits of instruction-level parallelism capture. Procedia Comput. Sci. 18, 1664–1673 (2013). 2013 International Conference on Computational ScienceCrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Bernard Goossens
    • 1
    • 2
    Email author
  • David Parello
    • 1
    • 2
  • Katarzyna Porada
    • 1
    • 2
  • Djallal Rahmoune
    • 1
    • 2
  1. 1.DALIUPVDPerpignan Cedex 9France
  2. 2.LIRMM, CNRS: UMR 5506 - UM2Montpellier Cedex 5France

Personalised recommendations