Skip to main content

Toward a Core Design to Distribute an Execution on a Manycore Processor

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9251))

Abstract

This paper presents a parallel execution model and a core design to run C programs in parallel. The model automatically builds parallel flows of machine instructions from the run trace. It parallelizes instruction fetch, renaming, execution and retirement. Predictor based fetch is replaced by a fetch-decode-and-partly-execute stage able to compute in-order most of the control instructions. Tomasulo’s register renaming is extended to memory with a technique to match consumer/producer pairs. The Reorder Buffer is adapted to parallel retirement. A sum reduction code is used to illustrate the model and to give a short analytical evaluation of its performance potential.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The stack in each section keeps its local variables, e.g. temp on Fig. 5.

  2. 2.

    “Good” model with a 2 K instructions window size, 64 instructions issued per cycle, 256 renaming registers, a branch predictor based on an infinite number of 2-bits counters and a perfect memory aliasing disambiguation.

  3. 3.

    “Perfect” model enhances “good” model: infinite renaming, perfect branch predictor.

  4. 4.

    Hosting core choice to optimize load balancing is out of the scope of this paper.

  5. 5.

    Memory renaming duplicates same address based stack frames. This allows multiple sections to update their local variables in their frames in parallel.

  6. 6.

    In the sum example, the conditional branches are all computed in the fetch stage, allowing the parallelization of the fetch by fetching fastly the fork instructions.

  7. 7.

    Stores update full lines. The loader sets a cleared line and loops to update it successively with t[0] up to t[4]. The full line right padded with zeros is exported to its first consumer, i.e. section 1. Sections 2 and 3 get section 1 cached copy.

  8. 8.

    The oldest section, i.e. the only one with no predecessor, dumps its renamings to the data memory hierarchy (DMH). When it receives a renaming request which misses, it loads from DMH and exports the loaded line.

  9. 9.

    15 cycles is the fetch time of instructions (Fig. 5) 2, 3, 8-10 (5 cycles), the creation time of the forked section (2 cycles), the fetch time of instructions 11-16 (5 cycles) and the retirement of instructions 17-19 (3 cycles).

References

  1. Shun, J., Blelloch, G.E., Fineman, J.T., Gibbons, P.B., Kyrola, A., Simhadri, H.V., Tangwongsan, K.: Brief announcement: the problem based benchmark suite. In: Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2012, pp. 68–70 (2012)

    Google Scholar 

  2. Wall, D.W.: Limits of instruction-level parallelism. In: WRL Technical Note TN-15 (1990)

    Google Scholar 

  3. Tomasulo, R.M.: An efficient algorithm for exploiting multiple arithmetic units. IBM J. Res. Dev. 11, 25–33 (1967)

    Article  MATH  Google Scholar 

  4. Tjaden, G.S., Flynn, M.J.: Detection and parallel execution of independent instructions. IEEE Trans. Comput. 19, 889–895 (1970)

    Article  Google Scholar 

  5. Nicolau, A., Fisher, J.: Measuring the parallelism available for very long instruction word architectures. IEEE Trans. Comput. C–33, 968–976 (1984)

    Article  Google Scholar 

  6. Austin, T.M., Sohi, G.S.: Dynamic dependency analysis of ordinary programs. In: Proceedings of the 19th Annual International Symposium on Computer Architecture, ISCA 1992, pp. 342–351 (1992)

    Google Scholar 

  7. Lam, M.S., Wilson, R.P.: Limits of control flow on parallelism. In: Proceedings of the 19th Annual International Symposium on Computer Architecture, ISCA 1992, pp. 46–57 (1992)

    Google Scholar 

  8. Moshovos, A., Breach, S.E., Vijaykumar, T.N., Sohi, G.S.: Dynamic speculation and synchronization of data dependences. In: Proceedings of the 24th Annual International Symposium on Computer Architecture, ISCA 1997, pp. 181–193 (1997)

    Google Scholar 

  9. Postiff, M.A., Greene, D.A., Tyson, G.S., Mudge, T.N.: The limits of instruction level parallelism in SPEC95 applications. In: CAN, vol. 27, pp. 31–34 (1999)

    Google Scholar 

  10. Cristal, A., Santana, O.J., Valero, M., Martínez, J.F.: Toward kilo-instruction processors. ACM Trans. Archit. Code Optim. 1, 389–417 (2004)

    Article  Google Scholar 

  11. Sharafeddine, M., Jothi, K., Akkary, H.: Disjoint out-of-order execution processor. ACM Trans. Archit. Code Optim. (TACO) 9, 19:1–19:32 (2012)

    Google Scholar 

  12. Goossens, B., Parello, D.: Limits of instruction-level parallelism capture. Procedia Comput. Sci. 18, 1664–1673 (2013). 2013 International Conference on Computational Science

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bernard Goossens .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Goossens, B., Parello, D., Porada, K., Rahmoune, D. (2015). Toward a Core Design to Distribute an Execution on a Manycore Processor. In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2015. Lecture Notes in Computer Science(), vol 9251. Springer, Cham. https://doi.org/10.1007/978-3-319-21909-7_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-21909-7_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-21908-0

  • Online ISBN: 978-3-319-21909-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics