An Efficient and Flexible Task Management for Many Cores

  • Yuan Nan
  • Yu Lei
  • Fan Dong-rui
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6760)


This paper presents the design and implementation of a runtime system (named “GodRunner”) on Godson-T many-core processor to support task-level parallelism efficiently and flexibly. GodRunner abstracts underlying hardware resource, providing ease-of-use programming interface. A two-grade task management mechanism is proposed to support both coarse-grained and fine-grained multithreading efficiently. Two load-balanced scheduling policies are combined flexibly in GodRunner. The software-controlled task management makes GodRunner more configurable and extensible than hard-wired ones. The experiment shows that the tasking overhead in GodRunner is as small as hundreds of cycles, which is about the hundreds of times faster than the conventional Pthread based multithreading on a SMP machine. Furthermore, our approach scales well and supports fine-grained tasks as small as 20k cycles optimally.


many-core architecture runtime system task management 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Huang, H., Yuan, N., Lin, W., et al.: Architecture Supported Synchronization-Based Cache Coherence Protocol For Many-Core Processors. In: 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, In Conjunction with the 35th International Symposium on Computer Architecture, Beijing, China (June 2008)Google Scholar
  2. 2.
    Iftode, L., Singh, J.P., Li, K.: Scope Consistency: A Bridge between Release Consistency and Entry Consistency. In: Proc. of the 8th Annual ACM Symposium on Parallel Algorithms and Architectures (1996)Google Scholar
  3. 3.
    Cuvillo, J.D., Zhu, W.R., Hu, Z., Gao, G.R.: TiNy threads: a thread virtual machine for the cyclops64 cellular architecture. In: Proceedings of 19th IEEE International Parallel and Distributed Processing Symposium, The Colorado, The USA (April 2005)Google Scholar
  4. 4.
    Kumar, S., Hughes, C.J., Nguyen, A.: Carbon: Architectural Support for Fine-Grained Parallelism on Chip Multiprocessors. In: Proceedings of 34th International Symposium on Computer Architecture, San Diego, California, USA (June 2007)Google Scholar
  5. 5.
    Palatin, P., Lhuillier, Y., Temam, O.: CAPSULE: hardware-assisted parallel execution of component-based Programs. In: Proceedings of 39th Annual IEEE/ACM International Symposium on Microarchitecture, Florida, USA (December 2006)Google Scholar
  6. 6.
    Chen, J., Juang, P., Ko, K., Contreras, G., Penry, D., Rangan, R., Stoler, A., Peh, L., Martonosi, M.: Hardware-Modulated Parallelism in Chip Multiprocessors. In: Proceedings of Workshop on Design, Architecture and Simulation of Chip Multi-Processors Conference (dasCMP), Spain, pp. 54–63 (November 2005)Google Scholar
  7. 7.
    Mueller, F.: Pthreads library interface. Technical report, Department of Computer Science, Florida State University (July 1993)Google Scholar
  8. 8.
    Rosenberg, J.: LWP user manual. Technical Report CMUITC- CMUITC-85-037, Information Technology Center, Carnegie- Mellon University (June 1985)Google Scholar
  9. 9.
    Nikolopoulos, D.S., Polychronopoulos, E.D., Papatheodorou, T.S.: Efficient runtime thread management for the Nano-Threads programming model. In: Proceedings of the 2nd IPPS/SPDP Workshop on Runtime Systems for Parallel Programming, Orlando, Florida, March 30, pp. 183–194 (1998)Google Scholar
  10. 10.
    Culler, D.E., Goldstein, S.C., Schauser, K.E., Eicken, T.V.: TAM – a compiler controlled threaded abstract machine. Journal of Parallel and Distributed Computing (July 1993)Google Scholar
  11. 11.
    Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the Cilk-5 multithreaded language. In: Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation (June 1998)Google Scholar
  12. 12.
    Theobald, K.B.: EARTH: An Efficient Architecture for Running Threads. PhD dissertation, McGill University (May 1999)Google Scholar
  13. 13.
    Woo, S.C., Ohara, M., Torrie, E., Pal Singh, J., Gupta, A.: The SPLASH-2 Programs: Characterization and Methodological Considerations. In: Proceedings of the 22nd International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, pp. 24–36 (June 1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Yuan Nan
    • 1
    • 2
  • Yu Lei
    • 1
    • 2
  • Fan Dong-rui
    • 1
  1. 1.Key Laboratory of Computer System and ArchitectureInstitute of Computing Technology, Chinese Academy of SciencesBeijingChina
  2. 2.Graduate University of Chinese Academy of SciencesBeijingChina

Personalised recommendations