A Minimal Average Accessing Time Scheduler for Multicore Processors

  • Thomas Canhao Xu
  • Pasi Liljeberg
  • Hannu Tenhunen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7017)

Abstract

In this paper, we study and analyze process scheduling for multicore processors. It is expected that hundreds of cores will be integrated on a single chip, known as a Chip Multiprocessor (CMP). However, operating system process scheduling, one of the most important design issue for CMP systems, has not been well addressed. We define a model for future CMPs, based on which a minimal average accessing time scheduling algorithm is proposed to reduce on-chip communication latencies and improve performance. The impact of memory access and inter process communication (IPC) in scheduling are analyzed. We explore six typical core allocation strategies. Results show that, a strategy with the minimal average accessing time of both core-core and core-memory outperforms other strategies, the overall performance for three applications (FFT, LU and H.264) has improved for 8.23%, 4.81% and 10.21% respectively comparing with other strategies.

Keywords

Fast Fourier Transform Schedule Algorithm Allocation Strategy Multicore Processor Memory Controller 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Benini, L., Micheli, G.D.: Networks on chips: A new soc paradigm. IEEE Computer 35(1), 70–78 (2002)CrossRefGoogle Scholar
  2. 2.
    Intel: Single-chip cloud computer (May 2010), http://techresearch.intel.com/articles/Tera-Scale/1826.htm
  3. 3.
    Corporation, T. (August 2010), http://www.tilera.com
  4. 4.
    Scott, T.L., Mary, K.V.: The performance of multiprogrammed multiprocessor scheduling algorithms. In: Proc. of the 1990 ACM SIGMETRICS, pp. 226–236 (1990)Google Scholar
  5. 5.
    Hakem, M., Butelle, F.: Dynamic critical path scheduling parallel programs onto multiprocessors. In: Proceedings of 19th IEEE IPDPS, p. 203b (2005)Google Scholar
  6. 6.
    Sharma, D.D., Pradhan, D.K.: Processor allocation in hypercube multicomputers: Fast and efficient strategies for cubic and noncubic allocation. IEEE TPDS 6(10), 1108–1123 (1995)Google Scholar
  7. 7.
    Laudon, J., Lenoski, D.: The sgi origin: a ccnuma highly scalable server. In: Proc. of the 24th ISCA, pp. 241–251 (June 1997)Google Scholar
  8. 8.
    Abts, D., Jerger, N.D.E., Kim, J., Gibson, D., Lipasti, M.H.: Achieving predictable performance through better memory controller placement in many-core cmps. In: Proc. of the 36th ISCA (2009)Google Scholar
  9. 9.
    Chen, Y.J., Yang, C.L., Chang, Y.S.: An architectural co-synthesis algorithm for energy-aware network-on-chip design. J. Syst. Archit. 55(5-6), 299–309 (2009)CrossRefGoogle Scholar
  10. 10.
    Hu, J., Marculescu, R.: Energy-aware communication and task scheduling for network-on-chip architectures under real-time constraints. In: DATE 2004 (2004)Google Scholar
  11. 11.
    Lei, T., Kumar, S.: A two-step genetic algorithm for mapping task graphs to a network on chip architecture. In: DSD, pp. 180–187 (September 2003)Google Scholar
  12. 12.
    Global, H.: Ddr 2 memory controller ip core for fpga and asic (June 2010), http://www.hitechglobal.com/ipcores/ddr2controller.htm
  13. 13.
    Kim, Y., Han, D., Mutlu, O., Harchol-Balter, M.: Atlas: A scalable and high-performance scheduling algorithm for multiple memory controllers. In: 2010 IEEE 16th HPCA, pp. 1–12 (2010)Google Scholar
  14. 14.
    Awasthi, M., Nellans, D.W., Sudan, K., Balasubramonian, R., Davis, A.: Handling the problems and opportunities posed by multiple on-chip memory controllers. In: Proceedings of the 19th PACT, pp. 319–330. ACM, New York (2010)Google Scholar
  15. 15.
    Gaeke, B.R., Husbands, P., Li, X.S., Oliker, L., Yelick, K.A., Biswas, R.: Memory-intensive benchmarks: Iram vs. cache-based machines. In: Proc. of the 16th IPDPSGoogle Scholar
  16. 16.
    Schmid, P., Roos, A.: Core i7 memory scaling: From ddr3-800 to ddr3-1600 (2009), Tom’s HardwareGoogle Scholar
  17. 17.
    Bailey, D.H.: Ffts in external or hierarchical memory. The Journal of Supercomputing 4, 23–35 (1990), doi:10.1007/BF00162341CrossRefGoogle Scholar
  18. 18.
    Woo, S.C., Singh, J.P., Hennessy, J.L.: The performance advantages of integrating block data transfer in cache-coherent multiprocessors. In: ASPLOS-VI, pp. 219–229. ACM, New York (1994)CrossRefGoogle Scholar
  19. 19.
    Bienia, C., Kumar, S., Singh, J.P., Li, K.: The parsec benchmark suite: characterization and architectural implications. In: Proc. of 17th PACT (October 2008)Google Scholar
  20. 20.
    Xu, T., Yin, A., Liljeberg, P., Tenhunen, H.: A study of 3d network-on-chip design for data parallel h.264 coding. In: NORCHIP, pp. 1–6 (November 2009)Google Scholar
  21. 21.
    Pereira, F.C., Ebrahimi, T.: The MPEG-4 Book. Prentice Hall, Englewood Cliffs (2002)Google Scholar
  22. 22.
    Magnusson, P., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., Werner, B.: Simics: A full system simulation platform. Computer 35(2), 50–58 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Thomas Canhao Xu
    • 1
    • 2
  • Pasi Liljeberg
    • 1
    • 2
  • Hannu Tenhunen
    • 1
    • 2
  1. 1.Turku Center for Computer ScienceTurkuFinland
  2. 2.Department of Information TechnologyUniversity of TurkuTurkuFinland

Personalised recommendations