Can Manycores Support the Memory Requirements of Scientific Applications?

  • Milan Pavlovic
  • Yoav Etsion
  • Alex Ramirez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6161)


Manycores are very effective in scaling parallel computational performance. However, it is not clear if current memory technologies can scale to support such highly parallel processors.

In this paper, we examine the memory bandwidth and footprint required by a number of high-performance scientific applications. We find such applications require a per-core memory bandwidth of ~ 300MB/s, and have a memory footprint of some 300MB per-core.

When comparing these requirements with the limitations of state-of-the-art DRAM technology, we project that in the scientific domain, current memory technologies will likely scale well to support more than ~ 100 cores on a single chip, but may become a performance bottleneck for manycores consisting of more than 200 cores.


Memory Access Parallel Application Memory Bandwidth Single Chip Spectralelement Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agarwal, A., Bao, L., Brown, J., Edwards, B., Mattina, M., Miao, C.C., Ramey, C., Wentzlaff, D.: Tile processor: Embedded multicore for networking and multimedia. In: Hot Chips (August 2007)Google Scholar
  2. 2.
    Alam, S.R., Barrett, R.F., Kuehn, J.A., Roth, P.C., Vetter, J.S.: Characterization of scientific workloads on systems with multi-core processors. In: Intl. Symp. on Workload Characterization, pp. 225–236 (October 2006)Google Scholar
  3. 3.
    Bhadauria, M., Weaver, V.M., McKee, S.A.: Understanding parsec performance on contemporary cmps. In: Intl. Symp. on Workload Characterization (October 2009)Google Scholar
  4. 4.
    Gonzalez, J., Gimenez, J., Labarta, J.: Automatic detection of parallel applications computation phases. In: International on Parallel and Distributed Processing Symposium, vol. 0, pp. 1–11 (2009)Google Scholar
  5. 5.
    Jacob, B., Ng, S.W., Wang, D.T.: Memory Systems: Cache, DRAM, Disk. Morgan Kaufmann, Burlington (2008)Google Scholar
  6. 6.
    Kalla, R., Sinharoy, B., Starke, W.J., Floyd, M.: Power7: IBM’s next-generation server processor. IEEE Micro 30, 7–15 (2010)CrossRefGoogle Scholar
  7. 7.
    Kongetira, P., Aingaran, K., Olukotun, K.: Niagara: A 32-way multithreaded Sparc processor. IEEE Micro 25, 21–29 (2005)CrossRefGoogle Scholar
  8. 8.
    Kottapalli, S., Baxter, J.: Nehalem-EX CPU architecture. In: Hot Chips (August 2009)Google Scholar
  9. 9.
    Liu, L., Li, Z., Sameh, A.H.: Analyzing memory access intensity in parallel programs on multicore. In: Intl. Conf. on Supercomputing, pp. 359–367 (2008)Google Scholar
  10. 10.
    Marc Casas, R.M.B., Labarta, J.: Automatic structure extraction from mpi applications tracefiles. LNCS, pp. 3–12. Springer, Heidelberg (2007)Google Scholar
  11. 11.
    Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Dubey, P., Junkins, S., Lake, A., Cavin, R., Espasa, R., Grochowski, E., Juan, T., Abrash, M., Sugerman, J., Hanrahan, P.: Larrabee: A many-core x86 architecture for visual computing. IEEE Micro 29(1), 10–21 (2009)CrossRefGoogle Scholar
  12. 12.
    Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 programs: characterization and methodological considerations. In: Intl. Symp. on Computer Architecture, pp. 24–36 (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Milan Pavlovic
    • 1
  • Yoav Etsion
    • 1
  • Alex Ramirez
    • 1
    • 2
  1. 1.Barcelona Supercomputing Center (BSC-CNS)Spain
  2. 2.Universitat Politècnica de Catalunya (UPC)Spain

Personalised recommendations