Double Buffering for MCDRAM on Second Generation \(\hbox {Intel}^{\circledR }\) Xeon Phi\(^{\text {TM}}\) Processors with OpenMP

  • Stephen L. OlivierEmail author
  • Simon D. Hammond
  • Alejandro Duran
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10468)


Emerging novel architectures for shared memory parallel computing are incorporating increasingly creative innovations to deliver higher memory performance. A notable exemplar of this phenomenon is the Multi-Channel DRAM (MCDRAM) that is included in the \(\hbox {Intel}^{\circledR }\) XeonPhi\(^{\text {TM}}\) processors. In this paper, we examine techniques to use OpenMP to exploit the high bandwidth of MCDRAM by staging data. In particular, we implement double buffering using OpenMP sections and tasks to explicitly manage movement of data into MCDRAM. We compare our double-buffered approach to a non-buffered implementation and to Intel’s cache mode, in which the system manages the MCDRAM as a transparent cache. We also demonstrate the sensitivity of performance to parameters such as dataset size and the distribution of threads between compute and copy operations.



Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. We wish to acknowledge our appreciation for the use of the Advanced Architecture Test Bed, Bowman, at Sandia National Laboratories. The test beds are provided by NNSA’s Advanced Simulation and Computing (ASC) program for research and development of advanced architectures for exascale computing.

Disclaimers: Intel, Xeon, and Xeon Phi are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

* Other brands and names are the property of their respective owners.


  1. 1.
  2. 2.
    Bauer, M., Cook, H., Khailany, B.: CudaDMA: Optimizing GPU memory bandwidth via warp specialization. In: 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC11), pp. 12:1–12:11. ACM (2011)Google Scholar
  3. 3.
    Cantalupo, C., Venkatesan, V., Hammond, J., Czurylo, K., Hammond, S.: Memkind: an extensible heap memory manager for heterogeneous memory platforms and mixed memory policies.
  4. 4.
    Chen, T., Sura, Z., O’Brien, K., O’Brien, J.K.: Optimizing the use of static buffers for DMA on a CELL chip. In: Almási, G., Cascaval, C., Wu, P. (eds.) LCPC 2006. LNCS, vol. 4382, pp. 314–329. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-72521-3_23
  5. 5.
    Dokulil, J., Bajrovic, E., Benkner, S., Sandrieser, M., Bachmayer, B.: HyPHI - task based hybrid execution C++ library for the intel xeon phi coprocessor. In: 2013 International Conference on Parallel Processing, pp. 280–289 (2013)Google Scholar
  6. 6.
    Liu, F., Chaudhary, V.: Extending OpenMP for heterogeneous chip multiprocessors. In: 2003 International Conference on Parallel Processing, pp. 161–168, October 2003Google Scholar
  7. 7.
    OpenMP Architecture Review Board: OpenMP application programming interface, version 4.5.
  8. 8.
    OpenMP Architecture Review Board: OpenMP technical report 5: memory management support for OpenMP 5.0.
  9. 9.
    Perez, J.M., Bellens, P., Badia, R.M., Labarta, J.: CellSs: making it easier to program the cell broadband engine processor. IBM J. Res. Dev. 51(5), 593–604 (2007)CrossRefGoogle Scholar
  10. 10.
    Sancho, J.C., Kerbyson, D.J.: Analysis of double buffering on two different multicore architectures: quad-core opteron and the Cell-BE. In: 2008 IEEE International Symposium on Parallel and Distributed Processing, pp. 1–12, April 2008Google Scholar
  11. 11.
    Sewall, J., Pennycook, S., Duran, A., Tian, X., Narayanaswamy, R.: A modern memory management system for OpenMP. In: Third International Workshop on Accelerator Programming Using Directives, pp. 25–35. IEEE Press (2016)Google Scholar
  12. 12.
    Sodani, A., Gramunt, R., Corbal, J., Kim, H.S., Vinod, K., Chinthamani, S., Hutsell, S., Agarwal, R., Liu, Y.C.: Knights landing: second-generation intel xeon phi product. IEEE Micro 36(2), 34–46 (2016)CrossRefGoogle Scholar
  13. 13.
    Spafford, K., Meredith, J., Vetter, J.: Maestro: data orchestration and tuning for OpenCL devices. In: DÁmbra, P., Guarracino, M., Talia, D. (eds.) Euro-Par 2010. LNCS, vol. 6272, pp. 275–286. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-15291-7_26 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Stephen L. Olivier
    • 1
    Email author
  • Simon D. Hammond
    • 1
  • Alejandro Duran
    • 2
  1. 1.Center for Computing ResearchSandia National LaboratoriesAlbuquerqueUSA
  2. 2.Intel Corporation IberiaMadridSpain

Personalised recommendations