Eliminating Dark Bandwidth: A Data-Centric View of Scalable, Efficient Performance, Post-Moore

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10524)


Most of computing research has focused on the computing technologies themselves versus how full systems make use of them (e.g., memory fabric, interconnect, software, and compute elements combined). Technologists have largely failed to look at the compute system as a whole, instead optimizing subsystems mostly in isolation. The result, for example, is that systems are built where applications can only ask for a fixed multiple of data (e.g., 64-bytes from DRAM), even if what is required is far less. This is efficient from a hardware interface perspective, however, it results in consuming valuable bandwidth that is never utilized by the core; this hidden bandwidth is effectively dark to the system. The causes of dark bandwidth are systemic, built into the very core of our virtual memory abstractions and memory interfaces. Continued focus on newer, revolutionary memory technologies to improve surface performance characteristics without a systems focus on reducing data movement will simply push this problem off onto future systems. This paper examines the problem of dark bandwidth and offers a holistic approach to reduce overall data movement within future compute systems.


  1. 1.
    Data Movement Dominates. Accessed Mar 2017
  2. 2.
    Dongarra, J., Heroux, M.A.: Toward a new metric for ranking high performance computing systems. Sandia Report, SAND2013-4744 312 (2013)Google Scholar
  3. 3.
    Dongarra, J.J., Moler, C.B., Bunch, J.R., Stewart, G.W.: LINPACK Users’ Guide. Society for Industrial and Applied Mathematics, Philadelphia (1979)CrossRefzbMATHGoogle Scholar
  4. 4.
    Henning, J.L.: SPEC CPU2006 benchmark descriptions. ACM SIGARCH Comput. Architect. News 34(4), 1–17 (2006)CrossRefGoogle Scholar
  5. 5.
    Jacob, B., Ng, S., Wang, D.: Memory Systems: Cache, DRAM, Disk. Morgan Kaufmann (2010)Google Scholar
  6. 6.
    Karakostas, V., Gandhi, J., Cristal, A., Hill, M.D., McKinley, K.S., Nemirovsky, M., Swift, M.M., Unsal, O.S.: Energy-efficient address translation. In: 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 631–643. IEEE (2016)Google Scholar
  7. 7.
    Kestor, G., Gioiosa, R., Kerbyson, D.J., Hoisie, A.: Quantifying the energy cost of data movement in scientific applications. In: 2013 IEEE International Symposium on Workload Characterization (IISWC) (2013)Google Scholar
  8. 8.
    Lloyd, S., Gokhale, M.: In-memory data rearrangement for irregular, data-intensive computing. Computer 48(8), 18–25 (2015). doi: 10.1109/MC.2015.230 CrossRefGoogle Scholar
  9. 9.
    Markov, I.L.: Limits on fundamental limits to computation. Nature 512(7513), 147–154 (2014)CrossRefGoogle Scholar
  10. 10.
    Srinivasan, J.R.: Improving cache utilisation. Technical report, University of Cambridge, Computer Laboratory (2011)Google Scholar
  11. 11.
    Vesely, J., Basu, A., Oskin, M., Loh, G.H., Bhattacharjee, A.: Observations and opportunities in architecting shared virtual memory for heterogeneous systems. In: 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 161–171. IEEE (2016)Google Scholar
  12. 12.
    Wang, L., Zhan, J., Luo, C., Zhu, Y., Yang, Q., He, Y., Gao, W., Jia, Z., Shi, Y., Zhang, S., et al.: Bigdatabench: a big data benchmark suite from internet services. In: 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), pp. 488–499. IEEE (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.ARM ResearchAustinUSA

Personalised recommendations