Skip to main content
Log in

The Memory-Bounded Speedup Model and Its Impacts in Computing

  • Review
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

With the surge of big data applications and the worsening of the memory-wall problem, the memory system, instead of the computing unit, becomes the commonly recognized major concern of computing. However, this “memory-centric” common understanding has a humble beginning. More than three decades ago, the memory-bounded speedup model is the first model recognizing memory as the bound of computing and provided a general bound of speedup and a computing-memory trade-off formulation. The memory-bounded model was well received even by then. It was immediately introduced in several advanced computer architecture and parallel computing textbooks in the 1990’s as a must-know for scalable computing. These include Prof. Kai Hwang’s book “Scalable Parallel Computing” in which he introduced the memory-bounded speedup model as the Sun-Ni’s Law, parallel with the Amdahl’s Law and the Gustafson’s Law. Through the years, the impacts of this model have grown far beyond parallel processing and into the fundamental of computing. In this article, we revisit the memory-bounded speedup model and discuss its progress and impacts in depth to make a unique contribution to this special issue, to stimulate new solutions for big data applications, and to promote data-centric thinking and rethinking.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

  1. Wulf W A, McKee S A. Hitting the memory wall: Implications of the obvious. ACM SIGARCH Computer Architecture News, 1995, 23(1): 20-24. https://doi.org/10.1145/216585.216588.

    Article  Google Scholar 

  2. Sun X H, Ni L M. Scalable problems and memory-bounded speedup. Journal of Parallel and Distributed Computing, 1993, 19(1): 27-37. https://doi.org/10.1006/jpdc.1993.1087.

    Article  MATH  Google Scholar 

  3. Sun X H, Ni L M. Another view on parallel speedup. In Proc. the 1990 ACM/IEEE Conference on Supercomputing, Nov. 1990, pp.324–333. https://doi.org/10.1109/SUPERC.1990.130037.

  4. Amdahl G M. Validity of the single processor approach to achieving large scale computing capabilities. In Proc. the Spring Joint Computer Conference, Apr. 1967, pp.483–485. https://doi.org/10.1145/1465482.1465560.

  5. Gustafson J L. Reevaluating Amdahl’s law. Communications of the ACM, 1988, 31(5): 532-533. https://doi.org/10.1145/42411.42415.

    Article  Google Scholar 

  6. Bashe C J, Johnson L R, Palmer J H, Pugh E W. IBM’s Early Computers. MIT Press, 1986.

    Google Scholar 

  7. Sun X H, Chen Y. Reevaluating Amdahl’s law in the multicore era. Journal of Parallel and Distributed Computing, 2010, 70(2): 183-188. https://doi.org/10.1016/j.jpdc.2009.05.002.

    Article  MATH  Google Scholar 

  8. Pan C Y, Naeemi A. System-level optimization and benchmarking of graphene PN junction logic system based on empirical CPI model. In Proc. the IEEE International Conference on IC Design & Technology, Jun. 2012. https://doi.org/10.1109/ICICDT.2012.6232850.

  9. Kogge P M. Hardware Evolution Trends of Extreme Scale Computing. Technical Reprt, University of Notre Dame, South Bend, 2011.

  10. Hennessy J L, Patterson D A. Computer Architecture: A Quantitative Approach (6th edition). Elsevier, 2017.

  11. Liu Y H, Sun X H. LPM: A systematic methodology for concurrent data access pattern optimization from a matching perspective. IEEE Trans. Parallel and Distributed Systems, 2019, 30(11): 2478-2493. https://doi.org/10.1109/TPDS.2019.2912573.

    Article  Google Scholar 

  12. Lo Y J, Williams S, Straalen B V, Ligocki T J, Cordery M J, Wright N J, Hall M W, Oliker L. Roofline model toolkit: A practical tool for architectural and program analysis. In Proc. the 5th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, Nov. 2014, pp.129–148. https://doi.org/10.1007/978-3-319-17248-4_7.

  13. Saini S, Chang J, Jin H Q. Performance evaluation of the Intel sandy bridge based NASA Pleiades using scientific and engineering applications. In Proc. the 4th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, Nov. 2013, pp.25–51. https://doi.org/10.1007/978-3-319-10214-6_2.

  14. Sun X H, Gustafson J L. Toward a better parallel performance metric. Parallel Computing, 1991, 17(10/11): 1093-1109. https://doi.org/10.1016/S0167-8191(05)80028-6.

    Article  MATH  Google Scholar 

  15. Kumar V, Singh V. Scalability of parallel algorithms for the all-pairs shortest-path problem. Journal of Parallel and Distributed Computing, 1991, 13(2): 124-138. https://doi.org/10.1016/0743-7315(91)90083-L.

    Article  Google Scholar 

  16. Kumar V, Grama A, Gupta A, Karypis G. Introduction to Parallel Computing: Design and Analysis of Algorithms. Benjamin-Cummings, 1994.

    MATH  Google Scholar 

  17. Sun X H, Chen Y, Wu M. Scalability of heterogeneous computing. In Proc. the International Conference on Parallel Processing (ICPP’05), Jun. 2005, pp.557–564. https://doi.org/10.1109/ICPP.2005.69.

  18. Sun X H, Rover D T. Scalability of parallel algorithm-machine combinations. IEEE Trans. Parallel and Distributed Systems, 1994, 5(6): 599-613. https://doi.org/10.1109/71.285606.

    Article  Google Scholar 

  19. Sun X H, Pantano M, Fahringer T. Integrated range comparison for data-parallel compilation systems. IEEE Trans. Parallel and Distributed Systems, 1999, 10(5): 448-458. https://doi.org/10.1109/71.770134.

    Article  Google Scholar 

  20. Sun X H. Scalability versus execution time in scalable systems. Journal of Parallel and Distributed Computing, 2002, 62(2): 173-192. https://doi.org/10.1006/jpdc.2001.1773.

    Article  MATH  Google Scholar 

  21. Hill M D, Marty M R. Amdahl’s law in the multicore era. Computer, 2008, 41(7): 33-38. https://doi.org/10.1109/MC.2008.209.

    Article  Google Scholar 

  22. Sun X H, Chen Y, Byna S. Scalable computing in the multicore era. In Proc. the 2008 International Symposium on Parallel Architectures, Algorithms and Programming, Sept. 2008.

  23. Dwork C, Goldberg A, Naor M. On memory-bound functions for fighting spam. In Proc. the 23rd Annual International Cryptology Conference, Aug. 2003, pp.426–444. https://doi.org/10.1007/978-3-540-45146-4_25.

  24. Abadi M, Burrows M, Manasse M, Wobber T. Moderately hard, memory-bound functions. ACM Trans. Internet Technology, 2005, 5(2): 299-327. https://doi.org/10.1145/1064340.1064341.

    Article  Google Scholar 

  25. Hart P E, Nilsson N J, Raphael B. A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Systems Science and Cybernetics, 1968, 4(2): 100- 107. https://doi.org/10.1109/TSSC.1968.300136.

    Article  Google Scholar 

  26. Korf R E. Depth-first iterative-deepening: An optimal admissible tree search. Artificial Intelligence, 1985, 27(1): 97-109. https://doi.org/10.1016/0004-3702(85)90084-0.

    Article  MathSciNet  MATH  Google Scholar 

  27. Korf R E, Reid M, Edelkamp S. Time complexity of iterative-deepening-A*. Artificial Intelligence, 2001, 129(1/2): 199-218. https://doi.org/10.1016/S0004-3702(01)00094-7.

    Article  MathSciNet  MATH  Google Scholar 

  28. Russell S. Efficient memory-bounded search methods. In Proc. the 10th European Conference on Artificial intelligence, Aug. 1992.

  29. Lovinger J, Zhang X Q. Enhanced simplified memorybounded a star (SMA*+). In Proc. the 3rd Global Conference on Artificial Intelligence, Oct. 2017, pp.202–212. https://doi.org/10.29007/v7zc.

  30. Seuken S, Zilberstein S. Memory-bounded dynamic programming for DEC-POMDPs. In Proc. the 20th International Joint Conference on Artifical Intelligence, Jan. 2007, pp.2009–2015.

  31. Seuken S, Zilberstein S. Improved memory-bounded dynamic programming for decentralized pomdps. arXiv: 1206.5295, 2012. https://arxiv.org/abs/1206.5295, Dec. 2022.

  32. Chen Z Y, Zhang W X, Deng Y C, Chen D D, Li Q. RMB-DPOP: Refining MB-DPOP by reducing redundant inferences. arXiv: 2002.10641, 2020. 10.48550/arXiv.2002.10641, Dec. 2022.

  33. Brito I, Meseguer P. Improving DPOP with function filtering. In Proc. the 9th International Conference on Autonomous Agents and Multiagent Systems: Volume 1, May 2010, pp.141–148.

  34. Petcu A, Faltings B. ODPOP: An algorithm for open/distributed constraint optimization. In Proc. the 21st National Conference on Artificial Intelligence, Jul. 2006, pp.703–708.

  35. Petcu A, Faltings B. A hybrid of inference and local search for distributed combinatorial optimization. In Proc. the IEEE/WIC/ACM International Conference on Intelligent Agent Technology (IAT’07), Nov. 2007, pp.342–348. https://doi.org/10.1109/IAT.2007.12.

  36. Petcu A, Faltings B. MB-DPOP: A new memory-bounded algorithm for distributed optimization. In Proc. the 20th International Joint Conference on Artifical Intelligence, Jan. 2007, pp.1452–1457.

  37. Williams S W. Auto-tuning performance on multicore computers [Ph.D. Thesis]. University of California, Berkeley, 2008.

  38. Williams S, Waterman A, Patterson D. Roofline: An insightful visual performance model for multicore architectures. Communications of the ACM, 2009, 52(4): 65-76. https://doi.org/10.1145/1498765.1498785.

    Article  Google Scholar 

  39. Lu X Y, Wang R J, Sun X H. APAC: An accurate and adaptive prefetch framework with concurrent memory access analysis. In Proc. the 38th IEEE International Conference on Computer Design (ICCD), Oct. 2020, pp.222–229. https://doi.org/10.1109/ICCD50377.2020.00048.

  40. Lu X Y, Wang R J, Sun X H. Premier: A concurrencyaware pseudo-partitioning framework for shared last-level cache. In Proc. the 39th IEEE International Conference on Computer Design (ICCD), Oct. 2021, pp.391–394. https://doi.org/10.1109/ICCD53106.2021.00068.

  41. Liu J, Espina P, Sun X H. A study on modeling and optimization of memory systems. Journal of Computer Science and Technology, 2021, 36(1): 71-89. https://doi.org/10.1007/s11390-021-0771-8.

    Article  Google Scholar 

  42. Glew A. MLP yes! ILP no. In Proc. ASPLOS Wild and Crazy Idea Session, Oct. 1998.

  43. Qureshi M K, Lynch D N, Mutlu O, Patt Y N. A case for MLP-aware cache replacement. In Proc. the 33rd International Symposium on Computer Architecture (ISCA’06), Jun. 2006, pp.167–178. https://doi.org/10.1109/ISCA.2006.5.

  44. Sun X H, Wang D W. Concurrent average memory access time. Computer, 2014, 47(5): 74-80. https://doi.org/10.1109/MC.2013.227.

    Article  Google Scholar 

  45. Najafi H, Lu X, Liu J, Sun X H. A generalized model for modern hierarchical memory system. In Proc. Winter Simulation Conference (WSC), Dec. 2022.

  46. Lu X, Wang R, Sun X H. CARE: A concurrency-aware enhanced lightweight cache management framework. In Proc. the 29th IEEE International Symposium on High-Performance Computer Architecture (HPCA), Feb. 25–Mar. 1, 2023.

  47. Yan L, Zhang M Z, Wang R J, Chen X M, Zou X Q, Lu X Y, Han Y H, Sun X H. CoPIM: A concurrency-aware PIM workload offloading architecture for graph applications. In Proc. IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), Jul. 2021. https://doi.org/10.1109/ISLPED52811.2021.9502483.

  48. Zhang N, Jiang C T, Sun X H, Song S L. Evaluating GPGPU memory performance through the C-AMAT model. In Proc. the Workshop on Memory Centric Programming for HPC, Nov. 2017, pp.35–39. https://doi.org/10.1145/3145617.3158214.

  49. Kannan S, Gavrilovska A, Schwan K, Milojicic D, Talwar V. Using active NVRAM for I/O staging. In Proc. the 2nd International Workshop on Petascal Data Analytics: Challenges and Opportunities, Nov. 2011, pp.15–22. https://doi.org/10.1145/2110205.2110209.

  50. Caulfield A M, Grupp L M, Swanson S. Gordon: Using flash memory to build fast, power-efficient clusters for data-intensive applications. ACM SIGPLAN Notices, 2009, 44(3): 217-228. https://doi.org/10.1145/1508284.1508270.

    Article  Google Scholar 

  51. Reed D A, Dongarra J. Exascale computing and big data. Communications of the ACM, 2015, 58(7): 56-68. https://doi.org/10.1145/2699414.

    Article  Google Scholar 

  52. Shalf J, Dosanjh S, Morrison J. Exascale computing technology challenges. In Proc. the 9th International Conference on High Performance Computing for Computational Science, Jun. 2010. https://doi.org/10.1007/978-3-642-19328-6_1.

  53. Kougkas A, Devarajan H, Sun X H. Hermes: A heterogeneous-aware multi-tiered distributed I/O buffering system. In Proc. the 27th International Symposium on High-Performance Parallel and Distributed Computing, Jun. 2018, pp.219–230. https://doi.org/10.1145/3208040.3208059.

  54. Kougkas A, Devarajan H, Sun X H. I/O acceleration via multi-tiered data buffering and prefetching. Journal of Computer Science and Technology, 2020, 35(1): 92-120. https://doi.org/10.1007/s11390-020-9781-1.

    Article  Google Scholar 

  55. Tissenbaum M, Sheldon J, Abelson H. From computational thinking to computational action. Communications of the ACM, 2019, 62(3): 34-36. https://doi.org/10.1145/3265747.

    Article  Google Scholar 

  56. Liu Y H, Sun X H, Wang Y, Bao Y G. HCDA: From computational thinking to a generalized thinking paradigm. Communications of the ACM, 2021, 64(5): 66-75. https://doi.org/10.1145/3418291.

    Article  Google Scholar 

  57. Owens J D, Houston M, Luebke D, Green S, Stone J E, Phillips J C. GPU computing. Proceedings of the IEEE, 2008, 96(5): 879-899. https://doi.org/10.1109/JPROC.2008.917757.

    Article  Google Scholar 

  58. Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 2008, 51(1): 107-113. https://doi.org/10.1145/1327452.1327492.

    Article  Google Scholar 

  59. Momose H, Kaneko T, Asai T. Systems and circuits for AI chips and their trends. Japanese Journal of Applied Physics, 2020, 59(5): 050502. https://doi.org/10.35848/1347-4065/ab839f.

    Article  Google Scholar 

  60. Singh G, Alser M, Cali D S, Diamantopoulos D, Gómez- Luna J, Corporaal H, Mutlu O. FPGA-based near-memory acceleration of modern data-intensive applications. IEEE Micro, 2021, 41(4): 39-48. https://doi.org/10.1109/MM.2021.3088396.

    Article  Google Scholar 

  61. Choi Y K, Santillana C, Shen Y J, Darwiche A, Cong J. FPGA acceleration of probabilistic sentential decision diagrams with high-level synthesis. ACM Trans. Reconfigurable Technology and Systems, 2022. https://doi.org/10.1145/3561514.

  62. Ghose S, Boroumand A, Kim J S, Gómez-Luna J, Mutlu O. Processing-in-memory: A workload-driven perspective. IBM Journal of Research and Development, 2019, 63(6): Article No. 3. https://doi.org/10.1147/JRD.2019.2934048.

  63. Ghiasi N M, Park J, Mustafa H, Kim J, Olgun A, Gollwitzer A, Cali D S, Firtina C, Mao H Y, Alserr N A, Ausavarungnirun R, Vijaykumar N, Alser M, Mutlu O. GenStore: A high-performance in-storage processing system for genome sequence analysis. In Proc. the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Feb. 2022, pp.635–654. https://doi.org/10.1145/3503222.3507702.

  64. Mutlu O. Intelligent architectures for intelligent computing systems. In Proc. the 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), Feb. 2021, pp.318–323. https://doi.org/10.23919/DATE51398.2021.9474073.

  65. Sun X H, Liu Y H. Utilizing concurrency: A new theory for memory wall. In Proc. the 29th International Workshop on Languages and Compilers for Parallel Computing, Sept. 2016, pp.18–23. https://doi.org/10.1007/978-3-319-52709-3_2.

  66. Kougkas A, Devarajan H, Lofstead J, Sun X H. LABIOS: A distributed label-based I/O system. In Proc. the 28th International Symposium on High-Performance Parallel and Distributed Computing, Jun. 2019, pp.13–24. https://doi.org/10.1145/3307681.3325405.

  67. Logan L, Garcia J C, Lofstead J, Sun X H, Kougkas A. LabStor: A modular and extensible platform for developing high-performance, customized I/O stacks in userspace. In Proc. the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC’22), Nov. 2022, pp.309–323.

  68. Hwang K, Xu Z W. Scalable Parallel Computing: Technology, Architecture, Programming. McGraw-Hill, 1998.

  69. Hwang K. Advanced Computer Architecture: Parallelism, Scalability, Programmability. McGraw-Hill, 1993.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xian-He Sun.

Supplementary Information

ESM 1

(PDF 447 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, XH., Lu, X. The Memory-Bounded Speedup Model and Its Impacts in Computing. J. Comput. Sci. Technol. 38, 64–79 (2023). https://doi.org/10.1007/s11390-022-2911-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-022-2911-1

Keywords

Navigation