The Memory-Bounded Speedup Model and Its Impacts in Computing

Sun, Xian-He; Lu, Xiaoyang

doi:10.1007/s11390-022-2911-1

The Memory-Bounded Speedup Model and Its Impacts in Computing

Review
Published: 31 January 2023

Volume 38, pages 64–79, (2023)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Xian-He Sun¹ &
Xiaoyang Lu¹

318 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

With the surge of big data applications and the worsening of the memory-wall problem, the memory system, instead of the computing unit, becomes the commonly recognized major concern of computing. However, this “memory-centric” common understanding has a humble beginning. More than three decades ago, the memory-bounded speedup model is the first model recognizing memory as the bound of computing and provided a general bound of speedup and a computing-memory trade-off formulation. The memory-bounded model was well received even by then. It was immediately introduced in several advanced computer architecture and parallel computing textbooks in the 1990’s as a must-know for scalable computing. These include Prof. Kai Hwang’s book “Scalable Parallel Computing” in which he introduced the memory-bounded speedup model as the Sun-Ni’s Law, parallel with the Amdahl’s Law and the Gustafson’s Law. Through the years, the impacts of this model have grown far beyond parallel processing and into the fundamental of computing. In this article, we revisit the memory-bounded speedup model and discuss its progress and impacts in depth to make a unique contribution to this special issue, to stimulate new solutions for big data applications, and to promote data-centric thinking and rethinking.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Wulf W A, McKee S A. Hitting the memory wall: Implications of the obvious. ACM SIGARCH Computer Architecture News, 1995, 23(1): 20-24. https://doi.org/10.1145/216585.216588.
Article Google Scholar
Sun X H, Ni L M. Scalable problems and memory-bounded speedup. Journal of Parallel and Distributed Computing, 1993, 19(1): 27-37. https://doi.org/10.1006/jpdc.1993.1087.
Article MATH Google Scholar
Sun X H, Ni L M. Another view on parallel speedup. In Proc. the 1990 ACM/IEEE Conference on Supercomputing, Nov. 1990, pp.324–333. https://doi.org/10.1109/SUPERC.1990.130037.
Amdahl G M. Validity of the single processor approach to achieving large scale computing capabilities. In Proc. the Spring Joint Computer Conference, Apr. 1967, pp.483–485. https://doi.org/10.1145/1465482.1465560.
Gustafson J L. Reevaluating Amdahl’s law. Communications of the ACM, 1988, 31(5): 532-533. https://doi.org/10.1145/42411.42415.
Article Google Scholar
Bashe C J, Johnson L R, Palmer J H, Pugh E W. IBM’s Early Computers. MIT Press, 1986.
Google Scholar
Sun X H, Chen Y. Reevaluating Amdahl’s law in the multicore era. Journal of Parallel and Distributed Computing, 2010, 70(2): 183-188. https://doi.org/10.1016/j.jpdc.2009.05.002.
Article MATH Google Scholar
Pan C Y, Naeemi A. System-level optimization and benchmarking of graphene PN junction logic system based on empirical CPI model. In Proc. the IEEE International Conference on IC Design & Technology, Jun. 2012. https://doi.org/10.1109/ICICDT.2012.6232850.
Kogge P M. Hardware Evolution Trends of Extreme Scale Computing. Technical Reprt, University of Notre Dame, South Bend, 2011.
Hennessy J L, Patterson D A. Computer Architecture: A Quantitative Approach (6th edition). Elsevier, 2017.
Liu Y H, Sun X H. LPM: A systematic methodology for concurrent data access pattern optimization from a matching perspective. IEEE Trans. Parallel and Distributed Systems, 2019, 30(11): 2478-2493. https://doi.org/10.1109/TPDS.2019.2912573.
Article Google Scholar
Lo Y J, Williams S, Straalen B V, Ligocki T J, Cordery M J, Wright N J, Hall M W, Oliker L. Roofline model toolkit: A practical tool for architectural and program analysis. In Proc. the 5th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, Nov. 2014, pp.129–148. https://doi.org/10.1007/978-3-319-17248-4_7.
Saini S, Chang J, Jin H Q. Performance evaluation of the Intel sandy bridge based NASA Pleiades using scientific and engineering applications. In Proc. the 4th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, Nov. 2013, pp.25–51. https://doi.org/10.1007/978-3-319-10214-6_2.
Sun X H, Gustafson J L. Toward a better parallel performance metric. Parallel Computing, 1991, 17(10/11): 1093-1109. https://doi.org/10.1016/S0167-8191(05)80028-6.
Article MATH Google Scholar
Kumar V, Singh V. Scalability of parallel algorithms for the all-pairs shortest-path problem. Journal of Parallel and Distributed Computing, 1991, 13(2): 124-138. https://doi.org/10.1016/0743-7315(91)90083-L.
Article Google Scholar
Kumar V, Grama A, Gupta A, Karypis G. Introduction to Parallel Computing: Design and Analysis of Algorithms. Benjamin-Cummings, 1994.
MATH Google Scholar
Sun X H, Chen Y, Wu M. Scalability of heterogeneous computing. In Proc. the International Conference on Parallel Processing (ICPP’05), Jun. 2005, pp.557–564. https://doi.org/10.1109/ICPP.2005.69.
Sun X H, Rover D T. Scalability of parallel algorithm-machine combinations. IEEE Trans. Parallel and Distributed Systems, 1994, 5(6): 599-613. https://doi.org/10.1109/71.285606.
Article Google Scholar
Sun X H, Pantano M, Fahringer T. Integrated range comparison for data-parallel compilation systems. IEEE Trans. Parallel and Distributed Systems, 1999, 10(5): 448-458. https://doi.org/10.1109/71.770134.
Article Google Scholar
Sun X H. Scalability versus execution time in scalable systems. Journal of Parallel and Distributed Computing, 2002, 62(2): 173-192. https://doi.org/10.1006/jpdc.2001.1773.
Article MATH Google Scholar
Hill M D, Marty M R. Amdahl’s law in the multicore era. Computer, 2008, 41(7): 33-38. https://doi.org/10.1109/MC.2008.209.
Article Google Scholar
Sun X H, Chen Y, Byna S. Scalable computing in the multicore era. In Proc. the 2008 International Symposium on Parallel Architectures, Algorithms and Programming, Sept. 2008.
Dwork C, Goldberg A, Naor M. On memory-bound functions for fighting spam. In Proc. the 23rd Annual International Cryptology Conference, Aug. 2003, pp.426–444. https://doi.org/10.1007/978-3-540-45146-4_25.
Abadi M, Burrows M, Manasse M, Wobber T. Moderately hard, memory-bound functions. ACM Trans. Internet Technology, 2005, 5(2): 299-327. https://doi.org/10.1145/1064340.1064341.
Article Google Scholar
Hart P E, Nilsson N J, Raphael B. A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Systems Science and Cybernetics, 1968, 4(2): 100- 107. https://doi.org/10.1109/TSSC.1968.300136.
Article Google Scholar
Korf R E. Depth-first iterative-deepening: An optimal admissible tree search. Artificial Intelligence, 1985, 27(1): 97-109. https://doi.org/10.1016/0004-3702(85)90084-0.
Article MathSciNet MATH Google Scholar
Korf R E, Reid M, Edelkamp S. Time complexity of iterative-deepening-A*. Artificial Intelligence, 2001, 129(1/2): 199-218. https://doi.org/10.1016/S0004-3702(01)00094-7.
Article MathSciNet MATH Google Scholar
Russell S. Efficient memory-bounded search methods. In Proc. the 10th European Conference on Artificial intelligence, Aug. 1992.
Lovinger J, Zhang X Q. Enhanced simplified memorybounded a star (SMA*+). In Proc. the 3rd Global Conference on Artificial Intelligence, Oct. 2017, pp.202–212. https://doi.org/10.29007/v7zc.
Seuken S, Zilberstein S. Memory-bounded dynamic programming for DEC-POMDPs. In Proc. the 20th International Joint Conference on Artifical Intelligence, Jan. 2007, pp.2009–2015.
Seuken S, Zilberstein S. Improved memory-bounded dynamic programming for decentralized pomdps. arXiv: 1206.5295, 2012. https://arxiv.org/abs/1206.5295, Dec. 2022.
Chen Z Y, Zhang W X, Deng Y C, Chen D D, Li Q. RMB-DPOP: Refining MB-DPOP by reducing redundant inferences. arXiv: 2002.10641, 2020. 10.48550/arXiv.2002.10641, Dec. 2022.
Brito I, Meseguer P. Improving DPOP with function filtering. In Proc. the 9th International Conference on Autonomous Agents and Multiagent Systems: Volume 1, May 2010, pp.141–148.
Petcu A, Faltings B. ODPOP: An algorithm for open/distributed constraint optimization. In Proc. the 21st National Conference on Artificial Intelligence, Jul. 2006, pp.703–708.
Petcu A, Faltings B. A hybrid of inference and local search for distributed combinatorial optimization. In Proc. the IEEE/WIC/ACM International Conference on Intelligent Agent Technology (IAT’07), Nov. 2007, pp.342–348. https://doi.org/10.1109/IAT.2007.12.
Petcu A, Faltings B. MB-DPOP: A new memory-bounded algorithm for distributed optimization. In Proc. the 20th International Joint Conference on Artifical Intelligence, Jan. 2007, pp.1452–1457.
Williams S W. Auto-tuning performance on multicore computers [Ph.D. Thesis]. University of California, Berkeley, 2008.
Williams S, Waterman A, Patterson D. Roofline: An insightful visual performance model for multicore architectures. Communications of the ACM, 2009, 52(4): 65-76. https://doi.org/10.1145/1498765.1498785.
Article Google Scholar
Lu X Y, Wang R J, Sun X H. APAC: An accurate and adaptive prefetch framework with concurrent memory access analysis. In Proc. the 38th IEEE International Conference on Computer Design (ICCD), Oct. 2020, pp.222–229. https://doi.org/10.1109/ICCD50377.2020.00048.
Lu X Y, Wang R J, Sun X H. Premier: A concurrencyaware pseudo-partitioning framework for shared last-level cache. In Proc. the 39th IEEE International Conference on Computer Design (ICCD), Oct. 2021, pp.391–394. https://doi.org/10.1109/ICCD53106.2021.00068.
Liu J, Espina P, Sun X H. A study on modeling and optimization of memory systems. Journal of Computer Science and Technology, 2021, 36(1): 71-89. https://doi.org/10.1007/s11390-021-0771-8.
Article Google Scholar
Glew A. MLP yes! ILP no. In Proc. ASPLOS Wild and Crazy Idea Session, Oct. 1998.
Qureshi M K, Lynch D N, Mutlu O, Patt Y N. A case for MLP-aware cache replacement. In Proc. the 33rd International Symposium on Computer Architecture (ISCA’06), Jun. 2006, pp.167–178. https://doi.org/10.1109/ISCA.2006.5.
Sun X H, Wang D W. Concurrent average memory access time. Computer, 2014, 47(5): 74-80. https://doi.org/10.1109/MC.2013.227.
Article Google Scholar
Najafi H, Lu X, Liu J, Sun X H. A generalized model for modern hierarchical memory system. In Proc. Winter Simulation Conference (WSC), Dec. 2022.
Lu X, Wang R, Sun X H. CARE: A concurrency-aware enhanced lightweight cache management framework. In Proc. the 29th IEEE International Symposium on High-Performance Computer Architecture (HPCA), Feb. 25–Mar. 1, 2023.
Yan L, Zhang M Z, Wang R J, Chen X M, Zou X Q, Lu X Y, Han Y H, Sun X H. CoPIM: A concurrency-aware PIM workload offloading architecture for graph applications. In Proc. IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), Jul. 2021. https://doi.org/10.1109/ISLPED52811.2021.9502483.
Zhang N, Jiang C T, Sun X H, Song S L. Evaluating GPGPU memory performance through the C-AMAT model. In Proc. the Workshop on Memory Centric Programming for HPC, Nov. 2017, pp.35–39. https://doi.org/10.1145/3145617.3158214.
Kannan S, Gavrilovska A, Schwan K, Milojicic D, Talwar V. Using active NVRAM for I/O staging. In Proc. the 2nd International Workshop on Petascal Data Analytics: Challenges and Opportunities, Nov. 2011, pp.15–22. https://doi.org/10.1145/2110205.2110209.
Caulfield A M, Grupp L M, Swanson S. Gordon: Using flash memory to build fast, power-efficient clusters for data-intensive applications. ACM SIGPLAN Notices, 2009, 44(3): 217-228. https://doi.org/10.1145/1508284.1508270.
Article Google Scholar
Reed D A, Dongarra J. Exascale computing and big data. Communications of the ACM, 2015, 58(7): 56-68. https://doi.org/10.1145/2699414.
Article Google Scholar
Shalf J, Dosanjh S, Morrison J. Exascale computing technology challenges. In Proc. the 9th International Conference on High Performance Computing for Computational Science, Jun. 2010. https://doi.org/10.1007/978-3-642-19328-6_1.
Kougkas A, Devarajan H, Sun X H. Hermes: A heterogeneous-aware multi-tiered distributed I/O buffering system. In Proc. the 27th International Symposium on High-Performance Parallel and Distributed Computing, Jun. 2018, pp.219–230. https://doi.org/10.1145/3208040.3208059.
Kougkas A, Devarajan H, Sun X H. I/O acceleration via multi-tiered data buffering and prefetching. Journal of Computer Science and Technology, 2020, 35(1): 92-120. https://doi.org/10.1007/s11390-020-9781-1.
Article Google Scholar
Tissenbaum M, Sheldon J, Abelson H. From computational thinking to computational action. Communications of the ACM, 2019, 62(3): 34-36. https://doi.org/10.1145/3265747.
Article Google Scholar
Liu Y H, Sun X H, Wang Y, Bao Y G. HCDA: From computational thinking to a generalized thinking paradigm. Communications of the ACM, 2021, 64(5): 66-75. https://doi.org/10.1145/3418291.
Article Google Scholar
Owens J D, Houston M, Luebke D, Green S, Stone J E, Phillips J C. GPU computing. Proceedings of the IEEE, 2008, 96(5): 879-899. https://doi.org/10.1109/JPROC.2008.917757.
Article Google Scholar
Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 2008, 51(1): 107-113. https://doi.org/10.1145/1327452.1327492.
Article Google Scholar
Momose H, Kaneko T, Asai T. Systems and circuits for AI chips and their trends. Japanese Journal of Applied Physics, 2020, 59(5): 050502. https://doi.org/10.35848/1347-4065/ab839f.
Article Google Scholar
Singh G, Alser M, Cali D S, Diamantopoulos D, Gómez- Luna J, Corporaal H, Mutlu O. FPGA-based near-memory acceleration of modern data-intensive applications. IEEE Micro, 2021, 41(4): 39-48. https://doi.org/10.1109/MM.2021.3088396.
Article Google Scholar
Choi Y K, Santillana C, Shen Y J, Darwiche A, Cong J. FPGA acceleration of probabilistic sentential decision diagrams with high-level synthesis. ACM Trans. Reconfigurable Technology and Systems, 2022. https://doi.org/10.1145/3561514.
Ghose S, Boroumand A, Kim J S, Gómez-Luna J, Mutlu O. Processing-in-memory: A workload-driven perspective. IBM Journal of Research and Development, 2019, 63(6): Article No. 3. https://doi.org/10.1147/JRD.2019.2934048.
Ghiasi N M, Park J, Mustafa H, Kim J, Olgun A, Gollwitzer A, Cali D S, Firtina C, Mao H Y, Alserr N A, Ausavarungnirun R, Vijaykumar N, Alser M, Mutlu O. GenStore: A high-performance in-storage processing system for genome sequence analysis. In Proc. the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Feb. 2022, pp.635–654. https://doi.org/10.1145/3503222.3507702.
Mutlu O. Intelligent architectures for intelligent computing systems. In Proc. the 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), Feb. 2021, pp.318–323. https://doi.org/10.23919/DATE51398.2021.9474073.
Sun X H, Liu Y H. Utilizing concurrency: A new theory for memory wall. In Proc. the 29th International Workshop on Languages and Compilers for Parallel Computing, Sept. 2016, pp.18–23. https://doi.org/10.1007/978-3-319-52709-3_2.
Kougkas A, Devarajan H, Lofstead J, Sun X H. LABIOS: A distributed label-based I/O system. In Proc. the 28th International Symposium on High-Performance Parallel and Distributed Computing, Jun. 2019, pp.13–24. https://doi.org/10.1145/3307681.3325405.
Logan L, Garcia J C, Lofstead J, Sun X H, Kougkas A. LabStor: A modular and extensible platform for developing high-performance, customized I/O stacks in userspace. In Proc. the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC’22), Nov. 2022, pp.309–323.
Hwang K, Xu Z W. Scalable Parallel Computing: Technology, Architecture, Programming. McGraw-Hill, 1998.
Hwang K. Advanced Computer Architecture: Parallelism, Scalability, Programmability. McGraw-Hill, 1993.

Download references

Author information

Authors and Affiliations

Department of Computer Science, Illinois Institute of Technology, Chicago, 60616, USA
Xian-He Sun & Xiaoyang Lu

Authors

Xian-He Sun
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyang Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xian-He Sun.

Supplementary Information

ESM 1

(PDF 447 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, XH., Lu, X. The Memory-Bounded Speedup Model and Its Impacts in Computing. J. Comput. Sci. Technol. 38, 64–79 (2023). https://doi.org/10.1007/s11390-022-2911-1

Download citation

Received: 17 October 2022
Accepted: 01 December 2022
Published: 31 January 2023
Issue Date: February 2023
DOI: https://doi.org/10.1007/s11390-022-2911-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Memory-Bounded Speedup Model and Its Impacts in Computing

Abstract

Access this article

References

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation