Performance Modelling and Optimization of Memory Access on Cellular Computer Architecture Cyclops64
This paper focuses on the Cyclops64 computer architecture and presents an analytical model and performance simulation results for the preloading and loop unrolling approaches to optimize the performance of SVD (Singular Value Decomposition) benchmark. A performance model for dissecting the total execution cycles is presented. The data preloading using “memcpy” or hand optimized “inline” assembly code, and the loop unrolling approach are implemented and compared with each other in terms of the total number of memory access cycles. The key idea is to preload data from offchip to onchip memory and store the data back after the computation. These approaches can reduce the total memory access cycles and can thus improve the benchmark performance significantly.
- 1.Cascaval, C., Castanos, J.G., Ceze, L., Denneau, M., Gupta, M., Lieber, D., Moreira, J.E., Strauss, K., Warren Jr., H.S.: Evaluation of a multithreaded architecture for cellular computing. In: HPCA 2002, pp. 311–322 (2002)Google Scholar
- 2.Almái, G., Cascaval, C., Castaños, J.G., Denneau, M., Lieber, D., Moreira, J.E., Warren, J.H.S.: Dissecting cyclops: a detailed analysis of a multithreaded architecture. In: MEDEA workshop, vol. 31, pp. 26–38 (2003)Google Scholar
- 3.Almasi, G.S., Caşcaval, C., Moreira, J.E., Denneau, M., Donath, W., Eleftheriou, M., Giampapa, M., Ho, H., Lieber, D., Newns, D., Snir, M., Henry, J., Warren, S.: Demonstrating the scalability of a molecular dynamics application on a petaflop computer. In: ICS 2001: Proceedings of the 15th international conference on Supercomputing, pp. 393–406. ACM Press, New York (2001)Google Scholar
- 4.del Cuvillo, J., Zhu, W., Hu, Z., Gao, G.R.: Fast: A functionally accurate simulation toolset for the cyclops-64 cellular architecture. In: Workshop on Modeling, Benchmarking and Simulation (MoBS), held in conjunction with the 32nd Annual Interantional Symposium on Computer Architecture (ISCA 2005), Madison, Wisconsin, June 4 (2005)Google Scholar
- 5.del Cuvillo, J., Zhu, W., Hu, Z., Gao, G.R.: Tiny threads: a thread virtual machine for the cyclops64 cellular architecture. In Fifth Workshop on Massively Parallel Processing (WMPP), held in conjunction with the 19th International Parallel and Distributed Processing System, Denver, Colorado, April 3-8 (2005)Google Scholar
- 6.del Cuvillo, J.B., Hu, Z., Zhu, W., Chen, F., Gao, G.R.: Toward a software infrastructure for the cyclops64 cellular architecture. CAPSL Memo 55, Department of ECE, Universisty of Delaware (2004)Google Scholar