Abstract
The Cydra 5 is a VLIW minisupercomputer with hardware designed to accelerate a broad class of inner loops, presenting unique challenges to its compilers. We discuss the organization of its Fortran/77 compiler and several of the key approaches developed to fully exploit the hardware. These include the intermediate representation used; the preparation, overlapped scheduling, and register allocation of inner loops; the speculative execution model used to control global code motion; and the machine model and local instruction scheduling approach.
Similar content being viewed by others
References
Allen, J.R., Kennedy, K., Porterfield, C., and Warren, J. 1983. Conversion of control dependence to data dependence. InConf. Proc. — 10th Annual Symp. on Principles of Programming Languages (Jan.), pp. 177–189.
Banerjee, U. 1976. Data dependence in ordinary programs. Tech. Rept. no. UIUCDCS-R-76-837 (M.S. thesis), Dept. of Comp. Sci., Univ. of Ill., Urbana-Champaign, Ill.
Beck, G.R., Yen, D.W.L., and Anderson, T.L. 1993. The Cydra-5 minisupercomputer: Architecture and implementation.The J. Supercomputing, 7, 1/2:143–180.
Callahan, D., Carr, S., and Kennedy, K. 1990. Improving register allocation for subscripted variables. InProc., SIGPLAN '90 Conf. on Programming Language Design and Implementation (White Plains, N.Y., June), pp. 53–65.
Charlesworth, A.E. 1981. An approach to scientific array processing: The architectural design of the AP-120B/ FPS-164.IEEE Computer, 14, 9:18–27.
Chow, F.C. 1983. A portable machine-independent global optimizer—Design and measurements. Tech. Rept. no. 83–254 (Ph.D. diss.), Stanford Univ., Stanford, Calif.
Colwell, R.P., Nix, R.P., O'Donnell, J.J., Papworth, D.B., and Rodman, P.K. 1987. A VLIW architecture for a trace scheduling compiler. InProc., 2nd Internat. Conf. on Architectural Support for Programming Languages and Operating Systems (Palo Alto, Calif., Oct. 5–8), pp. 180–192.
Dehnert, J.C., Hsu, P.Y-T., and Bratt, J.P. 1989. Overlapped loop support in the Cydra 5. InProc., 3rd Internat. Conf. on Architectural Support for Programming Languages and Operating Systems (Boston, Apr. 3–6), pp. 26–38.
Ellis, J. 1986.Bulldog: A Compiler for VLIW Architectures. MIT Press, Cambridge, Mass.
Fisher, J.A. 1981. Trace scheduling: A technique for global microcode compaction.IEEE Trans. Comps., C-30, 7:478–490.
Fisher, J.A. 1983. Very long instruction word architecture and the ELI-512. InProc., 10th Annual Internat. Symp. on Computer Architecture (Stockholm, June 13–17), pp. 140–150.
Hennessy, J., Jouppi, N., Przybylski, S., Rowen, C., Gross, T., Baskett, F., and Gill, J. 1982. MIPS: A microprocessor architecture. InProc., 15th Annual Workshop on Microprogramming (Palo Alto, Calif., Oct. 5–7), pp. 17–22.
Hsu, P.Y.-T. 1986. Highly concurrent scalar processing. Ph.D. thesis, Univ. of Ill., Urbana-Champaign, Ill.
Joshi, S.M., and Dhamdhere, D.M. 1982. A composite hoisting—strength reduction transformation for global program optimization, parts 1 and 2.Internat. J. Comput. Math., 11, 1:21–41, 111–126.
Lam, M.S.-L. 1987. A systolic array optimizing compiler. Tech. Rept. no. CMU-CS-87-187 (Ph.D. thesis), Carnegie Mellon University, Pittsburgh.
Lawler, E.L. 1976.Combinatorial Optimization: Networks and Matroids. Holt, Rinehart and Winston, Fort Worth, Tex.
Lowney, P.G., Freudenberger, S.M., Karzes, T.J., Lichtenstein, W.D., Nix, R.P., O'Donnell, J.J., and Ruttenberg, J.C. 1993. The Multiflow Trace scheduling compiler.The J. Supercomputing, 7, 1/2: 51–142.
Malek, G. 1988. Notes from a tutorial on global optimization, SIGPLAN '88 Conf. on Programming Language Design and Implementation (Atlanta, June 21).
Mateti, P., and Deo, N. 1976. On algorithms for enumerating all circuits of a graph.SIAM J. Computing, 5, 1:90–99.
McKusick, M.K. 1984. Register allocation and data conversion in machine independent code generators. Tech. Rept. no. UCB/CSD 84/214 (Ph.D. thesis), Univ. of Calif., Berkeley.
McMahon, F.H. 1986. The Livermore Fortran kernels: A computer test of the numerical performance range. Tech. Rept. UCRL-53745, Lawrence Livermore Nat. Lab., Livermore, Calif.
Nicolau, A. 1986. A fine-grain parallelizing compiler. Tech. Rept. no. 86–792, Cornell Univ., Ithaca, N.Y.
O'Brien, K., Hay, B., Minish, J., Schaffer, H., Schloss, B., Shepherd, A., and Zaleski, M. 1990. Advanced compiler technology for the RISC System/6000 architecture. InIBM RISC System/6000 Technology, IBM Corp., Austin, Tex. SA23-2619.
Rau, B.R. 1992a. Data flow and dependence analysis for instruction-level parallelism. InProc., Fourth Internat. Workshop on Languages and Compilers for Parallel Computing, Lecture Notes in Comp. Sci., Vol. 589, Springer-Verlag, Berlin, pp. 236–250.
Rau, B.R. 1992b. Private communication.
Rau, B.R., and Glaeser, C.D. 1981. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. InProc., 14th Annual Microprogramming Workshop (Chatham, Mass., Oct. 12–15), pp. 183–198.
Rau, B.R., Glaeser, C.D., and Picard, R.L. 1982. Efficient code generation for horizontal architectures: Compiler techniques and architectural support. InProc., 9th Annual Internat. Symp. on Computer Architecture (Austin, Tex., Apr. 26–29), pp. 131–139.
Rau, B.R., Schlansker, M.S., and Yen, D.W.L. 1989. The Cydra 5 stride-insensitive memory system. InProc., 1989 Internat. Conf. on Parallel Processing (Aug. 8–12), vol. 1, pp. 242–246.
Rau, B.R., Lee, M., Tirumalai, P.P., and Schlansker, M.S. 1992. Register allocation for software pipelined loops. InProc., SIGPLAN '92 Symp. on Programming Language Design and Implementation (San Francisco, June 17–19), pp. 283–299.
Rau, B.R., Yen, D.W.L., Yen, W., and Towle, R.A. 1989. The Cydra 5 departmental supercomputer.IEEE Comp. 22, 1:12–34.
Rosen, B.K., Wegman, M.N., and Zadeck, F.K. 1988. Global value numbers and redundant computations. InProc., 15th Annual Symp. on Principles of Programming Languages (San Diego, Jan. 13–15), pp. 12–27.
Tiernan, J.C. 1970. An efficient search algorithm to find the elementary circuits of a graph. CACM, 13:722–726.
Tirumalai, P.P., Lee, M., and Schlansker, M.S. 1991. Parallelization of WHILE loops on pipelined Architectures.The J. Supercomputing, 5, 2/3:119–136.
Touzeau, R.F. 1984. A Fortran compiler for the FPS-164 scientific computer. InProc., SIGPLAN '84 Symp. on Compiler Construction (Montreal, June 20), pp. 48–57.
Towle, R.A. 1976. Control and data dependence for program transformations. Tech. Rept. no. UIUCDCS-R-76-788 (Ph.D. thesis), Univ. of Ill., Urbana-Champaign, Ill.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Dehnert, J.C., Towle, R.A. Compiling for the Cydra. J Supercomput 7, 181–227 (1993). https://doi.org/10.1007/BF01205184
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF01205184