Abstract
I consider the problem of the domain-specific optimization of programs. I review different approaches, discuss their potential, and sketch instances of them from the practice of high-performance parallelism. Readers need not be familiar with high-performance computing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Réveillère, L., Mérillon, F., Consel, C., Marlet, R., Muller, G.: A DSL approach to improve productivity and safety in device drivers development. In: Proc. Fifteenth IEEE Int. Conf. on Automated Software Engineering (ASE 2000), pp. 91–100. IEEE Computer Society Press, Los Alamitos (2000)
van Deursen, A., Klint, P., Visser, J.: Domain-specific languages: An annotated bibliography. ACM SIGPLAN Notices 35, 26–36 (2000)
Hammond, K., Michaelson, G.: The design of hume: A high-level language for the real-time embedded systems domain. In: Lengauer, C., Batory, D., Consel, C., Odersky, M. (eds.) Domain-Specific Program Generation. LNCS, vol. 3016, pp. 127–142. Springer, Heidelberg (2004)
Quinn, M.J.: Parallel Computing. McGraw-Hill, New York (1994)
Robison, A.D.: Impact of economics on compiler optimization. In: Proc. ACM 2001 Java Grande/ISCOPE Conf., pp. 1–10. ACM Press, New York (2001)
Pacheco, P.S.: Parallel Programming with MPI. Morgan Kaufmann, San Francisco (1997)
Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Manchek, R., Sunderam, V.: PVM Parallel Virtual Machine, A User’s Guide and Tutorial for Networked Parallel Computing. MIT Press, Cambridge (1994), Project Web page: http://www.csm.ornl.gov/pvm/pvm_home.html
Skillicorn, D.B., Hill, J.M.D., McColl, W.F.: Questions and answers about BSP. Scientific Programming 6, 249–274 (1997), Project Web page: http://www.bsp-worldwide.org/
Gorlatch, S.: Message passing without send-receive. Future Generation Computer Systems 18, 797–805 (2002)
Gorlatch, S.: Toward formally-based design of message passing programs. IEEE Transactions on Software Engineering 26, 276–288 (2000)
Gorlatch, S.: Optimizing compositions of components in parallel and distributed programming. In: Lengauer, C., Batory, D., Consel, C., Odersky, M. (eds.) Domain-Specific Program Generation. LNCS, vol. 3016, pp. 274–290. Springer, Heidelberg (2004)
Kuchen, H.: Optimizing sequences of skeleton calls. In: Lengauer, C., Batory, D., Consel, C., Odersky, M. (eds.) Domain-Specific Program Generation. LNCS, vol. 3016, pp. 254–273. Springer, Heidelberg (2004)
Bischof, H., Gorlatch, S., Leshchinskiy, R.: Generic parallel programming using C++ templates and skeletons. In: Lengauer, C., Batory, D., Consel, C., Odersky, M. (eds.) Domain-Specific Program Generation. LNCS, vol. 3016, pp. 107–126. Springer, Heidelberg (2004)
Blackford, L.S., Choi, J., Cleary, A., D’Azevedo, E., Demmel, J., Dhillon, I., Dongarra, J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK: A linear algebra library for message-passing computers. In: Proc. Eighth SIAM Conf. on Parallel Processing for Scientific Computing. Society for Industrial and Applied Mathematics, vol. 15 (1997), (electronic) Project Web page: http://www.netlib.org/scalapack/
van de Geijn, R.: Using PLAPACK: Parallel Linear Algebra Package. Scientific and Engineering Computation Series. MIT Press, Cambridge (1997), Project Web page: http://www.cs.utexas.edu/users/plapack/
Herrmann, C.A.: The Skeleton-Based Parallelization of Divide-and-Conquer Recursions. PhD thesis, Fakultät für Mathematik und Informatik, Universität Passau, Logos-Verlag (2001)
Herrmann, C.A., Lengauer, C.: HDC: A higher-order language for divide-andconquer. Parallel Processing Letters 10, 239–250 (2000)
Aho, A.V., Sethi, R., Ullman, J.D.: Compilers – Principles, Techniques, and Tools. Addison-Wesley, Reading (1986)
Moreira, J.E., Midkiff, S.P., Gupta, M.: Supporting multidimensional arrays in Java. Concurrency and Computation – Practice & Experience 13, 317–340 (2003)
Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the Cilk-5 multithreaded language. ACM SIGPLAN Notices 33, 212–223 (1998); Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI 1998), Project Web page: http://supertech.lcs.mit.edu/cilk/
Trinder, P.W., Hammond, K., Loidl, H.W., Peyton Jones, S.L.: Algorithm + strategy = parallelism. J. Functional Programming 8, 23–60 (1998), Project Web page: http://www.cee.hw.ac.uk/dsg/gph/
Philippsen, M., Zenger, M.: JavaParty – transparent remote objects in Java. Concurrency: Practice and Experience 9, 1225–1242 (1997), Project Web page: http://www.ipd.uka.de/JavaParty/
Koelbel, C.H., Loveman, D.B., Schreiber, R.S., Steele Jr., G.L., Zosel, M.E.: The High Performance Fortran Handbook. Scientific and Engineering Computation. MIT Press, Cambridge (1994)
Foster, I.: Designing and Building Parallel Programs. Addison-Wesley, Reading (1995)
Brandes, T., Zimmermann, F.: ADAPTOR—a transformation tool for HPF programs. In: Decker, K.M., Rehmann, R.M. (eds.) Programming Environments for Massively Distributed Systems, pp. 91–96. Birkhäuser, Basel (1994)
Dagum, L., Menon, R.: OpenMP: An industry-standard API for shared-memory programming. IEEE Computational Science & Engineering 5, 46–55 (1998), Project Web page: http://www.openmp.org/
Lengauer, C.: Loop parallelization in the polytope model. In: Best, E. (ed.) CONCUR 1993. LNCS, vol. 715, pp. 398–416. Springer, Heidelberg (1993)
Feautrier, P.: Automatic parallelization in the polytope model. In: Perrin, G.-R., Darte, A. (eds.) The Data Parallel Programming Model. LNCS, vol. 1132, pp. 79–103. Springer, Heidelberg (1996)
Andonov, R., Balev, S., Rajopadhye, S., Yanev, N.: Optimal semi-oblique tiling. In: Proc.13th Ann. ACM Symp.on Parallel Algorithms and Architectures (SPAA 2001). ACM Press, New York (2001)
Griebl, M., Faber, P., Lengauer, C.: Space-time mapping and tiling – a helpful combination. Concurrency and Computation: Practice and Experience 16, 221–246 (2004); Proc. 9th Workshop on Compilers for Parallel Computers (CPC 2001)
Quilleré, F., Rajopadhye, S., Wilde, D.: Generation of efficient nested loops from polyhedra. Int. J. Parallel Programming 28, 469–498 (2000)
Bastoul, C.: Generating loops for scanning polyhedra. Technical Report 2002/23, PRiSM, Versailles University (2002), Project Web page: http://www.prism.uvsq.fr/~cedb/bastools/cloog.html
Griebl, M., Lengauer, C.: The loop parallelizer LooPo. In: Gerndt, M. (ed.) Proc. Sixth Workshop on Compilers for Parallel Computers (CPC 1996), Konferenzen des Forschungszentrums Jülich 21, Forschungszentrum Jülich, pp. 311–320 (1996), Project Web page: http://www.infosun.fmi.uni-passau.de/cl/loopo/
Feautrier, P.: Some efficient solutions to the affine scheduling problem. Part I. One-dimensional time. Int. J. Parallel Programming 21, 313–348 (1992)
Feautrier, P.: Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time. Int. J. Parallel Programming 21, 389–420 (1992)
Feautrier, P.: Toward automatic distribution. Parallel Processing Letters 4, 233–244 (1994)
Dion, M., Robert, Y.: Mapping affine loop nests: New results. In: Hertzberger, B., Serazzi, G. (eds.) HPCN-Europe 1995. LNCS, vol. 919, pp. 184–189. Springer, Heidelberg (1995)
Guyer, S.Z., Lin, C.: Optimizing the use of high-performance software libraries. In: Midkiff, S.P., Moreira, J.E., Gupta, M., Chatterjee, S., Ferrante, J., Prins, J.F., Pugh, B., Tseng, C.-W. (eds.) LCPC 2000. LNCS, vol. 2017, pp. 227–243. Springer, Heidelberg (2001)
Czarnecki, K., Eisenecker, U., Glück, R., Vandevoorde, D., Veldhuizen, T.: Generative programming and active libraries (extended abstract). In: Jazayeri, M., Musser, D.R., Loos, R.G.K. (eds.) Dagstuhl Seminar 1998. LNCS, vol. 1766, pp. 25–39. Springer, Heidelberg (2000)
Hoare, C.A.R.: Communicating Sequential Processes. Series in Computer Science. Prentice-Hall Int., Englewood Cliffs (1985)
Herrmann, C.A., Lengauer, C.: Using metaprogramming to parallelize functional specifications. Parallel Processing Letters 12, 193–210 (2002)
Taha, W.: A gentle introduction to multi-stage programming. In: Lengauer, C., Batory, D., Consel, C., Odersky, M. (eds.) Domain-Specific Program Generation. LNCS, vol. 3016, pp. 30–50. Springer, Heidelberg (2004)
Kennedy, K., Broom, B., Cooper, K., Dongarra, J., Fowler, R., Gannon, D., Johnsson, L., Mellor-Crummey, J., Torczon, L.: Telescoping languages: A strategy for automatic generation of scientific problem solving systems from annotated libraries. J. Parallel and Distributed Computing 61, 1803–1826 (2001)
Beckmann, O., Houghton, A., Mellor, M., Kelly, P.: Run-time code generation in C++ as a foundation for domain-specific optimisation. In: Lengauer, C., Batory, D., Consel, C., Odersky, M. (eds.) Domain-Specific Program Generation. LNCS, vol. 3016, pp. 291–306. Springer, Heidelberg (2004)
Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated empirical optimizations of software and the ATLAS project. Parallel Computing 27, 3–35 (2001), Project Web page: http://math-atlas.sourceforge.net/
Frigo, M., Johnson, S.G.: FFTW: An adaptive software architecture for the FFT. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 1998), vol. 3, pp. 1381–1384 (1998), Project Web page: http://www.fftw.org/
Püschel, M., Singer, B., Xiong, J., Moura, J.F.F., Johnson, J., Padua, D., Veloso, M., Johnson, R.W.: SPIRAL: A generator for platform-adapted libraries of signal processing algorithms. J. High Performance in Computing and Applications (2003) (to appear), Project Web page: http://www.ece.cmu.edu/~spiral/
Frigo, M.: A fast Fourier transform compiler. ACM SIGPLAN Notices 34, 169–180 (1999); Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI 1999)
Aldinucci, M., Gorlatch, S., Lengauer, C., Pelagatti, S.: Towards parallel programming by transformation: The FAN skeleton framework. Parallel Algorithms and Applications 16, 87–121 (2001)
Kuchen, H., Cole, M.: The integration of task and data parallel skeletons. Parallel Processing Letters 12, 141–155 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Lengauer, C. (2004). Program Optimization in the Domain of High-Performance Parallelism. In: Lengauer, C., Batory, D., Consel, C., Odersky, M. (eds) Domain-Specific Program Generation. Lecture Notes in Computer Science, vol 3016. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-25935-0_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-25935-0_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22119-7
Online ISBN: 978-3-540-25935-0
eBook Packages: Springer Book Archive