Operator Language: A Program Generation Framework for Fast Kernels

  • Franz Franchetti
  • Frédéric de Mesmay
  • Daniel McFarlin
  • Markus Püschel
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5658)


We present the Operator Language (OL), a framework to automatically generate fast numerical kernels. OL provides the structure to extend the program generation system Spiral beyond the transform domain. Using OL, we show how to automatically generate library functionality for the fast Fourier transform and multiple non-transform kernels, including matrix-matrix multiplication, synthetic aperture radar (SAR), circular convolution, sorting networks, and Viterbi decoding. The control flow of the kernels is data-independent, which allows us to cast their algorithms as operator expressions. Using rewriting systems, a structural architecture model and empirical search, we automatically generate very fast C implementations for state-of-the-art multicore CPUs that rival hand-tuned implementations.


Library generation program generation automatic performance tuning high performance software multicore CPU 


  1. 1.
    Intel: Integrated Performance Primitives 5.3, User GuideGoogle Scholar
  2. 2.
    Chellappa, S., Franchetti, F., Püschel, M.: How to write fast numerical code: A small introduction. In: Lämmel, R., Visser, J., Saraiva, J. (eds.) Generative and Transformational Techniques in Software Engineering II. LNCS, vol. 5235, pp. 196–259. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  3. 3.
    Xiong, J., Johnson, J., Johnson, R., Padua, D.: SPL: A language and compiler for DSP algorithms. In: Proc. Programming Language Design and Implementation (PLDI), pp. 298–308 (2001)Google Scholar
  4. 4.
    Püschel, M., Moura, J.M.F., Johnson, J., Padua, D., Veloso, M., Singer, B., Xiong, J., Franchetti, F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R.W., Rizzolo, N.: SPIRAL: Code generation for DSP transforms. Proc. of the IEEE, special issue on Program Generation, Optimization, and Adaptation 93(2), 232–275 (2005)Google Scholar
  5. 5.
    Franchetti, F., Voronenko, Y., Püschel, M.: FFT program generation for shared memory: SMP and multicore. In: Proc. Supercomputing (2006)Google Scholar
  6. 6.
    Franchetti, F., Voronenko, Y., Püschel, M.: A rewriting system for the vectorization of signal transforms. In: Daydé, M., Palma, J.M.L.M., Coutinho, Á.L.G.A., Pacitti, E., Lopes, J.C. (eds.) VECPAR 2006. LNCS, vol. 4395, pp. 363–377. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  7. 7.
    GPCE: ACM conference on generative programming and component engineeringGoogle Scholar
  8. 8.
    Czarnecki, K., Eisenecker, U.: Generative Programming: Methods, Tools, and Applications. Addison-Wesley, Reading (2000)Google Scholar
  9. 9.
    Batory, D., Johnson, C., MacDonald, B., von Heeder, D.: Achieving extensibility through product-lines and domain-specific languages: A case study. ACM Transactions on Software Engineering and Methodology (TOSEM) 11(2), 191–214 (2002)CrossRefGoogle Scholar
  10. 10.
    Batory, D., Lopez-Herrejon, R., Martin, J.P.: Generating product-lines of product-families. In: Proc. Automated Software Engineering Conference (ASE) (2002)Google Scholar
  11. 11.
    Smith, D.R.: Mechanizing the development of software. In: Broy, M. (ed.) Calculational System Design, Proc. of the International Summer School Marktoberdorf. NATO ASI Series. IOS Press, Amsterdam (1999); Kestrel Institute Technical Report KES.U.99.1Google Scholar
  12. 12.
    Gough, K.J.: Little language processing, an alternative to courses on compiler construction. SIGCSE Bulletin 13(3), 31–34 (1981)CrossRefGoogle Scholar
  13. 13.
    Bentley, J.: Programming pearls: little languages. Communications of the ACM 29(8), 711–721 (1986)CrossRefGoogle Scholar
  14. 14.
    Hudak, P.: Domain specific languages. Available from author on request (1997)Google Scholar
  15. 15.
    Czarnecki, K., O’Donnell, J., Striegnitz, J., Taha, W.: DSL implementation in MetaOCaml, Template Haskell, and C++. In: Lengauer, C., Batory, D., Consel, C., Odersky, M. (eds.) Domain-Specific Program Generation. LNCS, vol. 3016, pp. 51–72. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  16. 16.
    Taha, W.: Domain-specific languages. In: Proceedings of International Conference on Computer Engineering and Systems (ICCES 2008) (2008)Google Scholar
  17. 17.
    Whaley, R.C., Dongarra, J.: Automatically Tuned Linear Algebra Software (ATLAS). In: Proc. Supercomputing (1998),
  18. 18.
    Im, E.J., Yelick, K., Vuduc, R.: Sparsity: Optimization framework for sparse matrix kernels. Int’l. J. High Performance Computing Applications 18(1) (2004)Google Scholar
  19. 19.
    Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proc. of the IEEE, special issue on Program Generation, Optimization, and Adaptation 93(2), 216–231 (2005)Google Scholar
  20. 20.
    Frigo, M.: A fast Fourier transform compiler. In: Proc. ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI), pp. 169–180 (1999)Google Scholar
  21. 21.
    Baumgartner, G., Auer, A., Bernholdt, D.E., Bibireata, A., Choppella, V., Cociorva, D., Gao, X., Harrison, R.J., Hirata, S., Krishanmoorthy, S., Krishnan, S., Lam, C.C., Lu, Q., Nooijen, M., Pitzer, R.M., Ramanujam, J., Sadayappan, P., Sibiryakov, A.: Synthesis of high-performance parallel programs for a class of ab initio quantum chemistry models. Proc. of the IEEE, special issue on Program Generation, Optimization, and Adaptation 93(2) (2005)Google Scholar
  22. 22.
    Bientinesi, P., Gunnels, J.A., Myers, M.E., Quintana-Orti, E., van de Geijn, R.: The science of deriving dense linear algebra algorithms. TOMS 31(1), 1–26 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Dershowitz, N., Plaisted, D.A.: Rewriting. In: Robinson, A., Voronkov, A. (eds.) Handbook of Automated Reasoning, vol. 1, pp. 535–610. Elsevier, Amsterdam (2001)CrossRefGoogle Scholar
  24. 24.
    Nilsson, U., Maluszynski, J.: Logic, Programming and Prolog, 2nd edn. John Wiley & Sons Inc., Chichester (1995)zbMATHGoogle Scholar
  25. 25.
    Field, A.J., Harrison, P.G.: Functional Programming. Addison-Wesley, Reading (1988)zbMATHGoogle Scholar
  26. 26.
    Van Loan, C.: Computational Framework of the Fast Fourier Transform. SIAM, Philadelphia (1992)CrossRefzbMATHGoogle Scholar
  27. 27.
    Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated empirical optimization of software and the ATLAS project. Parallel Computing 27(1–2), 3–35 (2001)CrossRefzbMATHGoogle Scholar
  28. 28.
    Yotov, K., Li, X., Ren, G., Garzaran, M., Padua, D., Pingali, K., Stodghill, P.: A comparison of empirical and model-driven optimization. Proc. of the IEEE, special issue on Program Generation, Optimization, and Adaptation 93(2) (2005)Google Scholar
  29. 29.
    Johnson, R.W., Huang, C.H., Johnson, J.R.: Multilinear algebra and parallel programming. In: Supercomputing 1990: Proceedings of the 1990 conference on Supercomputing, pp. 20–31. IEEE Computer Society Press, Los Alamitos (1990)Google Scholar
  30. 30.
    Voronenko, Y.: Library Generation for Linear Transforms. PhD thesis, Electrical and Computer Engineering, Carnegie Mellon University (2008)Google Scholar
  31. 31.
    Voronenko, Y., de Mesmay, F., Püschel, M.: Computer generation of general size linear transform libraries. In: Intl. Symposium on Code Generation and Optimization, CGO (2009)Google Scholar
  32. 32.
    Batcher, K.: Sorting networks and their applications. In: Proc. AFIPS Spring Joint Comput. Conf., vol. 32, pp. 307–314 (1968)Google Scholar
  33. 33.
    Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory 13(2), 260–269 (1967)CrossRefzbMATHGoogle Scholar
  34. 34.
    de Mesmay, F., Chellappa, S., Franchetti, F., Püschel, M.: Computer generation of efficient software Viterbi decoders: submitted for publicationGoogle Scholar
  35. 35.
    Carrara, W.G., Goodman, R.S., Majewski, R.M.: Spotlight Synthetic Aperture Radar: Signal Processing Algorithms. Artech House (1995)Google Scholar
  36. 36.
    McFarlin, D., Franchetti, F., Moura, J.M.F., Püschel, M.: High performance synthetic aperture radar image formation on commodity architectures. In: SPIE Conference on Defense, Security, and Sensing (2009)Google Scholar
  37. 37.
    Franchetti, F., Voronenko, Y., Milder, P.A., Chellappa, S., Telgarsky, M., Shen, H., D’Alberto, P., de Mesmay, F., Hoe, J.C., Moura, J.M.F., Püschel, M.: Domain-specific library generation for parallel software and hardware platforms. In: NSF Next Generation Software Program workshop, NSFNGS (2008)Google Scholar
  38. 38.
    Franchetti, F., Püschel, M.: Short vector code generation for the discrete Fourier transform. In: Proc. IEEE Int’l. Parallel and Distributed Processing Symposium (IPDPS), pp. 58–67 (2003)Google Scholar
  39. 39.
    Franchetti, F., Voronenko, Y., Püschel, M.: Loop merging for signal transforms. In: Proc. Programming Language Design and Implementation (PLDI), pp. 315–326 (2005)Google Scholar
  40. 40.
    The GAP Team University of St. Andrews, Scotland: GAP—Groups, Algorithms, and Programming (1997),
  41. 41.
    Intel: Math Kernel Library 10.0, Reference ManualGoogle Scholar
  42. 42.
  43. 43.
    Hackenberg, D.: Fast matrix multiplication on Cell (SMP) systems,
  44. 44.
    Rudin, J.A.: Implementation of polar format SAR image formation on the IBM Cell Broadband Engine. In: Proc. High Performance Embedded Computing (HPEC) (2007)Google Scholar
  45. 45.
    Karn, P.: FEC library version 3.0.1 (August 2007),
  46. 46.
    de Mesmay, F.: Online generator for Viterbi decoders (2008),

Copyright information

© IFIP International Federation for Information Processing 2009

Authors and Affiliations

  • Franz Franchetti
    • 1
  • Frédéric de Mesmay
    • 1
  • Daniel McFarlin
    • 1
  • Markus Püschel
    • 1
  1. 1.Electrical and Computer EngineeringCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations