Optimal Shuffle Code with Permutation Instructions

  • Sebastian Buchwald
  • Manuel Mohr
  • Ignaz Rutter
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9214)


During compilation of a program, register allocation is the task of mapping program variables to machine registers. During register allocation, the compiler may introduce shuffle code, consisting of copy and swap operations, that transfers data between the registers. Three common sources of shuffle code are conflicting register mappings at joins in the control flow of the program, e.g, due to if-statements or loops; the calling convention for procedures, which often dictates that input arguments or results must be placed in certain registers; and machine instructions that only allow a subset of registers to occur as operands.

Recently, Mohr et al. [9] proposed to speed up shuffle code with special hardware instructions that arbitrarily permute the contents of up to five registers and gave a heuristic for computing such shuffle codes.

In this paper, we give an efficient algorithm for generating optimal shuffle code in the setting of Mohr et al. An interesting special case occurs when no register has to be transferred to more than one destination, i.e., it suffices to permute the contents of the registers. This case is equivalent to factoring a permutation into a minimal product of permutations, each of which permutes up to five elements.


Greedy Algorithm Transition Graph Outgoing Edge Directed Cycle Calling Convention 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Blazy, S., Robillard, B.: Live-range unsplitting for faster optimal coalescing. In: Languages, Compilers, and Tools for Embedded Systems (LCTES 2009), pp. 70–79. ACM (2009)Google Scholar
  2. 2.
    Bouchez, F., Darte, A., Rastello, F.: On the complexity of register coalescing. In: Code Generation and Optimization (CGO 2007), pp. 102–114. IEEE (2007)Google Scholar
  3. 3.
    Buchwald, S., Mohr, M., Rutter, I.: Optimal shuffle code with permutation instructions. CoRR abs/1504.07073 (2015).
  4. 4.
    Caprara, A.: Sorting by reversals is difficult. In: Computational Molecular Biology (RECOMB 1997), pp. 75–83. ACM (1997)Google Scholar
  5. 5.
    Farnoud, F., Milenkovic, O.: Sorting of permutations by cost-constrained transpositions. IEEE Transactions on Information Theory 58(1), 3–23 (2012)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Grund, D., Hack, S.: A fast cutting-plane algorithm for optimal coalescing. In: Adsul, B., Odersky, M. (eds.) CC 2007. LNCS, vol. 4420, pp. 111–125. Springer, Heidelberg (2007) CrossRefGoogle Scholar
  7. 7.
    Hack, S.: Register Allocation for Programs in SSA Form. Ph.D. thesis, Universität Karlsruhe (2007).
  8. 8.
    Hack, S., Goos, G.: Copy coalescing by graph recoloring. SIGPLAN Notices 43(6), 227–237 (2008)CrossRefGoogle Scholar
  9. 9.
    Mohr, M., Grudnitsky, A., Modschiedler, T., Bauer, L., Hack, S., Henkel, J.: Hardware acceleration for programs in SSA form. In: Compilers, Architecture and Synthesis for Embedded Systems (CASES 2013). ACM (2013)Google Scholar
  10. 10.
    Seress, Á.: Permutation Group Algorithms, vol. 152. Cambridge University Press (2003)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Karlsruhe Institute of TechnologyKarlsruheGermany

Personalised recommendations