blocks_3d: Software for general 3d conformal blocks

We introduce the software blocks_3d for computing four-point conformal blocks of operators with arbitrary Lorentz representations in 3d CFTs. It uses Zamolodchikov-like recursion relations to numerically compute derivatives of blocks around a crossing-symmetric configuration. It is implemented as a heavily optimized, multithreaded, C++ application. We give performance benchmarks for correlators containing scalars, fermions, and stress tensors. As an example application, we recompute bootstrap bounds on four-point functions of fermions and study whether a previously observed sharp jump can be explained using the"fake primary"effect. We conclude that the fake primary effect cannot fully explain the jump and the possible existence of a"dead-end"CFT near the jump merits further study.

Most studies pursued so far in d > 2 have focused on correlation functions of scalar operators, where the conformal blocks appearing in the bootstrap equations are relatively easy to compute. In 4d CFTs, they are expressible in terms of hypergeometric functions [15], while in 3d CFTs, they can be straightforwardly computed using Zamolodchikov-like recursion relations developed in [7,10,16,17]. These recursion relations have been implemented efficiently in the software package scalar blocks [18].
A handful of studies have been carried out for correlation functions of spinning operators, including four-point functions of fermions [19][20][21], stress-tensors [22], currents [23], and mixed correlators containing scalars and currents [24]. Each of these studies faced a huge technical hurdle of understanding how to compute the relevant conformal blocks for each combination of three-point and four-point tensor structures contributing to the correlator and then carrying out the computation in practice. These studies solved these problems on an ad hoc basis, employing a variety of different methods that do not easily scale to larger problems.
Recently, a general recursive algorithm for computing 3d conformal blocks of arbitrary spin was introduced in [25], building on the earlier recursion relations [16,26]. The algorithm uses the fact that the conformal blocks have an expansion in poles in the exchanged scaling dimension: where ∆ j,i describe a known infinite set of poles and the residues are themselves conformal blocks up to some additional residue matrices L j,i , R j,i . In addition to the scaling dimension ∆ and spin j, each block is labeled by a pair of three-point structures ab and a four-point structure I. General formulas for these matrices, as well as formulas for the ∆ → ∞ limit of the blocks needed to implement the recursion relation, were computed explicitly in [25]. A variety of different bases have been employed to describe the three-and four-point structures, including polynomial or differential bases in embedding-space structures [19,[27][28][29][30][31][32][33][34][35][36][37][38], and a q-basis which is naturally defined using the conformal frame approach [39]. In [25] an additional basis was introduced, called the SO(3) basis, where the matrices L j,i , R j,i take on a particularly simple block-diagonal form. For three-point functions between operators of SO(3) spins j 1 , j 2 , j 3 , this basis can be labeled by the possible spins (j 12 , j 123 ) appearing in the decompositions j 12 ∈ j 1 ⊗ j 2 and j 123 ∈ j 12 ⊗ j 3 .
In this work, we present the software package blocks 3d, which efficiently implements the recursive algorithm of [25]. In particular, given external operators of spins j 1 , j 2 , j 3 , j 4 , a four-point structure specified in the q-basis, and the spin combinations j 12 and j 43 , the software computes conformal block derivatives (using arbitrary-precision arithmetic) up to a specified recursion order for all allowed SO(3)-basis structures. Derivatives of conformal blocks around the crossing-symmetric configuration are returned up to a cutoff Λ, where derivatives can be taken in several different coordinates. We give performance benchmarks for the code, comparing our code first to scalar blocks. We then test it on several correlators containing scalars, fermions, and stress tensors, demonstrating that blocks 3d is feasible to use in large-scale numerical bootstrap calculations with spinning correlators.
We further explicitly demonstrate how to use the code in the context of the bootstrap for four-point functions of 3d Majorana fermions ψψψψ . We reproduce bounds on the leading parity-even operator and parity-odd scalar σ appearing in the ψ × ψ operator product expansion, previously obtained in [19]. The former shows a prominent kink, and the latter shows a sharp jump at the same value of ∆ ψ . This jump was previously conjectured to relate to the "fake primary" effect described in [21], where a parity-odd spin-1 operator V at the unitarity bound can mimic a scalar of dimension ∆ σ = 3. We impose a small gap in the spin-1 sector and observe that the jump and kink are stable, persisting up to ∆ V ∼ 2.3. Our tentative conclusion is that the feature is unlikely to be fully explained by the fake primary effect and merits further study.
This paper is organized as follows. In section 2 we describe our conventions and the bases of tensor structures that we use. We also give an exact formula for the conformal block corresponding to identity exchange in these conventions. In section 3 we describe the algorithm used by blocks 3d, the structure of the output, and the details of the implementation. In section 4 we give performance benchmarks, and in section 5 we describe in detail an example of using the software for the 4-fermion bootstrap. We conclude in section 6 by describing some possible future applications of blocks 3d. Appendices contain the link to blocks 3d code as well as the conventions and details omitted in the main text.

Definition of conformal blocks
In this section we give a precise definition of the conformal blocks that are computed by blocks 3d. Since we will only be interested in conformal blocks in 3d CFTs, we will not be any more general than required. We will work in Lorentzian signature since unitarity is the most manifest there.
Consider the Hermitian local primary operators O i (i = 1, · · · , 4) and O. Let their scaling dimensions and spins be (∆ i , j i ) and (∆, j), respectively. The spins j, j i can be integer or half-integer; we do not restrict to bosonic representations. 1 We work with the following realization of the spin-j representations: the operator O carries 2j spinor indices α k , and is completely symmetric in these indices. We choose our conventions (see details in appendix C) so that the representation matrices of the Lorentz group Spin(2, 1) are real when acting on α k , and thus we can indeed assume that O is Hermitian, The same comments apply to O i . Furthermore, we assume that O is normalized so that its time-ordered two-point function for space-like separated x 1 , x 2 is given by is a positive constant defined below, and we used the index-free notation for an auxiliary spinor s. The constant c O is given by where we leave the choice of a 0 , and the spin-dependent b j > 0, up to the user. Note that unlike c O , the phase in (2.3) is fixed by unitarity, i.e. requiring that the two-point function defines a positive-definite norm on the Hilbert space. The unitarity bounds on ∆ are When ∆ is in the interior of these bounds, i.e.
the conformal block is guaranteed to be finite. 1 In particular, the ordering of operators in the definitions plays an important role due to possible signs from permutations of fermions.
We parametrize the values of the three-point functions of the operators O i , O in the following way, where our choice of standard conformally invariant three-point tensor structures which are labeled by indices in some sets I 12O and I 43O , is described in section 2.2. Similarly, the four-point function is decomposed as where our choice of standard conformally invariant four-point tensor structures, 11) and the cross-ratios z, z, are defined in section 2.3, and the index I runs over some index set I 1234 .
With this introduction, the conformal block g ab ∆,j,I (z, z) is defined by requiring that the contribution of O and its descendants to the O 1 × O 2 (equivalently, O 3 × O 4 ) OPE in the above four-point function is given by where · · · denote contributions from other primary operators and descendants. When we use the notation g ab ∆,j,I (z, z) there is implicit dependence on the scaling dimensions and spins This definition, as well as the choices of three-and four-point structure bases described in sections 2.2 and 2.3, are the same as in [25].

Three-point structure basis
In this section we define two bases of three-point structures, the q-basis (first introduced in [39]) and the SO(3)-basis (first introduced in [25,40]). We introduce two bases because the physical properties of the three-point structures are easier to understand in the q-basis, while blocks 3d, for performance reasons explained below, uses the SO(3)-basis.
Let us explain in more detail what we mean by "defining a basis of structures." Defining a basis of three-point structures means to provide an algorithm which, given some quantum numbers ∆ i , j i , produces a finite set I 123 of functions which are linearly-independent, invariant under conformal transformations, and form a complete basis for functions of this form. Here, it is understood that conformal transformations act on f (a) in the same way as on where primary operators O i have scaling dimension and spin ∆ i , j i . We have already used, and will be using in what follows, a somewhat misleading notation for f (a) , The reason why this notation is misleading is that the left-hand side appears to depend on the concrete primary operators O i and suggests that their ordering is somehow related to the ordering of operators in physical correlators. Instead, f (a) only depends on the quantum numbers ∆ i , j i . While the ordering of these quantum numbers is important, it has nothing to do with the order of operators in correlation functions. Nevertheless, the above notation allows us to quickly summarize the quantum numbers that the tensor structures correspond to, and for this reason we prefer to use it. We hope this won't cause too much confusion.

q-basis
In this subsection we summarize the definition and properties of the q-basis of three-point structures. We don't give any proofs, for which we instead refer the reader to [39]. The q-basis structures for quantum numbers (∆ i , j i ) are labeled by triples a = [q 1 , q 2 , q 3 ], where q i are (half-)integers which range over (2.16) and are subject to The tensor structures are fixed uniquely by conformal invariance and the requirement that for x 2 = (0, 0, 1), (2.20) x 3 (L) = (0, 0, L), (2.21) the following identity holds This definition is rather implicit. However, it makes it easy to convert between the q-basis and more explicit structures such as those introduced in [19,27]: we just have to evaluate these explicit tensor structures in the same way as in the left hand side of (2.22), and express the result as a linear combination of the monomials appearing in the right-hand side of (2.22). Notice that this definition is not permutation-invariant: the representations (∆ i , j i ) that the operators O i symbolize are listed in a particular order, and the respective coordinates are set to different values above. For example, it is the coordinate corresponding to the last representation that is sent to infinity. We stress that the order is determined by the order in which the representations (represented by O i ) are listed, and not by their indices 1, 2, 3. This is important for understanding the meaning of permutation properties below.
Let us list some simple properties of these structures [39]: • If we expand a three-point function of Hermitian operators in the q-basis, then the OPE coefficients are real if all j i are integers and pure imaginary otherwise.
• Space parity acts on the q-basis structures as See appendix D for details on what is meant by "space parity".
Because of the way permutations and parity act on these structures, we will often consider the structures defined as The structures with (+) sign are parity-even and those with (−) sign are parity-odd. Furthermore, transpositions simply permute the q i in the labels of these structures, but for the (−) structures we get an extra factor of (−1) in the action of transpositions. We illustrate how these structures can be used in practice in the example of the fourfermion bootstrap in section 5.

SO(3)-basis
In this subsection we define the SO(3) basis of structures, first introduced in [25,40]. 2 We will only give the formal expressions for the structures and not explain the motivation behind them, for which we refer the reader to [25].
First, we define the following monomials in s i where m i can take the same values as q i , Note that these monomials are proportional to the ones appearing in the right-hand side of (2.22) with m i = q i . As the notation suggests, these monomials transform in the standard way under some su(2) algebra. With this in mind, we define In all of the above formulas the summation is performed over the values of the variables for which the Clebsch-Gordan coefficients are non-vanishing. We use conventions for the Clebsch-Gordan coefficients such that they are real, so that To completely specify the conventions, we give a formula for Clebsch-Gordan coefficients in appendix E. Note that according to the above definitions, |j 12 , j 123 are simply polynomials in s i . We define SO(3)-basis structures analogously to (2.22), where the values of x i are as in (2.19). That is, the SO(3) basis structures are labeled by pairs (j 12 , j 123 ) for which the polynomials |j 12 , j 123 are non-zero, i.e.
By examining the definition of polynomials |j 12 , j 123 , we see that they are explicitly defined as linear combinations of the monomials in the right-hand side of (2.22) with coefficients given by products of Clebsch-Gordan coefficients. This definition implies that we can directly write the SO(3)-basis structures as linear combinations of q-basis structures and vice versa. Since these expressions are rather bulky yet straightforward combinations of (2.28), (2.31) and (2.32), we omit them.
Since the Clebsch-Gordan coefficients are real, the reality properties of the OPE coefficients in SO(3) basis are the same as in q-basis. Most permutation properties are unfortunately not manifest in SO(3) basis. The parity of SO(3) structures is, on the other hand, manifest and is given by the sign of (−1) j 1 −j 2 +j 3 −j 123 . (2.39)

Four-point structure basis
In this section we define the q-basis for four-point tensor structures [39]. The same comments as in section 2.2 apply to the meaning of the notation that we use for four-point tensor structures, The q-basis structures are labeled by indices I = [q 1 , q 2 , q 3 , q 4 ] subject to 3 A q-basis four-point tensor structure is defined by conformal invariance and its value in a standard configuration. Specifically, let In particular, the decomposition of a four-point function into these structures can be computed by evaluating it in the configuration (2.42) and taking the limit as in the left-hand side of (2.46). For the purposes of numerical bootstrap, it suffices to assume that 0 < z, z < 1, and so all operators are spacelike-separated. That is, the precise definition of Let us list some simple properties of the q-basis four-point tensor structures [39]: • If we expand a four-point function of Hermitian operators in q-basis, then the coefficient functions are real if there are 0 or 4 fermions in the correlator, and imaginary if there are 2 fermions.
• The space parity of the q-basis stuctures is equal to (−1) i j i −q i . See appendix D for details on what is meant by "space parity".
• Four-point q-basis tensor structures have simple properties under permutations for which we refer the reader to [39]. The same comments about the meaning of permutations apply as in section 2.2.
• The coefficient functions g [q 1 q 2 q 3 q 4 ] (z, z) transform in the following way under z ↔ z, This identity also holds for individual conformal blocks.

Conformal block for identity exchange
As a simple illustration of some of the above definitions, let us compute the expression for the conformal block for identity exchange in a general 3d correlator.
The identity block appears in correlation functions of the form and its contribution is simply equal to To compute the decomposition of this block, we need to evaluate the above block in the configuration (2.42) using (2.3). We find . (2.51) We have We see that for the identity block, the only non-zero components are those with q 1 = q 2 and q 3 = q 4 . These functions are Note that the identity block is defined without a reference to the three-point bases, and in fact, the above expressions do not need to be multiplied by OPE coefficients: they directly give the contribution of the identity operator to the four-point function.

The algorithm
Approximations to conformal blocks are computed in blocks 3d using residue recursion relations [7,10,16], and specifically the general form of the 3-dimensional residue recursion relations derived in [25]. In this section we briefly review these recursion relations and how they are used in blocks 3d.
The statement of residue recursion relations in d = 3 is as follows. The conformal blocks g ab ∆,j,I (z, z) are meromorphic functions of ∆ ∈ C with simple poles and known residues. 4 For each j there is an infinite set of poles ∆ j,i , labeled by an index i in some index set i ∈ P j . The set of these poles is independent of a, b, I, with the exception that some residues may vanish for special values of theses indices. The residues at these poles take the form where summation over repeated indices is understood. Here, the matrices L j,i and R j,i are I-independent, and in general depend on ∆ 12 and ∆ 43 . The quantum numbers ∆ j,i , j j,i appearing in the right-hand side have known expressions in terms of j and i. Importantly, we have where n j,i are positive integers. The quantities ∆ j,i , n j,i , j j,i were computed in [16]. Some examples of the matrices L j,i , R j,i were computed in [7,10,16,23,24,41], and the general closed-form expressions were derived for them in [25].
To make use of (3.1), the conformal blocks are separated into two factors, and a 0 and b j are the constants appearing in the normalization of the two-point function 2.5. The functions h ab ∆,j,I (z, z) defined in this way are useful because they are holomorphic at ∆ = ∞, The functions h ab ∞,j,I (z, z) can be determined by solving the Casimir differential equation to leading order in ∆, and have been computed in a general closed form in [25], based on the results of [26].
The relation (3.1) implies the following expression for the residues of h ab ∆,j,I (z, z), Combining this with (3.5), and with the knowledge that h ab ∆,j,I (z, z) is meromorphic in ∆, we find the residue recursion relation For the purposes of the numerical conformal bootstrap we need to compute an approximation to h ab ∆,j,I (z, z) near z = z = 1 2 . It is well-known that h ab ∆,j,I (z, z) has an expansion in positive integer powers of r which converges quickly near this point [42]. The residue recursion relation (3.7) provides an efficient way of computing this power series. To see this, consider the r 0 term in this expansion. Since all n j,i > 0, we can completely neglect the sum over i. Since h ab ∞,j,I (z, z) is a known function, we can easily extract the coefficient of the r 0 term from its explicit expression.
To compute the r 1 term, we cannot neglect the sum over i anymore, but due to n j,i > 0 we only need to know the r 0 term of the h a b ∆ j,i ,j j,i ,I (z, z) that appear in the sum, which we have already computed. Repeating in this manner we can generate the r-series expansion for h ab ∆,j,I (z, z) to very high orders. (In realistic applications the order can often go to r 80 or higher.) The recursion relation (3.7) also gives a nice representation of the ∆-dependence of h ab ∆,j,I (z, z). In particular, once the r-series of h ab ∆,j,I (z, z) has been computed to the order r N , we can drop the terms with n j,i > N in the right-hand side of (3.7) and substitute the derivatives at z = z = 1 2 of the computed series for the remaining terms. In this way, we obtain an approximation of the form where D a,b;m,n i,j,I ∈ R are numbers. This can be rewritten as where P a,b;m,n j,I (∆) is a polynomial in ∆ of degree equal to the number of i ∈ P j with n j,i ≤ N . For the conformal block itself we then get (∆) + m + n. For future convenience, we also factored out the explicit dependence on b j . The output of blocks 3d is essentially the polynomials P a,b;m,n j,I (∆), with some tweaks and optimizations described below. The precise form of the output is specified in section 3.3.
In the rest of this section we briefly describe some more technical points about the algorithm used in blocks 3d.

Block structure of residue matrices
When using the recursion relation (3.7), we need to multiply the known part of the power series by matrices L j,i and R j,i . These matrices have size and O has spin j. In more complicated blocks N 3 can be relatively large. For example, for four-point stress-tensor blocks we have N 3 = 25, 5 which is to be compared with scalar blocks where N 3 = 1. Taking into account that the algorithmic complexity of matrix multiplication is O (N 3 3 ), we find that just this multiplication step is 10 4 times slower than in the case of scalar blocks.
It is, therefore, desirable to reduce N 3 as much as possible. In the case of conformal blocks for T T T T , we know that in reality, the number of three-point structures T T O is at most 2 [22]. The N 3 = 25 above comes from ignoring the permutation symmetry, space parity, and conservation properties of T T O . It thus seems to be a good idea to specialize the recursion relation (3.7) to structures which satisfy these properties. However, we do not do this in blocks 3d for the following technical reasons.
First of all, these properties are very problem-specific and would significantly complicate the input information required by blocks 3d. Furthermore, even if we wanted to implement conservation constraints, we would have to use three-point structures that solve these constraints. Such structures depend polynomially on ∆ and the corresponding function h ab ∆,j,I (z, z) is not holomorphic at ∆ = ∞. Instead, it has a high-degree pole there (for T T T T it would grow as ∆ 12 ). 6 This means that we would need to determine more functions at ∆ = ∞ than just h ab ∞,j,I (z, z), and general expressions for these functions are not currently available.
Instead of relying on permutation and conservation properties of three-point structures, we use the fact observed in [25] that the matrices L j,i and R j,i are block-diagonal in j 12 and j 43 . That is, working in the SO(3) basis defined in section 2.2, we have (3.11) We additionally take into account the fact that these either preserve or flip (depending on i) the space parity of the structure. These observations allow us to split the three-point tensor structures into groups distinguished by the value of j 12 (or j 43 ) and space parity. In the example of T T T T blocks, we split the N 3 = 25 generic structures into groups the largest of which contains only 5 structures, which gives a significant improvement in performance over the direct application of (3.1). Since the recursion relations for different values of j 12 , j 43 and I can be used completely independently, a single run of blocks 3d only computes the conformal blocks g (j 12 ,j 120 ),(j 43 ,j 430 ) ∆,j,I (z, z), with the values of j 12 , j 43 and I provided by the user. This enables easy parallelization of conformal block computations.

Pole-shifting
One drawback of using the representation (3.10) for the derivatives of the conformal blocks is that for large r-series order N the polynomials P a,b;m,n (∆) become of high degree. These polynomials are typically used as an input to the semidefinite solver SDPB [3,4], which performs slower with higher-degree polynomials. Unfortunately, taking N to be relatively large is often necessary in order to get a reliable approximation for the high-order cross-ratio derivatives of the conformal blocks.
In order to address this problem, we use the "pole-shifting" method first described in [10] and also implemented in scalar blocks. This method introduces a new truncation parameter κ and looks for approximations of the form (∆) are chosen to ensure that the difference between the two sides of (3.12) is at most O(∆ − M/2 −1 ) near ∆ = ∞ and O((∆ − ∆ 0 (j)) M/2 ) near ∆ = ∆ 0 (j), where ∆ 0 (j) is the unitarity bound for the given value of j, and M is the number of poles appearing on the right-hand side.
In practice we use large N and moderate κ, making sure that increasing κ does not affect the output of the semidefinite solver.

Optimizing for spinning four-point structures
In scalar blocks, the residue recursion relations are used to compute the derivatives of the blocks along z = z diagonal, and then the Casimir equation is used to compute the off-diagonal derivatives [5]. This strategy is useful because the computation of off-diagonal derivatives using the Casimir equation is much more efficient than running the recursion relation multiple times.
In blocks 3d this approach is not used, and the residue recursion relation is run multiple times to compute the r-series for all off-diagonal derivatives. This is because, in the spinning case, there are typically several four-point tensor structures I, and the Casimir recursion relation mixes them all together. In cases such as T T T T there are hundreds of four-point tensor structures I. Using the Casimir recursion relation would require running the recursion step for all of them, while in practice, only the blocks for a few values of I [22,23,43] are needed. Computing the off-diagonal derivatives directly from the recursion relation allows us to run the code only for these few values of I.

Coordinates for cross-ratios
There are several choices of coordinates in cross-ratio space available in blocks 3d: • z, z coordinates are defined by the conformal frame (2.42) and are related to the standard u, v coordinates by The crossing-symmetric point is z = z = 1 2 and crossing acts by z → 1 − z, z → 1 − z.
• x, t coordinates are defined through z, z via (3.14) The crossing-symmetric point is x = t = 0 and crossing acts by x → −x. These coordinates are useful because they make manifest the symmetry of various functions with respect to z ↔ z.
• y, y coordinates [44] uniformize the cut z, z-plane The crossing-symmetric point is y = y = 0 and crossing acts by y → −y, y → −y. The conformal block expansion is convergent when |y|, |y| < 1 and it is expected that the components of the extremal functional converge to finite values in these coordinates [44].
• w, s coordinates are the analogs of x, t for y, y, (3.16) The crossing-symmetric point is w = s = 0 and crossing acts by w → −w.
It is furthermore possible to compute only the "radial derivatives", i.e. the ∂ n x ∂ 0 t or ∂ n w ∂ 0 s derivatives of the conformal blocks. This option is useful, for example, in cases when there are one-dimensional degrees of freedom in the system of crossing equations [22,23].

The output
In this section we give the precise definition of the quantities output by blocks 3d. The code computes conformal blocks as defined in section 2.1, in the SO (3)  (z, z) denote such conformal blocks. These blocks do not have a definite symmetry under z ↔ z. Therefore, we define The functions g as discussed in section 3.1, in the form where the factor p ± (c 1 , c 2 ) is introduced to ensure that we take derivatives of a smooth function. Here we choose all p ± (c 1 , c 2 ) = 1 except p − (x, t) = 2 z−z and p − (w, s) = 2 y−y . In addition, r 0 = 3 − 2 √ 2 ≈ 0.1716 is the value of r at the crossing-symmetric point and κ is the user-selected truncation order for pole-shifting (section 3.1.2). These approximations are expressed in terms of x = ∆ − ∆ 0 (j), where ∆ 0 (j) is the unitarity bound. The poles x j,i = ∆ j,i − ∆ 0 (j) appearing in the above approximation as well as the polynomials P (expressed in terms of x) are output.
Given the user-specified derivative order Λ, the values of m, n which are output are determined by where the weights µ(c i ) are given by µ(t) = µ(s) = 2 and for other coordinates are equal to 1. The quantity Λ ± is defined as Λ + = Λ and Λ − = Λ − 1. For the coordinate choices z, z or y, y only half the derivatives are output due to the symmetry property under z ↔ z. The values of j 12 , j 43 , q i , ± are fixed in a given run of blocks 3d. The values of j 120 , j 430 are chosen to be compatible with the parity of the structures. For example, if the four-point tensor structure is parity-even, only the combinations of j 120 and j 430 which correspond to two parity-even or two parity-odd three-point structures are output, since the blocks vanish for other combinations.
If the exchanged operator is fermionic, i.e., j is a half-integer, the polynomials P are pure imaginary. In this case, their imaginary part is output. If j is an integer, the polynomials are real and are output directly.

Implementation details
blocks 3d is implemented in C++14 and uses the GMP [45], Boost [46], FMT [47] and Eigen [48] libraries. We repeatedly profiled the execution to detect what parts were taking a long time and aggressively optimized those parts. In a small number of cases, we had to rewrite code in an ugly fashion to reduce temporaries and memory pressure.
However, the most significant improvements came from using multiple, thread-local caches to speed up computations. For example, Clebsch-Gordan coefficients must be computed many times with identical inputs. These caches reduce execution time by more than an order of magnitude, but, unfortunately, they significantly increase memory use.
We also parallelized blocks 3d by splitting the computation across a user-specified number of threads. The work proceeds in two stages: 1. Compute the derivatives (3.19) with respect to (r, λ), where r is defined in (3.4) and λ = 1 2 log(ρ/ρ).

Convert the derivatives from (r, λ) into the output coordinates (section 3.2).
Stage 1 must finish before stage 2 can start. Each of those stages are independently parallelized across multiple threads.
In Stage 1, for a given derivative ∂ n λ , we use the recursion relation (3.7) to compute the power series in r, and, from it, the r-derivatives ∂ m r up to m + n = Λ. So all of the calculations for a given ∂ n λ can be computed independently. Symmetry under z ↔ z mean that we only have to compute even or odd λ-derivatives, so there are Λ/2 different independent computations.
We arrange these different computations in a queue, with threads taking work from the queue when they are ready. So very high Λ calculations can benefit from larger machines.
For Stage 2, it is the different values of the spin of the internal operator (j-internal) that are processed with a queue. So the degree of parallelization is limited by the number of elements in j-internal.
For the tests we have done, the limiting factor is usually Λ/2. The proportion of time taken by each stage varies from 30% to 70%, depending on the details of the problem and the hardware. As long as the calculation is large enough, we see very high utilization of all cores. This is a fairly simple way of multithreading the computation, so we did not encounter many problems with subtle multithreading bugs. In addition, we ran blocks 3d under the Helgrind thread error detector [49,50] and found no issues. We did find multithreaded performance problems with the Boost multiprecision library, but that has been rectified in the latest release of Boost (1.74).

Correctness
We have verified the correctness of blocks 3d in several ways.
We have a separate implementation in Mathematica that, while very, very slow, allowed us to validate all of the individual components as well as directly compare a complete calculation for smaller test cases.
We have verified that our implementation of h ab ∞,j,I (z, z) leads to the correct leading terms of the r-series and that the r-series generated by blocks 3d satisfies the quadratic conformal Casimir equation [51] in a number of correlation functions. Since the Casimir equation has a unique solution for a given leading term of the r-series expansion, this is a robust check of the code.
We have compared the output of blocks 3d to that of scalar blocks in the case of scalar blocks, and to the blocks computed in [22] in the case of T T T T blocks. We found a perfect match in both cases.
Finally, we have implemented the 3d four-fermion bootstrap (as described in section 5) and found agreement with the previous results [19].

Performance
In this section we present the results of some simple performance benchmarks. In section 4.1 we compare the performance of blocks 3d to that of scalar blocks [18] (for the problem of computing scalar blocks). In section 4.2 we describe current performance numbers for various examples of spinning blocks.
Benchmarks in this section were run on the Helios cluster at the Institute for Advanced Study, where each node has dual 14-core (28 cores total) 64-bit Intel Xeon Broadwell processors 7 and 128GB RAM.

Comparison to scalar blocks
In this section we compare the performance of blocks 3d with scalar blocks [18] when computing scalar conformal blocks. While both programs use the same recursion relations and pole-shifting procedures, they differ in more technical aspects, such as those discussed in section 3.1.3 and how parallelization is carried out. Table 1 shows the memory usage and runtime for two sets of parameters. The parameters for Set 1 are of medium complexity, while the parameters for Set 2 are characteristic of the hardest numerical bootstrap problems analyzed in the literature [8,12]. blocks 3d is slower than scalar blocks by a factor 2-3 and uses 10 times the memory. This is to be expected since blocks 3d has been optimized for a more general use case.  Table 1: Memory usage and total runtime for two sets of parameters for scalar blocks and blocks 3d, averaged over 10 runs each.
This makes blocks 3d less efficient for computing scalar blocks, but not impractically so. For example, blocks 3d runs will still fit comfortably on modern cluster nodes, which typically have at least 128 GB of RAM. If a project needs to compute spinning blocks, which take much, much longer than 3d scalar blocks, it may simplify the workflow to only use blocks 3d.

Spinning examples
When considering the performance of block-generating code, it is important to distinguish between two cases. The first case corresponds to conformal blocks appearing in correlation functions such as φφφφ , φφT T , T JT J , etc., where φ is some generic scalar operator and T, J are the stress-tensor and a spin-1 conserved current, respectively. 8 The common trait of these correlation functions is that the differences ∆ 12 = ∆ 1 − ∆ 2 , ∆ 43 = ∆ 4 − ∆ 3 are fixed. This could be because some operators are identical and their scaling dimensions cancel in these differences, or because some scaling dimensions are protected, such as those of T and J. Since ∆ 12 and ∆ 43 are fixed, once the set of intermediate spins, the derivative order Λ, and approximation-quality related parameters are selected, such blocks need to be computed only once. For this reason, this case will be called "static." The second case is when ∆ 12 and ∆ 43 can vary. In this situation the blocks will need to be recomputed many times in a typical bootstrap computation, which is why we will call this case "dynamic." For example, in mixed-correlator bootstrap studies of the 3d Ising CFT involving external σ, -operators, the blocks for σ σ and σ σ [7] are required. Searches over the parameter space ∆ σ , ∆ typically require on the order of ≥ 10 2 points. More complicated setups, such as the O(2) model [12] which used 3 external primary operators, require more blocks per point, and the total number of scalar blocks that need to be computed in these problems can reach 10 3 − 10 4 .
Since we are reviewing the performance of blocks 3d, we will put it in the context of the simplest setups with spinning operators that have not yet been studied with the numerical conformal bootstrap. Since quite a few single-correlator setups have already been implemented [1,19,20,22,23], we focus on problems which involve a pair of external primaries. 9 We will consider systems involving correlators of {φ, T }, {ψ, T }, as examples of relatively complicated systems, 10 and {φ, ψ} as an example of a relatively simple system. Here φ is a generic neutral or Z 2 -odd scalar, and ψ is a generic Majorana fermion. For simplicity, we do not consider non-trivial global symmetries: they tend not to greatly increase the number of conformal blocks that we need to compute, and instead simply add a layer of flavor blocks. Similarly, in counting structures, we will assume that systems preserve space parity. Ignoring parity symmetry will introduce only a constant factor change in the estimates. The systems we consider, together with their correlators and numbers of four-point tensor structures are given in table 2.
To compute blocks for a given four-point function we in general need to call blocks 3d several times. A separate call is required for each ordering of the operators (modulo Z 2 × Z 2 kinematic permutations which preserve the cross-ratios [39]), four-point tensor structure, and for every possible choice of j 12 and j 43 . The latter choice is the main determining factor for the performance of blocks 3d since the blocks computed in any given run are two matrices of sizes L 1 (j 12 ) × L 1 (j 43 ) and L 2 (j 12 ) × L 2 (j 43 ), 11 where for integer j and for half-integer j Theoretically, the algorithmic complexity of the recursion step depends on these sizes as Of these, only the mixed system involving a scalar with a spin-1 conserved current has been studied in the published literature [24]. 10 We do not consider, for example, {ψ, J} since it has smaller blocks than {ψ, T }. For another example, we do not consider the {J, T } system because it only has static blocks, and the worst-case static T T T T block is covered in, e.g., the {φ, T } system. 11 This is true for the correlators and structures we considered in our benchmarks. Depending on how space parities align, these could instead be L1(j12) × L2(j43) and L2(j12) × L1(j43). system correlator Table 2: The systems of correlators that we consider in our performance comparison, along with the number of four-point structures N 4 needed for each case. The notation a × b for N 4 means that there are a different orderings of the operators (e.g. the orderings T φT φ and T φφT ), modulo Z 2 × Z 2 permutations, for the fixed OPE channel, and each ordering has b four-point tensor structures. Such orderings have the same computational complexity. We are ignoring the 1-and 0-dimensional degrees of freedom in four-point structures of conserved operators [22,23], since those have a much smaller computational complexity than the 2dimensional degrees of freedom.
This scaling describes the data in table 4 reasonably well, accounting for most of the variation in the runtimes. 12 To get some sense of the performance for a given correlator, we can run blocks 3d with the maximal values of j 12 and j 43 for one choice of four-point tensor structure. When using the parameters in table 3, the required memory resource, and runtimes for various correlation  functions are shown in table 4.
To estimate the total time needed to compute all conformal blocks for a given correlator, we can then multiply these runtimes by the number of four-point tensor structures, as well as sum over all possible values of j 12 and j 43 assuming the scaling in (4.3). Specifically, if t is the time it takes to run blocks 3d for the maximal allowed values j 12,max , j 43,max , then we estimate the total time t tot required to compute conformal blocks for the given correlator as .
(4.4) 12 To be more precise, after dividing the total user time of these runs by (4.3), we obtain a factor of 4 difference between the highest and lowest fractions, compared to a factor of 300 difference without dividing by (4.3).  Table 3: Parameters used in our performance comparison. We use 13 threads because the parallelism is limited by Λ/2 in the current implementation. For the scaling dimensiondependent parameters delta-12, delta-43 and delta-1-plus-2 we use the values appropriate for the correlator, assigning some generic scaling dimensions to φ and ψ.  Table 4: Computing resources required for one call to blocks 3d for each kind of block, using the maximal values of j 12 , j 43 and for a single choice of four-point structure, given the parameters in table 3. where j 12,min and j 43,min equal 0 or 1 2 , and the sums proceed in integer steps. Taking into account the number of cores reserved for the computation, we can then get the approximate estimates shown in table 5 for the total CPU time required to compute the dynamic conformal blocks in the setups mentioned above (for one fixed choice of external dimensions), and the estimates in table 6 for computing some of the static blocks in these systems. Note that the T T T T correlator gives the worst-case scenario for static blocks involving scalars, fermions, and T .
These numbers show that it is practical to use blocks 3d for the numerical conformal bootstrap of the systems considered in this section. Specifically, the time to compute T T T T block dominates the static block computation time in all setups, and we estimate it to be on the order of 2700 CPU hours. Furthermore, assuming that the number of points in scaling system correlator CPU hours {φ, ψ} φψφψ T ψT ψ 300 Table 5: Estimates of CPU hours needed for the computation of dynamic blocks in each system of correlators (for one fixed choice of external dimensions). The notation for the correlators is the same as in table 2.
dimension space for which the dynamic blocks need to be computed is on the order of 10 2 −10 3 , we see that in all cases, the dynamic blocks dominate the conformal block computation time, and is estimated in total to be around 10 3 − 10 5 CPU hours depending on the problem. While this is significant, this is still below the typical computational time required to run semidefinite programming for problems of this size, which can be 10 6 CPU hours or higher.

Physical setup
In this section we apply blocks 3d to an example problem of the 3d four-fermion bootstrap. That is, we impose the crossing symmetry constraints on the four-point function of a single Majorana fermion ψ in a parity-preserving 3d CFT. The numerical bootstrap applied to this correlator was studied in great detail in [19]. The goal here is mostly to demonstrate how blocks 3d can be applied to an interesting physical problem. To have a concrete physical goal, we will revisit some of the features of the exclusion plots of [19] in the context of the recently described "fake primary" effect [21].
The first step is to identify the three-point structures of the operators that appear in the ψ × ψ OPE. Consider the three-point function ψψO , (5.2) where the operator O has spin j. Assume first that j ≥ 1. According to the discussion in section 2.2, the following q-basis three-point tensors structures are possible for this three-point function, 13 where we have defined We need to additionally impose the requirements of permutation symmetry between the first two operators as well as the parity constraints. These structures transform under the (12) permutation (see section 2.2) as Note that since ψ is a fermion we need anti-symmetric structures. We then find the following allowed structures for various types of operators that appear in ψ × ψ OPE: • Parity-even, even j: • Parity-even odd-j operators are forbidden.
• Parity-odd, even j: • Parity-odd, odd j: For j = 0 the only difference is that we have to remove the second structure for the parity-even even-j operators. The OPE coefficients corresponding to these structures are pure imaginary. We now need to determine the four-point tensor structures and the corresponding crossing equations. In principle there are 2 4 = 16 four-point tensor structures, corresponding, in the q-basis, to all possible choices of the four q i ∈ {− 1 2 , 1 2 }. Of these structures, 8 are parity-even.
It remains to determine the matrices M a j,(j 12 ,j 120 ) , i.e. to express the q-basis structures in terms of the SO(3)-basis structures. To keep the exposition short, we do this for a single q-basis structure [ 1 2 , 1 2 , −1] + . We have we can write, interpreting it as the value of the q-basis structure in the configuration (2.22), and according to definition (2.28) Using (2.34) we have where we plugged in the values of the Clebsch-Gordan coefficients. According to (2.33) we have and so altogether we find Therefore, Recall that the left-hand side was interpreted above as the value of q-basis structure in configuration (2.22), and that SO(3) basis is defined by (2.36) in the same configuration. This means that we can directly read this equation as the relation between SO(3) and qbasis three-point tensor structures. The relations for other q-basis structures can be obtained analogously.
This completes the reduction of conformal blocks that are needed for our analysis to the blocks computed by blocks 3d.

Results
We first reproduce the two simple bounds originally computed in [19]. These are the bounds on the gaps in scalar parity-even (∆ ) and parity-odd sectors (∆ σ ), as functions of the scaling dimension of the fermion ∆ ψ . The results are shown in figure 1. These plots, as well as all other plots in this section, were computed at Λ = 27. 16 These results are consistent with those of [19] (they do not exactly coincide since we used a slightly higher Λ) and show two prominent features.
The bound on ∆ has a pronounced kink somewhere in the interval Note that these ranges are valid for the given Λ = 27, and may shift at higher derivative truncation orders. Nevertheless, these ranges clearly overlap, and it was conjectured in [19] that these features correspond to an actual CFT.
In [19], these features, and in particular the jump in ∆ σ , were compared to similar features in the bootstrap of the 3d Ising CFT [7]. Furthermore, since the analysis of [19], jumps similar to that in ∆ σ have been observed in the 3d fermion bootstrap with global symmetries [20] and in the 4d fermion bootstrap [21]. A common trait of all these jumps is that the jump happens when the bound approaches the number of spacetime dimensions from below. For example, in the present case, ∆ σ is somewhat close to 3 just to the left of the jump in figure 1. 16 Λ is the upper cutoff on the total order of derivatives of the crossing equations. The other relevant numerical parameters are given in appendix B. 17 We determined the location of the jump more precisely than that of the kink only because we study the structure of the jump in more detail below.  In [21], the jumps in the 4d fermion bootstrap and 3d Ising bootstrap [7] were shown to be due to the "fake primary" effect. 18 We refer the reader to [21] for a detailed general explanation of this effect. In the setup of this paper, the statement is that the exchanges of parity-odd spin-1 operators V very close to the spin-1 unitarity bound ∆ V = 2 give exactly the same contribution to the four-point function ψψψψ as exchanges of parity-odd scalars with dimension ∆ σ = 3. Thus, unless we impose a gap on ∆ V above ∆ V = 2, we are effectively allowing an isolated parity-odd scalar contribution at ∆ σ = 3, the "fake primary." This has no effect on numerics while the gap in ∆ σ is below 3, since then this isolated contribution is a part of the continuum of other allowed contributions, but it becomes important as soon as the gap in ∆ σ crosses 3. In effect, in such a situation, we are bounding the gap to the second parity-odd scalar, assuming that the first parity-odd scalar is at ∆ σ = 3. This contributes to a discontinuity in the bound on ∆ σ .
This section aims to analyze the jump in ∆ σ observed in figure 1 in the context of the fake primary effect. While it is clear that in our setup this jump is at least partly due to the fake primary effect (because the bound on ∆ σ goes from below 3 to above 3), we would like to know whether there is some underlying CFT, as in the case of the 3d Ising bootstrap.
There are several indications, some seen already in the results of [19], which suggest that the fake primary effect is not the primary cause of the jump. First and foremost, there is a kink observed in the bound on ∆ at the same value of ∆ ψ . When the bound on ∆ is computed, no assumptions are made about the parity-odd scalar sector, and the fake primary is hidden in the continuum of allowed contributions. Therefore, it does not immediately affect the bound on ∆ . The fact that the bound on ∆ displays a kink distinguishes the current setup from the setups in [20,21], where the jumps seem to be only due to the fake primary effect. Instead, it appears more similar to the situation with the 3d Ising bounds, where there is a physical theory under the jump.  Furthermore, the jump in the ∆ σ bound appears to start below ∆ σ = 3, which is another distinguishing feature of the jumps in 3d Ising bounds [7]. To verify this, we computed the bound on ∆ σ over a fine grid of ∆ ψ values near the jump, with the results shown in figure 2. These plots strongly suggest that the discontinuity starts at ∆ σ = 2.924(1). (Again, this number is for Λ = 27.) Since we are only computing the bound at a discrete set of values ∆ ψ , we cannot logically exclude the possibility that the true discontinuity starts at ∆ σ = 3 and is entirely due to the fake primary effect. However, in that case, there must still exist an extremely pronounced continuous feature in the plot leading up to ∆ σ = 3 just to the left of the discontinuity. This should be contrasted with the jumps observed in [21] and [20], where the bound is perfectly smooth up to exactly the fake primary threshold, at which point it jumps.
The work in [21] found that the fake primary contribution to the jump can be removed in the 3d Ising model by imposing a gap above the unitarity bound in the Z 2 -odd vector sector, the role of which in our setup is played by the parity-odd vectors V . In figure 3, we show how the bound on ∆ σ is affected by the gaps ∆ V imposed on such operators. We see that the jump persists up to at least ∆ V = 2.25. The way the plot near the jump changes with ∆ V is somewhat different from what was observed in [21] for the 4d fermion bootstrap, where the jumps were concluded to be likely entirely due to the fake primary effect. But it is hard to draw sharp conclusions from this comparison.
It is, however, instructive to compare figure 3 to figure 4, where the bound on ∆ is plotted for various choices of ∆ V . From figure 4 we see that the bound is essentially independent of We have additionally checked that if we sit near the kink at {∆ ψ , ∆ } = {1.286, 4.974}, then the maximal parity-odd spin-1 dimension is ∆ V < 2.29 and the parity-even spin-2 gap must be smaller than ∆ T < 3.004. It thus seems to be a consistent scenario that the jump in ∆ σ and the kink in ∆ are both due to a local CFT which contains a parity-odd vector operator of dimension ∆ V ≈ 2.3 as well as a stress-energy tensor with ∆ T = 3.
The evidence discussed in this section appears to be inconsistent with the features in figure 1 being solely explained by the fake primary effect, but is so far consistent with the existence of a local CFT with ∆ ψ ≈ 1.3, ∆ ≈ 5, 3 ∆ σ 7, and a parity-odd vector operator of dimension ∆ V ≈ 2.3. It would be interesting to further explore and constrain this hypothetical CFT.

Conclusions
Introducing a general software tool for computing spinning 3d conformal blocks should mark the beginning of a new era for the numerical conformal bootstrap. In particular, blocks 3d will enable the study of large systems of bootstrap equations involving external spinning operators, including fermions, global symmetry currents, and the stress tensor. In turn, this should allow for the computation of new bootstrap bounds and islands, leading to rigorous determinations of observables in physically-interesting CFTs.
An immediate future direction is to apply blocks 3d to perform bootstrap computations in systems of mixed correlators containing fermions and scalars, building on the bounds obtained in [19,20]. We expect that such a system will lead to additional constraints on the CFT data of the Gross-Neveu-Yukawa models. It may also help us explore the nature of the hypothetical "dead-end" CFT which may underlie the kink/jump appearing in [19,20], in the bounds from fermion four-point functions.
It will also be interesting to perform new bootstrap computations using systems of correlators containing the stress tensor, building on the general bounds from stress-tensor four-point functions obtained in [22]. In addition to allowing access to CFT observables connected to the stress tensor (e.g., three-point coefficients T T O ), such systems should also help to produce a more refined map of the general space of 3d CFTs.
Another direction is to study mixed correlators containing non-Abelian currents (building on [23,24] and the supersymmetric generalizations [52][53][54]), together with operators charged under their global symmetries. Such systems will allow for the study of whether information about current three-point coefficients can be used to help isolate 3d CFTs. They can also serve as a prototype for studying whether inputting information about 't Hooft anomalies can help isolate interesting non-supersymmetric 4d CFTs such as the conformal window of QCD. Additionally, they can be used to explore whether such correlators can be effectively used to forbid the global symmetry enhancements that affect the structure of numerous bootstrap bounds [2,[55][56][57][58].
Overall, we are optimistic about the future of the numerical bootstrap. With the recent development of SDPB 2 [4], and now the introduction of blocks 3d, a plethora of new bootstrap problems involving external spinning operators should now become tractable. We expect that there is still much low-hanging fruit to be picked from these systems and that the conformal bootstrap will reveal new surprises for many years to come.

A Code Availability
The software blocks 3d is freely available from Gitlab at https://gitlab.com/bootstrapcollaboration/blocks_3d The work presented here was computed by the latest current version, which has the Git commit hash e37e972f5f19befa1158754ee9570c7b6a1c5913 B Details on numerics The computations described in section 5 of this paper used the parameters given in table 7 for SDPB [3,4] and blocks 3d.  Table 7: Parameters used for the numerical computations in this paper.

D Parity for the three-point and four-point structures
In this section we clarify the meaning of parity for the tensor structures. We define the parity κ of a local operator by where R µ is the unitary operator representing reflection in a spatial direction µ (x µ → −x µ ), µ = 1, 2, and R µ x is the appropriately reflected x. For κ = 1 we say that the operator is parity-even, and for κ = −1 we say that the operator is parity-odd. This definition is consistent with the usual definition of parity for tensor (integer j) operators. Motivated by this definition, for a tensor structure represented by a function f (x i , s i ) of several coordinates x i and polarizations s i we define (D. 2) It follows that R 2 µ f = f and thus all structures can be split into parity-even (R µ f = f ) and parity-odd (R µ f = −f ).
For operators of definite parity κ, correlation functions are expanded in terms of parityeven structures if the product of operator parities is even, and in terms of parity-odd structures if the product of operator parities is odd.