Modular Invariance in Superstring Theory From ${\cal N} = 4$ Super-Yang Mills

We study the four-point function of the lowest-lying half-BPS operators in the ${\cal N} =4$ $SU(N)$ super-Yang-Mills theory and its relation to the flat-space four-graviton amplitude in type IIB superstring theory. We work in a large-$N$ expansion in which the complexified Yang-Mills coupling $\tau$ is fixed. In this expansion, non-perturbative instanton contributions are present, and the $SL(2, \mathbb{Z})$ duality invariance of correlation functions is manifest. Our results are based on a detailed analysis of the sphere partition function of the mass-deformed SYM theory, which was previously computed using supersymmetric localization. This partition function determines a certain integrated correlator in the undeformed ${\cal N} = 4$ SYM theory, which in turn constrains the four-point correlator at separated points. In a normalization where the two-point functions are proportional to $N^2-1$ and are independent of $\tau$ and $\bar \tau$, we find that the terms of order $\sqrt{N}$ and $1/\sqrt{N}$ in the large $N$ expansion of the four-point correlator are proportional to the non-holomorphic Eisenstein series $E({\scriptstyle \frac{3}{2}},\tau,\bar\tau)$ and $E({\scriptstyle \frac{5}{2}},\tau,\bar\tau)$, respectively. In the flat space limit, these terms match the corresponding terms in the type IIB S-matrix arising from $R^4$ and $D^4 R^4$ contact interactions, which, for the $R^4$ case, represents a check of AdS/CFT at finite string coupling. Furthermore, we present striking evidence that these results generalize so that, at order $N^{\frac{1}{2}-m}$ with integer $m \ge 0$, the expansion of the integrated correlator we study is a linear sum of non-holomorphic Eisenstein series with half-integer index, which are manifestly $SL(2,\mathbb{Z})$ invariant.

reproduces the "stringy" correction corresponding to the eight-derivative contact interaction of the form R 4 (a well-known contraction of four Riemann tensors) in the effective IIB superstring action [35,36]. Several other terms of higher order in 1/λ and in 1/N have been matched with terms that arise in the low energy expansion of string perturbation theory [33,34].
The procedure that allows the above comparisons between string perturbation theory and N = 4 correlation functions is as follows. For the four-point function mentioned above, the analytic bootstrap consistency conditions [11,33,34] determine the position dependence of the tree-level Witten diagrams corresponding to higher derivative terms in the string effective action, up to a number of undetermined coefficients. At low orders in the derivative expansion, these coefficients can be fully determined, in principle, from constraints derived using supersymmetric localization [29][30][31][32][33][34]. Indeed, as shown in [33] in the case of the N = 4 SYM theory, one can obtain integrated constraints on the 4-point functions of the operators in the stress-tensor multiplet by taking four derivatives of the partition function Z of the mass-deformed N = 4 SYM theory placed on a round S 4 . This mass deformation of the N = 4 SYM theory is referred to as the N = 2 * theory, and its S 4 partition function was computed by Pestun using supersymmetric localization [37]. As shown in [37], this partition function takes the form of a finite-dimensional integral over constant values of one of the vector multiplet scalars. The integrand is a product of a classical contribution, a one-loop contribution, and contributions from instantons located at the north and south poles of S 4 [38][39][40][41].
The procedure implemented in [33] for analyzing the N = 4 integrated correlation function made use of the constraint coming from the m = 0 limit of the mixed derivative (where m is the mass deformation parameter) is the complexified gauge coupling. As mentioned above, this leads to an explicit calculation of various terms in the double expansion in 1/N and 1/λ of the CFT correlation function [33,34], and these terms reproduced the analogous terms in the low energy expansion of the type IIB superstring tree amplitude that are perturbative in g s .

SL(2, Z) duality of CFT correlation functions and type IIB graviton amplitudes
These connections between N = 4 SYM and type IIB superstring perturbation theory are part of a much richer non-perturbative story that incorporates the constraints of S-duality. In particular, understanding how the SL(2, Z) S-duality of the type IIB superstring theory [42] arises as the image of Montonen-Olive SL(2, Z) duality of N = 4 super-Yang-Mills theory [43][44][45] involves understanding the holographic connection between Yang-Mills instantons and D-instantons [46][47][48][49], which will be a crucial feature of this paper.
Before we go into more details about the SL(2, Z) invariance properties of the IIB amplitudes and corresponding SYM correlators, we note that neither N = 4 SYM nor the IIB theory on AdS 5 × S 5 are entirely SL(2, Z) invariant. As explained in [50], specifying the SYM theory requires knowing the global form of the gauge group, as well as a discrete theta angle, and general SL(2, Z) transformations change both the global structure of the gauge group and the discrete theta angle. 1 However, local correlators in N = 4 SYM theory (and correspondingly the amplitudes in IIB string theory) are insensitive to such subtleties, and it is in this sense that we are exploring their SL(2, Z) invariance properties in this paper.

SL(2, Z) and the IIB superstring low energy expansion
The exact coefficients of higher-derivative interactions in the low energy expansion of the four-graviton amplitude in IIB superstring theory are SL(2, Z)-invariant functions of the complex string coupling τ s = χ s +i/g s that have been explicitly determined up to order D 6 R 4 .
These interactions preserve a fraction of the 32 supersymmetries (they are so-called F-terms), and their form is severely constrained by supersymmetry combined with S-duality. For example, the coefficient of the 1/2-BPS R 4 interaction is proportional to a non-holomorphic Eisenstein series E( 3 2 , τ,τ ) [53][54][55], which will be defined in Appendix A. When expanded at small string coupling, this Eisenstein series has two terms that are power behaved in g s , corresponding to genus-0 and genus-1 contributions in string perturbation theory. In addition, there is an infinite sequence of exponentially suppressed non-perturbative terms due to D-instanton effects, where the contribution of a charge-k D-instanton is proportional 1 For the SU (N ) cases, these theories are labeled as (SU (N )/Z k ) n where k is a positive divisor of N and n a Z k -valued theta angle [50]. Under the S generator of SL(2, Z), the SU (N ) theory (no discrete theta angle for this case) is mapped to (SU (N )/Z n ) 0 and vice versa. On the bulk side, the type IIB string theory contains a nontrivial topological sector on AdS 5 described by a Chern-Simons-like theory involving the NS and RR 2form fields [51]. The discrete data involved in specifying the boundary SYM theory translates into a choice of boundary conditions for the bulk topological theory, and such boundary conditions transform nontrivially under SL(2, Z) [51,52]. These topological subtleties are important for understanding the SL(2, Z) properties of extended objects (such as line operators) or when considering topologically nontrivial backgrounds (such as nontrivial H 2 (M 4 , Z N ) on the 4d boundary manifold M 4 ).
to e −2πk/gs . Similar comments apply to the coefficient of the 1/4-BPS D 4 R 4 interaction, which has a coefficient proportional to E( 5 2 , τ,τ ) [56]. Whereas the Eisenstein series satisfy Laplace eigenvalue equations, the coefficient of the 1/8-BPS interaction, D 6 R 4 , is a novel modular function that satisfies an inhomogeneous Laplace eigenvalue equation [57] that is also reviewed in Appendix A. 2 SL(2, Z) and correlation functions in N = 4 SYM In the usual 't Hooft limit, the 't Hooft coupling λ is kept fixed as N → ∞, which requires g YM to be small. However, this is incompatible with SL(2, Z) duality, which has an action on the complex coupling τ ≡ θ 2π + 4πi given by where a, b, c, d ∈ Z and ad − bc = 1. In particular, this transformation mixes weak coupling and strong coupling effects. Therefore, in order to consider the action of SL(2, Z) on correlation functions in the large-N limit, it is necessary to consider the limit in which g YM is fixed as N → ∞. Such a limit had been considered in [33], where it was referred to as the "very strong coupling limit," and it was also considered in an analogous context in [61]. 3 In particular, instanton effects of order e −8π 2 k/g 2 YM = e −8π 2 kN/λ , where k (the instanton number) is a positive integer, survive this limit, whereas they are exponentially suppressed in the usual 't Hooft limit. In the very strong coupling limit, it is the terms of order N 1/2 , N −1/2 , and N −1 that correspond to the R 4 , D 4 R 4 , and D 6 R 4 mentioned above.
Using a similar strategy to that outlined above in the 't Hooft limit, we find that the analytic bootstrap constraints combined with the supersymmetric localization constraints coming from (1.1) yield, in the very strong coupling limit, the same Eisenstein series that appear in the low energy expansion of the type IIB four-graviton amplitude. Indeed, the constraint from (1.1) is sufficient to determine the coefficient of N 1/2 in the large-N expansion of the CFT four-point function, and, as we will show, this coefficient ends up being proportional to E( 3 2 , τ,τ ), with precisely the right proportionality factor to match the corre- 2 These results were rederived in [58] using on-shell superamplitude methods [59]. Furthermore, more general F-terms involving higher-point interactions have also been determined [60]. 3 Non-'t Hooft limits have been considered before in other gauge theories. For instance, in the 3d U (N ) k × U (N ) −k ABJM theory [62], the limit in which k is fixed while N → ∞ corresponds to M-theory on AdS 4 × S 7 /Z k . In the same theory, a different non-'t Hooft limit, namely that in which N → ∞ with finite µ ≡ N/k 5 considered in [32] is somewhat similar to the very strong coupling limit considered here. sponding term in the superstring amplitude. 4 This is a full non-perturbative precision test of AdS/CFT! Note that those contributions that are perturbative in 1/λ in the 't Hooft limit are also power-behaved in g YM in the limit in which N → ∞ and g YM is fixed. Thus, the perturbative terms evaluated in earlier work reproduce the two terms that are power-behaved in g YM in E( 3 2 , τ,τ ). Our main challenge is to show that the exponentially suppressed terms in the Eisenstein series can also be reproduced from the N = 4 SYM theory. Using (1.1), we find that these exponentially suppressed terms come from considering the m = 0 limit of the instanton contributions to the N = 2 * partition function. 5 At order N −1/2 in the large-N expansion, the constraint (1.1) is no longer enough to fully determine the CFT correlation function, but we do find that the integrated correlator (1.1) is proportional to the E( 5 2 , τ,τ ) modular invariant that appears in the flat space limit of the IIB amplitude. Combining the integrated constraint with the flat space string theory answer, we obtain the complete CFT correlator up to order N −1/2 . While the E( 3 2 , τ,τ ) modular invariant appearing at order N 1/2 multiplies a single term of schematic form R 4 in the effective field theory in AdS, the E( 5 2 , τ,τ ) invariant multiplies a linear combination of the D 4 R 4 interaction and R 4 /L 4 , where L is the radius of AdS.
At higher orders in the 1/N expansion we do not have sufficient information to determine the four-point correlator in N = 4 SYM. We will nevertheless argue, more conjecturally, that any given order in the large-N expansion of the integrated correlation function (1.1) is a finite linear sum of non-holomorphic Eisenstein series with rational coefficients.

Outline
The rest of this paper is organized as follows. In Section 2 we give a brief review of the 4-point function of the stress tensor superconformal primary operator in N = 4 SYM, its relation to the string theory scattering amplitude, and the supersymmetric localization constraint coming from the mixed derivative (1.1). Section 3 describes the main technical achievement of this paper, which is the evaluation of the instanton contributions to the integrated correlation function. These contributions are associated with factors of the Nekrasov partition function 4 It is worth noting that the fact that the four-point functions of operators in the stress tensor multiplet of N = 4 SYM, and in particular the quantity (1.1), are SL(2, Z) modular invariants is no surprise. Indeed, the operators belonging to the stress tensor multiplet transform with well-defined holomorphic and antiholomorphic modular weights (w, −w) under the action of SL(2, Z) [61,63,64]. The bottom component of the multiplet we consider has the weight w = 0, therefore the corresponding correlators are SL(2, Z) invariant. 5 While the strict m = 0 limit of the N = 2 * instanton partition function is trivial, the subleading m 2 order is not and contributes to (1.1). [40,41] that enter into the localization result for the mass-deformed S 4 partition function and contribute to the N = 4 integrated correlator described by (1.1). We will determine the k-instanton contributions to this quantity at orders N 1 2 and N − 1 2 , and show that they match, respectively, the kth Fourier modes with respect to θ of E( 3 2 , τ,τ ) and E( 5 2 , τ,τ ). The analysis of terms that are higher order in 1/N in (1.1) is developed further in Section 4. We end with a discussion of our results in Section 5. Several technical details are relegated to the Appendices.
2 Four-point function in N = 4 SYM Let us start with a brief review of the setup of the four-point function of the stress tensor superconformal primary operator in SYM, its relation to the 10d IIB flat space graviton S-matrix and constraints from supersymmetric localization. For more details, see Ref. [33].
This operator transforms in the 20 of the SO(6) R R-symmetry, and it can be represented as a traceless symmetric tensor S IJ ( x) with I, J = 1, . . . , 6 as SO(6) R fundamental indices.
In order to avoid a proliferation of indices, it is customary to contract them with null polarization vectors Y I satisfying Y ·Y ≡ 6 I=1 Y I Y I = 0. Superconformal symmetry implies that the four-point function of the operator S( x, Y ) ≡ S IJ ( x)Y I Y J takes the form [65,66] S( x 1 , Y 1 ) · · · S( x 4 , Y 4 ) = 1 Here, c is the conformal anomaly coefficient, which for an SU (N ) gauge group equals c = (N 2 − 1)/4; the quantities U ≡ are the usual conformal invariant crossratios; and Y ij ≡ Y i · Y j are SO(6) R invariants. Importantly, the only non-trivial information in the correlator (2.1) is encoded in a single function of the conformal cross-ratios, T (U, V ).
More generally, to describe the holographic correlators, it is simpler to work in Mellin space [67,68]. Let us thus define the Mellin transform M of T via where u ≡ 4−s−t. Crossing symmetry M(s, t) = M(t, s) = M(s, u), as well as the analytic properties of the Mellin amplitude (for a detailed description see [33]), restrict M(s, t) to have the following 1/c expansion at fixed Yang-Mills coupling: where the coefficients α, β, γ i , etc. are potentially non-trivial functions of (τ,τ ). Here, M 1-loop is the regularized supergravity one-loop amplitude that can be found in [34] and will not be discussed here; in this work, we will instead focus mostly on the 1/c 7/4 and 1/c 9/4 terms, corresponding, respectively, to the R 4 and D 4 R 4 interaction vertices. As explained in [33], the constant α can be found as follows. The free theory contribution in (2.1), when expanded in conformal blocks, contains twist two operators of all spins. In the interacting SU (N ) gauge theory, however, one expects no operators of twist precisely two, except for those operators belonging to the stress tensor multiplet. Thus, the conformal block decomposition of the second term in (2.1) must cancel, in part, that of the first term. A careful analysis shows that this requirement implies [33,69] Note that this argument relies crucially on the gauge group being SU (N ), and not U (N ). A U (N ) gauge theory contains a free U (1) sector, and the S × S OPE contains many operators of twist two beyond those in the stress tensor multiplet.
At each order in 1/c, one can impose constraints on the coefficients β, γ i , etc. by either comparing with the (super)graviton four-point scattering amplitude in type IIB string theory in the flat space limit or using the quantity (1.1) (or other similar quantities) derived from supersymmetric localization. Let us first discuss the constraints from the flat space scattering amplitude, and then those from supersymmetric localization.

Constraints from the flat space limit
The IIB four-point scattering amplitude of 10d gravitons and superpartners are restricted by supersymmetry to be proportional to a single function f (s, t) where A SG tree is the tree-level four-point supergravity amplitude, 6 s and t are the Mandelstam invariants. We will also define u ≡ −s − t. In turn, this function has an expansion at small momentum (more correctly, the expansion is for small values of the dimensionless product between momentum and the string length s ) of the form Here, the coefficient function that appears at each order in the expansion may be a nontrivial function of the complexified string coupling τ s = χ s + i/g s . In fact, the functions f R 4 and f D 4 R 4 can be written in terms of non-holomorphic Eisenstein series as [54][55][56][57] (2.8) The Eisenstein series has the following expansion at small g s (see Appendix A for details) where the divisor sum σ p (k) is defined as σ p (k) = d>0,d|k d p , and K r− 1 2 is the Bessel function of second kind of index r − 1/2.
The relation between the function f (s, t) in (2.6) and the Mellin amplitude (2.4) is given by the flat space limit formula [33] f (s, t) = stu 2 11 π 2 g 2

10)
6 This is given by δ 16 (Q) stu in the superamplitude notation where Q denotes the 16-component supermomentum variable. See, for instance, [58,70]. In particular, the component corresponding to the four-where κ > 0. 7 This relation, as well as the AdS/CFT dictionary (2.12)

Constraints from supersymmetric localization
As explained in [33], supersymmetric localization also imposes constraints on the coefficients of the expansion in (2.4). While there are several possible supersymmetric localization constraints, the one studied in [33] came from the mixed derivative ∂ 4 log Z ∂τ ∂τ ∂m 2 m=0 of the N = 2 * theory on a round S 4 .
In a large c expansion, a careful analysis of the integrated constraints that follow from as well as the ansatz (2.4) gives [33] where C 1-loop is a constant that depends on the precise form of the M 1-loop amplitude that we will not study here. We have normalized the integrated four-point correlators by (2.14) In the following section, using the matrix model for the N = 2 * partition function derived by Pestun [37], we will show that, up to an additive ambiguity that is a sum of holomorphic and anti-holomorphic terms in the complexified coupling, What remains to be done is to derive Eq. (2.15), which we carry out in the following section.
3 Eisenstein series from the mass-deformed S 4 partition function

Setup
As shown in [37], up to an overall normalization constant, the mass-deformed partition function of the N = 4 SU (N ) SYM theory (preserving an N = 2 subalgebra) is where a ij ≡ a i − a j , the integration is over N real variables a i , i = 1, . . . , N , subject to the constraint i a i = 0, H(z) is the product of two Barnes G-functions, and Z inst is the contribution from instantons localized at the poles of S 4 that we will come to shortly. The constrained integral over the a i can be implemented, for instance, by an integral over N unconstrained a i 's with a δ( i a i ) insertion. Note that the normalization constant that was dropped from (3.1) depends on the radius of the sphere, as required by the existence of a conformal anomaly, but it is independent of the coupling (τ,τ ) and mass m. Consequently, it will drop out of the ratio (2.13) that is used to related the sphere partition function to the four-point correlator of the 20 operator at separated points.
Evaluating Z(m, τ,τ ) for all m seems to be a complicated task, and we will not pursue it here in full generality. Instead, what we need is to evaluate the second mass derivative at zero mass, ∂ 2 m Z(m, τ,τ ) m=0 . Let us write the instanton partition function as inst (m, a ij ) representing the contribution of the k-instanton sector and normalized such that Z (0) inst (m, a ij ) = 1. Notably, Z inst (0, τ, a ij ) = 1 [37] so the instantons do not contribute to the sphere partition function at the conformal point. Then one can argue that where the expectation value is defined to be in the Hermitian matrix model at m = 0. 8 The perturbative terms ∂ 2 m log Z pert m=0 were shown in [34] to take the form 9 + . . . , (3.5) where further perturbative terms will be discussed in Section 4. These perturbative terms match those of the expected Eisenstein series in (2.15) as defined in (2.9). To similarly match 8 We could consider the expectation values in (3.3) in either the SU (N ) or the U (N ) theories, whose partition functions are Indeed, for any function F that depends only on the differences a ij one can show that Thus, even in the presence of an insertion depending only on a ij (as is the case in (3.3)), the partition functions for the SU (N ) and U (N ) theories differ by a multiplicative constant that is independent of the operator being inserted. It follows that normalized expectation values are equal in the SU (N ) and U (N ) theories. 9 These expressions were given in Eq. (3.1) and (3.20) of [34] in the strong coupling expansion, which we can simply convert to the very strong coupling expansion by replacing λ → g 2 YM N .
the instanton terms, we need to show that at large N we have As a warm-up, let us start with the one-instanton case k = 1, and then continue with the case of multiple instantons.
Taking two derivatives with respect to the mass and evaluating the result at m = 0 gives . (3.8) The quantity I 1 can be written in terms of a contour integral as where the integration contour is a counter-clockwise contour surrounding the poles at z = a j + i. Note that the subtraction of 1 from the integrand does not contribute to the final result, but it does make the integrand decay as 1/z 2 at |z| → ∞. Thus, the integration contour in (3.9) can be taken to be the real line.
We need to evaluate the expectation value of I 1 in the Hermitian matrix model (3.4). In the saddle point approximation, where at leading order in 1/N correlation functions factorize, we can approximately write the expectation value of the second expression in (3.9) as , and rescale the a j 's similarly: It is a standard result on Hermitian matrix models that, at leading order in large N , the b j become dense and their density is described by the Wigner semicircle law as Making the replacement j (· · · ) → N db (· · · ), we can then write (3.10) approximately as The leading contribution to the integral is given by taking N → ∞ in the integrand. In this Performing the b integral: we obtain where θ(x) is the Heaviside theta function.
The integrand is an even function of x, so we can just integrate from x = 0 to x = ∞ and multiply the answer by a factor of 2. On this interval, we can further change variables and obtain (3.17) We can now use the integral representation of the Bessel K 1 function, to finally write the √ N term in the large-N expansion of I 1 as Combining with (3.8), this expression implies in agreement with the expansion of the Eisenstein series-See Eq. (3.6) in the case k = 1.
Obtaining the term that scales as 1/ √ N in (3.6) is not any harder, because this term is suppressed only by a factor of 1/N relative to the term we just computed, while the corrections to the approximations made in writing (3.10) and (3.12) are suppressed by 1/N 2 .
Thus, the next term in (3.19) can be obtained by simply expanding (3.12) to one more order in 1/N so that and evaluating the effect of the second term in (3.21) in the same way as above. Plugging (3.21) into (3.12), we obtain Performing the b integrals and the x → t substitution (3.16), we find Adding this expression to (3.19) and using the definition of I 1 in (3.8), we conclude that which is again in agreement with the expectation (3.6).
Note that in order to go beyond the first two orders in the 1/N expansion, one would have to take into account the 1/N 2 corrections to the saddle point evaluation of expectation values, which we will do later in Section 4.

The k > 1 instanton sector
We now consider the instanton sector with general k > 1. Recall the instanton partition inst (m, a ij ) may be further expressed 10 as a sum of contour integrals around poles indicated by a vector where the Young diagrams Y i in Y are such that the total number of boxes is k, namely . . , and the transpose Young diagram Y T has columns λ T 1 ≥ λ T 2 ≥ . . . . We will often write them compactly as

26)
10 Note that the instanton partition function for N = 2 * SYM was originally obtained for U (N ) gauge group [40,41]. Later in [72,73], the SU (N ) instanton partition function was obtained by factorizing out U (1) contributions Z U (1) motivated by the AGT correspondence [72]. Since Z U (1) is holomorphic in τ and independent of the a i , this is absorbed into the the holomorphic (anti-holomorphic) ambiguity of (2.15), and does not affect the physical four-point-functions. For this reason, we will simply use the results for U (N ) instanton partition functions in the following. We thank Yuji Tachikawa for pointing out the reference [73]. 11 Our notation here is related, for instance, to equation (3) of [74] by sending φ I → −iφ I , a j → −ia j and where φ IJ ≡ φ I − φ J and ± ≡ 1 ± 2 , with 1,2 being the squashing parameters of S 4 . (We have 1 = 2 = 1 for a round sphere.) The multi-dimensional integration contour is determined by the Jeffrey-Kirwan (JK) prescription [75], which selects the poles to encircle based on the choice of an auxiliary vector η in R k . 12 As mentioned above, the set of relevant poles are in one-to-one correspondence with the vector of Young diagrams Y = (Y 1 , Y 2 , . . . , Y N ).
Each box in the jth Young diagram Y j is labelled by its position (α, β) for positive integers α and β denoting the column and the row, respectively, as measured from the bottom left corner (see Figure 1). 13 The integral then consists of oriented contours surrounding the poles For the case of the round S 4 , which is relevant for our consideration, we set 1 = 2 = 1.
The k-instanton partition function then reduces to , (3.28) and the contours now are surrounding Note some of the simple poles in (3.27) for certain Y are degenerate when 1 = 2 , giving rise to higher order poles in (3.28). For certain purposes, (3.26) is more convenient to use because it contains only simple poles when 1 = 2 , and one can set 1 = 2 = 1 after evaluating the residues.
In the case of k = 1, (3.28) reduces to Eq. (3.7) that we studied in the previous sections.
In this section, we are interested in the cases with k > 1. For instance, for k = 2, the relevant poles are enumerated by the Young diagrams N -tuples with a total of two boxes: and the 2-instanton partition function is then obtained by the sum (3.28) over the contributions from each vector of Young diagrams shown in (3.30). In general, such a k-instanton partition function is very complicated and difficult to analyze, especially when k is large.
However, as we will show below, at the order m 2 in a small m expansion (relevant for the four-point correlation function we consider), the k-instanton partition function simplifies greatly even for arbitrary k.

Dominance of rectangular Young diagrams
In this section, we will study the instanton partition function on S 4 in the small-mass expansion. It turns out to be convenient to keep 1 and 2 general at first and send 1 → 2 at the end. 14 We will argue that the k-instanton partition function at order m 2 , i.e.
where Y p×q denotes the rectangular Young diagram with p rows and q columns. See Appendix B and Theorem B.1 for the complete proof. In the following we present a brief summary of the proof.
Given a Young diagram Y = [λ 1 , . . . , λ l ], the partial transposition at horizontal position α, denoted by PT α , is a local operation on Y that replaces the rightmost block (subdiagram) P = [λ α , λ α+1 , . . . , λ l ] by its transpose, while leaving the rest of the Young diagram unchanged, provided that the new diagram is still a Young diagram. Otherwise, the particular partial transposition PT α is defined to act trivially on Y (for a more detailed discussion, see Appendix B). In particular PT 1 generates the usual transposition of Young diagrams.
In general Y can have many different partners from partial transpositions. For instance, for 14 The same final result can be reached if we instead used (3.28) where 1 = 2 from the beginning. k = 4, we have two cases of vectors Y of the type (3.32): The other Young diagram vectors Y that contribute are either of the form and related by a single transposition to the first case in (3.33), or are of the form The two Y 's in (3.35) are related to each other by a transposition, and the second one is related to the second diagram in (3.33) via a single partial transposition involving the orange block.
Note that if we set 1 = 2 = 1 from the beginning (i.e. starting with (3.28)), the Y 's and their partners related by transpositions and partial transpositions degenerate in the sense that they give exactly the same poles in the integrand of the contour integral (see It is useful to introduce the arm-length and leg-length of a box at position (α, β) (see Figure 1): where λ T β is as defined below (3.25). In other words, h(α, β) and v(α, β) measure the number of boxes from the given box (α, β) to the right edge and the top edge, respectively, of the Young diagram.
As we show in Appendix B, in general the contribution of a given Y to the second mass-derivative of the instanton partition function behaves as Figure 1: An example of Young diagram Y , the coordinates (α, β) for box s and the corresponding arm-length h(s) and leg-length v(s).
It is straightforward to see that all rectangular Young diagrams Y p×q have Furthermore, all other types of Young diagram N -tuples either have N i=1 µ(Y i ) > 2, and therefore their contributions manifestly vanish, or N i=1 µ(Y i ) = 2, but for such a Y vector (where every non-empty Y i is not related to a rectangular diagram by partial transpositions) there is always a cancellation between Y and a partner related by a certain involution that arises from a sequence of partial positions (see Appendix B). 16 For instance, consider the following two Y 's that are related by partial transposition to each another (transposing the 15 One may worry about the divergence arising from I k,(...,Yp×q,... ) ∼ ( − ) −1 for p = q, but the divergence is canceled out in the sum I k,(...,Yp×q,... ) + I k,(...,Yq×p,... ) and leads to a finite result. 16 Such a cancellation does not happen for (3.32) with a square Young diagram Y p×p , which also have µ = 2 as shown in (3.39). orange block): (3.40) The above diagrams in Y 's are not related to any rectangular Young diagrams by partial transpositions, and each has µ = 2 (therefore naively would lead to a finite contribution to the instanton partition function). But the finite contributions from these two Y 's in fact cancel out. This is a very general phenomenon. Again, we refer a general proof of all these statements to the Appendix B.

Instanton partition function at order m 2
After showing the dominance of the rectangular Young diagrams for the leading m 2 order of the instanton partition function, let us now compute the residues necessary for evaluating (3.26) or (3.28). As mentioned above, while in general the contribution of any given Young diagram to the k-instanton partition function is rather complicated, the quadratic term in the small mass expansion will end up being quite simple.
For instance, the instanton partition function for k = 2 can be computed either from (3.26), in which case we have non-trivial contributions only from Alternatively, we can also use (3.28), in which case the above two Y 's give the same set of poles. For the case of k = 2, for each Young diagram, there are two ways of distributing φ I to the boxes of the Young diagram, which lead to two contributions to the partition function.
For instance, using the formula (3.26), the two residues that we should evaluate for the first Young diagram in (3.41) are given by For the second Young diagram (which is the conjugate of the first one), the residues are computed in the same way, but with 1 ↔ 2 . Furthermore, R 1 and R 2 give the same contributions since they simply exchange φ 1 ↔ φ 2 . It is thus convenient to only evaluate the residue Res φ 2 =φ 1 +i 1 in R 1 , and Res φ 1 =φ 2 +i 1 in R 2 , and leave the remaining variable (namely φ 1 in the first case, and φ 2 in the second case) unintegrated. (The variable that remains is the one corresponding to the box of the Young diagram that sits in the bottom left corner.) Computing the residues explicitly and summing up the contributions from both Young diagrams, we obtain In general, for the case of k instantons, there are k! ways of assigning φ I 's to a given Young diagram, 17 and we will integrate out all the (k − 1) φ I 's, but again leave the one that is assigned to the bottom left corner box unintegrated (just as the k = 2 case), and denote it by z. The contour for the remaining z-integration is then a counter-clockwise contour surrounding the poles at z = a j + i, with j = 1, 2, . . . , N .
Let us consider another example before presenting a general k-instanton formula. For instance, the 4-instanton partition function, for which there are two types of Young diagrams that contribute. The first type is given by 44) and the second kind is Computing each contribution explicitly, we again find very compact results with similar structures as those of (3.43) from the two-instanton case. For the Young diagrams in (3.44), we find with k a = {0, 1, 2, 3}, while for (3.45) we obtain with k a = {0, 1, 1, 2}.
The above simple structures present in the examples we have studied generalize. Indeed, we find that the contribution to the k-instanton partition function coming from Young diagram vectors of the form (3.32) as well as its partial transpositions is given by where the integration contour of the left-over z is a counter-clockwise contour surrounding the poles at z = a j + i (with j = 1, 2, . . . , N ). The k a 's (k of them) are read off from the vector of Young diagrams Y as in (3.32), and they are given by k a = {0, 1, · · · , p − 1; 1, 2, · · · , p; · · · ; q − 1, q, · · · , p + q − 2} . Finally, the function f (p, q) is This function is symmetric in p ↔ q and vanishes at p = q. The formula (3.48), which is one of our main results, was obtained by studying the pattern of many non-trivial examples. We will study its large-N expansion in the next section. For the special case where the non-trivial rectangular Young diagram in Y is Y 1×k , we provide a proof in Appendix C using a recursion relation satisfied by the instanton partition function [77,78] arises only from the second line of (3.28), whereas the term depending on z and a j involves expanding the first line of (3.28) while taking the residues around higher order poles.
Finally, we remark that given the structure of the k-instanton contribution to the nonholomorphic Eisenstein series, especially the appearance of the divisor sum (see (2.9)), it is not surprising that the relevant Y 's are only the rectangular ones. As we will show, each Young diagram Y p×q contributes a term in the divisor sum for a non-holomorphic Eisenstein series (proportional to p 1−2r + q 1−2r for E(r, τ,τ )). 18

Large-N expansion
We will now compute the expectation value of I p×q in the Hermitian matrix model (3.4), in the large-N expansion. The computation is similar to that in the one-instanton case presented in Section 3.2, so we will therefore be brief here. In the large-N limit, we have with k a given in (3.49). In the above, we have approximated the sums as integrals, and we have deformed the contour by subtracting an appropriate constant from the integrand.
We find that the leading term in the 1/N expansion is given by (3.53) After a change of integration variable identical to (3.16), it is straightforward to show that I p×q √ N can be expressed in terms of a Bessel K 1 function: .
where we have used pq=k, 0<p≤q (3.56) The N 0 order term vanishes due to the fact that integrand is odd in x. Then, at the next order we have a 1/ √ N term that takes the following form where c 1 , c 2 are given by with k a given in (3.49). Again, by a change of integration variable, the integral of I p×q 1/ √ N reduces to a standard Bessel K 2 function, . (3.59) Again, taking into account all the relevant contributions from rectangular Y 's with k boxes, the prefactor in the above formula becomes the divisor sum σ −4 (k), namely, (3.60) Combining (3.59) with the result of the leading large-N term in (3.55), we obtain . (3.61) Therefore we have proven (3.6). In the next section, we will study the higher order terms in the 1/N expansion, and show that in fact they are also given by non-holomorphic Eisenstein series.
4 Eisenstein series at higher orders in 1/N In this section we will provide additional evidence that the coefficients in the large-N expansion of ∂ 2 m log Z m=0 , which was derived in the previous sections to the first couple of orders in 1/N in terms of the Eisenstein series shown in (2.15), takes the form of Eisenstein series to all orders in 1/N . In particular, we propose that, through order  We can then take derivatives in τ andτ to obtain the SL(2, Z) invariant quantity The first piece of evidence for (4.1) comes from considering the terms that are perturbative in 1/λ = 1/(g 2 YM N ), which as discussed above were computed in [34] and take the form 19  This match motivates the conjecture that the finite g YM expression for ∂ 2 m log Z m=0 can be derived to any order in 1/N by computing the perturbative terms as described in [34], and then simply replacing those by their Eisenstein completions using (2.9).
Further evidence for (4.1) comes from considering the instanton terms ∂ 2 m log Z inst m=0 , which are written as expectation values of sums and products of eigenvalues. In the previous sections, these quantities were computed using the saddle-point expansion, which is valid to leading order in 1/N 2 (including the subleading in 1/N term). Subleading corrections in 1/N 2 can be computed using topological recursion [79,80]. This method naturally applies 19 The expression here includes a further order in 1/N and several more orders in 1/λ = 1/(g 2 YM N ) relative to Eqs. (3.1) and (3.20) of [34], which can be easily computed using the same methods explained in that work.
to the resolvent W (y 1 , . . . , y n ), which is defined as the connected expectation value W n (y 1 , . . . , y n ) ≡ N n−2 , (4.4) with the 1/N 2 expansion W n (y 1 , . . . , y n ) ≡ ∞ m=0 1 N 2m W n m (y 1 , . . . , y n ) . (4.5) The coefficients W n m can be computed for finite λ for any n, m in a Gaussian matrix model using a recursion formula in n, m, starting with the base case W 1 0 , as reviewed for the Gaussian U (N ) SYM matrix model in [34]. (See Footnote 8.) Topological recursion can then be applied to any expectation value that can be written in terms of the resolvents W n (y 1 , . . . , y n ), for instance by taking derivatives or integrals in terms of y i . Unfortunately, the operators that appear in the instanton terms (3.48) are written as products over a restricted set of eigenvalues a i , which cannot be easily related to W n (y 1 , . . . , y n ). However, by expanding these products for small a i , which is equivalent to a small g YM expansion, they can be expressed as an infinite sum of polynomials in a i , whose expectation values can then be easily related to W n (y 1 , . . . , y n ).
Let us begin by discussing the one-instanton case. By explicitly performing the sums and products in I 1 (Eq. (3.8)) for many small values of N , we find that I 1 can be expanded for small a i as where we defined the invariants The expectation values of these C p can be related to coefficients of the large y i expansion of n-body resolvents W n (y 1 , . . . , y n ) with n ≤ p. Since the C p are degree p polynomials in a i , they must be proportional to λ p/2 and their 1/N 2 expansion truncates. For instance, for the C p that are shown in (4.6), using the explicit expressions for W n m in Appendix B of [34] we find (4.8) We can then insert these expressions into the expectation value (4.6), set λ = g 2 YM N , and expand in 1/N to get

(4.9)
This is consistent according to (3.8) with the small g YM expansion of

Conclusion
In this paper, we studied the four-point correlator SSSS of the superconformal primary operator S transforming in the 20 of SO(6) R in the N = 4 SYM theory, in the "very strong coupling" limit in which N is sent to infinity at fixed g YM . In this limit, the action of SL(2, Z) modular transformations on the SSSS correlator is manifest. In particular, we studied the constraints on SSSS coming from the flat space limit of the IIB string theory amplitudes, and those coming from the integrated four-point function τ 2 2 ∂ τ ∂τ ∂ 2 m log Z m=0 . The latter can be computed using supersymmetric localization. Starting from Pestun's localization expression [37] for the partition function Z, we argued that when τ 2 2 ∂ τ ∂τ ∂ 2 m log Z m=0 is expanded in 1/N , the first two sub-leading terms (of orders N 1 2 and N − 1 2 , respectively) can be written as the Eisenstein series E( 3 2 , τ,τ ) and E( 5 2 , τ,τ ), respectively. Our argument is not completely rigorous because it relies on studying the k-instanton contribution for many values of k and deducing the general pattern, but we hope that it should be possible to provide a more rigorous argument in future work. Using solely the relation between the integrated SSSS correlator and τ 2 2 ∂ τ ∂τ ∂ 2 m log Z m=0 from [33], we completely determined the N 1 2 term in the large N expansion of SSSS . This term corresponds to an effective −2 s R 4 coupling in AdS 5 , which, in the flat space limit, matches the −2 s R 4 contribution to the Type IIB graviton S-matrix as computed at finite string coupling g s in [53][54][55]. This is a precision test of AdS/CFT at finite g s ! We then used the 2 s D 4 R 4 term in the Type IIB S-matrix, which is also known at finite g s , as well as the N − 1 2 term in τ 2 2 ∂ τ ∂τ ∂ 2 m log Z m=0 to completely determine SSSS at order N − 1 2 . In Mellin space, this expression contains two polynomial terms, both proportional to E( 5 2 , τ,τ ), one corresponding to an 2 s D 4 R 4 contact term in AdS 5 and one to an 2 s L 4 R 4 term. Finally, using a small g YM expansion, we gave non-trivial evidence that each of the terms in the 1/N expansion of τ 2 2 ∂ τ ∂τ ∂ 2 m log Z m=0 is a finite linear combination of non-holomorphic Eisenstein series.
The fact that we can derive the full CFT correlator at order N The result of this large-N ADHM analysis is reproduced in our procedure by the first term in the small-g YM expansion of the Bessel function in the kth Fourier mode of the Eisenstein series E( 3 2 , τ,τ ) (the function F k defined in (A.7)). The fact that the dominant contribution to the Nekrasov partition function in the m → 0 limit has a single cluster of boxes should correspond to properties of the large-N analysis of the ADHM construction. However, this correspondence is difficult to make precise since our analysis is based on taking a limit of the non-conformal N = 2 * whereas conformal invariance is explicit in the large-N ADHM construction. The connection of the D-instanton measure with the SU (k) D-instanton matrix model partition function is also not obvious in our procedure. Nevertheless, the fact that our procedure packages an infinite number of perturbative corrections to the k-instanton contribution into a K-Bessel function is a most significant generalization of [48,81] and an essential requirement of SL(2, Z) covariance.
As shown in Section 4 our integrated constraint τ 2 2 ∂ τ ∂τ ∂ 2 m log Z m=0 has an expansion in half-integer powers of 1/N (apart from the first term). However, it is well known that the low energy expansion of the string amplitude does also contain even powers of 2 s , which lead to integer powers of 1/N , the most relevant one being the 1/8-BPS interaction D 6 R 4 . A more complete analysis of the holographic correspondence should therefore also include terms with integer powers of 1/N . We expect that such terms will appear in other quantities that can be computed using supersymmetric localization, such as ∂ 4 m log Z m=0 or ∂ 2 b ∂ 2 m log Z m=0,b=1 , where b = 1 / 2 is a parameter that defines the squashing deformation of S 4 that appear in (3.26) (recall that up to now 1 = 2 = 1). We expect that determining these three distinct integrated four-point correlation functions should eliminate the ambiguities in determining the expansion of the AdS 5 ×S 5 type IIB string theory amplitudes up to order D 6 R 4 . In other words, this procedure should uniquely determine the BPS protected interactions without the need to input known results from flat-space type IIB superstring theory.
We have so far only considered four-point correlators. It was argued in [63,64]  Such a statement would not hold for (n ≥ 5)-point functions, which transform as modular forms with non-trivial modular weights. 20 These correlation functions should correspond to higher-point superstring amplitudes that violate the U (1) R-symmetry of type IIB supergravity. Such U (1)-violating superstring amplitudes (especially those that violate U (1) maximally) are identified in [60,82], and more importantly the F-terms (terms up to the same number of derivatives as D 6 R 4 ) have also been determined using maximal supersymmetry and SL(2, Z) symmetry [60]. The coefficients of these U (1)-violating interactions are modular forms with non-zero modular weights, and it would be of interest to understand how they arise from the supersymmetric localization computation.
Lastly, it is interesting to compare the calculation presented in this paper to calculations done in the 3d ABJM theory [62] with gauge group U (N ) k × U (N ) −k and N = 6 supersymmetry. An analogous computation that includes non-perturbative contributions can also be performed in that case [32], in the very strong coupling limit in which N is taken to infinity while N/k 5 is held fixed. In this limit, the ABJM theory is dual to type IIA string theory on AdS 4 × CP 3 at finite string coupling g s . However, in this case, all the non-perturbative contributions to the type IIA scattering amplitudes of the lowest closed string states vanish.
Thus, in order to obtain non-trivial non-perturbative contributions to CFT correlators that can be matched to string scattering amplitudes, one is led to consider the case of the 4d N = 4 theory that was studied in the present paper. Nevertheless, it is worth pointing out that the closest 3d analog of the formulas presented in Section 4 that include resummed instanton contributions would be the mass-deformed partition function of ABJM theory that can be computed [83] to all orders in the 1/N expansion using the Fermi gas method developed in [84].
the Aspen Center for Physics (ACP) for hospitality during the initial stages of this work.

A Non-holomorphic Eisenstein series
The first four terms in the low energy expansion of the four-graviton amplitude in the type IIB superstring theory correspond to BPS protected effective interactions that take the form where R signifies the linearised Weyl curvature tensor, which has the form (where [· · · ] denotes anti-symmetrization of the indices), where k µ is a null ten-dimensional momentum and ν σ is a graviton polarization. The symbol R 4 denotes the particular contraction of four curvature tensors that is implied by ten-dimensional N = 2 supersymmetry.
It is straightforward to show that E(r, τ,τ ) is invariant under a SL(2, Z) transformation, for a, b, c, d ∈ Z and ad − bc = 1.
A non-holomorphic Eisenstein series has an expansion in Fourier modes of the form where the zero mode consists of two power behaved terms, and the non-zero modes are proportional to K-Bessel functions, where the divisor sum is defined by for k > 0, and σ −p (k) = k −p σ p (k).
The two power-behaved terms in F 0 (r, τ 2 ) in (A.6) correspond to tree-level and (r− 1 2 )-loop contributions in string perturbation theory. Using the asymptotic behavior of the K-Bessel we see that the non-zero mode F k (r, τ 2 ) behaves as e −2π|k|τ 2 and has the form of a k Dinstanton contribution.
The terms proportional to E( 3 2 , τ,τ ) and E( 5 2 , τ,τ ) in (A.1) are coefficients of R 4 and D 4 R 4 interactions in the type IIB low energy effective action. These are, respectively, 1/2-BPS and 1/4-BPS interactions. The last term in (A.1) corresponds to a term proportional to the 1/8-BPS interaction D 6 R 4 , with a coefficient E(τ,τ ) that satisfies the inhomogeneous Laplace eigenvalue equation [57,58] The solution to this equation [85] is qualitatively different from an Eisenstein series. The zero mode is of the form in the large-τ 2 limit, The power-behaved contributions correspond to string perturbation theory up to genus three.
The symbol O(e −4π|k|τ 2 ) denotes a specific infinite sum of D-instanton-anti D-instanton contributions with zero total instanton number (details of which are in [85]). Similarly, each mode of non-zero mode number k has the form of a sum of D-instanton-anti D-instanton contributions with instanton numbers k 1 and k 2 satisfying k 1 + k 2 = k = 0. We see in the main text that such a (s 3 + t 3 + u 3 ) R 4 contribution does not arise from our analysis of the flat-space limit of τ 2 2 ∂ τ ∂τ ∂ 2 m log Z since its contribution to the integrated correlation function vanishes.
The four terms in the low energy expansion of the four-graviton amplitude explicitly shown in (A.1) correspond to local BPS interactions that are fully determined by supersymmetry, while higher derivative terms are not expected to be protected and have not been fully determined.

B Rectangular dominance
As discussed in the main text, the full Nekrasov partition function for the mass-deformed N = 2 * SU (N ) SYM with squashing parameters 1,2 at instanton number k is given by a sum over N -tuples of Young diagrams Y = (Y 1 , Y 2 , . . . , Y N ) with k boxes in total, where ± ≡ 1 ± 2 as in the main text, and Here, s labels a box (α, β) (α-th column and β-th row) in a given Young diagram as Figure 1, and h i (s) and v i (s) denote the arm-length and leg-length, respectively, of the box s in the diagram Y i . (Each individual Young diagram Y consists of columns of non-increasing heights Then the arm-length h and leg-length v of the box s in Y are given by (see Figure 1) Note that the definitions of h and v extend beyond boxes in Y to the entire quadrant (α, β) ∈ Z 2 + in the obvious way. In particular they can be negative (e.g. when Y is empty).) In the rest of this appendix, we prove the following theorem: and those with Yˆi replaced by its partial transpositions (which we define next) contribute.
Given a Young diagram Y = [λ 1 , λ 2 , . . . , λ l ] (with λ l ≥ 1), we define its partial transposition at position α, with 1 ≤ α ≤ l, to be where P denotes the block (Young subdiagram) to the right of the (α − 1)-st column, and Y \P is the complement Young subdiagram (see examples in Figure 2). In particular, the usual transposition is a partial transposition at α = 1. For notational simplicity, we will often suppress the subscript α when the context is clear. We will need the following useful properties of the map PT α (·). We start with the obvious lemma: Consequently the partial transpositions preserve the set of poles (3.29) in the contour integral for the instanton partition function. A useful corollary that follows is the following: Proof. A single partial transposition at the right-most column gives a Young diagram with width p+q −1. The fact that this is the maximal value that can be achieved by any sequence of partial transpositions follows from the previous lemma by noting that is invariant under partial transpositions, where M is any integer equal to or larger than the maximum width of Young diagrams related to Y by partial transpositions. Note that λ α for α > l is defined to be 0.
Proof. It suffices to prove that ∆ B (Y ) = ∆ B (Y T ), because for any partial transposition with respect to a subdiagram P ⊂ Y , ∆ B (Y \P ) is clearly invariant. The entries in ∆ B (Y ) consists of α + β for boxes (α, β) located on the North-East boundary of the Young diagram.
Then we also define n a (Y ) to be number of coordinates s with d(s) = a.
is invariant under partial transpositions. In fact both n 0 (Y ) and n 1 (Y ) + n −1 (Y ) are separately invariant.
Proof. Under a partial transposition that involves transposing the subdiagram P of Y , the box at (α, β) ∈ P gets mapped to the box at (β, α) ∈ P T . Thus, focusing on boxes in (namely ρ a = λα +a−1 ) with λα −1 ≥ ρ T 1 (so that it's a nontrivial operation). Focusing on the boxes in the α-th column with 1 ≤ α ≤α − 1, their d = h − v values before the partial transposition are given by whereas after the partial transposition, we have Therefore we conclude µ(Y ) is a partial transposition invariant, and clearly so is n 0 (Y ).
In particular, a rectangular Young diagram Y p×q with p columns and q rows has n 0 (Y ) = min(p, q) , n 1 (Y ) = min(p − 1, q) , n −1 (Y ) = min(p, q − 1) , (B.14) which implies that To proceed, let us make two more definitions. We define N b (Y ) ≥ 1 to be the minimal number of rectangular blocks (in a horizontal decomposition) in a Young diagram Y (see In particular µ(Y ) = 1 if and only if Y is related by partial transpositions to a diagram of the type Y p×q , for some p and q.
Proof. Since µ is invariant under partial transposition, we can take Y = Y min . By assumption, the Young diagram Y has a minimum of c rectangular blocks in a horizontal decomposition. Correspondingly, Y has c outward-pointing corners along the North-East boundary.
We obtain a finer decomposition of Y into 1 + 2 + · · · + c = c(c+1) 2 smaller rectangular blocks by drawing perpendicular lines from the c corners in an obvious fashion (see Figure 4). Each of the c blocks at the corners contributes 2 or 1 to µ depending on whether it has equal or non-equal sides. Therefore, it suffices to show that the c(c−1) 2 interior blocks each contribute non-negatively to µ(Y ).
We label these rectangular blocks as (i, j) for 1 ≤ i, j ≤ c and i + j ≤ c + 1, and let their sizes be p j × q i . The corner blocks are given by those with i + j = c + 1. Each interior rectangular block labeled by (i, j) contributes at least −1 to µ(Y ), and this happens precisely when either p j + p j+1 = q i+1 , or q i + q i+1 = p j+1 (see Figure 5). Let's assume this is the case for the interior rectangular block (i, j). But this means N b can be reduced by 1 after a transposition of the subdiagram involving the rectangular blocks (k, l) with k > i, l ≥ j if p j + p j+1 = q i+1 , and similarly from transposing the subdiagram involving the rectangular blocks (k, l) with k ≥ i, l > j if q i + q i+1 = p j+1 . Since such operations are all achievable by a sequence of partial transpositions, this contradicts with the fact that Y = Y min minimizes N b .
Thus each interior rectangular block can only contribute non-negatively to µ(Y ). Therefore µ(Y ) is bounded from below by the contributions of the N b = c corner blocks and the lemma follows. Figure 5: Reduction of N b (or corners) by partial transpositions. Note that the PT in the right diagram involves a sequence of three partial transpositions.
We will also need the following lemma concerning the general properties of terms that appear in the summand of (B.1).
Lemma B.6. For a pair of Young diagrams Y 1 and Y 2 , the following function Proof. We use the following identity from Appendix A.1 of [77], which holds when M is any positive integer larger than the widths and heights of Y 1 and We first take Y to be such that the only nonempty Young diagram is Yˆi = Y . The quantity Z Y defined in (B.1) takes the form Our goal here is to understand properties of Z Y in the limit − → 0 (and + → 2) in relation to the shape of the Young diagram Y .
The quantity F 2 (Y ) is manifestly finite and nonzero (for generic a i ) for all m, ± , while F 1 (Y ) has a subtler behavior in the limit − → 0. To understand how F 1 (Y ) behaves in this limit, we write it as where we have decomposed the product over and F 0 1 , F ± 1 , F r 1 correspond to products over these disjoint subsets, respectively. Note that according to our previous definition, n 0 = |Y 0 | and n ±1 = |Y ± |. More explicitly, we have where, for notational simplicity, we suppressed the s dependence in h(s). For the quantity (B.4) of interest, we need to take the limit − → 0. In this limit, F 0 1 could potentially vanish, F ± 1 could potentially blow up, and F r 1 is finite and nonzero. Expanding in small − (and taking + = 2), we have From lemma B.5, µ(Y ) = 2n 0 − n 1 − n −1 ≥ 1, which implies that there can be at most a simple pole in − from F 1 at order m 2 . For the purpose of extracting the order m 2 term from Z Y , we can take the truncation (which we denote by ) (B.28) We recollect the relevant contributions to F 1 as and F In addition, Similarly, it suffices to set m = 0 for F 2 , where we only kept the terms up to first order in − . Here .

(B.36)
We have thus spelled out which parts of (B.22) and (B.23) matter for evaluating the limit Thus if µ(Y ) > 2, it vanishes identically. We then move onto cases of Y with µ(Y ) = 2, and its contribution to (B.4) is then deduced from to be It is easy to see that the corresponding Y min takes the form as in Figure 6 from stacking 3 rectangular boxes of sizes p 1 × q 1 , p 2 × q 1 , and p 1 × q 2 respectively where the sides satisfy (see proof of Lemma B.5) Under partial transpositions, we will show that the p 2 × q 1 block of Y min gets mapped to a p 1 p 2 q 1 q 2 Figure 6: Y min for a Young diagram Y with µ(Y ) = 2 and not related to a rectangular diagram by partial transpositions.
Young subdiagram of the resulting Young diagram Y . We define W Y to be this particular subdiagram in Y (see Figure 7 for an example). Then we define the Young diagram Y (correspondingly the Young diagram vector Y with a single non-empty Young diagram Y î = Y ) to be obtained from Y by a sequence of partial transpositions that send W Y to its transpose while keeping the rest of the diagram fixed.(It is always possible to find such a sequence of partial transpositions.) 22 We denote this involution by ι Y .
We show below that The existence of such a sequence of partial transpositions is guaranteed for Y of the form in Figure 8 (with the orange block W Y possibly replaced by its partial transpositions). In the later part of the section, we will prove that for Y with µ(Y ) = 2 and not related to rectangular diagrams by partial transpositions, it takes the form as in Figure 8. Here we assume this is the case. Note that µ(Y ) = 2 demands the bottomleft Y t×r sub-diagram (gray block) to contribute −1 to µ(Y ) since each of the three exterior colored blocks along the North-East boarder (orange or gray) contribute at least 1 to µ(Y ). This means we have either λ T 1 − r = λ r or λ T t = λ 1 − t(see proof of Lemma B.5). In the former case, ι Y is achieved by PT r+1 followed by PT r+t+1 ; in the latter case, ι Y = PT 1 · PT r+t+1 · PT t+1 · PT 1 . This is a consequence of the properties of F (0) 1 (Y ) and F (0) 2 (Y ) under partial transpositions: the latter is even whereas the former is odd for the particular type of Y and involution ι Y we consider here.
We first show that F (0) 2 (Y ) is invariant under partial transposition with respect to an arbitrary subdiagram P . We have the decomposition where λ T β does not appear, the first factor in (B.42) from Y \P does not change under P → P T . It is also easy to see that the second factor is also invariant. For (α, β) ∈ P and (β, α) ∈ P T with P = [ρ 1 , . . . , ρ m ], are mapped to each another through Thus we've shown Let's now show that that F  Figure 8 satisfy for 1 ≤ α ≤ r, and this is sufficient to ensure that the left white block in Figure 8 does not contain entries with h − v equal to 0 or ±1. Suppose we perform a partial transposition PT x (Y ) for some 1 < x ≤ r, then the resulting Young diagram is again of the form in Figure 8 with the change in the parameters, and to ensure that the (new) left white block as in Figure 8 does not contain entries with h − v equal to 0 or ±1, we need which follows from (B.50). A similar argument applies to the third case, focusing on the right bottom white block in Figure 8.
Note that r, t are non-negative integers. If t = 0, the only gray block is the one on the top left; if r = 0, the only gray block is the one on the bottom right. In particular Y min in Figure 6 corresponds to the special case r = 0, t = p 1 and p = p 2 , q = q 1 .
We have thus proven that for the special type of Young diagram Y and involution ι Y considered here.
Putting together the facts that F To see this, recall that the summand Z Y of (B.1) in this case decomposes as the following product where each factor only depends on the Young diagram(s) in the subscript (e.g. Z Y 1 only involves the product in (B.1) with i = 1 and j = 2 or j = 1 and i = 2). From the argument we presented for the case Y = {Y, ∅, . . . , ∅}, it's easy to see that Z Y 1 is invariant under 28) and (B.30)). Thus it suffices to prove that in the limit (B.4) which indeed follows from Lemma B.6.
We have thus finished the proof of theorem B.1: the limit (B.4) of double mass derivatives of the Nekrasov partition function (B.1) is dominated by Y with a single non-empty Young diagram of rectangular shape (and its partial transpositions).

C Recursion relations
In this appendix we will give a proof of (3.48) for the special cases in which the nontrivial Young diagram in the vector Y in (3.32) is of the form Y 1×k or Y k×1 . Recall a Young diagram vector Y 's contribution to the k-instanton partition function is given by . (C.1) For the round S 4 , by setting 1 = 2 = 1 we have .

(C.2)
In this appendix, we will first keep 1 , 2 arbitrary, and set them equal to 1 at the end. That is because with general 1 , 2 , the instanton partition function only has simple poles and obeys a simple recursion relation [77,78]. Indeed, it is straightforward to see that Z k, Y (m, a ij , i ) satisfies the following recursion relation Z k+1, Y + (m, a ij , i ) = Z k, Y (m, a ij , i ) 1 k + 1 where Y + (with k + 1 boxes) is a Young diagram vector that is obtained from Y by adding one more box. The contour for the integration over φ is determined by the position of the box that we add to Y during the recursion procedure, andφ J are the poles for evaluating Z k, Y (m, a ij , i ), which are determined by Y according to (3.27).
We will here consider the contributions from Young diagram vector Y as in (3.32) where the nontrivial Young diagram is a single row Young diagram Y 1×k or a single column Young diagram Y k×1 . These Young diagrams can be constructed recursively by adding one box at a time, and one can solve the recursion relation straightforwardly for the instanton partition function at order m 2 .
Here, we only keep the terms of order O(( 1 − 2 ) 0 ). The contribution from the Y with nontrivial Young diagram Y k×1 is obtained by the transposition, which simply exchanges 1 with 2 , The singular terms cancel in the sum of these two contributions, and the final result is given by I 1×k = [I 1×k ( 1 , 2 ) + I k×1 ( 1 , 2 )] , where we have set 1 = 2 = 1 at the end of computation. Having determined the N = 1 case, the generalization of the above integrand to arbitrary N is straightforward. This can be seen by using (C.2) as follows.
If I 1×k were computed from (C.2), we would take consecutive residues surrounding the poles at φ 2 IJ + 1 = 0. Since they are higher order poles, one needs to expand the integrand when taking the residues. The constant term 4 1+δ 1k 1 + 1 k 2 of I 1×k clearly comes from the second line in (C.2), which is independent of N , whereas the term containing z − a 1 requires expanding the first line in (C.2), and its generalization to general N is obvious. Therefore, for general N we have 2i(k + 1)(k − 1) 2 k(z − a j + ki)(z − a j + (k − 1)i)(z − a j ) , (C.9) and the expression (after summing over k! identical contributions) agrees with the general formula given in (3.48).

D Instantons at higher order in 1/N
In this appendix we compute the instanton contributions to ∂ 2 m log Z m=0 to O(N − 9 2 ) in a small g YM expansion to subleading order for instantons k = 2, . . . 12. The result matches the conjecture in (4.1) for ∂ 2 m log Z m=0 at finite g YM in terms of Eisenstein series, which generalizes the match found for the perturbative terms and the k = 1 instanton in the main text. By explicitly performing the sums and products in I p×q (3.48) for many small values of N , we find that I p×q can be expanded for small a i as I p×q (N, a ij ) = I (0) p×q (N ) + I (2) p×q (N )C 2 (a ij ) + · · · , (D. 1) where recall that C 2 is defined in (4.7). When p = q, which includes the one-instanton case (4.6), we found closed form expressions for I (0) p×q (N ) and I (2) p×q (N ), but for p = q we could only find recursion relations in N . In either case, these formulae can be expanded explicitly at large N . For instance, for k = 2 we find the recursion relations:

(D.4)
The cases k = 3, . . . , 12 are increasingly more complicated so we put them in an attached Mathematica file.
We can then use the expectation value C 2 in (4.8) with λ = g 2 YM N to compute I p×q to O(N − 9 2   which describes the p × q instanton terms in (4.1). This is a very nontrivial check of the conjectured finite g YM expression for ∂ 2 m log Z m=0 .