Diquark properties from full QCD lattice simulations

We study diquarks on the lattice in the background of a static quark, in a gauge-invariant formalism with quark masses down to almost physical $m_\pi$. We determine mass differences between diquark channels as well as diquark-quark mass differences. The lightest and next-to-lightest diquarks have"good"scalar, $\bar{3}_F$, $\bar{3}_c$, $J^P=0^+$, and"bad"axial vector, $6_F$, $\bar{3}_c$, $J^P=1^+$, quantum numbers, and a bad-good mass difference for $ud$ flavors, $198(4)~\rm{MeV}$, in excellent agreement with phenomenological determinations. Quark-quark attraction is found only in the"good"diquark channel. We extract a corresponding diquark size of $\sim 0.6~\rm{fm}$ and perform a first exploration of the"good"diquark shape, which is shown to be spherical. Our results provide quantitative support for modeling the low-lying baryon spectrum using good light diquark effective degrees of freedom.

mass is gauge-invariant. Since the mass of such a static-light-light baryon diverges in the continuum limit, the quantities of interest are mass differences between various diquark channels. The diquark size can also be obtained in a gauge-invariant way, from the spatial decay rate of the quark density-density correlator at fixed time [21][22][23][24][25][26].
We adopt the second, gauge-invariant approach of [22,23,26]. Measurements are taken on dynamical n f = 2 + 1, 32 3 × 64, clover-improved Wilson fermion gauge configurations with lattice spacing a ≈ 0.090 fm generated by PACS-CS [27,28] and publicly available from the JLDG repository [29]. Five ensembles, with pion masses m π = 164, 299, 415, 575, 707 MeV, are considered, allowing us to study the dynamical light-quark mass dependence of diquark properties and perform a short, controlled extrapolation to physical m π . We re-use the (gauge-fixed, wall-source) quark propagators from [30,31]. To connect with previous quenched studies, we also employ a new 32 3 × 64, a ≈ 0.092 fm, quenched ensemble with valence pion mass m π = 909 MeV. Static quark propagators are computed using the method of [32,33] with HYP1 smearing. See Appendix B for further details.

Diquark spectroscopy
We first quantify the expected reduction in the "good" diquark mass by studying the staticlight-light baryon spectrum. With Q the static quark, c, C denoting charge conjugation, and light quarks in a D Γ = q c CΓq diquark configuration, where Γ acts in Dirac space, we measure the baryon correlators (2.1) Γ = γ 5 , γ 5 γ 0 for "good", 0 + diquarks, γ i for "bad", 1 + diquarks, and 1 and γ 5 γ i , for the "not-even-bad", odd-parity 0 − and 1 − diquarks. We also measure the correlators of static-light meson operators [QΓq]. The static quark (m Q → ∞) acts as a spectator; its mass cancels in mass differences, exposing the diquark spectrum. We consider diquarks with light-light (ud), light-strange ( s, = u, d) and strangestrange (ss ) flavors on 5 ensembles with different light-quark, hence pion, masses. Note in particular that s denotes a hypothetical additional strange valence quark. It is introduced to allow a study of a good diquark with both quarks having the same (strange) quark mass, which the good diquark flavor antisymmetry makes inaccessible to two identical s quarks. Technical aspects of the analysis are summarized in App. C. Figure 1 (top panel) shows the dependence on m π of the ud 0 + versus 1 + , 0 − and 1 − diquark mass differences. The results provide quantitative support for the phenomenological diquark approach, which considers only good 0 + diquarks. Explicitly, the good 0 + ud diquark lies lowest in the spectrum, 100-200 MeV below the bad 1 + ud diquark. The negative-parity 0 − and 1 − ud diquarks lie even higher, ∼ 0.5 GeV above the good diquark and will thus play no role in the low-energy physics. The same pattern is observed in the s and ss sectors. Figure 1 (middle panel) compares the ud, s and ss (1 + − 0 + ) splittings. The curves in Fig. 1 are fits using Ansätze guided by limiting cases. Explicitly, the (1 + − 0 + ) mass difference goes to a constant in the chiral limit and decreases as ud-u ls-s ls-l pheno. ud-u 1/(m q 1 m q 2 ), with m π ∼ (m q 1 + m q 2 ), in the heavy-quark limit. The simplest interpolation between these limits is the two-parameter form with n = 0, 1, 2 for q 1 q 2 = ss , s and ud, respectively. Here, A fixes the chiral limit behavior, while B separates the light-and heavy-quark regimes. The fits clearly describe the data very well. A similar Ansatz proposed in [22], with n twice as large, produces a much poorer fit. Note that the ud, s and ss curves all intersect at the flavor-symmetric n f = 3 point, m u,d → m s . The parameter values and physical-point mass differences are listed in the top half of with fits to the Ansatz, where C fixes the chiral limit value and D separates the light-and heavy-q 1 quark regimes. In the latter, the mass splitting must grow linearly with the mass m q 1 , which dictates n = 1 if q 1 is a heavy quark, n = 2 otherwise. The bottom half of Tab. 1 lists the fit parameter values and resulting extrapolated physical-point mass differences. The excellent agreement with phenomenological expectations of our results for all of the δ(1 A, provides strong evidence that we have successfully identified the ground-state heavy baryon signals and that, as expected, residual discretization effects are small. This justifies investigating the structure of the diquark correlations in those baryon ground states using fixed-time density-density correlators, described in more detail below. Appendix A also provides a brief outline of other approaches that have been used to estimate the goodbad diquark splittings.
An additional interesting relation between the bad-good diquark and ∆ − N splittings is discussed in App. C.

Diquark structure
Having successfully identified the relevant ground-state baryon signals, we now turn to an investigation of the light diquark structures in those states. To do so, we compute the fixed-time density-density correlators: where ρ( x, t) =q( x, t)γ 0 q( x, t) and O Γ are the baryon operators used before. Γ characterizes the diquark channel. With the static quark at the origin, the light-quark source and sink points are ( 0, t src ) and ( 0, t snk ). The currents are inserted at t m = (t snk + t src )/2 to maximize projection onto the ground state. The relative positions of the static source and current insertions x 1 , x 2 , can be characterized by r qq = x 2 − x 1 , S = ( x 1 + x 2 )/2, the separation between the static source and diquark midpoint, and φ, the angle between r qq and S, as shown in Fig. 2. We define dropping the label Γ when this produces no confusion. For fixed S and r qq , the distance from the static source to the closer of the two insertion points is minimized (maximized) for φ = π (π/2). If the proximity of a static source disrupts the diquark correlation in a given channel, this disruption will thus be largest for φ = π and smallest for φ = π/2. We therefore focus our attention on ρ 2 for these two cases. When φ = π/2, | x 1 | = | x 2 | ≡ R, and we may instead characterize the relative positions using R and the angle Θ between x 1 and x 2 . We define ρ ⊥ 2 (R, θ) ≡ ρ 2 (r qq , S, π/2) and ρ 2 (r qq , S) ≡ ρ 2 (r qq , S, π). Our calculations average over all spatial translations.
The impact of the light-quark interactions on the spatial correlation between light quarks for different Γ is displayed in Fig. 3 (left), which shows the density-density correlations ρ ⊥ 2 (R, Θ, Γ) as a function of cos(Θ). For illustration, we show results for all Γ at  Each m π has its own color. Data sets have been normalised at r qq = 0 and offset vertically. Results for all available R are shown together in one colored set. Each colored band comes from the combined fit used to determine the diquark size r 0 (m 2 π ). (Right) Resulting good diquark size r 0 versus m 2 π , compared to results of other lattice studies in the literature. The vertical line denotes physical m π . R = 4.1 a for the ensemble with m π = 575 MeV. As cos(Θ) increases from −1 to +1, r qq decreases from 2R to 0. The clear increase in ρ 2 seen in the good diquark channel is absent in all other channels [34]. The strengths of the quark-quark attractions in the good and bad diquark channels are further quantified in Fig. 3 (right), which shows the m π dependence of the ratios ρ ⊥ 2 (R, Θ = 0, Γ)/ρ ⊥ 2 (R, Θ = π/2, Γ = γ 5 ) for Γ = γ 5 and γ i . The ratio is 2 or more for the good diquark across the whole range of m π , but consistent with 0 for the bad diquark, with no evidence for any m π dependence, apart from a possible low-m π enhancement for the good diquark. The results confirm a significant attractive quark-quark spatial correlation in the good diquark channel not present in the bad diquark channel, for all m π studied here.
With a significant attractive good diquark spatial correlation established, we can refine our picture of the good diquark by studying its size and shape. We consider first the case φ = π/2. At fixed R, ρ ⊥ 2 (R, Θ, Γ = γ 5 ) depends only on Θ or, equivalently, r qq = R 2(1 − cos(Θ)). We find this dependence well represented by an exponential form, ρ ⊥ 2 (R, r qq ) ∼ exp(−r qq /r 0 ), for each value of R. As R decreases and the diquark moves closer to the static quark, one might expect the latter to distort such diquark correlations and cause r 0 to vary. We see no evidence for such a variation, so long as R > r qq , and thus, in the left panel of Fig. 4, display results for all R together, for each m π . We take r 0 as our definition of the good diquark size and fix its value from a combined fit to data for all such R. The resulting r 0 (m 2 π ) are displayed, and compared to those obtained in Refs. [22,24], in the right panel of Fig. 4. Recall that the parameters of our quenched ensemble match exactly those of [22]. Our results are in very good agreement with [22], and with both the quenched and dynamical results of [24].
Increasing m q 1,2 should, on its own, produce a more compact object. The accompanying decrease in good diquark attraction seen in Tab. 1 will, however, work in the opposite direction. We see some evidence that the former effect dominates for larger m π (above ∼ 400 MeV) though, in the limit of infinitely massive sea quarks (i.e. the quenched case) the diquarks are definitely larger. As Fig. 4 (right) indicates, in full QCD, over a range of m π , r 0 is of the order of 0.6 fm, a size similar to that of static-light mesons when measured the same way [35]. This result is also in good agreement with that of a phenomenological, relativistic quark-diquark model study of nucleon form factors [36], which found a fitted 0 + diquark form factor corresponding to an rms diquark size of ∼ 0.54 fm. It should be noted that the determined diquark size does not affect the spectroscopy of models that include diquarks as effective degrees of freedom. Our results clearly support diquark modelling of the baryon structure which allows for the possibility of a non-zero diquark size.
Finally, we can learn about the good diquark shape, by comparing the density-density correlation falloff for the relative radial (φ = π) and tangential (φ = π/2) orientations of x 2 − x 1 and S (sketched in Fig. 7 of App. D). We define separate radial ( ) and tangential (⊥) size parameters, r 0 and r ⊥ 0 , from exponential fits to the data for ρ ⊥ 2 (R, Θ) and ρ 2 (r qq , S), detailed in App. D and shown in the left and right panels of Fig. 8.
The ratio r ⊥ 0 /r 0 provides a measure of whether the diquarks are prolate, oblate, or neither. The results are shown in Fig. 5. We find r ⊥ 0 /r 0 (m 2 π ) 1 within errors for all m π , indicating that the diquarks have a near-spherical shape. This is consistent with the scalar, J = 0 nature of the good diquark, though the presence of the static quark could, in principle, have induced a diquark polarization. There appears no need to include a dipole term in diquark models.

Summary and conclusions
Using a gauge-invariant setup, we have studied the masses and shapes of diquarks carrying different quantum numbers. Our study is the first to consider n f = 2 + 1 flavors of dynamical quarks with a range of u, d masses corresponding to m π as low as 164 MeV. This allows for a small, controlled extrapolation to physical m π ≈ 135 MeV. The resulting diquark mass differences presented in Fig. 1 and Tab. 1 confirm the special status of the "good" diquark channel, which shows an attraction of 198(4) MeV over the "bad" channel, more over the others. A simple interpolation Ansatz Eq. (2.2) accurately describes how this attraction varies with m π , and with the diquark flavor composition. Extrapolation of our results to the continuum limit is still required, but this has been found to amount to a small correction, at the percent level, in other hadronic mass measurements on the same gauge configurations [28,[37][38][39]. We have also measured the mass difference between a good diquark and an [anti]quark, as per Tab. 1.
We have also shown that the q −q attraction responsible for the bad-good diquark mass differences induces a compact spatial correlation, present in the "good" diquark channel only. The associated "good" diquark size, extracted from the spatial decay rate of quark density-density correlations, is O(0.6) fm, similar to that of ordinary mesons and baryons [35], and varies little with light-quark mass.
Finally, we have tried to refine the diquark picture further, by studying the shape of quark density-density correlations in a good diquark, in the background of a heavy, static quark. It turns out that good diquarks are nearly spherical, with no signal within errors of a departure from this simplest shape.
The information obtained above may prove useful, both in identifying channels favorable to the existence of low-lying tetraquark or pentaquark states, and in obtaining rough estimates of their expected masses. Such qualitative guidance has, in fact, already been exploited in identifying double-open-heavy-flavor, SU (3) F flavor3 F , J P = 1 +QQ qq channels as favorable to the existence of exotic tetraquark states. In such channels, a localized four-quark configuration benefits from the attractive good-light-diquark and color 3 c heavy-antidiquark Coulomb interactions, neither of which is accessible for two wellseparated heavy-light mesons. This observation motivated both phenomenological and lattice explorations of potential binding in such doubly heavy tetraquark channels 1 , and experimental searches for bound doubly heavy tetraquark states, the latter culminating in the LHCb discovery of the exotic doubly charmed T cc tetraquark state [54,55]. Multiple recent lattice studies using interpolating operators designed to access the expected good-light-diquark configuration now also provide clear evidence for the existence of a J P = 1 + , SU (3) F3F multiplet of doubly bottom strong-interaction-stable tetraquark states [30,45,46,[48][49][50][51][52][53]. An analogous qualitative diquark-based argument identifies the singly-heavy J P = 1/2 + , I = 1/2,Qsudd channel as one potentially favorable to the existence of an exotic pentaquark resonance. Explicitly, while at most one good light diquark can exist in a state consisting of a well-separated heavy-light meson and light-quark baryon, a localized singly heavy five-quark state can contain two good light diquarks, one non-strange and one strange. 2 The possibility that the short-distance part of the associated singly heavy meson-light baryon system might have an attractive component resulting from this localized "extra-good-light-diquark" configuration motivates further study of this channel. Outstanding issues still to be investigated are potential distortions of the good diquark 1 See Refs. [30,31,[40][41][42][43][44][45][46][47][48][49][50][51][52][53] and earlier references therein. 2 In such a singly heavy pentaquark channel, the four light quarks can be organized into two good light diquark pairs only if the four-quark spin and color are 0 and 3c, respectively. To satisfy Pauli statistics, a low-lying state with no internal spatial excitation must then have four-quark flavor 3F , and hence contain at least one u, one d and one s quark. correlation caused by the presence of additional light quarks and/or the impact of Pauli blocking in channels, like this, where more than one good light diquark may be present. These are questions that might be amenable to investigation using microscopic models which survive the tests of predicting very shallow binding in the T cc channel and binding energies compatible with now well-established lattice results for the non-strange and strange doubly bottom J P = 1 + ,3 F channels. Bound or resonant singly-heavy J P = 1/2 + , I = 1/2,Qsudd pentaquark states, if they exist, would have four open flavors and hence, like the doubly heavy tetraquarks, be manifestly exotic.
Our three sets of results -bad-good diquark mass difference, good diquark size and shape -paint a diquark picture entirely consistent with that used in diquark models and provide clear, quantitative support for the good diquark picture. Diquark models are playing an important role in explorations of possible multi-quark exotics, especially in channels too complex to permit a complete theoretical analysis. In such channels, various light quark pairings into compact good diquark composites are likely to occur, especially when heavy c, b quark sources are present (see e.g. [40][41][42] for phenomenological and [30,31,[43][44][45][46][47][48][49][50][51][52][53] for lattice studies). Which pairing is energetically favored depends sensitively on the numerical values of the parameters of the diquark model. Our study may sharpen these values, and help improve the reliability of such diquark analyses.

Acknowledgements
Calculations were performed on the HPC clusters HPC-QCD@CERN and Niagara@SciNet supported by Compute Canada. AF thanks M. Bruno, B. Colquhoun, P. Fritzsch, J. Green and M. Hansen for discussions. PdF thanks R. Fukuda for his help with an earlier, not completed version of this project [26], and K. Fukushima for discussions. RL and KM acknowledge the support of grants from the Natural Sciences and Engineering Research Council of Canada. Together we thank R. J. Hudspith for reviewing an initial version of this manuscript.

A Phenomenological Expectations
Ref. [1] discussed in detail how to obtain phenomenological estimates for the static-limit values of the bad-good diquark and diquark-antiquark mass differences using combinations of single-charm and single-bottom meson and baryon masses chosen so O(1/m Q ) contributions cancel, bringing the results closer to the static limit. Comparing the estimates for a given splitting obtained using charm input to that obtained using bottom input provides an assessment of how close to the static limit the bottom-based estimate is likely to be. This data-based approach is obviously very closely related to the gauge-invariant, static limit approach used to obtain our lattice results above. In this appendix we provide an update of the numerical analysis of Ref. [1]. We also remind the reader of a number of other, generally more model-dependent approaches, that have been used to obtain estimates of the good-bad diquark splittings.
Ref. [1] gives expressions for the combinations needed to provide phenomenological estimates for four of the splittings we have measured. Explicitly, the combination provides an estimate for δ(1 + − 0 + ) ud , the combination an estimate for δ(1 + − 0 + ) us , the combination with P Qu and V Qu the ground-state, heavy-light pseudoscalar and vector mesons, an estimate for δ(Q[ud] 0 + −Qu), and the combination with P Qs and V Qs the ground-state, heavy-strange pseudoscalar and vector mesons, an estimate for δ(Q[us] 0 + −Qs). For a given static-limit splitting, the most accurate estimate should be that obtained using bottom hadron input, while the difference between the charm-and bottom-based estimates should provide a conservative assessment of the deviation of the bottom-based estimate from the actual static-limit value. At the time Ref. [1] was written, information on bottom hadron masses was limited, and only one of these four splittings, δ(Q[ud] 0 + −Qu), could be estimated with both charm and bottom input. The agreement between the two was excellent. It is now possible to estimate all four splittings using both charm and bottom input. Using PDG 2021 input [56], we find, for δ(1 + − 0 + ) ud , 210 MeV using charm input and 206 MeV using bottom input; for δ(1 + − 0 + ) us , 148 MeV using charm input and 145 MeV using bottom input; for δ(Q[ud] 0 + −Qu), 313 MeV using charm input and 306 MeV using bottom input; and, for δ(Q[us] 0 + −Qs) 398 MeV using charm input and 397 MeV using bottom input. It follows that the bottom-based phenomenological estimates for the static-limit splittings should be reliable to O(7) MeV or better. Moreover, these estimates agree well with the results shown in Tab. 1. A number of other approaches have also been used to estimate the ud and s good-bad diquark splittings.
The good-bad diquark mass splittings have also been obtained from diquark-quark model analyses of the non-strange and strange baryon spectrum in which the diquark mass enters as a free parameter of the model and is obtained as part of the fit to the spectrum. An iterative, phenomenological version of this approach [62] produced the results δ(1 + −0 + ) ud = 205 MeV and δ(1 + −0 + ) us = 140 MeV, while a more microscopic, relativistic model, which did not, however, allow for the possibility of mixing between quark-scalardiquark and quark-axial-vector-diquark configurations, obtained a larger result, δ(1 + − 0 + ) ud = 350 MeV, for the ud diquark splitting [63]. A modified version of this model, which significantly improves the quality of the model fit to known 3 * and 4 * baryon resonances, obtained by adding a term to the effective interaction that allows such mixing to occur, in contrast, produces a result δ(1 + − 0 + ) ud = 210 MeV in good agreement with both our lattice determination and the updated version of the phenomenological estimate of Ref. [1].
An alternate implementation of the microscopic quark-diquark model approach first fits the parameters of a model two-body quark-antiquark effective interaction with one-gluonexchange color dependence, F q · Fq, to the meson spectrum, then uses this interaction, with F q · Fq replaced by the corresponding quark-quark one-gluon-exchange factor, F q · F q , to determine the nominal3 c , J P = 0 + and 1 + masses, and hence the 1 + -0 + splittings. The meson sector model used in Ref. [64] (which has a two-body confinement interaction involving a linear combination of scalar and vector structures) produces the results δ(1 + − 0 + ) ud = 199 MeV and δ(1 + − 0 + ) us = 121 MeV [64], while the Godfrey-Isgur model [65] used in Ref. [66] (with its purely scalar two-body confinement form) produces the results δ(1 + − 0 + ) ud = 149 MeV and δ(1 + − 0 + ) us = 106 MeV.
In view of the good agreement between the updated versions of the charm-and bottombased estimates of Ref. [1], we consider the bottom-based results to represent the best phenomenological estimates of the gauge-invariant static limit splittings we measure on the lattice. The other approaches, which are more model-dependent, but have the advantage of being applicable to non-strange and strange baryon sector, produce results in reasonable to good agreement with the heavy-quark based phenomenological estimates for the good-bad diquark splittings, with more recent versions of the analyses typically producing improved agreement. It is, of course, possible that the additional light quarks present in non-strange and strange baryons might affect the structure of the good light diquark correlation in those systems, causing it to differ from that found in singly heavy baryon systems. The agreement of the results for the diquark splittings obtained from (albeit model-dependent) analyses of the light baryon sector with the heavy-quark-based phenomenological estimates is thus of interest since it supports the picture in which the same good light diquark correlations serve as useful effective degrees of the freedom in light and heavy baryon sectors.

B Lattice ensembles and propagators
For the numerical studies, we re-use the set of propagators from [30,31] determined on the publicly available n f = 2 + 1 flavor full QCD gauge ensembles provided by the PACS-CS'09 collaboration [27,28] via the JLDG repository [29]. The quoted values of m π and lattice spacing originate from our own previous re-determination [39,67]. These gauge configurations have been used extensively within the lattice community. A known caveat is the slight mistuning of the strange sea quark mass [27]. The value of the hopping parameter  Table 2. Parameters of the lattice calculation. n spec meas and n struct meas indicate how many measurements in total were made of the baryon/meson correlators for the spectroscopy study, and of the densitydensity correlators for the structure analysis, respectively. For the latter analysis, the sink-source time propagation is set to (t snk − t src ) = 16, with the currents inserted at t m = 8, see also the sketch in Fig. 2 (left).
which produces the physical strange quark mass is, however, known, and we set the strange valence quark mass to this value, thus introducing a tiny amount of partial quenching.
To connect with previous studies in the quenched setup discussed in [23], and in more detail in [22], we generated a new ensemble with the same lattice parameters, in particular with coupling β = 6.0 and hopping parameter κ = 0.153 for propagator inversions. This corresponds to a valence pion mass m v π = 909 MeV. All propagators were computed using the deflated SAP-GCR solver [68] and have Coulomb gauge-fixed wall sources, where the gauge was fixed using the FACG-algorithm in the implementation of [69,70].
We can re-use the propagators without further inversions since the gauge-fixed wall sources enable the contraction of the density correlators without an additional sequential source propagator. Choosing (t snk −t src ) = 16 enables us to perform multiple measurements on each configuration. Setting t m midway between source and sink minimizes excited-state contamination.
For the static quark we compute propagators on the fly via (t 2 > t 1 ) [32]: where we dropped the exponential prefactor, since it amounts to a constant shift in the masses that either drops out in the difference or is irrelevant to the results. To reduce statistical fluctuations, the gauge links are smeared using HYP smearing (type 1 [33,71]) in all 4 dimensions, which introduces some non-locality in time. In all cases we made sure that the propagation time t is large enough and the number of smearing steps small enough to ensure negligible effects aside from the boosted signal. Furthermore we considered several smearing setups and smearing radii. We observed comparable results and selected the one giving the best signal-to-noise properties. Our lattice parameters are listed in Tab. 2. (1 + -0 + ) ud , (×3/2) -N PDG Figure 6. Agreement of the bad-good diquark mass splitting with the prediction [1], δ(∆ − N ) = 3/2 × δ(1 + − 0 + ) ud .

C Lattice spectroscopy analysis details
To study the mass differences between good and bad diquarks, shown in top and middle panels of Fig. 1, we fix the energies in the following way: First, we analyze the smeared and unsmeared correlators separately. For each dataset, we consider one-and two-state fits. The fit window in Euclidean time is the longest for which both fits give ground-state energies consistent within errors. The ground-state energies of the smeared and unsmeared data sets are then averaged, and the larger of the two uncertainties assigned as the final error.
When performing the extrapolations to physical m π , we also considered combined fits with the Ansatz Eq. (2.2) using free n and shared B, but did not find an improvement and therefore quote results from individual fits. As a further consistency check note that all three extrapolations intersect for large m π at the n f = 3 flavor-symmetric point without having enforced this expectation through a shared parameter.
In the bottom panel of Fig. 1 we show the difference in mass between single-static octet baryons and static-light pseudoscalar mesons, as explained in the text. In this data the excited state contamination is much larger and we extract the masses by fitting both smeared and unsmeared data with a two-state Ansatz. As before we take the values from the longest time interval where the fitted ground states agree within errors, average the results, and quote the larger of the two uncertainties as our error. For the ud − u, s − s and s − cases we observe the expected n f = 3 degeneracy as m u,d → m s .
As a final diquark spectroscopy investigation, we compare the bad-good diquark mass difference with the ∆-N mass splitting for each of our 5 ensembles. This comparison is motivated by the observation [1] that, in the one-gluon-exchange approximation, and chiral limit, δ(∆ − N ) = 3 2 δ(1 + − 0 + ). Fig. 6 compares the left-and right-hand sides of this relation, the red curve showing the appropriately rescaled version of the δ(1 + − 0 + ) ud fit of the middle panel of Fig. 1 and the blue curve a similar fit to the ∆-N data with A = 3 2 × 0.203(9), B = 1.10(11) GeV. Agreement between the two is excellent in the chiral limit, and remains very good over the whole mass range. Measurements of the ∆ − N splitting were newly performed for this study, using the same propagators as for the static baryons shown so far. Since in this case we do not benefit from the cancellation of the heavy quark mass, and nucleon correlators generally suffer from the well known signal-to-noise Q Q prolate: oblate: problem, we expect larger uncertainties in this study. The situation is further complicated as the ∆ baryon is a resonance in nature: a single operator analysis, as performed here, can capture only its rough features. To stabilise the extraction of the masses here, we fitted the channel pairs Γ = (γ i , σ i0 ) and (γ 5 , γ 5 γ 0 ) simultaneously with single exponentials and chose to quote the parameters from the longest combined plateau. For the physical-point extrapolation, the results were fitted to the same Ansatz Eq. (2.2), in the same way as before.

D Lattice structure analysis details
The diquark size r 0 can be estimated from the fitted rate of the exponential decay, ∼ exp(−r qq /r 0 ), of the density-density correlator ρ ⊥ 2 (R, r qq ), with r qq the distance between the two current insertion points: see Fig. 4 (left).
The colored bands (one for each m π ) are the result of performing a combined fit for all available R to a single exponential with shared size parameter r 0 and separate amplitudes. Note that our lattice spatial size is about 5r 0 , so we neglect corrections caused by periodic boundary conditions which were studied in [24]. We checked the dependence of r 0 (R) on R through individual fits and found no significant dependence for R ∈ [3 : 6]. Similar findings were reported in [22]. Of course, if R is increased beyond ∼ 1 fm, the effective string between the static quark and the diquark will break and a light baryon will form, with qualitatively different diquark correlations. Our study does not consider this largedistance regime. Also, for a given R, we normally would quote the number from the largest stable fit window. However, due to our chosen geometry we may expect interference from the static quark when r qq R, and we limit the fit window accordingly. Separate sizes r ⊥ 0 and r 0 can be defined for the tangential and radial geometries shown in Fig. 7. The ratio of sizes r ⊥ 0 /r 0 then gives a measure of whether the diquarks are spherical (r ⊥ 0 /r 0 = 1), prolate (r ⊥ 0 /r 0 < 1), or oblate (r ⊥ 0 /r 0 > 1). To estimate r ⊥ 0 and r 0 , we measure ρ 2 (r qq , S, φ) for the two geometries of Fig. 7, with r ⊥ 0 and r 0 corresponding to φ = π/2 and π in Fig. 2, respectively. When the line labelled S in Fig. 2 points along the x axis, the current insertion points for the radial configuration (the blue points in Fig. 7) are x 1,2 = (S ± r , 0, 0), while those for the tangential configuration MeV results for the r qq -dependence of ρ 2 (r qq , S, φ) for tangential (φ = π/2, left panel) and radial (φ = π, right panel) quark-quark orientations. The colored error bands are the results of combined fits to data for each of S = 4a, 5a and 6a.
(the red points in Fig. 7) are x 1,2 = (S, ±r ⊥ , 0). For simplicity, we take the line labelled S to always lie in one of the x, y or z axis directions, considering all such permutations. Focusing first on the radial case, x 1 + x 2 = 2S is constant at fixed S and independent of r . Here we define our radial size parameter, r 0 , at this fixed S, by fitting the r dependence to the form ρ 2 (S, r ) ∼ exp(−r /r 0 ) . (D.1) Since no obvious S dependence is observed, we arrive at r 0 by analyzing the data in a combined fit for several S using the same fit method applied before. In the tangential case, a complication arises. Our previously introduced size parameter, r 0 , was defined through a fit to ρ ⊥ 2 (R, r ⊥ ) with variable r ⊥ , but fixed R. Since, however, R = (r ⊥ ) 2 + S 2 , when r ⊥ varies at fixed R, S also varies. This is not the fixed-S situation used to define r 0 . We thus need to define an alternate tangential size parameter, r ⊥ 0 , through a fit to data with variable r ⊥ but fixed S, in order to compare tangential and radial size parameters both defined at fixed S. As seen above, the density-density correlation ρ ⊥ 2 (R, r ⊥ ) at fixed r ⊥ varies with R. We find this dependence well described by an exponential form ∼ exp(−2R/R 0 ). The dependence of the density-density correlation on r ⊥ at fixed S in the tangential configuration can then be obtained by fitting the product ρ ⊥ 2 (R, r ⊥ ) exp(+2R/R 0 ), evaluated at fixed S and variable r ⊥ , to the form exp(−r ⊥ /r ⊥ 0 ). Since the fixed-S tangential r ⊥ = 0 and radial r = 0 configurations are geometrically degenerate, it is convenient to instead use the form exp(−r ⊥ /r ⊥ 0 ) to fit the modified product ρ ⊥ 2 (S, 2r ⊥ ) = ρ 2 (R, 2r ⊥ ) exp(+ with R 0 a second fit parameter, and the right-hand side evaluated at fixed S. The extra r ⊥ -independent factor, exp(−2S/R 0 ), ensures that, in the limit that r ⊥ → 0 and hence R → S, the quantity being fit reduces to ρ ⊥ 2 (R = S, r ⊥ = 0). Since this is identical to the analogous zero-separation quantity, ρ 2 (S, r = 0), which enters the fit used to determine r 0 , this choice ensures a common normalization for the tangential and radial fits, at the r ⊥ = r = 0 point common to both.
In the left and right panels of Fig. 8 we show, for the m π = 575 MeV ensemble and S ranging from 4 to 7 times the lattice spacing, the dependences ofρ ⊥ 2 (S, 2r ⊥ ) on r ⊥ and ρ 2 (S, 2r ) on r , respectively. The results are normalized soρ ⊥ 2 (S, 2r ⊥ ) and ρ 2 (S, 2r ) take the common value 1 at S = 4a and zero separation.
While the parameters r ⊥ 0 and R 0 have been determined in the two-parameter fit described above, R 0 could, in principle, also be determined by fitting ρ ⊥ 2 (R, r ⊥ ), with r ⊥ fixed to zero, to the form ∼ exp(−2R/R 0 ). We found that inserting the resulting R 0 into the Ansatz Eq. (D.2) and subsequently fitting r ⊥ 0 produced no improvement over the direct two-parameter fit result, once errors were propagated and correlations taken into account.
In both the radial and tangential cases, we have propagated all errors within a bootstrap procedure, which allows us to also evaluate the uncertainty on the ratio r ⊥ 0 /r 0 consistently. Results for this ratio are shown as a function of m 2 π in Fig. 5 of the main text. We observe that the errors on the data limit the precision of the analysis, with a stable result, for example, not even attainable for the m π = 164 MeV ensemble. This is due in part to the noisiness of the results at large S, which were not precise enough to constrain R 0 , r ⊥ 0 and r 0 further.