Nonperturbative Corrections to Soft Drop Jet Mass

We provide a quantum field theory based description of the nonperturbative effects from hadronization for soft drop groomed jet mass distributions using the soft-collinear effective theory and the coherent branching formalism. There are two distinct regions of jet mass $m_J$ where grooming modifies hadronization effects. In a region with intermediate $m_J$ an operator expansion can be used, and the leading power corrections are given by three universal nonperturbative parameters that are independent of all kinematic variables and grooming parameters, and only depend on whether the parton initiating the jet is a quark or gluon. The leading power corrections in this region cannot be described by a standard normalized shape function. These power corrections depend on the kinematics of the subjet that stops soft drop through short distance coefficients, which encode a perturbatively calculable dependence on the jet transverse momentum, jet rapidity, and on the soft drop grooming parameters $z_{\rm cut}$ and $\beta$. Determining this dependence requires a resummation of large logarithms, which we carry out at LL order. For smaller $m_J$ there is a nonperturbative region described by a one-dimensional shape function that is unusual because it is not normalized to unity, and has a non-trivial dependence on $\beta$.


Introduction
Measurements of jet observables in QCD provide a key tool to test perturbative, resummed, nonperturbative, and Monte Carlo descriptions of QCD dynamics and also are used to probe the presence of new physics. A typical observable receives contributions from perturbative momentum regions, where the description requires fixed order calculations often supplemented with resummations of large logarithms, as well as from the nonperturbative momentum region, related to hadronization. Remarkable progress has been achieved from high precision perturbative calculations, where examples include event shapes in e + e − collisions at next-to-next-to-next-toleading-log order +O(α 3 s ) [1][2][3][4][5][6][7][8][9], and Higgs production with a jet veto [10][11][12][13][14][15][16][17][18] with a resummation of logarithms at next-to-next-to-leading-log order.
Nonperturbative hadronization corrections can also often be described rigorously from QCD with the help of factorization theorems, for instance by examining operators built out of nonperturbative modes [19] within the soft-collinear effective field theory (SCET) [20][21][22][23][24]. This program has been successfully carried out for e + e − event shapes [7,8,19,25], thrust for DIS with a jet [26], or the jet-mass in pp collisions [27,28]. For other methods for examining power corrections using nonperturbative models and other analytic techniques, see for example [29][30][31][32]. The corresponding nonperturbative parameters frequently involve light-like Wilson lines making them hard to evaluate using Lattice QCD. However, their functional dependence and universality can still be determined, and the hadronization effects can then be described by fitting one or more additional parameters in a way consistent with field theory. Often the most important information about hadronization can be encoded in a single parameter, which is the first moment of an underlying nonperturbative shape function.
Another method of accounting for hadronization corrections is to rely on hadronization models that are implemented within Monte Carlo parton shower event generators like Pythia [33] and Herwig [34]. The parameters in these models are fixed by tuning them to certain standard observables, and then used to predict hadronization effects in other observables. An advantage of this method is that hadronization effects can be predicted for any observable. However, it is often hard to estimate the accuracy of these models since, unlike the factorization based methods, they are based on an extrapolation rather than systematic expansions. Another problem with the Monte Carlo method is that hadronization parameters are tuned to data using a perturbative accuracy that is often limited to (roughly) next-to-leading logarithmic (NLL) order [35,36] or less (though there are a few notable exceptions that include [37][38][39][40][41][42][43][44]). When higher order perturbative precision is available, the use of these Monte Carlo based hadronization estimates becomes problematic since the tuning partially absorbs perturbative corrections beyond NLL, potentially leading to double counting which is hard to control.
Together with these advances in understanding perturbative and nonperturbative dynamics of jets, there has also been a surge of interest in jet substructure techniques [45][46][47][48][49]. This includes in particular the use of jet grooming to remove soft radiation from jets, with the goal of reducing the effects from hadronization, underlying event, and pileup. Theoretically, the most widely studied jet groomer is the soft drop algorithm [48] (which includes as a special case the earlier modified mass drop algorithm [45,50]). Soft drop groomed observables have been recently measured by ATLAS [51] and CMS [52]. Perturbative methods have been developed to carry out calculations for groomed jets [50,[53][54][55][56], and soft-drop factorization theorems have been derived for D 2 [57], the 2-point energy correlator, e (α) 2 [55], and the jet-mass and angularities for inclusive jets [58,59]. Resummation of groomed event shapes at an e + e − collider such as soft drop thrust, hemisphere jet mass, and narrow invariant jet mass were studied in Ref. [60], and related fixed order corrections at next-to-next-to-leading order (NNLO) accuracy were calculated in Ref. [61]. Resummation of the soft drop jet mass for top jets was studied in Ref. [62] and e (α) 2 for bottom quarks in Ref. [63]. So far, either Monte Carlo hadronization models, models based on scaling from single gluon emission, or naive analytical shape functions have been used to estimate hadronization corrections for these observables, and no attempt has been made at obtaining a rigorous operator based description of hadronization corrections after jet grooming. Scaling results based on the kinematics of single gluon emission in soft drop were considered in Refs. [50,64], and shape function models for soft drop hadronization corrections have been employed for jet angularities [55], D 2 [57,65], and heavy quark induced jets [62,63].
In this paper we develop a factorization based description of hadronization corrections for jet observables after soft drop grooming. We focus for concreteness on the groomed jet mass m J for massless jets, though our approach can also be applied to other observables. Soft drop decouples the dependence of the hadronization from aspects of the event that are not associated with the groomed jet being studied, thus our results apply equally well for e + e − and pp collisions. We consider two distinct regions for the jet mass, each having a distinct description for their leading hadronization corrections: (1.1) These relations will be discussed in detail in Sec. 3. Here Λ QCD ∼ 0.5 GeV is the typical nonperturbative scale for QCD, and Q = 2E J = 2p T cosh(η J ) is a hard scale given by twice the jet energy, which is related to the jet p T and rapidity η J . We assume that Q m J and note that perturbative resummation is important for both of these regions. Finally, we have defined the smaller soft drop induced scale which depends on the soft drop parameters z cut and β (see Sec. 2.1 below for the definitions of z cut , z cut , and β).
In the SDOE region, we will demonstrate that the leading hadronization effects are given by two nonperturbative matrix elements defined by a field theory based operator expansion, where κ = q, g distinguishes quark versus gluon initiated jets. Furthermore, we will show that the Υ κ 1 (β) parameter has a linear dependence on β in the SDOE region: This yields in total three universal hadronic parameters, Ω • • 1κ , Υ κ 1,0 , and Υ κ 1,1 which depend on a universal geometry, that we describe below, and on Λ QCD . The matrix elements Ω • • 1κ and Υ κ 1 (β) are multiplied by perturbative Wilson coefficients that depend on z cut and β, and the jet kinematic variables. The resulting power corrections depend on the kinematics of the subjet that stops soft drop. We set up a formalism for determining these coefficients and calculate them with leading logarithmic (LL) resummation accounting for running coupling effects. The expansions used to derive this form for the power corrections imply that we cannot connect them to the power corrections appearing for the ungroomed jet mass by taking β → ∞.
In the SDNP region we find that the leading hadronization effects are given by a non-trivial shape function F ⊗ κ (k, β). Unlike other examples of shape functions derived in the literature, F ⊗ κ (k, β) is not normalized to unity when integrated over all k. The function F ⊗ κ (k, β) also depends on the color charge κ of the hard particle that initiates the jet, i.e. it depends on whether one considers a quark or a gluon initiated jet. However, it does not depend on p T , η J , z cut , the jet-radius R or other quantities related to the hard process.
The outline for the further sections is as follows: In Sec. 2 we review the soft drop grooming algorithm and setup the leading power soft drop factorization theorem in a convenient manner for our analysis. We then describe the interface between the partonic cross section and the nonperturbative corrections. In Sec. 3 we describe the relevant EFT modes that are responsible for the leading power corrections in the SDOE and SDNP regions. We derive the factorization of the measurement and the matrix elements in the SDOE region in Sec. 4, which leads to definitions for the power corrections Ω • • 1 and Υ 1 (β). In Sec. 5 we calculate the perturbative Wilson coefficients of these power corrections, which are needed to describe the hadronization corrections in the SDOE region. In Sec. 6 we analyze the SDNP region using tools of EFT and derive the properties of the shape function that describes power corrections in this region. A comparison with previous work is presented in Sec. 7. Section 8 presents a parton shower Monte Carlo (MC) event generator study where we confront our field theory based description of the hadronization corrections in the SDOE region with MC results at parton and hadron level. In particular, we test the agreement of MCs with our predictions for universality by fitting the power corrections in the SDOE region to results from MC hadronization models. We conclude in Sec. 9.

Soft Drop Algorithm and Jet Mass
The soft drop algorithm [48] considers a jet of radius R, reclusters the particles into a angular ordered cluster tree of subjets using the Cambridge-Aachen (CA) algorithm [66,67], and then removes peripheral soft radiation by sequentially comparing subjets i, j in the tree. The grooming stops when a soft drop condition specified by fixed parameters z cut and β is satisfied by a pair of subjets. For pp collisions the condition is where R ij is the angular distance in the rapidity-φ plane, R 2 ij = 2 cosh(η i − η j ) − cos(φ i − φ j ) , and in general R 0 is a parameter that is part of the definition of the soft drop algorithm which is often chosen to be the jet radius. For e + e − collisions the condition is This is illustrated in Fig. 1 where Θ sd = 1 − Θ sd represents the pass/fail test being applied by the soft drop groomer. Once Eq. (2.1) or Eq. (2.2) is satisfied all subsequent constituents in the tree are kept, thus setting a new jet radius R g < R for the groomed jet.
In the limit R ij 1, with jet constituents close to the jet axis, we can rewrite Eq. (2.1) in terms of the energies E i = p T i cosh η i and polar angles θ ij 1, so that the pp formula becomes where here we introduced the shorthand notatioñ z cut = z cut cosh β η J R β 0 (pp case) . (2.4) To obtain this result we used R 2 ij cosh η i cosh η j θ 2 ij and cosh η i = cosh η j + O(θ ij ) cosh(η J ) to write the extra factors in terms of the jet's rapidity η J . For e + e − , the result in the θ ij 1 limit takes the same form in Eq. (2.3), but with z cut = z cut √ 2 sin R ee With soft drop grooming the jet mass is defined by starting with the constituents of the jet of radius R and summing only over the constituents J sd that remain after soft-drop has been applied, (2.7) Figure 2. Pythia prediction for m 2 J /E 2 J distribution indicating three different regions of the spectrum. Here we take R ee 0 = π 2 and the inequality " " in the "p + cs p + Λ " constraint for the SDOE region is replaced by a factor of 5.
In Fig. 2 we show a Pythia8 Monte Carlo prediction for this groomed jet mass spectrum with z cut = 0.1 and β = 2. We distinguish three relevant regions of the spectrum: the soft drop nonperturbative region (SDNP) to the far left (left of the magenta dashed line), the soft drop operator expansion region (SDOE) in the middle (between the dashed lines), and the ungroomed resummation region on the far right where soft drop turns off but the log resummation for the ungroomed case is still active (right of the green dashed line). The distinction between the SDNP and SDOE regions is determined by Eq. (1.1), while the distinction between the SDOE and ungroomed resummation region is given by soft drop operator expansion (SDOE): ungroomed resummation region: For the hemisphere case in e + e − one has R = π/2. For R = 1, in the e + e − case this boundary roughly corresponds to m 2 J /E 2 J = z cut which we use in Fig. 2. In all three of these regions the resummation of large logarithms is important, though the precise nature of this resummation is different. There is an additional transition between resummation and fixed-order regions which occurs on the very far right of Fig. 2, near where dσ/dm J goes to zero (not indicated in the plot).

Partonic Factorization for Light Quark and Gluon Jets
In this section we review the pioneering partonic massless soft drop factorization theorem derived in [55,68], working in the same limit In Secs. 3-6 we will extend this factorization based description to hadron level by including the leading final state hadronization effects. This will be achieved by incorporating field theoretically derived nonperturbative matrix elements for the SDOE region, and by setting up a novel shape function for the SDNP region. Consider the groomed jet mass measurement from jets initiated by light quarks or gluons. A partonic factorization formula for the soft dropped jet mass in the limit in Eq. (2.9), was derived in Ref. [55]: Here ⊗ denotes a convolution, κ sums over quark and gluon jets, theÑ κ are normalization factors, andJ κ andS κ C are dimensionless jet and collinear-soft functions. TheÑ κ determine what fraction of the jets are produced by gluons or quarks given an underlying hard process. For pp it incorporates the initial state parton distribution functions (PDFs). It also contains the global soft function, the hard function, as well as any other functions describing other aspects of the event. The jet mass spectrum is determined by the inclusive jet functionJ κ that encodes the distribution of collinear radiation in the jet and a collinear-soft functionS C that describes the influence of soft radiation retained by soft drop.S C depends on Q and the soft drop parameters z cut and β. The dependence on the variables Q and z cut is remarkably simple since they appear only in a single combination with m J [55]. At (modified) LL order, Eq. (2.10) agrees with the results derived in the original soft drop paper [48] and at LL for β = 0 with the results for modified Mass Drop Tagger derived in Ref. [50] using the coherent branching formalism. Results for β ≥ 0 have also been derived in Refs. [56,64].
We choose to write the partonic factorization theorem for the jet mass spectrum in a more explicit form as where we show the light-cone momentum components (p + , p − , p ⊥ ) ≡ (n J · p,n J · p, p ⊥ ) , (2.13) relative to the jet axisn J , with n J = (1,n J ) andn J = (1, −n J ). In terms of these coordinates the angle θ relative to the jet axis is given by sin(θ) = | p ⊥ |/p z which for θ 1 gives Hence the collinear particles spread out relative to the jet axis with a typical angle θ c ∼ 2m J /Q. The scaling for the global soft modes that contribute to N κ is This soft scaling differs from that of the ultrasoft mode, p µ us ∼ m 2 J /Q(1, 1, 1), which is relevant in the ungroomed case.
Finally, rather than using a dimensionless collinear-soft functionS κ C as in Eq. (2.10), we use a collinear-soft function with dimension (−2 − β)/(1 + β): where the subscript µ on the RHS indicates that this matrix element is renormalized in the MS scheme. The normalization convention we adopt for this definition ensures that S κ c is truly a function of only the three variables shown, which makes manifest the non-trivial connection between + , Q and z cut derived in Ref. [55], which imply that it is only a function of + Q 1 1+β cut . Note that in the pp case this combination is independent of the jet rapidity η J since the cosh η J factors in Q 1 1+β cut /Q cancel: Note that the prefactor of Q so that the collinear-soft modes probe the edge of the groomed jet, while the collinear modes are well inside. At one-loop the result for the collinear-soft function in the MS scheme is x + is the standard plus function which integrates to zero on x ∈ [0, 1], C q = C F , and C g = C A . Here we see explicitly that the collinear-soft function is only a function of the three variables shown in the arguments in the left hand side of Eq. (2.20). The momentum space result in Eq. (2.20) agrees with the Laplace space result in Eq.(E.4) of Ref. [55].
The form of the convolution shown in Eq. (2.11) follows from the fact that the jet mass, when decomposed into contributions from the collinear and collinear-soft modes, is where the ellipses are higher order in the power counting. Here p 2 n is the argument of the jet function J κ , and p + cs = + is the variable that appears in S κ c . The fact that the sum of collinear and collinear-soft momenta gives the total jet mass leads to the convolution. Including the resummation of large logarithms, the factorization theorem in Eq. (2.11) becomes where U S G and U J are RG evolution kernels. Note that the hard scale µ h indicates an upper limit for an evolution that takes place inside N κ , and since this factor also includes the boundary condition at µ h it is formally µ h independent. In the third line of Eq. (2.22) we have defined a notation for the partonic cross sections dσ κ /dm 2 J for the individual κ = q, g channnels which we will use later on. This formula does not account for final state hadronization effects. The canonical global soft, jet, and collinear-soft scales appearing in Eq. (2.22) are (2.23) The renormalization group evolution between these scales sums the associated large logarithms.
Note that we will always have µ J µ cs , since µ cs /µ J = (m J /Q) 1, and also µ gs µ cs , since µ cs /µ gs ∼ (m 2 J /QQ cut ) 1+β 2+β 1. On the other hand there is no universal hierarchy between the scales µ gs and µ J . The functions N κ involve both the hard scale µ h ∼ Q ∼ p T and the global-soft scale µ gs ∼ Q cut , and additional logarithms of z cut from the hierarchy Q Q cut can be summed inside N κ if desired. The integrations in Eq. (2.22) can be evaluated analytically with standard techniques and give predictions at LL, NLL, etc, for the soft drop groomed spectrum. Results up to NNLL were obtained in Ref. [55] for β = 0 and β = 1.

Nonperturbative Modes for Soft Drop
To determine the leading hadronization corrections we first determine for our observable the dominant nonperturbative modes Λ with momenta (p + Λ , p − Λ , p ⊥ Λ ). This is done separately for the operator expansion (SDOE) and nonperturbative (SDNP) regions, see Eqs. (1.1) and (2.8).
In Fig. 3a we show all the relevant perturbative and nonperturbative SCET modes for m J values in the SDOE region. In Fig. 3b we show the relevant modes when m J is in the SDNP region. These figures shows only scaling relations, so that the momentum scaling of modes at different locations are separated by a inequality. Any particles with momenta satisfying a ∼ relation appear at the same point, with the scaling of the mode at that point. Here C denotes the collinear modes appearing in J κ , which sit on the blue measurement curve labeled m 2 J = Qp + at small θ c ∼ 2m J /Q. The slanted orange line for z z cut θ β bounds the momentum region removed by soft drop and is labeled by "slope = β". The magenta point labeled CS denotes the collinear-soft modes which determine S κ c , whose scaling was given in Eq. (2.18), and is determined by the intersection of the slanted orange line and the blue curve. Finally, S denotes the global soft modes that are groomed away. Their presence is required for renormalization group (RG) consistency, as part of the calculation of the no-emission probability, and are included in N κ . Note that ultra-soft modes, sensitive to other parts of the event, sit at the intersection of the blue curve and y-axis, and are removed by soft drop. Figure 3a for the SDOE region is identical to the mode picture in [55] with two exceptions that did not matter there but are important for our analysis. The first is that we have added the brown line, p 2 ∼ Λ 2 QCD , which denotes where the dominant modes responsible for hadronization effects are located. The second is that the shaded orange region, which denotes the region removed by soft drop, is truncated in the θ direction by the vertical orange line at the angle θ R g where the iterative grooming stops, see [48]. At leading power in the SDOE region, soft drop is always a) b) Figure 3. Modes appearing in the soft drop factorization formula for jets initiated from massless quarks or gluons. Panel a) shows a m J value in the operator expansion region, while panel b) shows a m J in the nonperturbative region. Here z is the energy fraction relative to the total jet energy, and θ is the polar angle relative to the jet-axis. The hard modes H at z ∼ 1 and θ ∼ 1 are not shown.
stopped by comparing a perturbative CS subjet with the subjet containing the collinear particles. Thus the vertical line occurs at the location of the CS mode which has a parametrically larger angle than the collinear mode. To understand this, note that the CS modes saturate the soft drop condition, i.e. they sit at the largest possible angle from the jet axis and have large enough energies to enable them to pass soft drop. So when pairs of subjets are tested as we traverse the CA clustering tree backwards, there will be a subjet in the tree that carries all the C particles, and another with one or more CS particles. As long as at least one CS subjet is kept by soft drop, then the stopping subjet will have collinear-soft scaling.
In the SDOE region a perturbative CS subjet will always stop soft drop, yielding R g ∼ θ cs . This then determines the dominant nonperturbative mode for the SDOE region, labeled by Λ in Fig. 3a, which has the same parametric angle as the CS mode. Its (p + , p − , p ⊥ ) momentum components therefore have the scaling These Λ modes yield the most important nonperturbative contribution to the jet mass since, as seen from Eq. (2.21), in the collinear-soft limit the contribution to the jet mass is given by Qp + and the Λ modes have the largest p + momentum component among all modes on the brown line that are not completely removed by soft drop. To see this we note that for massless modes p + ∼ Qz(θ/2) 2 ∼ Λ QCD (θ/2), where the second ∼ follows from p 2 ∼ Λ 2 QCD when θ 1. Thus the nonperturbative mode with the largest p + is the one with the largest θ that is not removed by soft drop, given by θ cs /2 ∼ ζ cs . Comparing p + momenta in Eqs. (2.18) and (3.1) we see that for p + 1, which is precisely the equation for the SDOE region in Eq. (1.1). In this region the hadronization contributions from Λ are power corrections. We note that the dominant power corrections appear only from corrections to the collinear-soft sector. The nonperturbative corrections in the collinear sector are subleading since nonperturbative corrections to the jet function are suppressed by Λ QCD /m J 1. We could also consider the case where there is no perturbative CS particle retained by soft drop in the SDOE region, so that all CS subjets are eliminated. In this case soft drop will be stopped by a nonperturbative mode whose momentum would sit at a location determined by extending the slanted orange line until it intersects the brown curve, indicated by the Λ mode in Fig. 4. However, in the SDOE region the probability for such an event is exponentially suppressed by the ratio of Sudakov exponentials that arise in the cases with or without a perturbative CS mode, describing the respective no radiation probability. This is because in the scenario without a CS mode there is a significantly larger region of phase space without an emission until the brown line is reached where a NP particle with Λ scaling is always found that stops soft drop. The Λ in Fig. 4 is thus a subleading mode for power corrections in the SDOE region.
In contrast, for p + cs ∼ p + Λ the CS and Λ modes become parametrically close, merging into a single mode, which is labeled by ΛCS in Fig. 3b. Using Eqs. (2.18) and (3.1) we find that this parametric relation corresponds to jet masses m 2 J ∼ QΛ QCD (Λ QCD /Q cut ) 1 1+β . This scaling relation agrees with Ref. [55]. Jet masses with these or smaller values correspond to the SDNP region, quoted above in Eq. (1.1). 1 Here it is a ΛCS mode in SCET that stops the soft drop groomer. In this region we have θ cs /2 ∼ ζ np and the scaling where ζ np is still parametrically 1. Since the ΛCS modes sit on the blue line in Fig. 3b, there are leading nonperturbative corrections to the jet mass spectrum in this SDNP region.
The above nonperturbative modes are the new ingredients needed for our analysis of the SDOE and SDNP regions. These modes with p 2 ∼ Λ 2 QCD will determine the nonperturbative matrix elements (SDOE) or functions (SDNP) that contribute to the jet mass cross section. Note that the need to consider the Sudakov exponentials for the analysis of power corrections in the SDOE region is novel, and differs from SCET analyses in other contexts, such as ungroomed event shapes. This implies that at least leading logarithmic (LL) resummation will need to be considered in order to properly incorporate the dominant hadronization corrections in the SDOE region. The manner in which the nonperturbative corrections appear can also depend on the order in resummed perturbation theory considered. For our analysis of power corrections we will make use of leading-logarithmic resummation, leaving results at higher order to future work.
Using the SCET based techniques of Refs. [7,25,73,74], which here involves factorizing perturbative and nonperturbative contributions in the collinear-soft region, we can derive a factorized form for these power corrections. This involves factorizing the measurement operator and matrix elements, as well as considering the role of CA clustering, which we consider in Secs. 4.1-4.4 in order to obtain the final result for the SDOE region in Sec. 4.5. The extension to the SDNP region is considered in Sec. 6.

Nonperturbative Corrections in the Operator Expansion Region
In this section we reconsider the soft drop jet mass factorization formula in order to include the leading nonperturbative effects related to final state hadronization in the SDOE region. The expansions for the SDOE region are based on the ratios from comparing Eqs. (2.18) and (3.1), 1 . (4.1)

Expansion of the Measurement Operator
The measurement operator for the groomed jet mass cross section involves CA clustering, subsequent grooming, and the measurement of the observable on the remaining set of particles as illustrated in Fig. 1. In this section we extend the analysis of the partonic measurement operator to include leading power corrections. Before we embark on the calculation we first set up some useful notation. Consider first the case of plain jet mass (without grooming). The differential cross section can be written as where H κ IJ encodes the dependence on the hard production process and initial state, O κ represents the SCET operator which includes fields for κ = q or g which initiate the jet, X are the final state particles in the jet region of interest, X are other final state particles, and I, J are shorthand for any color and spin indices. Since soft drop decouples the jet mass distribution from the behavior of particles outside the jet region, we can focus on X for our analysis. The measurement operator in the soft or collinear-soft limit can be written aŝ where the operatorp + measures the p + momentum of final state radiation. The action ofp + on a n-particle final state is simply the sum of all the individual +-momentum components: In the case of the groomed jet mass the situation is more complicated since the particles cannot be simply selected from a simple geometrical region due to CA clustering and the soft drop test. Hence, thep µ operator above is not sufficient to define the groomed jet mass measurement operator consistently. To ameliorate the problem, we define a "soft drop momentum operator", p µ sd (X;z cut , β), that takes into account the CA clustering and grooming for a given multiparticle reference state |X . Its action on a single particle state |i that may or may not be in |X is defined as followsp µ sd (X;z cut , β)|i = p µ Θ sd p µ , {p µ j ; j ∈ X};z cut , β |i (4.5) The operator Θ sd p µ i , {p µ j ; j ∈ X};z cut , β is defined to be 1 if the particle |i is not groomed away and 0 otherwise. For simplicity we suppress the dependence on the soft drop parametersz cut and β in the argument of Θ sd below. The usefulness ofp µ sd (X) becomes apparent when its action on a multiparticle state is considered: If the particles {i 1 , i 2 , . . . i n } are contained in X then each particle |i α is individually tested for passing soft drop amongst the particles in |X . Thusp µ sd (X) yields a measurement of the groomed jet momentum. In this notation, the measurement operator for the groomed jet mass for the final state |X of collinear-soft particles is simplŷ The action ofδ sd on |X will precisely collect the p + momenta of only those particles that remain after grooming.
Having established some notation, we are now prepared to consider leading hadronization corrections in the SDOE region. Upon including nonperturbative (NP) particles, represented by a multiparticle state |X Λ , together with perturbative particles |X , Eq. (4.7) now readŝ The NP contribution to the jet mass is then given by where p + Λsd is the sum of all p + momenta of the nonperturbative particles kept after grooming. Here we have made use of the fact that the combined state |XX Λ factorizes into |X |X Λ in the SDOE region. This follows because the Λ and CS modes have the same boost but hierarchically different momentum components, and hence factorize in their respective Lagrangians. Furthermore, the momentum operator now uses the full hadronic state as its reference state:  Figure 5. Schematic of a CA clustered tree and its simplification in the SDOE region. The vertical separation between the branches corresponds to the angular distances and the CA clustering proceeds from right to left. The colors correspond to the modes displayed in Fig. 3. The angular locations of perturbative subjets in the SDOE region at leading power remain unperturbed as nonperturbative particles are added.

1.
A "shift" correction: contribution to the observable from the NP radiation kept in the groomed jet, given by Q p + Λsd in Eq. (4.9), 2. A "boundary" correction: modification of the soft drop test for a perturbative subjet in presence of NP radiation, as seen from the X Λ dependence of Θ sd in Eq. (4.10).
We will show below that both of these power corrections modify the shape of the spectrum and cannot simply be included via a shape function. In general both of these corrections are tied to the clustering history of other perturbative particles in the jet, hence complicating the nonperturbative factorization. However, there are some key simplifications one can make in the SDOE region up to LL accuracy, which we address in the following.
To help visualize the problem we show in Fig. 5 the same schematic as Fig. 1, but with the momentum scaling of the branches made explicit. The labels and the colors correspond to the EFT modes shown in Fig. 3. The perturbative branches are effectively immersed in a bath of nonperturbative particles distributed at all angles, corresponding to the brown line in Fig. 3. The scaling of the combined subjet depends on the dominant mode of the pair, with the collinear modes having the highest energy. Hence, the collinear subjet undergoes the smallest relative change in the subjet momentum during clustering. The soft drop grooming will be stopped by a comparison involving branches that have collinear and CS scaling. As discussed above in Sec. 3, in the SDOE region there is always a perturbative CS subjet that stops soft drop.
In general, the angular location of a subjet changes at each stage of clustering as the subjets are combined, as shown in the left figure in Fig. 5. At leading power in the SDOE region the shift to the momentum of perturbative subjets on adding NP particles is small and can be ignored. At the first subleading power where the hadronization effects enter, the NP particles that determine the shift term are the ones that belong to the same leading power collinear or CS subjets. In calculating the shift term we thus ignore the effects of NP particles on the clustering and soft drop comparison of perturbative subjets, as shown in the right schematic in Fig. 5, and hence can usep + sd (X, X Λ ) p + sd (X) for the shift term. In contrast, the boundary term separately captures the effect of the leading NP modification to the subjet geometric regions, which modifies the amount of perturbative p + momentum kept in the groomed jet.

Collinear soft
Collinear Figure 6. The catchment area (blue and pink shaded regions) of nonperturbative particles (wavy lines) set by the perturbative collinear and collinear-soft subjets with angular ordering of perturbative subjets. The particles shown in gray are groomed away. NP particles are not assumed to obey angular ordering.

Expansion for the Shift Term
At LL one can make a further approximation of assuming strong angular ordering of the perturbative emissions. This dramatically simplifies the complexity of the CA clustering in the measurement operator. Strong angular ordering implies that all the perturbative emissions subsequent to the one that stops soft drop lie at much smaller angles. We illustrate in Fig. 6 the region of momentum space that forms the catchment area of the kept NP particles at LL (blue and pink shaded regions). The perturbative emissions that occur after the emission that stops soft drop will also lie within this region. Each cone is centered on one of the subjets that stops soft drop, and the conic sections correspond to the regions where the nonperturbative radiation is collected by each of these subjets. To extend this formalism to NLL would require considering modifications of the catchment area in Fig. 6 due to inner resolved perturbative subjets (which are not necessarily strongly ordered), which we leave to future work.
Looking down the jet axis, in the small angle approximation, the size and the alignment of the region is determined by {θ cs , φ cs }, the polar and azimuthal location of the CS subjet measured relative to the jet axis, as shown in Fig. 7a. Since the collinear subjet carries the majority of the jet energy, we assume that the jet axis is aligned with that of the collinear subjet. As a result the contribution of the NP particles to the observable via the shift term will only come from the catchment area of the collinear and collinear-soft subjet. A similar geometry but for a different application of pile-up and underlying event subtraction in the case of CA clustering was also explored in Ref. [75].
Thus, at leading log we can make Eq. (4.9) manifest with a simpler geometrical constraint for the shift term in the SDOE region: where Q p + Λsd is the contribution of the nonperturbative particles to the groomed jet mass. Note the use here of state |X in the operatorp + sd (X) as opposed top + sd (X, X Λ ) in Eq. (4.11), since the The catchment area of nonperturbative modes relevant for a) the shift term and b) the boundary term up to LL, pictured from above looking down the jet axis (taken along the z-direction). These modes are clustered with either the collinear subjet located on the jet axis (blue dot), or the stopping collinear-soft subjet (pink cross) as indicated by the shaded brown regions. The overlapping circles both have radius θ cs .
difference is a subleading power correction as explained above. The operatorp + • • (θ cs , φ cs ) gives the p + momentum of all the particles clustered with the collinear or CS subjet, and is defined as: where the operator Θ

• •
NP is defined to be 1 when a NP subjet in X Λ is clustered with either the collinear or CS subjets, as given by the shaded region in the p x -p y plane shown in Fig. 7a. The operatorp µ • • acts on a nonperturbative multiparticle state the same way asp µ sd does in Eq. (4.6). The condition θ cs 1 in the SDOE region implies that the two circles simply have radius θ cs , yielding a compact expression for Θ where ∆φ = φ Λ −φ cs is the relative azimuthal angle in the plane perpendicular to the jet axis, and θ Λ the polar angle relative to the jet axis. In the second line, we identified the only arguments that the operator depends on, namely the angular locations of the CS and the NP subjets in momentum space.

Expansion for the Boundary Term
We now turn to the boundary term. We first rewrite Eq. (4.10) aŝ with the boundary power correction being given by Unlike the shift term in Eq. (4.11), that only contributed through the catchment area defined by the final collinear and CS subjets, the effect of the boundary term is to modify the soft drop condition by ∆Θ sd (X, X Λ ) for every step of comparison. In the collinear-soft limit the soft drop test for a soft subjet with perturbative momentum p µ , accounting for the additional momentum q µ from hadronization effects, reads We can expand the expression above in the limit where all the components of q µ are parametrically smaller than those of p µ , while the angles are of the same order θ p ∼ θ q : with Θ p sd denoting the soft drop condition applied on p alone. Here The boundary correction is also influenced by momentum that is removed from the subjet due to hadronization, in which case the correction is which is just the negative of that in Eq. (4.17). Eqs. (4.17) and (4.18) describe scenarios where hadronization causes NP momentum to enter and leave the soft subjet respectively. For the analysis of boundary correction we can ignore the power suppressed nonperturbative corrections to the collinear subjet momentum. The correction in Eq. (4.15) can affect the collinear-soft function S κ c as well as the normalization factor N κ in Eq. (2.11) that accounts for the global soft modes. However, the CA clustering and the scaling of the NP modes in SCET implies that only the boundary correction to the collinear-soft modes need to be considered for the leading order NP power corrections. In order for a NP mode to modify the soft drop condition for a perturbative global soft mode at LL (or NLL), it must sit at the same parametric angle (due to CA clustering), which yields p µ Λ(us) ∼ Λ QCD (1, 1, 1). These corrections in the global soft region consist entirely of subjets that fail to pass the soft drop condition. However, p µ Λ /p µ cs p ν Λ(us) /p ν gs for all components in the light cone basis, so the power corrections to global soft are always further suppressed, as can be seen from the momentum scalings in Eqs. (2.15), (2.18) and (3.1). There are also additional modifications in the collinear-soft region from subjets that fail soft drop, which first become non-trivial at O(α 2 s ). These subjets do contribute to the leading power NP corrections, but are beyond LL accuracy, as we discuss in more detail in App. B.2.2. Hence for the remainder of this section we will only focus on the correction to the soft drop test for the collinear-soft subjet that stops soft drop.
The geometric region at LL that corresponds to the catchment area of the CS subjet is shown in Fig. 7b. The projection operator that selects the NP emissions in the CS subjet is given by with ∆φ = φ Λ − φ cs . This is shown as the brown shaded region in Fig. 7b that represents the region where the NP particles are clustered with the collinear-soft subjet (and not with the collinear subjet). We also define Θ NP ≡ 1 − Θ NP , which describes the complimentary region.
For the collinear-soft subjet that stops soft drop, the results in Eqs. (4.17) and (4.18) can be combined to give the leading result for the eigenvalue of the operator ∆Θ sd (X, X Λ ), as where q cs (θ cs , φ cs , β) is the non-perturbative momentum entering into or leaving from the collinearsoft subjet (the shaded region in Fig. 7b) that is at the angles θ cs and φ cs . It can be defined by the eigenvalue equationp (θ cs , φ cs , β) |X Λ = q cs (θ cs , φ cs , β) |X Λ , where the relevant operator for the boundary term is given bŷ Here the Θ NP term yields the corrections corresponding with Eq. (4.17), while the Θ NP term yields those for Eq. (4.18). The results from non-perturbative matrix elements in the operator expansion enable us to encode effects where a simple replacement of partonic momenta by hadronic momenta variables does not suffice. In App. A we present further justification for Eq. (4.21) via a simple illustrative example. Note thatp(θ cs , φ cs , β) in Eq. (4.21) is linear in β. This will lead to the linear dependence of the boundary power correction on β that was noted above in Eq. (1.4). For β = 0, the boundary power corrections are particularly simple, where from Eqs. (4.17) and (4.18) we can see that they entirely result from an expansion in the minus momentum of the nonperturbative and collinearsoft modes. This has no analogue in the case of the plain (i.e. ungroomed) jet mass since that measurement solely involves the plus component. The groomed jet mass requires a minimum p − = Qz cut for the collinear-soft mode to pass, which is susceptible to power corrections that are of the same order as the shift corrections to the jet mass measurement. For β > 0 the boundary power correction has additional angular dependence at the same order, as seen from Eq. (4.16) or Eq. (4.21). We also note from examining Eqs. (4.1) and (4.21) that we cannot take β too large if we want the SDOE expansion in Eqs. (4.17) and (4.18) to remain valid. This implies a constraint on the β values: so that the limit β → ∞ is not compatible with the expansions in the SDOE region. In the numerical analysis in Secs. 5 and 8 (also in Fig. 2 above), we replace the inequality defining the SDOE region in Eq. (1.1) by 1/5. Hence, to avoid violating Eq. (4.22) we will impose β ≤ 2 for our numerical analysis.

Rescaling
Summarizing the results from the previous sections, the two nonperturbative power corrections to the jet mass measurement in the SDOE regions from a set of NP particles {q µ i } using the LL approximation can be expressed as Here, (m 2 J ) cs refers to the contribution to the jet mass from the collinear-soft sector and the NP particles {q i }, and ∆φ i = φ q i − φ cs . The first term is the leading contribution to the measurement on the perturbative collinear-soft mode and the next two terms result from the shift and boundary corrections respectively. The measurement operatorsp µ • • (θ cs , φ cs ) andp (θ cs , φ cs , β) in Eq. (4.24) are a simplification over the full soft drop operatorp µ sd (X, X Λ ) since the same constraint now applies to all the NP subjets without involving additional nontrivial modifications due to CA clustering. The catchment area only depends on the kinematics of the perturbative radiation, that, however, varies at each point in the jet mass spectrum -a novel feature for the groomed jet mass. This is related to the observation in Sec. 3 that the Λ mode has the same parametric angle as the CS mode. Hence, the dominant NP radiation lies on and inside the boundary of the catchment area defined by the overlapping cones in Fig. 7 where θ Λ ∼ θ cs , and moves with the location of CS mode along the spectrum. The next step towards deriving nonperturbative factorization is then to factorize this perturbative dependence of the power corrections from the purely nonperturbative contribution to the observable.
We observe from Eqs. (4.13) and (4.19) that only the ratio of polar angles θ Λ /θ cs and relative azimuthal angles φ Λ − φ cs appear in the projection operators. We thus make the following change of variables from the momenta q i ∼ p Λ ∼ Λ QCD by: This implies In terms of rescaled momentum k µ i in Eq. (4.24) the projection operators defined in Eqs. (4.13) and (4.19) read In Fig. 8 we show the catchment area of the nonperturbative particles in the k x -k y plane. In the rescaled Θ

• •
NP and Θ NP the second argument 1 corresponds to the unit radius appearing in these figures. Note that in contrast to Fig. 7 the axes are now k x,y /k − .
We can think of the rescaling in Eq. (4.25) as boosting the nonperturbative momenta along the jet axis, and rotating by φ cs in the plane perpendicular to the jet axis, which we can approximate to be aligned with the collinear subjet. To see this explicitly we first define the Lorentz operator Λ(γ, φ) for a boost γ along the jet direction and a rotation by φ: where R φ is a 2 × 2 rotation matrix in the transverse plane. Hencê withΛ −1 being the corresponding inverse Lorentz transformation. Then the operators in Eqs. (4.12) and (4.21) for the shift and boundary power corrections assume a simple form The measurement in the square brackets is performed on the state |X Λ as seen in Eq. (4.11), and thus yields momenta k µ i ∼ Λ QCD in both the cases, whereas the simple angular factors outside are purely perturbative. Thus we see that, despite their ± superscripts, the new variables k + i and k − i are invariant under physical boosts along the jet axis, which follows from their definitions in Eq. (4.25) and Eq. (4.30), since any boost to q ± i is compensated by that to θ cs /2 = p + cs /p − cs . Thus we observe that by performing the measurement on the nonperturbative subjets in an appropriately boosted frame we are able to completely factorize the perturbative and the nonperturbative dependence of the power corrections induced through the angles of the subjet. We note that the small angle approximation is crucial for this derivation. As already mentioned above near Eq. (4.22), in the limit β → ∞ the modes that stop soft drop have θ ∼ 1, and we are no longer able to factor out the perturbative dependence of the measurement. Hence, our results for the soft drop power corrections do not have any simple connection to power corrections for the plain jet mass spectrum.

CA Clustering within the Nonperturbative Sector
We remind the reader that our method of treating hadronization uses nonperturbative modes with virtuality p 2 Λ ∼ Λ 2 QCD that in conjunction with the perturbative modes account for the full hadronic cross section. Both the NP and perturbative modes are described by different fields in the SCET Lagrangian, with their own individual contributions to matrix elements. Concerning Eqs. (4.12) and (4.21), we note that a key feature of the operatorsp µ • • (θ cs , φ cs ) andp (θ cs , φ cs , β), used to calculate the power corrections, is that they implement a single-particle and purely geometrical constraint on the NP emissions state |X Λ based on the location of the perturbative subjets and their catchment areas. As a consequence, in the SDOE region we were able to decouple the effects of CA clustering between the perturbative and nonperturbative sectors. However, the CA clustering is still important within the nonperturbative sector: the angular locations of the NP branches can change significantly if other NP branches with similar momentum scaling get paired with it. In this section we address this issue by clarifying what we mean precisely by "NP subjets" for the purpose of defining our NP source function.
As an example we consider two scenarios with two NP particles and two perturbative branches as shown in Fig. 9: in scenario (a) both of the NP particles get clustered with the perturbative tree at different stages, and in scenario (b) they get clustered together first and the combined branch is then paired with a perturbative branch (here the CS branch). In the scenario a) both the nonperturbative particles can be combined with the CS subjet only if each of them falls in the catchment area shown in Fig. 7, and hence individually satisfy Θ • • NP = 1. After the first NP particle closer to the CS subjet is clustered, the angular location of the resulting subjet direction is roughly the same, and hence the same geometrical constraint applies for the second NP particle clustered later. In scenario (b), however, one of the NP particles may not lie in the region of overlapping cones, because only the combination of them needs to. Hence, in order to make our operators in Eqs. (4.12) and (4.21) account for such cases it is mandatory to make the meaning of the state |X Λ more precise.
Given a set of particles to be reclustered for grooming, the EFT provides a natural Lorentz invariant distinction between perturbative and NP particles without introducing a hard momentum cutoff. Physically, the momentum distribution of non-perturbative particles peaks at smaller momenta in the SDOE region. We demand that the operatorsp µ • • (θ cs , φ cs ) andp µ (θ cs , φ cs , β) should be applied to "NP subjets" instead of being tested on individual NP particles, where these NP subjets are obtained by CA clustering of all the NP particles treating perturbative particles as "beam" directions. These NP subjets are defined in the following manner: 1. All NP particles are called NP subjets.
2. The pair of NP subjets with smallest relative angular distance ∆θ is grouped into a new NP subjet, if the angular separation of each of the two NP subjets to any perturbative particle is larger than ∆θ. The grouping of the NP subjets continues until the latter condition fails.
The set of NP subjets that result from this grouping defines the "multi-particle" state |X Λ that is tested according to the geometrical constraint set by the collinear and CS subjets. We also note that the NP subjets then themselves have energy ∼ Λ QCD . With this refinement of the meaning of |X Λ Eqs. (4.12) and (4.21) now account for all the clustering cases with arbitrary number of particles.
Note that the steps outlined above yield the same CA clustered tree as one would obtain via the usual CA procedure that starts with clustering the closest pair regardless of their energy. We will make use of this procedure in the Monte Carlo studies presented below in Sec. 8 to demonstrate the validity of our kinematic approximations in the SDOE region.

Factorization for Matrix Elements
Having simplified the form of the measurement operator we now consider the nonpertubative factorization for the corresponding matrix elements. In this section we shed light on the properties of the power corrections for groomed event shapes, via fixed order calculations working at LL in the perturbative emissions and using Feynman gauge, in order to demonstrate the factorization of the perturbative and nonperturbative parts of the matrix element. The part of these perturbative calculations involving nonperturbative modes simply serve as a proxy to probe the corresponding matrix elements.
In standard event shapes, without jet grooming, the nonperturbative effects are often sourced by Wilson lines that know only about the direction of the energetic collinear parton [7,19,25,76]. This is due to the fact that the leading power correction is governed by soft modes that cannot resolve the details of collinear splittings of quarks and gluons that constitute the internal perturbative structure of the jet. For e + e − → dijets [19], or jet production in pp with small jet radius R [28], the matrix element of nonperturbative radiation becomes invariant under boosts along the collinear direction. This is a desirable property given the simplifications we obtained for the groomed jet measurement operators in Eqs. (4.31) and (4.32) on boosting the NP sector to an appropriate reference frame along the jet direction. In our analysis below, the abelian graphs without gluon splitting exhibit a factorization of nonperturbative and perturbative matrix elements (without making a boost since they are boost invariant). In contrast, the non-abelian graphs, where the NP gluon is emitted from the collinear-soft gluon, yield an expression that is apparently not factorized into perturbative and nonperturbative matrix elements. However, the change of variables in Eq. (4.25), corresponding to a boost and a rotation of the NP sector alone that depend on the angles of the collinear-soft subjet, does in fact yields a factorization of the perturbative and nonperturbative matrix elements in the new frame. This transformation yields precisely the same perturbative Wilson coefficients as the abelian diagrams, showing that the transformation in Eq. (4.25) is essential not only for factorization of the measurement but also for the matrix element.
To determine the perturbative coefficients multiplying the non-perturbative matrix elements in the operator expansion we follow the logic of Ref. [25], where a source NP gluon replaces the NP mode. As discussed above, the dominant effect of NP modes is induced only via the collinear-soft function S c , and thus we do not consider nonperturbative effects in the global soft function or other functions. We consider a case of a quark or a gluon initiated jet in the single emission picture with a perturbative gluon p µ , that has the collinear-soft scaling, and a nonperturbative gluon q µ . With this set up we demonstrate how the nonperturbative power corrections can be factorized from the perturbative matrix element. The corresponding Feynman diagrams are shown in Figs. 10 and 11. We expand the interactions to the leading non-trivial power, which leads to eikonal couplings to the energetic source lines. We do not consider cut vacuum polarization graphs for quarks or gluons as they yield subleading nonperturbative corrections. The nonperturbative gluons are brown, whereas the perturbative gluons are magenta. With the dashed line representing plus momentum measurement with the soft drop test, δ( + −p + sd ), these graphs precisely correspond to the O(α s ) perturbative corrections with an additional nonperturbative gluon.

Abelian Graphs
We first consider the abelian graphs shown in Fig. 10. The result for the one loop collinear-soft function in Feynman gauge including the effect of the NP gluon reads where the first term is simply the perturbative O(α s ) collinear-soft function quoted above in Eq. (2.20), and the remaining terms are the power corrections with an integral over the momentum q of the non-perturbative source gluon. Hereμ 2 = (µ 2 e γ E /(4π)) and C κ = C F or C A for quark and gluon initiated jets respectively. The details of the derivation are presented in App. B. At first order in the power expansion, only diagrams where the NP gluon attaches after the perturbative gluon (next to the cut) contribute. Thus in Fig. 10 the 1st graph does not contribute, but the the 2nd and 3rd graphs do. The power corrections for the shift and the boundary terms involve phase space integrals over the perturbative gluon Here NP and Θ NP were defined in Sec. 4.1. The results in Eq. (4.34) refer to the remaining pieces after subtracting the perturbative S κ,pert c from the full expression S κ,had c . In Eq. (4.33) we have further factored out the matrix element for the NP gluon, such that the term in the square brackets serves as a proxy for a nonperturbative source function:F ab.
with the 'ab.' superscript emphasizing that this is derived from the abelian graphs, and the subscript κ that it is dependent on the jet initiating parton. HereC(q) is defined in Eq. (B.8). (At the end of our analysis the source functions will be replaced by a full non-perturbative function rather than some perturbative approximation.) In Eq. (4.33) we did not add diagrams with virtual NP gluons because they only affect the overall normalization. We comment further on the normalization of the nonperturbative source function below. The two graphs in Fig. 10 where only a NP gluon crosses the cut do not contribute in the SDOE region, as discussed further in App. B. We observe thatF (q µ ) and the measure d d q are individually invariant under boosts along the jet direction. On performing the boost and the rotation defined in Eq. (4.25) taking θ cs = θ p and φ cs = φ p we find As a result of which Eq. (4.33) becomes where the power corrections Ω • • 1 and Υ 1 involve measurements on the NP radiation in the boosted frame: with the projection operators given by Eq. (4.27). Since k ± are boost invariant, so are the full definitions in Eqs. (4.38) and (4.39). We note that the shift term is β andz cut independent, whereas the boundary term has a linear dependence on β. The power corrections in Eq. (4.33) are multiplied by perturbative Wilson coefficients given by

Non-Abelian Graphs
We now turn to the non-abelian contributions shown in Fig. 11. Here the NP gluon is radiated off the perturbative gluon. These diagrams contribute at the same order as the abelian ones, while graphs that are not shown (such as a cut gluon vacuum polarization graph) are higher order in the power expansion. The sum over all the non-abelian graphs is discussed in App. B and the result reads Unlike the abelian graphs it appears that we cannot simply carry out the p + and p − integrations to express S had, n.a.
we find that the nonperturbative and the perturbative factors in in the last term of Eq. (4.41) completely decouple: . cut , β, µ − Ω n.a.
where the perturbative coefficients ∆S • • c and ∆S c are exactly the same as in the abelian case, see Eq. (4.40), and the nonperturbative moments are given by HereF n.a. (k µ ) is a proxy for the nonperturbative source function for the non-abelian graphs: This expression demonstrates that the power corrections we are considering here can not be expressed in terms of a non-perturbative matrix element of a Wilson line operator. From the analysis of the non-abelian graphs we see that the rescaling in Eq. (4.25) is crucial to achieve a separation of the nonperturbative matrix elements and perturbative Wilson coefficients. The factorization property of the source function continues to hold in the presence of additional perturbative gluons at LL. This occurs because the nonperturbative gluon momentum is relevant only in one denominator, which is either from a gluon propagator or Wilson line regardless of the number of perturbative emissions. In App. B.2.1 we perform an explicit check that this property holds with two perturbative gluons. We note that in our one-loop fixed order analysis we only considered the boundary power correction to the subjet stopping soft drop. In case of multiple perturbative emissions there are subjets with collinear-soft scaling that fail soft drop and their contribution to the normalization of the cross section can receive a boundary power correction as well. However, we show in App. B.2.2 that such corrections only enter beyond LL order.
Overall we can combine the abelian and non-abelian results from Eqs. (4.37) and (4.44) to obtain: with Ω • • 1κ = Ω ab. 1κ + Ω n.a. 1 and Υ κ 1 (β) = Υ ab. 1κ (β) + Υ n.a. 1 (β). We note that we did not explicitly consider diagrams with virtual nonperturbative gluons in the analysis above, however they do not affect in any way the conclusion of this factorization analysis. The essential point is that our analysis clarifies that the interface between NP and perturbative modes, and in particular the shift and boundary corrections, are governed by a nonperturbative source functionF κ (k µ ) which, eventually, has to be determined using methods outside perturbation theory.

O(α s ) Matching Coefficients
In this section we evaluate the Wilson coefficients in Eq.
We parameterize the hadron level differential cross section with the leading power shift and boundary nonperturbative corrections in the SDOE region in the form . (4.51) The form of both results in Eq. (4.51) makes explicit that they are the same order in the power counting, as argued above. We also note that for the pp case the combinations in Eq. (4.51) are independent of cosh η J since (4.52) Note that the presence of an explicit α s factor in the first order SDOE power corrections is explained by the need for a perturbative emission which stops soft drop. Furthermore we emphasize that the coefficients for the shift and boundary power corrections contain large logarithms from higher orders so one should not conclude that the O(α s ) terms shown in Eq. (4.51) suffice to determine these coefficients at the leading log level. Due to the inherent separation of scales that is present with the collinear-soft, jet, and global soft scales, these results will be dressed by further perturbative emissions which produce a tower of leading logarithms.

Results for Power Corrections from the Operator Expansion
Based on the factorization for the power corrections derived in Secs. 3-4.4 it is natural to write the hadronic cross section as where the leading power corrections are projections on a non-perturbative source distributioñ F κ (k µ ) given by Here we have used the fact that Υ 1 (β) is linear in β. Thus the leading non-perturbative power corrections in the SDOE region are expressed in terms of three hadronic parameters, Ω • • 1κ , Υ κ 1,0 , and Υ κ 1,1 . These parameters are each O(Λ QCD ), depend on whether the jet is initiated by a quark or gluon via κ = q, g, and are independent of all other variables. 2 Since the momentum variables k ± are defined as boost invariant along the jet axis, so are these hadronic parameters. We stress that these parameters are only defined for groomed jet mass in the SDOE region, with geometry determined by the Θ .
(4.57) Equation (4.56), or equivalently Eq. (4.57), are our main results for the operator expansion with leading power corrections in the SDOE region.
In the next section we will show that a leading double logarithmic series is absent from C κ 1 and C κ 2 when defined as in Eq. (4.56), so that their parametric forms include terms Here we work at LL order with a running coupling, and hence only single logarithms from the running of α s (µ) are included. The determination of the full set of terms k=1 (α s L) k requires a NLL calculation for d∆σ • • κ /dm 2 J and d∆σ κ /dm 2 J which we leave to future work. We remind the reader that the factors in Eq. (4.58) are such that in the pp case the combination of factors of Q and coefficients C κ 1,2 yield cosh η J independent results for these power corrections, see Eq. (4.52).
It can be convenient to express the shift power correction as the moment of a one dimensional distribution: Interestingly, Eq. (4.60) implies that Ω • • 1 in Eq. (4.59) is effectively obtained from an unnormalized distribution, which differs completely from the case of ungroomed event shapes: At this order we can reexpress the results in terms of a normalized distribution F • • κ (q ) defined as Thus using Eqs. (4.56) and (4.63) we arrive at an expression of the hadronic cross section valid for terms up to the leading nonperturbative corrections: Written in this manner we see more explicitly that the C κ 1 term in the first line acts like a jet mass dependent shift, while the C κ 1 and C κ 2 in the second line contribute to a jet mass dependent normalization correction. This form may be more useful than Eq. (4.56) as it allows the higher order power corrections to be modeled by higher order moments of F • • κ (k). Combining the nonperturbative factorization in Eq. (4.64) with the partonic factorization theorem in Eq. (2.11) we obtain Note that Eq. (4.65) contains a nontrivial form of convolution between the collinear-soft function S c and the normalized shape function F • • κ due to the appearance of the Wilson coefficient C κ 1 (m 2 J ). This encapsulates the effect of the jet mass dependent NP catchment area. In contrast, for an ungroomed event shape, such as thrust or a hemisphere mass, the dijet region receives leading power NP corrections that represent a constant jet mass independent shift to the spectrum. Furthermore, for the leading power corrections there is also no analog of the m J dependent normalization corrections.

Resummation for Matching Coefficients for Hadronic Corrections
The goal of this section is to calculate the perturbative Wilson coefficients C κ 1 and C κ 2 in Eq. (4.56), and demonstrate that they do not contain LL double logarithms. We make use of a combination of the coherent branching formalism [77] and input about the nature of the expansion from SCET. Since C κ 1 and C κ 2 are properties of the groomed jet, it suffices to calculate them for quark dijets from an e + e − collision, with the extension to gluon jets obtained by a simple replacement. We will also quote the corresponding results for pp collisions.
In what follows, we first review the derivation of the partonic resummation formula for the groomed jet mass using the coherent branching formalism presented in Refs. [48,50]. We then make use of the key results derived from the EFT analysis above to setup a coherent branching calculation of the Wilson coefficients in Eq. (4.51). In coherent branching, the resummation is implemented via a sum over real emissions, where one can, in analogy to a coherent branching parton shower, track the kinematic information of the sequence of emissions, making the calculation quite intuitive. This novel coherent branching calculation corresponds to carrying out a resummation for an observable (here m 2 J ) while simultaneously weighting its phase space by a function of another observable (the stopping angle, θ cs ). Schematically, the calculation of C κ 1 and C κ 2 corresponds to evaluating the following resummed averages of the powers of the soft drop stopping angle: We note that it is not obvious how to define this result from coherent branching alone, since coherent branching does not provide the power expansion or the formulation of fields needed for describing non-perturbative corrections. On the other hand, at LL order in resummed perturbation theory, using the coherent branching formalism is simpler than setting up the necessary (novel) SCET formalism for the resummation of large logarithms.

Review of Parton Level Resummation in Coherent Branching
We start with a series of angular-ordered emissions off an energetic parton. These are being clustered in a jet of radius R, with the previous emissions off the parton being at wider angles, θ 1 > θ 2 > . . . > θ n . Subsequently the radiation is groomed. We now assume that the CA clustering proceeds such that, at each step, an emission is paired with the central collinear subjet, and not with another emission. Thus at every stage of unclustering we will recover the emissions in the order they were emitted [77][78][79][80]. It is useful to definẽ which satisfiesθ i ≤ 1. We also define z i as the energy fraction of the i'th emission with respect to the jet's energy. At this point we replace the angular ordering with ordering in the variable [50,56]: Using this chain of ordered emissions we review the known resummation for the parton level jet mass cross section in this section. This provides the basis for the calculation of the resummed expressions for the Wilson coefficients C 1 (m 2 J ) and C 2 (m 2 J ) discussed in Sec. 5.2. In the collinear-soft and soft limit the contribution to the jet mass from the i'th emission is Thus we see that the ordering in ρ i is equivalent to the ordering in the contribution to the jet mass of each emission carried away from the collinear parton. For simplicity we consider a quark jet for our discussion, noting that the result for gluon jets at NLL is simply obtained via a substitution for the color factor C F → C A and the splitting function p gq (z) → p gg (z). We will therefore frequently suppress the κ = q subscript in the following. For convenience, we also use a short hand notation for the single emission phase space and matrix element: Here p gq (z) is the one-loop collinear splitting function which reads We also include a running coupling in Eq. (5.5) as a part of the LL resummation. Here the running coupling is evaluated at the scale of the ⊥ momentum of the emission with respect to the jet axis. At LL accuracy we can limit ourselves to the case that the emission with the largest ρ n that passes soft drop also sets the value of measured jet mass m 2 J . Emissions with ρ i < ρ n (for i > n) that are kept can then be considered unresolved. Thus the ordering in ρ i ensures that there is a natural upper cutoff for the unresolved emissions. Further, in the LL approximation the ρ i can be assumed to be strongly ordered, with the inequality '>' in Eq. (5.3) replaced by strong inequality ' '. We also can ignore the perturbative radiation off the emitted gluons. If the stopping pair that sets the value of the groomed jet mass m J is found after n unclusterings then the normalized perturbative differential cross section is given by where Θ i sd and Θ i sd are soft drop passing and failing conditions for the i th subjet: The −δ(m 2 J ) and −1 terms in Eq. (5.7) correspond to the virtual contributions in the passing and failing subjets respectively and the terms Θ(ρ i − ρ i+1 ) impose the ordering. The term with Θ n sd represents a unique scenario where all the n emissions in the jet fail soft drop and a zero contribution to the jet mass m 2 J is obtained. The scenario where there is yet another emission after the n th one is accounted for by the (n + 1) th term in the sum, and so on. Hence, there is no double counting in the resummation formula in Eq. (5.7).
It is convenient to work with the cumulant of the cross section defined bŷ which leads to exponentiation for the emissions: where the radiator R q for quark jets in the Sudakov factor reads Note that the radiator and cumulant only depend on R in combinations with other variables. The cumulant requires m 2 J ≥ 0 and we will leave the Θ(m 2 J ) implicit below. For the gluon jet radiator R g we can replace C F p gq(z ) → C A p gg(z ) to capture the leading logarithms. Lastly, the exponentiation in Eq. (5.10) resulted from the replacement of the angular ordering of the emissions by ρ i ordering in Eq. (5.3), which yields the same constraint on the n th emission as the previous ones.
The result for the differential jet mass distribution in e + e − annihilation then reads where we find it convenient to work with the result in terms of C q 0 defined as whereθ is given bỹ Note that to simplify the notation we have here defined C q 0 , R q andθ with abbreviated arguments, and we continue to use this notation below. The ζ cs is defined in Eq. (2.18) and appeared in the definition of CS mode, which captures the scaling of the softest subjet that satisfies the soft drop condition. We see that in Eq. (5.13) the angle of the stopping subjet lies between the angle of the collinear modes and the collinear-soft modes: where the minimum condition accounts for the transition to the ungroomed resummation region whenθ (m 2 J ) ∼ 1. For a quark jet from a pp collision with R 1, the angles are cutoff by θ i ≤ R/ cosh η J , so we would defineθ i = θ i cosh η J /R to have 0 ≤θ i ≤ 1. Repeating the analysis for this case we obtain whereθ is now given bỹ Eq. (5.16) also involves the radiator for the pp case which is with Note that with the rescaled variables the results for C q(pp) 0 (m 2 J ),θ (m 2 J ), and R pp q are all independent of cosh η J . In the case of θ (m 2 J ) the displayed cosh η J dependence cancels with the implicit dependence contained in the Q and Q cut , see Eq. (4.52).

Resummed Matching Coefficients C 1 and C 2 for Subleading Power
We now turn to a calculation of the matching coefficients C 1 and C 2 of the leading power nonperturbative corrections, which we carry out at LL order.

Resummation for the Shift Power Correction
In Sec. 4.1 we showed that the shift correction corresponds to the total NP radiation captured by the groomed jet area. From Eq. (4.31) we see that this correction can be obtained by performing a measurement in an appropriately boosted frame and rescaling the result by θ cs /2, the opening angle of the soft drop stopping pair. From the analysis in Sec. 4.3 we learned that the nonperturbative and perturbative matrix elements can be factored by boosting the nonperturbative momentum to an appropriate reference frame, see Eq. (4.25). Using these results the jet mass cumulant including the shift correction is given bŷ In Eq. (5.20) the measurement on the NP radiation is performed in the boosted frame with respect to the collinear-soft momentum of emission n, which corresponds to setting θ cs to θ n = Rθ n in Eq. (4.31). The termF (k µ ) is the NP source function. Separating the perturbative and nonperturbative components we get where and just as above in Eqs. (4.38) and (4.45) we have identified The series in Eq. (5.22) exponentiates as in Eq. (5.10) yielding which have rewritten in the desired form of Eq. (4.56). Then from Eqs. (5.24) and (5.12) we can identifỹ Note that in the ratio e −Rq /C q 0 of the Sudakov exponential and the leading power cross section, all the double logarithmic terms cancel out.
From Eq. (5.26) we see that C 1 is given by the average resummed opening angle of the stopping pair that passes soft drop at a given value of m J . Our result is similar to the average of the groomed jet radius, R g , calculated in Ref. [81], except that the result is evaluated at a fixed value of m J that sets the range of possible values for θ stop in Eq. (5.15). We find that a rough approximation is We now give the corresponding result for C q 1 (m 2 J ) for a pp collider. Here we find where z cut and θ pp (m 2 J ) are given above in Eqs. (5.19) and (5.17). Here a rough approximation is (5.31)

Resummation for the Boundary Power Correction
We now turn to the boundary power correction that originates from the fact that the soft drop comparison for the collinear-soft subjet receives a correction given by Eq. (4.20). The perturbative cumulant with these corrections then readŝ where ∆Θ n sd is the correction in the soft drop test for the stopping subjet n stated in Eq. (4.20). Performing the change of variables in Eq. (4.25) we can again factorize the measurement such that where just as Eqs. (4.39) and (4.46) we identify We note again that Υ 1 (β) is linear in β. The power correction can be factorized from the perturbative component as: whereC q 2 is given bỹ .
Hereθ (m 2 J ) is defined as in Eq. (5.14). Thus we see thatC q 2 (m 2 J ) is the average of 2/θ evaluated at the boundary of soft drop at a given m 2 J . The result in the pp case can be computed in a similar manner, and is given by with z cut and θ pp (m 2 J ) are given above in Eqs. (5.19) and (5.17).

Numerical Results for C 1 and C 2
Having derived the expressions for C 1 (m 2 J ) and C 2 (m 2 J ) we now present some numerical results for quark jets κ = q. We use 2-loop running for α s (µ): with α s (µ 0 = m Z ) = 0.118 and Given that for Q = 500-1000 GeV and the typical values of z cut we adopt both the collinear-soft and jet scales lie well below the top mass, we employ the evolution for n f = 5 dynamical flavors. We choose to freeze the coupling at α s (1.5 GeV) for scales µ ≤ 1.5 GeV.  point the description of power corrections with Eq. (4.56) no longer applies. It is instead replaced by an analogous formula with C 2 = 0, C 1 = 1, and a different definition for the parameter Ω • • 1κ → Ω 1κ (R), see [28] for its definition.
In Fig. 13 we show corrections due to the shift and boundary power corrections using Eq. (4.56). For the nonperturbative parameters we take the representative values of Ω • • 1 = 1.0 GeV, Υ 1,0 = 0.7 GeV, Υ 1,1 = 0.4 GeV and consider several values of z cut and β. The curves named "shift" and "boundary" correspond to the two leading power correction terms in Eq. (4.56) divided by the parton level cross section dσ/dm 2 J . The "shift" term involves derivative of the partonic cross section times C 1 (m 2 J ) and is proportional to Ω • • 1κ , and the "boundary" term is the correction to the normalization due to C 2 (m 2 J ) times Υ κ 1 (β) = Υ κ 1,0 + βΥ κ 1,1 : The region between the vertical dashed and dashed-dotted lines indicates the appropriate SDOE region where this description is valid. We see that the power corrections amount to about ∼ 1-10% in the SDOE region and grow larger as we approach the SDNP region. These plots display the absolute value of the power corrections, and hence do not illustrate the possibility that the two nonperturbative corrections may have different signs. In general we expect a positive value for the shift parameter Ω • • 1κ . We will see in Sec. 8.3 that MC event generators favor a negative value for Υ κ 1,0 .

Hadronization Effects in the Nonperturbative Region
Next we consider the nonperturbative (SDNP) region of the groomed jet mass spectrum, as defined in Eq. (1.1). Here the nonperturbative effects contribute at leading order, and we show that they can be accounted for via a shape function. We demonstrate that this shape function exhibits universality by being independent of the jet kinematics Φ J and z cut , but does depend on β in a non-trivial fashion. Furthermore, it exhibits an unusual feature that it is not normalized to 1.

Shape Function in the Nonperturbative Region
Consider the factorization for the cross section in the SDNP region, as illustrated in Fig. 3b. In this region we have the same hierarchy of modes as were considered for the perturbative cross section in the SDOE region, with {H,C,S,CS} mapped to {H,C,S,ΛCS}. The essential difference is that now the collinear-soft ΛCS mode is nonperturbative. Since the steps in the derivation of the factorization theorem with SCET that lead to Eq. (2.11) do not rely on a perturbative expansion, the factorization formula in Eq. (2.11) and its properties are also valid for the SDNP region.
However now the collinear-soft function S κ c + Q 1 1+β cut , β, µ defined in Eq. (2.16) is nonperturbative since it involves modes that have p 2 ∼ Λ 2 QCD . Here + represents the total plus momentum of the collinear-soft sector. In the following we show how to best describe this collinear-soft function, and how this leads to a conceptually and practically viable version of the factorization theorem of Eq. (2.11) involving a shape function with prescribed properties.
Note that in the SDNP region there is no analog of a separation into shift and boundary corrections as seen in the SDOE region, since here the NP subjets themselves determine when the soft drop is stopped. So there is a single leading order effect to the jet mass measurement from particles in the SDNP region which enters through S κ c . In Ref. [55] it was shown that the Q and z cut dependence in the collinear-soft function S κ c in Eq. (2.16) only enters in the combination + Q 1 1+β cut . Although the analysis in Ref. [55] relied on considering the phase space integrals for on-shell particles, it is also straightforward to see that it also holds when considering off-shell particles in the nonperturbative region (as is necessary when considering the interpolating operators for hadrons). The proof follows from expressing the momenta p µ i in the lab frame in terms of dimensionless rescaled coordinates {k a i , k ⊥ i , k b i } which in our notation corresponds to the following change of variables As discussed in detail in Ref. [55], the CA clustering and soft drop test on a cluster of particles become independent of Q cut upon making this rescaling. 3 The Wilson lines appearing in the definition in Eq. (2.16) are also invariant under this rescaling. Finally, Lorentz invariant combinations of momenta such as also involve only the combination + Q 1 1+β cut . Applying this change of variables in Eq. (2.16) gives a rescaling to the explicit momentum operatorp + and the momentum operators in Θ sd (p µ i ), which can be arranged into the form where the new operatorsk i give the rescaled momenta of Eq. (6.1). Thus the proof for the dependence on only this combination holds also with having offshellness of O(Λ 2 QCD ). Note that due to the ultraviolet structure of the collinear-soft function that there are contributions from p −2 ⊥ , which leads to another source of + dependence in Eq. (6.3) through distributions involving ( + Q 1 1+β cut /µ 2+β 1+β ). Therefore one should not simply pull the distribution variable + in Eq. (6.3) out of the δ-function.
The function S κ c depends on three arguments, but we can factor out its dependence on the MS UV renormalization scale µ by following Refs. [73,74]. First we define the Fourier transform cut , β, µ , (6.4) so that the variable y has mass dimensions −(2 + β)/(1 + β) and the functionS κ c is dimensionless. In position space the RGE equation forS κ c (y, β, µ) is multiplicative. Therefore we can define a new µ independent nonperturbative shape functionF ⊗ κ (y, β) viã S κ c (y, β, µ) ≡S κ,pert c (y, β, µ)F ⊗ κ (y, β) , (6.5) the perturbative collinear-soft functionS κ,pert c is included to the order in the α s expansion (or in resummed perturbation theory) required for the precision of the prediction. When using MS for the computation ofS pert c this leads to the nonperturbative functionF ⊗ κ being defined in the MS scheme. At this point other solutions with reduced infrared-sensitivity, which could allow to control renormalon effects, are also possible following the approach of Ref. [73]. The superscript ⊗ in Eq. (6.5) is meant to distinguish it from the source function introduced in Sec. 4.3 that describes the Λ mode in SDOE region.
The collinear-soft functionS κ c depends on the jet initiating hard parton κ = q, g. Furthermore, all possible logarithmic dependence on y is associated with the µ dependence captured byS κ,pert c , implying that it is confinement which restricts the possible form of the shape functionF ⊗ κ . In momentum space this implies that all moments of F ⊗ κ exist, so that it falls off at large momentum faster than any power law (such as exponentially). In position space this implies that all derivatives of the function exist at the origin y = 0. Next we define the momentum space shape function F ⊗ κ using a dimension 1 momentum space variable k NP via NP to k NP and absorb the resulting Jacobian as part of the definition of F ⊗ κ (k NP , β). With this definition we naturally have the scaling k NP ∼ Λ QCD and we can assume that the function F ⊗ κ (k NP , β) typically only has non-trivial support in this momentum range. With the definition in Eq. (6.6), the Fourier transform of Eq. (6.5) now yields a convolution with this momentum space shape function, This is the final form that we will use for describing the collinear-soft function in the SDNP region. Note that the shape function F ⊗ κ (k NP , β) depends only on β and the jet initiating parton κ, but not on Q or z cut , demonstrating this universality. We also note that the condition that only the combination + Q 1 1+β cut appears in S κ c already implies the form of the convolution in Eq. (6.7). An important difference to the case of ungroomed event shapes is that the normalization of F ⊗ κ is not constrained to unity. In the ungroomed case the normalization condition follows from the fact that the shape function represents a unitary non-perturbative redistribution of the partonic plus momentum. Moreover, in both the nonperturbative peak region and the operator expansion region, the leading nonperturbative mode is a wide angle soft mode that scales as Λ QCD (1, 1, 1), enabling a connection to be made between these two regions, constraining the normalization condition and the way how the hadronic parameters enter [73]. In contrast, in the groomed case, the power corrections in the SDOE region involved both a term with a derivative of the cross section, as well as terms without. This can be seen from Eq. (4.65) where in the context of our SDOE analysis we went as far as possible to rewrite the result to a form involving a one-dimensional shape function. The form of terms involving Υ κ 1 (β) and dC κ 1 /dm 2 J makes it clear that the same normalization condition can not be derived in this case, so the shape function F ⊗ κ of Eq. (6.7) is not normalized to unity. Furthermore, there is no simple connection between the shape function F ⊗ κ (k NP , β) and the function F • • κ (k) of Eq. (4.65) since the perturbative m Jdependent Wilson coefficients C 1 and C 2 must emerge when we transition from the SDNP to the SDOE region. This is because there are two different leading nonperturbative modes, the ΛCS mode for the SDNP region and the Λ mode for the SDOE region, so the transition requires a more complicated interpolation between them which we do not consider further here. In the next section we discuss an approach for building a model for F ⊗ κ (k NP , β) in order to study this function in greater detail.

A Model for the SDNP Region Shape Function
Here we discuss a model for the nonperturbative function F ⊗ κ (k NP , β) in the SDNP region which enables a more detailed exploration of its properties. Unlike the perturbative CS mode in the SDOE region, the relevant nonperturbative mode for this region, ΛCS, both stops the groomer and is responsible for the nonperturbative effects. Consider a nonperturbative subjet with momentum q µ that is being tested for soft drop:  Figure 14. The catchment area of nonperturbative modes in the SDNP region is shown. The brown cross indicates a NP subjet that stops soft drop in the SDNP region and has an angle θ q < θ ΛCS (q ⊥ ). Blue dot refers to the collinear parton aligned with the jet axis. The brown shaded region corresponds to the region where any other NP subjets could lie clustered with the collinear subjet. In b) the same geometry is shown in the rescaled coordinates.
where we have expressed the angle in terms of q ⊥ and q − since the subjet does not need to obey an onshell condition, but is still boosted. Here we have identified for a given q ⊥ the maximum angle a NP subjet can have to pass soft drop: In the SDNP region, CA clustering of commensurate momenta will lead to significant changes in the direction of the subjet as the particles are clustered. However, we can simplify the analysis by considering subjets of NP particles defined via the procedure described above in Sec. 4.2. In the SDNP region the only perturbative subjet relevant is the collinear jet-initiating parton. This implies that once the NP particles have been clustered to yield a set of NP subjets, the subsequent CA clustering will only combine an NP subjet with the collinear subjet.
We show in Fig. 14a a NP subjet at angle θ q that stops soft drop, which from Eq. (6.8) satisfies θ q ≤ θ ΛCS (q ⊥ ). The brown shaded region is the catchment area for any other NP subjets that contribute to the jet mass which have been clustered with the collinear parton. To construct our model, we consider two NP subjets with momenta q µ and r µ , where q µ stops soft drop. In the SDNP region we then express the collinear-soft function as follows The distribution of the two subjets in the SDNP region including its µ dependence is described by a nonperturbative source function F ⊗ κ (q µ , r µ , µ). The function does not depend on the soft drop parameters which are implemented explicitly by the measurement in Eq. (6.10). (For simplicity we do not write the symmetric case when r µ stops soft drop and q µ is kept or rejected.) Here Θ ⊗ NP denotes the catchment area for r µ related to the collinear subjet when q µ stops soft drop at angle θ q . It can be expressed with the help of the operators defined in Eqs. (4.13) and (4.19) above, which we introduced in our analysis for shift and boundary terms: where ∆φ = φ r − φ q . This is shown as the brown shaded area in Fig. 14a. In Eq. (6.10) the first term in the second line corresponds to the case where r µ is already a part of the collinear subjet when q µ is being tested and hence it contributes to the measurement. Since we have assumed that q µ stops soft drop, the second term in Eq. (6.10) describes the scenario where r µ lies outside the combined catchment area of q µ and the collinear subjet and fails soft drop. The expression in Eq. (6.10) can be further simplified by expressing all nonperturbative momenta in terms of variables in an appropriately boosted frame with respect to the q µ subjet in a fashion similar to Eq. (4.25). Here the analog is to choose the relevant angle for the boost to be θ ΛCS (q ⊥ ), the maximum angle allowed for the momentum q µ of the stopping NP subjet: Note that since θ ΛCS (q ⊥ ) does not transform under boosts along the jet direction, the variables k µ in Eq. (6.12) are not boost invariant and transform as q µ does under boosts (unlike the rescaled variables in the SDOE case in Eq. (4.25)). We will assume that the nonperturbative function F ⊗ κ (q µ , r µ , µ) is invariant under boosts along the jet direction, and thus F ⊗ κ (q µ , r µ , µ) = F ⊗ κ (k µ ,k µ , µ). Applying this rescaling we can rewrite Eq. (6.10) in the form We thus see that after the change of variable, the dependence on Q cut drops out in the soft drop passing and failing conditions for the two subjets consistent with our discussion in Sec. 6.1. In Fig. 14b we show the catchment area in the rescaled coordinates. We have also introduced an auxiliary variable k + that encodes the total NP + momentum.
We can rewrite Eq. (6.13) including µ evolution down to a low nonperturbative scale µ Λ ∼ Λ QCD via: where we introduced k 2+β 1+β NP as a convenient integration variable, and note that the δ function sets k 2+β 1+β . At NLL the perturbative collinear-soft function only consists of a RG evolution factor since the boundary condition at tree level is a δ-function, sô As a result of this Eq. (6.14) becomes: The result in Eq. (6.16) is the analog of what we derived above in Eq. (6.7) from general considerations. The result in Eq. (6.17) provides an explicit model for F ⊗ κ (k NP , β) in terms of a two-variable source function F ⊗ κ (k µ ,k µ , µ Λ ) that is independent of the soft drop parameters. This expression clearly illustrates the non-trivial β dependence of the non-perturbative shape function. Note that this model also includes the same overall factors that appeared in the prefactor of Eq. (6.6).
Using the properties of the Θ functions, we can also investigate the normalization of F ⊗ κ (k NP , β): Thus we see explicitly that the presence of the soft drop grooming induced Θ-function projection operators decreases the normalization. A practical assumption for this model would be then to take F ⊗ κ (k µ ,k µ , µ Λ ) as normalized, which would then imply

Comparison with Previous Work
Nonperturbative corrections to groomed jet mass spectra have been studied previously as an extension of perturbative calculations, both with Monte Carlo studies [48,50,55,56,64], analytic models [50,64] and shape function models [55]. In general, hadronization corrections are found to be reduced by the jet grooming [48]. Here we will make a comparison of our results with the analytic estimates for hadronization corrections from Refs. [50,55,64].
For pp collisions Ref. [50] considered the modified mass-drop tagger, which closely corresponds to soft drop with β = 0. Analogs of both the shift and boundary corrections were considered in the region where m 2 J > Λ 2 QCD /z cut , which corresponds to the SDOE region in Eq. (1.1) with β = 0. These estimates were extended to β > 0 in Ref. [64] where hadronization corrections were considered due to a shift in m J and a reduction in p T of soft subjets from nonperturbative particles, Here z)) is the effective radius of the jet formed by a single perturbative splitting, and Λ hadr ∼ Λ QCD is a nonperturbative parameter common to both the corrections. Eq. (7.1) was then used to calculate the leading power corrections from these two types of hadronization effects to the jet mass cross section by averaging over z with the soft drop passing constraint: Here dσ/dm J is the partonic cross section and an expansion in are the first subleading terms from this expansion. The logarithm can also be expressed as lnz −1 sd = ln θ (m 2 J )/θ c (m 2 J ) , and hence is the same as the first logarithm that appears in an α s expansion of C 0 (m 2 J ) in Eq. (5.13). The two corrections in Eq. (7.2) have the same scaling for z sd 1. Taking dσ/dm J ∼ (α s lnz −1 sd )/m J , the lowest order terms in α s have the scaling Comparing Eq. (7.3) with our Eqs. (4.53) and (4.55) we find that our results agree as far as the scaling for the term is concerned. When compared in more detail there are, however, significant differences between the model of Refs. [50,64] shown in Eq. (7.2) and our final pp results from Eqs. (4.56, 5.30, 5.40), which we now discuss. One difference is that our shift correction in Eq. (4.56) involves a derivative of the differential jet mass cross section, implying that the shift corrections in Eqs. (4.56) and (7.2) do not agree at LL order, and hence differ for example in their m J and β dependence. Comparing our boundary correction with the p T -shift term, we see that both results are proportional to the leading power cross section at this order, so perturbatively the only difference is due to running coupling effects kept in our C pp 2 , compared to the fixed coupling that was used to obtain the expressions in Eq. (7.2) (this difference is also present for the shift term). The results also differ because ours depend on three hadronic parameters for each of the quark and gluon induced jets . In contrast only one hadronic parameter Λ hadr is employed in the two corrections in Eq. (7.2) for both the m J and p T shift correction terms with color prefactors C κ that do not appear in this manner for us (since in general both abelian and non-abelian attachments of nonperturbative gluons are present as sources). Since different projection operators appear in the definition of Ω • • 1κ and Υ κ 1 (β) in Eq. (4.54) there is no relation that connects these two quantities in our treatment. The Υ κ 1,1 also signifies additional linear β dependence in our boundary correction that is not present in the p T -shift correction of Eq. (7.2), since Refs. [50,64] did not account for the additional β dependence from expanding the soft drop condition in their treatment of Eqs. (4.17) and (4.18). Finally, in our treatment the signs of our Υ κ 1,i are apriori not determined as the softer subjet can in principle both gain or lose momentum from hadronization.
For a numerical comparison of the m J -and β-dependence of our shift and boundary corrections with the corresponding results of Ref. [64] given in Eq. (7.2) we consider the quark jet case, and set Ω • • 1q = C F Λ had = 1 GeV, Υ q 1,0 = C A Λ had = 1 GeV, and Υ q 1,1 = 0 GeV. We then consider the following ratios: where the numerators (labeled "MSS") are predictions from Ref. [64] in Eq. (7.2), and the denominators (labeled "HMPS") are our predictions for shift and the boundary terms in Eqs. (5.25) and (5.37). We plot these ratios in Fig. 15, where the three panels correspond to three choices of β. Differences can be as large as 30% for β = 0, and they become even larger for larger β. For the dashed curves we keep only LL terms from the splitting function, p gq (z) = 1/z, and other z dependent terms, and use a fixed scale for the coupling, so that higher order terms that are treated differently in the two formulas are uniformly dropped. For the solid curves we use the full O(α s ) splitting function p gq (z) given in Eq. (5.6). The differences between dashed and solid curves is at the 10-20% level. The difference in the m J dependence appears to be rather moderate for the shift term, but is more significant for the boundary term except when β = 0. Note that the presence of a non-zero hadronic parameter Υ q 1,1 will induce an even larger differences. In Ref. [55] hadronization corrections to soft dropped jets were investigated in the SDNP region with a simple convolution with a normalized model function F shape to obtain an alternate estimate for these corrections, based on the analogy with event shapes in e + e − collisions. Their model corresponds to using cut , β, µ F model (k + ) , (7.6) in place of the last line in our Eq. (4.65). In [55] the approximate agreement of this implementation with Monte Carlo was taken as evidence that there might indeed exist a normalized shape function for describing hadronization in groomed jet observables. Given that we have derived a shape function description of the power corrections for the jet mass in Eq. (6.7) in the SDNP region we can directly compare with Eq. (7.6). In particular we note that Eq. (7.6) has the same form as our result in the SDNP region, given in Eq. (6.7) (though the integration variable is not strictly speaking a +-momentum). The most important difference, however, is that the norm of our shape function F ⊗ κ (k NP , β) is not constrained, and is instead determined by an additional nonperturbative parameter. Our F ⊗ κ (k NP , β) is independent of z cut , but depends on β and the jet initiating parton κ = q, g.
The scaling analysis of Ref. [55] effectively corresponds to considering nonperturbative modes Λ with the scaling As shown in Fig. 4, for the SDOE region these Λ modes differ from the Λ modes in Eq. (3.1) which represent the true dominant NP modes in the SDOE region. The Λ modes do no account for the existence of soft drop stopping angle, whcih is represented by the vertical orange line and which effectively reduces the jet radius to R g < R. In our SDOE analysis we have shown that the Λ modes are the dominant nonperturbative modes. Their effects exceed those of the Λ modes since they have parametrically larger p + momenta, and lead to only an α s suppression rather than extra Sudakov suppression. In the SDNP region, m J has decreased to a region where the CS, Λ, and Λ modes of Fig. 4 all have the same scaling, so that Eq. (7.7) and Eq. (3.2) describe the same mode. We note, however, that the SDNP region does not have a simple connection to the SDOE region, so that the latter does not yield a simple normalization constraint for the SDNP the shape function, F ⊗ κ (k NP , β) of Eq. (6.7). In Ref. [59] a model equivalent to that of Ref. [55] in Eq. (7.6) was employed to describe soft drop angularity distributions [82]. Our analysis of nonperturbative modes and conclusions  can also be applied for angularities as long as one considers angularities that are sufficiently jet mass like and away from the broadening limit, i.e. a < 1. We leave a detailed analysis of the power corrections to groomed angularities to future work. In Ref. [63] the angularity a = 1.5 was considered where the collinear mode becomes nonperturbative before the collinear-soft modes do. In this case our conclusions from the jet mass analysis do not apply.

Monte Carlo Studies
In this section we present a Monte Carlo study to test our predictions for power corrections in the SDOE region. We consider three Monte Carlos: Pythia 8.235 [83,84], Vincia 2.2 [85] and Herwig 7.1 [34,86] with their respective default tunes for hadronization. The jet finding and soft drop grooming is implemented using the SoftDrop plugin in the FastJet 3.3 package [48,87]. We choose to simulate the e + e − → qq dijet process at Q = 500 and 1000 GeV, take R ee 0 = π/2 and reconstruct two leading R = 1 jets with the anti-kT jet-finding algorithm [88].

Comparing Wilson Coefficient Results with Monte Carlo
We show in Fig. 16  In order to extract C 2 (m 2 J ) from Monte Carlo we implement the δ function in Eq. (5.36) by modifying the soft drop condition such that and take a numerical derivative with respect to of the jet mass cross section in a given jet mass bin. This yieldsC 2 (m 2 J ) in Eq. (5.38) times the partonic cross section in that bin. In Fig. 17 we compare the C 2 (m 2 J ) obtained in this way from the Monte Carlo using the modification of the soft drop test in Eq. (8.1) with our analytical result from Eqs. (5.38) and (5.39). Note that we do not show results for C 2 (m 2 J ) from the Monte Carlo further to the left of the pink vertical line, i.e., beyond the SDOE region where the formalism with C 2 no longer applies. We observe that unlike the case of C 1 (m 2 J ) discussed in Fig. 16, Pythia and Vincia results for C 2 (m 2 J ) differ from Herwig, with Herwig being in a better agreement with our calculation of C 2 .

Catchment Area Geometry versus Jet Mass
The key ideas that allowed us to describe the NP corrections in the SDOE and SDNP regions were our assumptions about the geometry of the catchment area of the NP subjets around the collinear and collinear-soft jet axes. Upon rescaling the NP subjet momenta according to Eq. (4.25) we were able to factorize the perturbative contributions and the nonperturbative matrix elements. To test the underlying kinematic approximations we show in Fig. 18 heat maps illustrating the distribution of the rescaled angular location of the NP subjets in the plane perpendicular to the jet axis for different bins of the jet masses from hadron level Vincia with at Q = 1000 GeV with z cut = 0.1 and β = 1. Here hotter colors correspond to a higher density of NP subjets. In accordance with our definition for NP subjets in Sec. 4.2 we uncluster the groomed jet until we find the first subjet that has energy E ≤ 1.0 GeV. We then rescale the angle of this subjet with  respect to the jet axis using Eq. (4.25) and our calculation for the opening angle θ cs = 2 C 1 (m 2 J ) for the corresponding jet mass value of the jet. We then rotate the subjet in the azimuthal plane by φ cs , where the more energetic (collinear) subjet lies at the origin, and the softer subjet of the stopping pair is rotated onto the positive x-axis (according to Fig. 8). These transformation allow us to visualize the 2 dimensional distribution of the angular locations of the NP subjet in the rescaled coordinates shown in Figs. 8 and 14b.
We divide the distributions in several jet mass bins in order to visualize how the catchment area geometry changes when going from the SDNP to the SDOE region, and from the SDOE to the ungroomed resummation region. These distinct regions of the jet mass spectrum have been defined as in Eqs. (1.1) and (2.8) and are illustrated in Fig. 19. We can see from Fig. 18 that there are prominent voids around the collinear subjet and (once present) the perturbative collinear-soft subjet. These are primarily due to the fact that both hadronization and the CA algorithm cluster non-perturbative particles/hadrons with the energetic subjets close to the collinear or collinearsoft subjet axes, and these clusters always lie deeper than the first NP subjet in the CA clustering tree. The top left panel corresponds to the events in the SDNP region. Here we see that the NP subjets are clustered along with the collinear subjet at angles ∼ θ cs . The region extends slightly to the right of the collinear subjet catchment area since the stopping subjets are themselves nonperturbative and are included in the distribution. In the subsequent three panels we enter the SDOE region and see the expected geometry from Fig. 8a emerging. Since we rescaled the angles of the NP subjets by our perturbative prediction, θ cs = 2C 1 , it is a nontrivial check of our discussions in Sec. 4 that the rightmost "black hole" at the collinear-soft subjet location in the center right panel is indeed centered at k x /k − 1 and k y /k − 0. The resulting regions of high density also confirm our expectation that the dominant NP modes are determined by the θ cs angle, leading to the halos of radius 1 around the stopping subjets. In the bottom two panels

Event
Generator We also include a goodness of fit measure with χ 2 min /dof following the method described in the text.
we exit the SDOE region and start to enter the ungroomed resummation region. Here we see a distortion in the distributions due to the effect of the ungroomed jet boundary, which is a circle of radius R = 1.

Testing Universality with Fits for Hadronic Parameters
We now test the compatibility of the hadronization models of the three Monte Carlos with our description of power corrections in the SDOE region. In Fig. 20 the solid curves show the hadron level jet mass spectrum for z cut = 0.1, β = 1, and Q = 500 GeV for the three Monte Carlos in the SDOE region. We note that while the hadron level predictions of the three Monte Carlos are quite close, they differ significantly in their parton level output, indicating in turn significant differences concerning their hadronization models.
To carry out a fit to determine the compatibility of the MC hadronization models with our description of hadronization, we consider a grid of jet mass spectra for the following sets of Following Eq. (4.56) we then fit for the hadronic parameters Ω • • 1q , Υ q 1,0 and Υ q 1,1 in the SDOE region, taking the corresponding partonic MC cross section with hadronization turned off as the perturbative cross section dσ/dm 2 J , and use our analytical results for C q 1 (m 2 J ) and C q 2 (m 2 J ) in Eqs. (5.26) and (5.38). We define the χ 2 -function as the sum of squared difference between the MC hadron level and the MC parton level cross sections including the power corrections of Eq. (4.56). We adopt m J independent uncertainties in the SDOE region. For the size of uncertainty, we first take 5% as the approximate size of the hadronization corrections to the normalized cross section in this region, as is suggestive from Fig. 20, and then define the uncertainty to be employed in the χ 2 -function to be 10% of that. We then fit in the SDOE region with approximately 20-30 bins, for all combinations of Q, z cut and β values shown in Eq. (8.2). The result of the minimization is summarized in Tab. 1.
We first note that the results of the fits all yield O(1 GeV) values of the three universal hadronic parameters as expected. Although the extracted results from Pythia and Vincia differ, in both cases our three universal hadronic parameters provide an excellent description of the MC hadronization model. To further probe the reason for a rather poor fit results from default Herwig 7.1.4 we carried out a fit for the p T B tune defined in Ref. [89]. The default Herwig 7.1.4 parton shower uses the jet mass preserving kinematic reconstruction, which was pointed out to be problematic in Ref. [36] where the shower-cut dependence of the parton level thrust distributions was analyzed. In Ref. [36] a better agreement with their analytical QCD results was observed with the p T preserving reconstruction originally proposed in Ref. [90] (see also [34]) which is the basis of the p T B tune. This observation is further supported by studies in Refs. [35,91]. We find here that the p T B tune, with 4 α s (m Z ) = 0.118, yields similar values of the hadronic parameters 4 Note that the pT B tune of Ref. [89] also prescribes a change in the reference strong coupling value from α CMW s (mZ ) = 0.127 (corresponding to αs(mZ ) = 0.118 in the MS scheme) to α CMW s (mZ ) = 0.1087 as a part of the tune. However, we found that upon using the α CMW s (mZ ) = 0.1087 the fitting procedure failed to produce stable, and we therefore used the pT B tune with α CMW s (mZ ) = 0.127. as the Herwig 7.1.4 but with a much improved χ 2 . The correlations between the fit results for Ω • • 1q , Υ q 1,0 and Υ q 1,1 for our fits for Vincia are displayed in Fig. 21. Although there is some visible correlation between the three parameters, they are all still well constrained at the central values of the fits.
In Fig. 22 we show a selection of the fit results (blue curves) for Vincia hadron level results (red curves) and parton level results (black dashed curves) for a subset of the distributions included in the fits, using representative values of Q = 500 GeV, z cut = {0.1, 0.2} and β = {0, 1, 2}. We see that the three fitted values of the hadronic parameters Ω • • 1q , Υ q 1,0 and Υ q 1,1 are sufficient to  describe the hadronization corrections for a range of Q, z cut and β values. We obtain similar good quality results for Q = 1000 and other values of z cut and β not shown in Fig. 22.
A prediction of Eq. (4.54) is that the boundary correction depends linearly on β. To check this within the context of our analysis we also fitted for Υ q 1 (β) for each value of β separately, while fixing Ω • • 1q to the global fit value shown in Tab. 1. This allows us to compare the fit results for individual β values against our prediction for Υ q 1 (β) using the global fit values of Υ q 1,0 and Υ q 1,1 . The result of this exercise is displayed in Fig. 23. We observe a good agreement of the linear prediction and the explicit fit results for Pythia, Vincia and Herwig p T B, well within the ±20% estimated uncertainty from our predictions for C q 1 and C q 2 . In Fig. 24 we also test the z cut -independence of the boundary power correction parameters Υ q 1,0 and Υ q 1,1 . While the fits for these parameters obtained from Pythia and Vincia are roughly z cut independent, as expected from our theoretical considerations, we find a residual z cut dependence for the two versions of Herwig. We see, however, for Herwig some reduction of the z cut dependence when using the p T B tune since the individual fits are significantly closer to the global fit. We conclude that the MC results support the universality of the Υ q 1,i parameters at the expected level of precision of ±20% induced from the uncertainty in C q 1 and C q 2 . Finally, in Fig. 25 we test the universality of the shift correction with respect to variations in z cut and β. Here we fit for Ω • • 1q for individual z cut and β values (colored symbols) while fixing Υ q 1,0 and Υ q 1,1 to their global fit values. The results from these individual fits are then compared with those from the global fit values shown in Tab. 1 (horizontal solid and dashed lines). We find an independence to the value of z cut at the 10% level for Pythia and Vincia, and up to 15% for Herwig 7.1 p T B. Again the MC results support the expected level of universality for Ω • • 1q .

Conclusion
We have presented a study of the dominant nonperturbative corrections to the groomed jet mass spectrum based on quantum field theory calculations in the framework of soft collinear effective theory and the coherent branching formalism. We considered the operator expansion region (SDOE), where the hadronization effects are power corrections, and the nonperturbative region (SDNP) of the jet mass spectrum, where they become leading order effects. In the SDOE region we identified two leading nonperturbative power corrections, called the "shift" and "boundary" corrections, which are related to the contribution of the nonperturbative radiation to the jet mass and to modifications in the soft drop test due to nonperturbative radiation respectively. The two power corrections have a perturbative dependence on the angle of the soft drop stopping collinear-soft subjet that sets the catchment area for the nonperturbative particles. We showed that this stopping angle dependence can be factored into the Wilson coefficient by an appropriate rescaling of the momenta of the nonperturbative modes. This allowed us to derive factorization at the level of both the measurement function and the nonperturbative matrix element. It resulted in jet mass dependent perturbative Wilson coefficients for the two effects, which involve the perturbative functions C 1 (m 2 J ) and C 2 (m 2 J ), that encode non-trivial dependence of the leading power corrections on the kinematic parameters p T , η J , the jet radius R, and grooming parameters z cut and β. We have explicitly calculated the coefficients C 1 (m 2 J ) and C 2 (m 2 J ) using the coherent branching formalism at LL order with a running coupling. As a result the leading power corrections in the SDOE region can be described by three universal hadronic parameters Ω • • 1κ , Υ κ 1,0 and Υ κ 1,1 that are independent of these parameters and have dimension of mass. Here κ = q for a quark initiated jet, while κ = g for a gluon initiated jet, and the parameters can differ for these two cases. In the SDNP region we showed that the leading order nonperturbative corrections can be described by a shape function that depends on β which has a normalization less than one. We tested our conceptual analysis on the structure of hadronization corrections in the SDOE region by carrying out Monte Carlo studies with Pythia 8.235, Vincia 2.2, and Herwig 7.1. We extracted C 1 (m 2 J ) and C 2 (m 2 J ) from the Monte Carlos and found that they agree well with our LL calculations. We confirmed the kinematic approximations of our conceptual studies by analyzing the angular distributions of nonperturbative subjets produced by MC generators, and we found the expected geometries for the SDNP and SDOE jet mass regions. We then tested the hadronization models of the MC generators in detail by fitting for the three universal parameters Ω • • 1q , Υ q 1,0 and Υ q 1,1 for quark jets in e + e − annihilation in the SDOE region. We found that the fitted values of the three parameters are O(1) GeV, and consistently describe the MC hadronization corrections for a range of Q, z cut and β values. The parameters also showed the predicted universality for each of Pythia, Vincia and Herwig, though with different values for the hadronic parameters. An improvement of Herwig 7.1.4 results was observed upon considering a different parton shower kinematic reconstruction from Ref. [90]. Thus our predictions for hadronization corrections were successfully confirmed by this comparison to MCs.
The results of this work enable a more precise description of hadronization effects for the groomed jet mass distribution, and make it possible to consider obtaining improved precision measurements of QCD parameters such as α s and top mass m t from the LHC data using the groomed observables. We note that some of our results are also used in our theoretical analysis of soft drop groomed top quark jets in Ref. [62], where it is pointed out that these hadronization corrections are also universal between massive and massless quarks. While we focused on the case of jet mass in this work, we emphasize that the same techniques can also be applied to other groomed observables such as angularities [82]. We also envisage, that our study may contribute to better calibrate and tune hadronization models of Monte Carlos, which have so far relied only on ungroomed observables. The universal hadronic parameters we obtained in our analysis involve combinations of +, −, and ⊥ components of nonperturbative momenta and are hence more sophisticated than those obtained from classic event shapes. Thus, our results for the power corrections for the groomed jet mass have interesting implications for dedicated tuning studies of MC hadronization models.

A Measurement Operator for the Boundary Term
To illustrate the key physical ideas for understanding the operator for the boundary term in Eq. (4.21) we consider a simple example. Consider two perturbative subjets with energies E 1 and E 2 forming a jet with energy E J such that We will consider the first subjet to be collinear-soft and the second to be collinear, so For simplicity we consider the case with β = 0, such that the soft drop criteria is purely a comparison of energies. Then the soft drop comparison is and we are interested in the regime where z cut 1. Turning on hadronization, involves including NP radiation with energy E Λ ∼ Λ QCD /ζ cs E 1 . This radiation encodes both the non-perturbative rearrangement of momenta into and out of the subjets as well as the binding of partons to hadrons. Since E 1 E 2 , the leading hadronization effects will occur when hadronization modifies the energy of the E 1 subjet. There will be a reduction in energy from the regrouping done by hadronization causing a loss of particles from the 1 subjet to other subjets (here subjet 2). Similarly there can be an increase to E 1 from hadronization if the nonperturbative regrouping adds particles to the 1 subjet.
We can probe these non-perturbative effects by using a single NP subjet of energy E Λ that participates in the CA clustering. At the stage where this NP subjet is not yet clustered, hadronization has changed the energy of subjet 1 to E 1 = E 1 − E Λ 1 . We also note that E 2 = E 2 − E Λ 2 is the energy of subjet 2 after hadronization. Case by case the E Λ i could be positive or negative, but on aggregate we expect a positive value. Here E Λ 2 is always negligible relative to E 2 , but can appear in comparisons with E 1 . By energy conservation the total jet energy is We consider the effect on the soft drop condition in Eq. (A.3) on the hadronized subjets 1 , 2 and Λ, and expand back in terms of the variables of the perturbative subjets 1 and 2. From the point of view of CA clustering, we have three situations: (i) the CS subjet 1 and NP subjet are clustered first, followed by the collinear subjet 2 , (ii) collinear subjet 2 and NP subjet are clustered first, followed by the CS subjet 1 , (iii) the subjets 1 and 2 are clustered first, followed by the NP subjet.
The case that occurs is entirely determined by a geometric test of the angles of the NP subjet, which divides up the phase space as 1 = Θ NP . From the beginning we drop corrections E Λ and E Λ i relative to E J and E 2 and only retain those appearing with E 1 or E 1 . For case (i), we have where , and the ellipses denote higher order power corrections. For case (ii) there is no correction from the CA clustering since the NP subjet is grouped with the collinear subjet. This is also the situation for case (iii) where the NP subjet is simply groomed away by soft drop in the SDOE region, since it is the comparison of 1 and 2 that stops soft drop.
Expanding to the first non-trivial order, these cases give Noting that Θ NP = Θ The terms in Eq. (A.7) track flow of energy in and out of the subjets. Generalization of this example to β > 0 is straightforward. The final step is to determine an analog for Eq. (A.7) from the point of view of the operator expansion with NP fields. In this context the NP fields enable us to describe effects where a simple correspondence between partonic and hadronic expressions breaks down without double counting the hadronization effects related to particles that remain in the same region. In the OPE with a single non-perturbative Λ field, there is only a single q − momentum, so that at this order the OPE gives q − 1 = q − 2 = q − . Thus the NP source fields satisfying Θ NP (θ Λ , θ 1 , ∆φ 1 ) = 1 correspond to the term for region (i), and those with Θ NP (θ Λ , θ 1 , ∆φ 1 ) = 1 correspond to the terms for regions (ii) and (iii). This leads to the result quoted in Eq. (4.20).

B Collinear-Soft Function with a Probe Nonperturbative Gluon
Here we provide some details of the results presented in Sec. 4.3 required to demonstrate the factorization of nonperturbative corrections, and to show that the boundary correction from soft drop failing subjets is subleading at LL accuracy.

B.1.1 Measurement Operator in SDOE region
We first start by considering the measurement operator for the diagrams in Figs. 10 and 11. The momentum of the collinear-soft perturbative gluon is taken to be p, which can be real or virtual, and that of the NP source gluon to be q, which is always real. Here the total plus momentum + of the real radiation kept by soft drop is measured. For the real graphs with p passing the cut the measurement operator is given by whereas for the virtual graphs for p this is simply a soft drop test on the NP gluon q: M q = Θ q sd δ( + − q + ) + Θ q sd δ( + ) .

(B.2)
Here the unbarred Θ's are complements of the Θ's, so Θ = 1 − Θ. The three lines in Eq. (B.1) correspond to the three CA clustering situations we discussed above in App. A. In the first line in Eq. (B.1), p and q are first combined together into a subjet and the combined momentum p + q is tested for soft drop with the collinear jet. In the second line, the first step of clustering combines q with the collinear parton, with q representing momentum lost by the CS subjet, and hence the soft drop test is first applied on p − q. The third line with Θ • • NP corresponds to the case when p and the collinear jet are clustered first, and thus the first soft drop test compares q and the collinear+p subjet. In this case the NP particle lies outside the shaded region in Fig. 7a.
The superscripts on the Θ sd and Θ sd operators correspond to the momentum that gets soft drop tested with the collinear parton. They represent soft drop passing and failing conditions respectively. In accordance with Eq. (4.21) and the discussion in App. A, in the first line where q is clustered with the softer subjet, the momentum tested for soft drop is p + q. The next two cases in the second and the third line reduce the momentum of p µ from the point of view of the soft drop condition. On the other hand, the NP particle contributes to the total momentum as long as it is part of either the collinear or the collinear-soft subjet, as shown by the arguments of the δ-functions.
In the SDOE region, the probability of a NP emission to pass soft drop in this region alone is exponentially Sudakov suppressed. Hence, we set Θ where ∆Θ sd is given by In simplifying to Eq. (B.3) we made use of the following relations related to the geometric regions: We have written Eq. (B.3) such that the second and third lines correspond to the shift correction in the + momentum and the boundary correction, respectively, as we saw above in Eqs. (4.23) and (4.24). Note that we dropped the subleading q + shift in the boundary correction term. The term −δ( + − p + ) in the shift correction in the second line in Eq. (B.3) results from separating out the partonic measurement.

B.1.2 Abelian Graphs
In the following we give details of the computation of the abelian graphs shown in Fig. 10 and discussed in Sec (abelian, real) = C κ (4C κ − 2C A )(g 2μ2 ) 2 d d p (2π) d C(p) n ·n n · pn · p d d q (2π) dC (q) n ·n n · qn · q M p+q , where q µ is the momentum of the nonperturbative gluon and C κ = C F or C A is the color factor for quark or gluon being the parent parton, respectively. C(p) andC(q) implement the onshell cut conditions: C(p) = 2π δ(p 2 )Θ(p 0 ) ,C(q) = 2π δ(q 2 − m 2 )Θ(q 0 ) , (B.8) and the MS factorμ 2 = (µ 2 e γ E /(4π)) . In Eq. (B.8) m 2 ∼ Λ 2 QCD is a mass term for the nonperturbative source gluon, which for this calculation can be thought of as a proxy for the mass of the NP subjet. We get two Eikonal factors for both the gluons emitted from the parent parton. The virtual diagrams (examples are the last two diagrams in Fig. 10) add to give (abelian, virtual) = −C κ (4C κ − 2C A )(g 2μ2 ) 2 d d p (2π) d C(p) n ·n n · pn · p d d q (2π) dC (q) n ·n n · qn · q M q .
Note that the δ( + ) term again cancels in the difference M p+q − M q . This yields the result stated in Eq. (4.41).

B.2 Analysis with Two Perturbative Emissions
B.2.1 Confirmation of the NP Factorization at O(α 2 s ) In this appendix we perform a cross check on the factorization of the nonperturbative matrix element by using two perturbative emissions, one of which stops the soft drop and the other being either a failing or an unresolved emission. We show examples of such graphs in Fig. 26. For simplicity, in this exercise, we will only evaluate the real radiation graphs, where two perturbative gluons and one nonperturbative source gluon run across the cut. The graphs with one or both the perturbative gluons being virtual simply serve to cancel certain terms and correspond to the 5 Unlike for the abelian graphs, here it is known that there can be additional 1/ UV poles which generate anomalous dimensions for the power correction parameters [25], which here would sum single logarithms generated between ΛQCD and the collinear-soft scale. Consideration of these effects is beyond the scope of this work. cases already considered above in App. B.1. At LL it suffices to consider the diagrams with both the perturbative gluons having eikonal couplings to the energetic collinear parton. The soft drop stopping gluon has the momentum p µ 2 and the additional gluon has p µ 1 . At LL we can limit ourselves to the case where the angles θ p 1 and θ p 2 are strongly ordered, such that we have the following two scenarios: p µ 1 fails soft drop : R θ p 1 θ p 2 , p µ 1 is an unresolved emission : Concerning attachments of the NP gluon, at leading order, we again only need to consider diagrams where the NP gluon is either emitted last from the collinear parton (described by a Wilson line), or is attached to one or both of the perturbative gluons. From Eqs. (4.35) and (4.47) we see that the combined NP source function from the abelian and non-abelian graphs derived with a single perturbative emission is given bỹ . (B.16) Here the variables (k + , k − , k ⊥ ) are related to the NP gluon momentum q µ by the rescaling in Eq. (4.25) with θ cs = θ p 2 being the angle of the stopping gluon with respect to the jet axis.
The goal is to check that also in the presence of an additional perturbative emission, the same source function of Eq. (B.16) emerges at LL accuracy. We anticipate that in presence of additional perturbative emissions at LL, we will obtain an additional C κ for each emission. The contributions from all the real emission diagrams of the kind shown in Fig. 26, can be then decomposed into the following color basis: (B.17) The graphs involving two triple gluon vertices, such as the last two diagrams in Fig. 26, and many more, contribute with a C κ (C A /2) 2 color factor. For simplicity, we will not include these diagrams in this analysis and hence will only consider contributions involving the first two color structures in Eq. (B.17), which are generated by the abelian graphs and the non-abelian graphs involving a single triple gluon vertex. We first consider the measurement operator M( + , p µ 1 , p µ 2 , q µ ) for the case of two perturbative emissions. It is a higher order generalization of the two emission case in Eq. (B.3) and can be expressed as follows M( + , p µ 1 , p µ 2 , q µ ) = M( + , p µ 1 , p µ 2 ) + Θ(θ p 1 − θ p 2 )Θ p 1 sd ∆M( + , p µ 2 , q µ ) (B.18) + Θ(θ p 2 − θ p 1 )∆M( + , p µ 1 , p µ 2 , q µ ) . The first term on the right hand side is the measurement for the leading power term that only involves the two perturbative emissions p µ 1 and p µ 2 . Since we can ignore clustering of p µ 1 and p µ 2 due to the strong ordering limits, it reads: M( + , p µ 1 , p µ 2 ) = Θ(θ p 1 − θ p 2 )Θ p 1 sd Θ p 2 sd δ( + − p + 2 ) + Θ p 2 sd δ( + ) (B.19) While the NP matrix element for the abelian result in Eq. (B.26) remains the same, the terms appearing in the non-abelian results in parentheses in Eq. (B.25) yield the following expressions (B.28) where we have used the small angle approximation and on-shell condition, We see that upon rescaling the momentum q µ with respect to θ p 2 leaves a residual θ p 1 /θ p 2 dependence in the terms where the gluon is attached to p 1 . We can, however, simplify the expressions by making use of the fact that only a strong ordering of the angles as in Eq. (B.15) will contribute to the LL result. In these two limits we find Thus, taking either of the limits in Eq. (B.30) we get the same result for the non-abelian graphs in Eq. (B.25), which simplify to (1 n.a., real, LL) = α s π (µ 2 e γ E ) .
Thus, upon adding the results in Eqs. (B.26) and (B.31) we find that for the C 3 κ and C 2 κ C A terms we obtain the following expression: We thus see that Eq. (B.32) involves precisely the same source function as identified in Eq. (B.16) from the analysis of one perturbative emission, and hence, following Eq. (B.27), the same hadronic parameters. This confirms our expectation that the OPE involves precisely the same NP matrix element for any number of perturbative emissions at LL order.